Hai Tran



I own a dataset that contains detailed data on board committees (all committees), committee memberships, and director bios of every U.S. public company (including non-S&P 1500 firms) from 1994 to 2014. Please contact me if you have research projects that could use this dataset. Here are some sample data files: committee memberships, and committee descriptions. This dataset is used extensively in my job market paper on director awards and the market for directorships, as well as my current working paper on the concentration of power within boards and its impact on firm performance.



For our joint project on conflicts of interest in mutual fund management, my co-authors and I hand collected a unique dataset on mutual fund managers, their ownership in the funds they manage, as well as the number of other investment companies or accounts they manage. This dataset is also used in my joint research project on the impact of the readability of mutual fund prospectuses on fund flows.



Both of the unique datasets I discuss above are hand collected from SEC filings by a team of research assistants. All data are verified and corrected in a second review by another independent research assistant. For small scale data collection projects, public companies’ filings can be queried through the SEC Edgar search engine at http://www.sec.gov/cgi-bin/srch-edgar. Proxy statements, which contain data on current directors as well as directors nominated for election, are called Form 14a. Mutual fund prospectuses, which contain information on fund managers in the Statement of Additional Information, are called Form 485BPOS.

This process, however, is time consuming. For each filing, the data collector needs to run a separate query, generate a list of results, and choose the correct filing. For large scale projects, it is much easier to obtain URL links to company filings from the SEC index and match these URL links to the corresponding company-year in the Compustat database, or the corresponding fund-year in the CRSP database. The following tips may be useful for researchers planning to hand collect data from SEC filings:
1. The SEC provides a comprehensive index of all company filings at https://www.sec.gov/Archives/edgar/full-index/. Choose the folder of the year you wish to collect data, download the file company.idx from all 4 quarters of the year and save them in a local folder. File names should be in the format of YYYY_Q_company.idx (2001_1_company.idx for the first quarter of 2001 file).
2. You can use the SAS code in this file to import the SEC index into SAS formatted data.
3. To merge with Compustat, you can use the CIK-gvkey mapping provided by the Compustat database. Please note that this mapping contains the current links and not historical links. Manual checks are necessary for companies that have been acquired or merged with other firms.
4. To merge with CRSP mutual funds, you will need to match between CRSP fund names and SEC fund names prior to 2006. From 2006 onward, you can use fund tickers (required by the SEC after February 2006) to match to CRSP fund tickers. This process is much more difficult than merging to Compustat. Please contact me via email if you have additional questions regarding this step.