Resources for Programming and manipulating financial datasets

This page provides program routines (mostly in SAS) that I have used, however it is neither a primer on SAS programming nor is it a manual. Rather, it provides specific links to set of tools that could help in manipulating or computing/creating datasets that are used in finance and accounting research. I provide the links from publicly available sources and also upload some of the programs that I might have written from scratch or edited from an existing program. I cannot guarantee the veracity of such programs. However, I make sure to test the programs or the datasets before I update here. I hope these resources would be helpful . Should you find this page useful or have any comments please feel free to email me

Difference in shares outstanding between CRSP and COMPUSTAT.

CRSP and COMPUSTAT are the most commonly used databases by a financial researcher. There is often confusion between variables under either of these dtatabases

This article provides a detailed description of calculating market value of equity with CRSP and Compustat database.

Fixed Effects regression

Everyday life of a researcher in corporate finance and asset pricing involves estimating fixed effects regressions on panel datasets. Here is a great resource by Gormley and Masta (RFS, 2014)on how to estimate the fixed effects regressions. They also provide SAS/STATA commands and a detailed review (PhD lecture slides). See also this interesting article.

Wondering how to interpret the results from 2SLS/3SLS , IV regressions ?

Here is a nice article by STATA corp, explaining how to overcome the missing/negative R

Probability of Informed Trading: Estimating PIN

Prof. Stephen Brown's website provides a detailed description along with PIN measures and SAS programs to compute PIN using TAQ data. To know more about the basic PIN (by Easely et al. 1996) and PINs computed using the Venter and DeJong model (Venter, J.H., de Jongh, D., 2006. Extending the ekop model to estimate the probability of informed trading. Studies in Economics and Econometrics 30, 25-39)

Computing number of business segments using COMPUSTAT

Ever wondered how to compute the number of Business segments of North American firms as in Fich and Shivdasani (2006) ? Here is a great video by WRDS, in addition this SAS macro will help you to compute the number of business segments a firm has on an annual basis. For convenience I have copied the SAS macro from WRDS.US repository.

/*  
    Macro:          nSegments.sas

Created by: Joost Impink

    Date:           March 2014
    Macro to count the number of segments (either industry segments or geo segments)
    Variables required:
        dsin            dataset in (needs to have gkvey, fyear and datadate)
        dsout           dataset out
        type            'ind' (default) or 'geo'
        unique          'true' (default) or 'false', count unique segments only     
                        (segments with the same industry count as 1 segment)
                        only relevant for industry segments     
        sicsreq         'true' (default) or 'false'; if true: require industry sic code (sics)
                        no be nonempty  
        var             name of variable to create        
        segmerged       location (libname + dataset name) of wrds_seg_merged 
                        e.g. segments.wrds_segmerged    
    Dependencies:
        - the macro requires local access to wrds_segmerged (Compustat) in a local libray
        - %runquit macro
         %macro runquit;                
            ; run; quit;                           
            %if &syserr. ne 0 %then %do;                  
            %abort cancel ;                   
            %end;                      
         %mend runquit;   
*/

%macro nSegments(dsin=, dsout=, type="ind", unique="true", sicsreq="true", var=, segmerged=);

    /*  Vars needed from dsin */

data seg1 (keep = gvkey fyear datadate);

set &dsin;

    %runquit;
    /*  Select segments segments */

proc sql;

        create table seg2 as
        select 
            a.gvkey, a.fyear, b.SICS1 as sics
        from
            seg1 a,
            segments.wrds_segmerged b
        where
            a.gvkey = b.gvkey
            /* use segment info of year of 10-K (not later years) */
            and a.datadate = b.datadate
            and a.datadate = b.srcdate

/* operating or business segments */

%if &type eq "ind" %then %do;

and b.stype IN ("BUSSEG", "OPSEG")

            %end;
            /* geo segments */

%else %do;

and b.stype = "GEOSEG"

            %end;           
            /* drop segments with missing sics1 ? */

%if &sicsreq eq "true" %then %do;

and b.SICS1 ne ""

            %end;
            and b.SALES ne .
            and b.SALES > 0
            ;   
    %runquit;
    /*  Unique industries? (only relevant for type "ind")*/

%if &type eq "ind" and &unique eq "true" %then %do;

    /*  Drop segments with same industry */

proc sort data=seg2 nodupkey; by gvkey fyear sics;%runquit;

    %end;

    /*  Count #segments  */

proc sql;

        create table seg3 as
        select distinct gvkey, fyear, count(*) as numSegs from seg2

group by gvkey, fyear

    ;
    %runquit;
    /*  Create output dataset */

proc sql;

create table &dsout as

        select a.*, b.numSegs as &var
        from

&dsin a

LEFT JOIN

            seg3 b
        ON
            a.gvkey = b.gvkey
        and a.fyear = b.fyear;      
    %runquit;


    /*  Cleanup */

proc datasets library=work;

delete seg1 - seg3 ;

    %runquit;
%mend;

Computing firm and industry level market to book ratio using Compustat

Market to book ratio is commonly applied in accounting and finance research. This article by WRDS provides a detailed guide and a SAS program to compute annual market to book ratio for US firms.