Dear community,

For my Master Thesis, I am planning to replicate the Fama French 5 factor model from scratch. However, I encounter a practical problem in constructing the factors.

In order to construct, for example, the factor SMB, I need to divide my dataset in two parts with respect to size and in three parts with respect to value/growth stocks (as explained in your papers). My question is how do you do this? The dataset I uses consists of approximately 8000 companies with a range of approximately 50 years in panel-data format. What techniques should I use in order to create these factors? And should I do this in panel-data format or time-series format? I thought using time-series format (i.e. one column for every firm) I might be able to make the distinctions since every firm had one own specific variable. However, unfortunately, my dataset is too large in order to convert it into time-series format (even when I try to divide it in subsamples). Using panel-data however, all firms are listen underneath one variable and hence I struggle with the question on how to divide these firms into different groups in order to create the factors.

I hope you can help me with my problems!

Yours truly,

Niek van der Schaaf

Similar questions and discussions