I have been using R on my financial research for quite some time, and in this post I would like to answer some common questions that social science scholars may encounter when using R for research.

1. Why R? "R is written by statisticians and for statisticians"-Norman Matloff. Unlike most other programming languages, R has huge edge in terms of statistical analyses. I am also a big fan of Python, Julia, Octave, and Haskell, but when it comes to statistical analyses, R is the MVP. More specifically, R has a community of statisticians (PhDs in statistics) who contribute packages supported by peer-review publications, for example, see https://www.jstatsoft.org/index. In addition, R has the best data wrangling and visualization toolbox (dplyr and ggplot2) and good access to database (in my case WRDS and Quandl). Last but not least, R has elegant integration with markdown and LaTex, which enables me to write a paper without leaving R.

2. R is slow when not well written. Since most social science scholars do not have time to learn programming in a systematic style, they do not pay enough attention to the performance of codes. Here I offer two simple solutions. First, never use for loops, instead use apply family functions. Second, use the Rcpp package to integrate C++ with R and write functions in C++, and this will push the speed to the maximum limit.  There are some other versions of R implementations that are super fast, like pqR, FastR, Riposte, Renjin, etc. 

3. R has a complex memory management system. R is a object-oriented programming language, and it stores objects in RAM instead of disk. Unlike SAS, R requires the users to have a good knowledge of memory management. Personally, I get around this issue with PostgreSQL and Spark. The dplyr package works very well with PostgreSQL, and can write and read data directly from the SQL environment. On the other hand, the sparkly package enables R users to utilize the capability of Apache Spark in handling big data.

4.  Markdown vs LaTex. Markdown is the future, but currently LaTex is still preferable in academic writing. The reason is that Markdown can elegantly incorporate interactive graphs and its learning curve is much flatter than LaTex. HTML format is the future, and PDF will eventually be history. However, when it comes to printing out a paper, you may want to use LaTex instead of Markdown, because normally Markdown does not divide the whole paper into pages. Luckily, R provides support for both Markdown and LaTex. The knitr package allows R users to use markdown in Rmd files and LaTex in Rnw files.   

Similar questions and discussions