What is the difference between SPSS, R and STATA software?

Scott Reza Jafarian Kerman @Scott_Reza_Jafarian_Kerman

06 June 2013 18 889 Report

I am using SPSS software for years and do not have any problems. I did not find STATA any more helpful except manual entering in some parts; however, I have not worked with R yet.

January Weiner Popular answer

Personally, I prefer R. Learning two frameworks would be too much trouble, and SPSS only would not suffice for me.

The main drawback of R is the learning curve: you need a few weeks just to be able to import data and create a simple plot, and you will not cease learning basic operations (e.g. for plotting) for many years. You will stumble upon weirdest problems all the time because you have missed the comma or because your data frame collapses to a vector if only one row is selected.

However, once you mastered this, you will have the full arsenal of modern cutting-edge statistical techniques at your disposal, along with in-depth manuals, references, specialized packages, graphical interface, a helpful community -- and all at no cost. Also, you will be able to do stunning graphics.

That said, learning statistics is way more important than learning R or SPSS or whatever else. An easy to use graphical interface is deceitful, if you do not understand what it really does. I have seen people doing complex analyses without really understanding basic concepts of statistics -- they thought that they got the correct results, but in fact were doing terrible things. R is helpful in the sense that it is hard to do something if you don't understand what you want (or need) to do.

Jonas Forsman

I can only speak from personal experiences, but I mainly use SPSS when I would like to check something quick and easy, like exploratory/confirmatory factor analysis and such. I switch to R when I need to implement/try out new analysis methodology.

I would recommend that you use what you (and/or your colleagues) know and can handle easily.

StackOverflow also has a thread on differences on R and SPSS: http://stackoverflow.com/questions/3787231/r-and-spss-difference

For a overview on different statistical environments, see:

http://brenocon.com/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/

January Weiner

Personally, I prefer R. Learning two frameworks would be too much trouble, and SPSS only would not suffice for me.

Scott Reza Jafarian Kerman

Thank you very much for your answers, please enplane also about STATA...

January Weiner

@scott:

I have never worked in Stata, so what I'm about to say should be taken with a grain of salt. First of all, Stata seems to be as powerful as R, at least in some areas. On the downside, it also has a steep learning curve and lots of things need a command line operation. R is more extensive as a programming language, though, including object oriented frameworks, and seems to be more flexible when it comes to creating graphics (the examples of plots on the Stata homepage are not impressive for an R user). For me, however, the main disadvantage of Stata would be the fact that it is proprietary software. Nonetheless it seems to me a better choice than SPSS.

Carlos Jimenez-Gallardo

There is a very important element to consider, this is the time required for the analysis.

In some cases, when you know fully R, turns out to be one of the best tools that can develop. But when this against time delivery SOFTWARE use some alternative seems better idea, the problem obviously is which.

I have been fortunate to work with MYSTAT, Epi-info, Systat, Statgraphics, Stata, Jump, SPSS, SAS, PSPP, R, FACTOR,

there are substantial differences between the two, for example some are better in experimental design (jump), others are best in Quality Control (Statgraphics).

for psychometric processes Factor is the best option.

Anders Beckman

I have mainly worked with SPSS and I am trying to learn STATA as well as R. However, as I am only infrequently using these systems I´m depending on a GUI. Fortunately, there are several GUIs to R (depending on your OS). I think the main thing is the purpose for your choice of system: the time you have to spend to learn a new system must be weighed against the gains of the system. If 90% of your needs are fulfilled with SPSS the gain with another system seems marginal.

Scott Reza Jafarian Kerman

Thank You for your answers, I think i have got the point....

I can work and do my analysis with SPSS so far, but I have seen than most of the recent papers are analyzed with R or STATA and therefore I was a little concerned about my works. I knew that the calculations are the same, but I thought maybe there are some higher techniques or advantages in other soft wares.

I think I have to put an effort on learning R..

any suggestions????

Roland E Andersson

I have used SYSTAT, Statistix, Statistica and Stata over the years. I have so far not met any problem I could not solve with Stata. Stata has a GUI that has gotten better through the years but I do not use it. The help-files are easy to understand. If you just want to do some easy statistcs you can use the GUI just like the others. If you want more complicated things you can use the command input.

I am right now trying to learn R and the learning curve is sooooo much steeper. I suspect that only full-time statisticians can feel comfortable. The help-files ar not intuitive. The language is very technical. There are no examples.

if you will do advanced statistcal work full-time for many years I think R is good but otherwise Stata can do almost everything you want. If you only want to use a statistcal software temporariliy you can choose some simpler software. Statistix is probably the cheapest an has lots of functionality.

Roland E Andersson

One intersting comparison of commands needed for some data-managetment work between R, SAS, SPSS and Stata. I am used to Stata and find it much more intuitive and efficient that the others which I think is obvious from these examples.

http://r4stats.com/examples/data-management/

Blair Grace

My views as someone who has used Stata and R for many years, and dabbles in SPSS and SAS:

-I find using SAS is like going back to 70s in terms of weird code and needing to define formats etc just to get the data in. SAS' proprietry documentation is next to useless. SAS can do nearly any analysis you can think of.

-R is also very frustrating at times for many things. Documentation is terrible - you need to buy a textbook written in normal language (e.g. Crawley). Analysis possibilities are virtually limitless.

-I think Stata has hit the sweet spot between being powerful, expandable and still easy to use. The programing language is more consistent and intuitive than R or SAS. It is also fairly cheap. I do get frustrated occasionally when I hit the limit of available Stata code.

Karthik Balachandran

There are several important considerations which would dictate the choice of statistical software, apart from techinical differences.

How often do you deal with data?

If you work with data once in a blue moon, it is best to stick with a program your colleagues know and can trouble shoot and has a good GUI and documentation. SPSS is common in medical and social sciences, while STATA is the norm in econometrics. Both have their own strengths - SPSS is a behemoth, but its output can be directly copied to your reports without much modifcation. STATA has both a command line and a GUI, and its documentation is the best in the industry. If you are tech savvy I suggest STATA is a good choice.

Is reproducible research important to you?

R has the best environment for reproducible research. Since what is done on a GUI is hard to document and therefore hard to reproduce, every scientist should aspire to produce reproducible research. Unfortunately R is the hardest to learn initially- although it is not harder than SAS. best of all it is free and has a great community. You can easily share your code with someone in stackoverflow and get your answers.- meaning you are not limited by the local expertise available.

How much are you willing to pay?

R is free while all others cost a limb. If you have site license, then you can go for the one provided by your university.

Type of analysis

If your research involves cutting edge statistical analysis, chances are it may not be implemented in the software package provided by your university. In that case, R would be the best choice.

As the saying goes, relying on GUI tools is like waiting for a bus to reach your destination. Programming yourself is like driving a car. Hard to do at first and requires learning and maintenance, but worth it depending on your situation.

I dont have experience with JMP/SAS

Roland E Andersson

"Is reproducible research important to you?

R has the best environment for reproducible research. Since what is done on a GUI is hard to document and therefore hard to reproduce, every scientist should aspire to produce reproducible research."

I would object to that. With stata you can use do-files and keep everything reproducible and include comments. It is easy. I do that all the time and I always keep a .do-file for all my projects. When I find an error or something missing or I want to expand my analyses I can simply edit the .do-file

Roland

Karthik Balachandran

I agree STATA has an option to save your commands as do files. But if some of the data change, you may rerun the analysis, but the report has to be updated manually. Unlike R, STATA's integration with LaTeX is not very strong, although there are a few user written programs which can do this. R with knitr and RMarkdown, on the other hand, can make a pdf , word document or a html5 presentation with minimal changes to the code. This, of course, requires some initial time investment. So R is more powerful than STATA in this regard, but STATA is more user friendly.

®γσ, Lian Hu ENG

http://www.princeton.edu/~otorres/RStata.pdf

Geetanjali Pinto

I would like to know which would be the best software to use when dealing with Panel Data? Also can you please recommend the best way to go about learning that software?

M. Ricky Ramadhian

Are in your country if your basic science is medicine should know expert in SPSS or statistic software?

Rogier Brussee

Much has been already said, but I have taught SPSS and use R.

SPSS: I don't like it. The good thing about it is, if you have a very simple thing to do like making a pie chart or running a single t-test AND you know where to click on the menus it is easy. The bad thing is, this would have been easy in any system including spreadsheets. By far the most problematic thing though, is that It has the nasty habit of giving students the idea that they are doing objective science if and only if they figured out a way to click on a SPSS menu that gives an answer that has the word "significant" in it. Never mind which answer. To be honest, that may have as much to do with the people teaching and being taught SPSS as with the program itself. It is also very difficult (not impossible but difficult) to reproduce a result if you find out you made a small mistake somewhere, or if you have to do the same analysis to run on multiple datasets. There is a SPSS syntax, that programmatically does things, but it is weird and the menu driven and syntax driven part of SPSS are poorly integrated (in fact SPSS seems to be an ancient FORTRAN 60 program glued to a half baked Java GUI, changing in unexpected ways every release). A side effect of this, is that there is surprisingly little online help showing how to do things, often needing video's and their main focus is to to help you click through. Last not least, SPSS is freaking expensive if you are not in education. The consultant crowd uses it to show they are experts to managers that were scarred by SPSS themselves, and so IBM charges huge licence fees (even if you are in education the licences that run out are annoying). Therefore many students will not be able to use it after they graduate simply because it is too expensive for the rare cases they need it for.

R is a general purpose programming language, although a quircky one, that just happens to have very good support for statistics, and data that looks like a spreadsheet (i.e. dataframes). This leads to a different mindset because you can extract parts of the data in variables reuse them, proces them further and make your own functions. This means that it is easy to describe exactly what you do, including being copy paste ready from online sources like Stackoverflow or an email. It also means that there are all sort of packages for processing data (the library system is very extensive and easy to use) e.g. loading data from social media off the net as so called json files (a fileformat that has nothing to do with statistics), transforming them to a dataframe, analysing them as a social network graph, and graphing the results can all be done inside R. To be honest this is not always easy because it seems all programming takes a non trivial amount of fiddling before you find out how you do things, including getting to grip with R's somewhat quirky type system. So using R may have a higher learning curve than clicking on a menu. However, it is a skill set that is usefull for more than just doing statistics. Also R (and the R environment R studio) is free.

By the way, I have heard good things about the combination Python, Scipy, pandas, and matplotlib, but no hands-on experience. Scipy does not seem to have quite the same extent of support for statistics as R (but still quite a lot and if you really need to you can actually use R from within python), but the few times I used the programming language python I found it rather convenient and it has an even more extensive general purpose computing support than R.

tips to get started with R.

*if you don't know how to do things use something like ??t-test and/or google using R in the query.

* The first time I used R I had a lot of trouble to get spreadsheet data in and out of R so I couldn't get started. Turns out it is easy if you know how: save your spreadsheet as a cvs file (in the save as menu from Excel) and use

data = read.csv("my_filename.cvs")

and

write.cvs(data, "your_filename.cvs")

If that does not work, chances are that your computer is set up for a language that uses a comma instead of a decimal dot for decimal numbers (like in my native Dutch or German) and Excel will use a ; as a seperator in the csv. Then simply replace read.csv by read.csv2 and write.csv, write.csv2 in the above and you should be good to go.

Badges
Science topic

More Scott Reza Jafarian Kerman's questions See All

How much does the number of publications and the impact factor of the journals one's have published in is important for USMLE matching system?!

I'm an American-Iranian Medical doctor, graduated from Tehran University of Medical Sciences, have 13 publications (5 in ISI total impact near 7 and 3 other in Pubmed) and a tons of educational...

06 July 2013 6,129 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View