I am on the starting stage to study a software. I know spss only. I want to study a software either R or SAS.Some of them say R is better and others SAS is better. I am totally confused. Which one is best and easily understanding and highly used in medical field?
I wonder how you narrowed down to those two. SAS is one of the most comprehensive statistical packages available. As compared with others it has the most powerful tools for manipulating data sets. Although it has added much functionality, SAS has preserved the programming lanuage and the idea that SAS should look pretty much the same regardless of what type of computer it is running on for at least two decades. In other words it doesn't look much better running on a PC than it does on some sort of mainframe computer running UNIX or something else. In other words, given that most users are now on PCs, it is a pain to learn. SAS's status as the killer in terms of statistical features among the commercial stats programs has been severely challenged by Stata. In a lot of ways Stata has surpased SAS, although not in data manipulation. Stata is an interesting hybrid. It is not open source by it has the ability to incorporate user-developed routines greatly amplifying the capabilities of an already expansive program. Like SAS, it is mostly run from a command line interface. Finally, R: R is actually free and open source. It is not pretty and has a steep learning curve. It has routines available to do just about anything, including things no commercial pacage does or things only expensive specialty packages do. Although there is no direct tech support from the company who makes it, there are lively R support communities who give advice, help fix problems, etc. Many of the routines for R are well known. Most routines have been thoroughly compared to commercial software that does the same thing demonstrating that R gets the same results, etc. Just like commercial softare, it might have a bug from time to time. Personally, if I were starting out with a new platform, I wouldn't be considering SAS. I would be considering Stata or R. Based on other answers I have seen on ResearchGate, I think you will get a lot of responses favoring R.
Thank you Robert... also i want to knowsome doubts about R and SAS. Some of them say that in clinical field,only SAS result is approved and others not?.
If you don't mind me asking, what are you trying to do in the end? Will your focus be on graphics or on tables? What are your resources? Who will be your customers? How much time would you be willing to commit to learning a new language? Is there a standard for data analysis in your particular area?
SAS will provide you a better comprehension and parsimonious output.
To respond to Deepak's follow up question, I don't think that SAS is the only acceptable package, but R might not be completely accepted. The medical field is a bit unusual in that the name of the statistical package is usually disclosed for even trivial statistics. SAS is certainly accepted, but so are SPSS, Stata, BMDP, and probably quite a number of commercial packages. I have seen Stata make huge gains in academics and don't expect to see that change. Right now, if I wanted a commercial package to start from scratch, I would do Stata. I don't use it myself because I am nearly 30 years into another package, but if I had to make the choice now, it would be Stata for sure.
Hi again, Deepak. I was thinking about this more today. Is there a reason you can't stay with SPSS? If you did, you could add on R for special statistics that maybe can't be done in SPSS or you might need to buy more modules for SPSS in order to do. The biggest problem with SPSS is usually the cost for modules, support, and frequent upgrades. R would give you lots of options to do things like multilevel modeling, structural equations, propensity scoring, not to mention Bayesian statistics and simulations should you need those things. Another option if you want a user-friendly package like SPSS, but cost is a problem is to consider SYSTAT. There is a special link for SPSS users at http://www.systat.com/SwitchToSystat.aspx. You might be interested to know that Systat version 13 was actually written in India (Cranesoft in Bangalore). I had an opportunity to interact with some of the programmers and statisticians during the beta test. There is a free 30-day demo version and a completely free version called MYSTAT, that is based on version 12 and is limited to 100 variables and lacks some of the advanced statistics, but even MYSTAT has more statistics than the SPSS base. The graphs available in the SYSTAT family are very similar to those in SPSS, which means better than most. There is no reason that you can't do most of your work in virutally any mainstream stats package and use R for stuff the stats package can't handle. I use SYSTAT that way, and most of the rest of my work is in HLM (Scientific Software International), but I have also used Mplus (www.statmodel.com) and other software over the years. An interesting free stat package is MicrOsiris from the Univesity of Michigan Survey Research Center, where they also have IVEware, one of my favorites for the imputation of missing data. Of course, I have support of colleagues who work in Stata, SAS, and R, although the capabilities of all these packages overlap a lot. Frequently our papers cite two (sometimes more, especially if one is IVEware) software pacakages used. I hope this is helpful. Bob
thank you Robert. actually i wanted to do SEM and confirmatory analysis. i have no add on packages of SPSS like AMOS. thats why i wanted to study another software that will have all the options of SPSS and some more techniques also. i wanted to study it at the earliest. so i am totally confused what would study first SAS or R
Thank you all.. for your valuable suggestions. my filed is analysis of medical and clinical data
Hi Deepka, R will give you all those things. SAS is a costly annual license although the academic license is not so bad, but I have no idea about India. The module in SAS that does SEM is not great but not terrible. I have only really looked at it once. It should also do CFA. To me Mplus (www.statmodel.com) is the leader in those areas, but it is expensive. It's a program you buy, but the technical support requires a license after the first year. I understand that there are pretty good routines in R for these things, but I haven't seen them or worked with anyone using them, mostly we use Mplus. SYSTAT 13 has a relatively new module (included) for CFA. The SEM program in SYSTAT, RAMONA (also does CFA) is relatively easy to use, but is old. The biggest problem there is the lack of model statistics all the more up-to-date programs give. If you can keep your access to SPSS for the basic tasks, I think that using R for advanced statistics would work. The most important thing is to get hooked up with the on-line support communities for R, because there is no tech support, or if you can afford it, consider Mplus alongside SPSS. Mplus is really brilliant. They offer training courses, even in Asia sometimes. Bob
Thank you Bob. Do you know SEM and CFA detailed?.. Do you the application of SEM in the medical and healthcare field?
Deepak, I don't do those things very often. My latest papers of anything like that was a latent class growth analysis "Trajectories of Internalizing Problems in War-Affected Sierra Leonean Youth: Examining Conflict and Postconflict Factors" and an SEM called "Conservation of Resources theory in the context of multiple roles: an analysis of within- and cross-role mediational pathways" (although this paper just came out, we wrote it more than 2 years ago), both done in Mplus. I have a CFA called "A Closer Look at the Measurement of Burnout" and anohter SEM "Fit as a mediator of the relationship between work hours and burnout" done in SYSTAT. Finally, " The relationship between job experiences and psychological distress: A structural equation approach," which was LISREL. All of these are full text on my ResarchGate profile. I don't think things are done much differently in medicine or healthcare except to say that investigation of measurement is relatively new in medicine as compared to psychology; it's more widely known in psychiatry. I don't know much about publishing in healthcare at all. I don't read much about that.
My recommendation would be to start with R - it is free, so if you don't like it, you've lost nothing. If R doesn't give you what you need, then move on to SAS or Stata. Stata seems to be easy to understand and use. I recently saw someone turn it on for the first time on Tuesday morning, and by Wednesday afternoon she was comfortably writing syntax and manipulating data.
Good luck with your analyses.
Based on the simplicity of your question, I would say SAS.
Not that SAS is simple. It's a pain in the SAS to learn. But it has all the standard statistics. If you like memorization without understanding - it is perfect. R requires a minimum of understanding of programming and logic. It offers much more convenient flexibility. SAS is widely accepted as a standard (I suspect some payoff may be involved), but it is not bug-free. Neither is R. If you are barely literate in my field (statistics) and want to make money or impress your friends who don't really know statistics, then become another pain in the SAS.
Yeah, it's all true, but I don't see any compelling reason to give up SPSS, and R already gives options you can't get in SAS. Plus in many academic areas Stata is either displacing SAS or already has (economics, for example), I was around when SPSS was king and SAS knocked it out (before there was even such a thing as a personal computer), so I know what this looks like. So between R and Stata, I don't see any real reason anyone needs to start with SAS at this point in time. I remember when even BMDP was bigger than SAS. If you see where SAS advertises, they are moving on, too. They are focusing much more on their business products and less on their academic ones.
Besides being free, R has the advantages of multiple ways of doing things (for example, several packages for mixed models) and very good support via mailing lists -- in fact, package maintainers such as Douglas Bates or Frank Harrell often respond to questions on these lists.
I would recommend to use R: it can perform a huge amount of statistical analyses due to the excellent packages that are written for R, and it is free to use. However, when you are really interested in performing CFA and SEM, I would recommend using Mplus (http://www.statmodel.com/). In my experience, Mplus is better suited for these types of analyses than R or Lisrel.
SPSS is very user friendly! R is a little confusing unless you get used to it
SAS with no doubt ! SAS is a real statistical software based on the paradygmo of data matrices having as rows the statistical units and as columns the corresponding variables. Moreover in SAS any new analysis made on the data producing new statistical units descriptors (e.g principal component scores, classification into clusters, probability scores from linear discriminant analysis..) are directly added to the data matrix as other variables (columns).
Moreover SAS does not require programming if not the logical sequence of:
proc 'name of the procedure' data = 'name of the input file';
this allows to concentrate on the real thing and not on accessory and distracting elements like program listing, the real thing is to explore data structures not build programs ! Moreover any statistical procedure you can imagine is in SAS.
Many people I know use both R and SAS. The strength of SAS is that it is a powerful tool for data management. When you have very large databases that require lots of massaging/manipulation, it's great to have SAS at your fingertips. But R-enthusiasts claim that SAS is behind a few years if you're interested in doing 'modern' statistics. There are a lot of new components added (for free) in R that SAS will take much longer to implement at a cost. So ideally, you can conduct all your data management tasks in SAS, export your data, and do your fancy statistics in R. En viola!
SAS in fact is easy to learn, a lot easier than STATA. You learn simple commands and then you keep adding some more, but it is very intuitive language. You keep your editor very easy, SAS dont bother if you write with capital letters... Any person can learn SAS, I teach for my dental students and they had no problem with basic commends. R is free but more difficult for those who are not into the world of math and pure statistics.
It is some hard to answer this question. Because we could not say that which software is better for a field. It mostly depends upon the type of analysis that you want to perform. Actually SAS is very strong regarding network usage and works better in case of large data in large-scale studies. In addition, SAS is more visual than R that makes its use simpler for those who are not familiar with programming languages. On the other hand, every software could perform some kind of analysis in its simplest way, and generally we could not say which software is better than the other one. hope for you the bests.
If you go for high-throuput data analysis in medicine like all these "omics" things, there is no way getting around R. When medical research enters "systems biology", one needs to simulate and analyse dynamic networks, and this is right away possible to do with R as well. If you stay in fields of "standard applications" in data analysis, it will be a matter of personal prefence what to use.
R is a free software and it is a strong advantage from my point of view. Moreover, R can be enriched with various free packages designed specifically for specific purposes. E.G. for genetic analysis there is the package "genetics", for haplotypes analysis "Haplo.stats". Each package has a good user's manual. On the other hand, R is not immediately friendly. It takes a while before getting into it. Worth a try.
Don't think twice! Start to use R now! SAS is still very powerfull, but it is expensive and the R evolution is something incredible. If R don't do what you want now, it will do it in a few months...
I dunno; I"ve programmed in S-Plus (the commercial, costs-money version of R) and I'm a SAS programmer of many years' experience. Both have learning curves. But to manipulate and manage data, I would prefer SAS. SAS ver. 12 has a lot of powerful analytical and data management capabilities, and I disagree that it's at all 'behind the times' in terms of statistical analysis.
R and SAS require you to think about your data a bit differently, because R acts more like an object-oriented language. SAS will remind you of SPSS, although SAS draws its user syntax from PL/1, an IBM programming language from the 60s/70s. When SAS Institute adapted SAS for use on other OS platforms, it chose, wisely, to keep the user syntax the same so that current users wouldn't have to relearn.
However, SAS is not cheap. If you have to buy it yourself, then check out less expensive alternatives. STATA is cheaper, and R is free
As an addendum to my comment above, where do you expect to work and/or study? If the places you're looking at are predominantly SAS shops, then learning SAS would be a very good (if not necessary) idea. While it's true that many places use R, SAS and SPSS are still the go-to packages chosen by institutions for data management and analysis.
They are the same. R, SAS, S+, .... uses the same methods and techniques. So, Use the software that you know
I used SAS for many years and find it very powerful and well developed. R will be more and more popular and I think in the near future, R will be the top one.
R is not only free, it is more powerful than SAS, SPSS and STATA http://www.analyticbridge.com/group/productreviews2/forum/topics/product-reviews-comparing-r-matlab-sas-stata-spss
In my view, MINITAB is comparatively good software in terms of price and functionality. Graphics provided by Minitab is also quiet good.
It depends on the accuaracy you want.
Look at a comparison of packages for computing the sin(x):
===============================================
(1)SPSS:
Unable to compute sin(x) when x>3*10^15. see the log below:
===============================================
COMPUTE sinx=SIN(x).
EXECUTE.
>Warning # 625
>The absolute value of the argument to the SIN function is too large. The
>largest permissible value is about 3 * 10**15. The result has been set to the
>system-missing value.
>Command line: 24 Current case: 10 Current splitfile group: 1
...
1,00E+001 -,5440211108893698
1,00E+002 -,5063656411097588
1,00E+003 ,8268795405320025
1,00E+004 -,3056143888882522
1,00E+005 ,0357487979720165
1,00E+006 -,3499935021712929
1,00E+007 ,4205477931907825
1,00E+008 ,9316390271097260
1,00E+009 ,5458434494486996
1,00E+010
1,00E+011
1,00E+012
It is accurate until x=10^9.
===============================================
(2)STATA 11:
Unable to compute sin(x) when x>10^18. see the log below:
===============================================
. clear
.
. set obs 20
obs was 0. now 20
.
. gen x=10^(_n)
.
. gen y=sin(x)
(2 missing values generated)
1,00E+001 -.5440211296081543
1,00E+002 -.5063656568527222
1,00E+003 .8268795609474182
1,00E+004 -.3056143820285797
1,00E+005 .0357487984001637
1,00E+006 -.349993497133255
1,00E+007 .4205477833747864
1,00E+008 .9316390156745911
1,00E+009 .5458434224128723
1,00E+010 -.4875060319900513
1,00E+011 .9981087446212769
1,00E+012 -.0208029765635729
1,00E+013 .9688757658004761
1,00E+014 .9671487808227539
1,00E+015 .9944344162940979
1,00E+016 -.4965634644031525
1,00E+017 -.5700774192810059
1,00E+018 -.2154808789491653
1,00E+019
1,00E+020
It is accurate until about x=10^9.After 10^10 the error increases.
For example:
sin[stata](10^18)= -0.2154809
while sin(10^18) = -0.9929693
===============================================
(3)R:
===============================================
It computes everything but it looses accuracy after x>10^16:
x sin(x)
1,00E+011 .9286936605443354419975
1,00E+012 -.6112387013579698713528
1,00E+013 -.2888852824922840678568
1,00E+014 -.2094084333834997091461
1,00E+015 .8582721324763733505847
1,00E+016 .7796799451610669784429
1,00E+017 -.4646441089357642995061
1,00E+018 -.9928161040530034675555
1,00E+019 -.8556574595436564623085
1,00E+020 -.7469218912594929316029
> sin(10^15)
[1] 0.8582721
(the true value is 0.8582728)
> sin(10^16)
[1] 0.7796799
(the true value is 0.779688)
> sin(10^18)
[1] -0.9928161
(the true value is -0.9929693)
===============================================
(4)Libre Office Calc:
===============================================
1,00E+001 -0,5440211109
1,00E+002 -0,5063656411
1,00E+003 0,8268795405
1,00E+004 -0,3056143889
1,00E+005 0,035748798
1,00E+006 -0,3499935022
1,00E+007 0,4205477932
1,00E+008 0,9316390271
1,00E+009 0,5458434494
1,00E+010 -0,4875060251
1,00E+011 0,9286936605
1,00E+012 -0,6112387014
1,00E+013 -0,2888852825
1,00E+014 -0,2094084334
1,00E+015 0,8582721325
1,00E+016 0,7796799452
1,00E+017 -0,4646441089
1,00E+018 -0,9928161041
1,00E+019 #VALUE!
1,00E+020 #VALUE!
===============================================
(4)SAS:
===============================================
It is accurate til 10^13, then do not work:
x sin(x)
1E11 0.928693660496592
1E12 -0.611238702376889
1E13 -0.288885294817525
1E14 .
For comparison reasons the sin(x) with 32 digits accuracy are:
1,00E+011 0.92869366049659195285722716417521
1,00E+012 -0.61123870237688949819202041532463
1,00E+013 -0.28888529481752512226278104601737
1,00E+014 -0.20940830749645230269474599582368
1,00E+015 0.85827279317023583552388639084841
1,00E+016 0.77968800660697875023552795403336
1,00E+017 -0.46453010483537269615452411397506
1,00E+018 -0.99296932074040507620955301726363
1,00E+019 -0.92706316604865038523412228966494
1,00E+020 -0.64525128526578084420581171131252
For floating point arithmetic packages we conclude that in accuracy
R and LibreOffice have better performance than SPSS 20 , STATA 11 and SAS 9.2
The problem is that many statistical software packages are too expensive while their 'engine' for floating point arithmetic computations is too poor. Why to pay so much for a program that cannot compute with accuracy even a simple function like sin(x)?
I don't need necessarily the degree of accuracy that Demetris needs - I need it for stats not maths, ie estimation rather than accuracy (such a sweeping generalisation!). So for me that is less of a concern.
But I do need to be able to manage, import, sort & merge lots of data sets, handle many thousands of records with hundreds or variables, write code to make new variables, do statistical tests & make plots & charts, and modelling,
R can do those last things well, though it helps to understand more stats. And I agree with Sandro that it is the way of the future for stats analysis, but I don't know how good it is for large-scale data management.
SAS has powerful data management features, & I think it is this that has led to its routine use in many large corporates and government departments, as well by researchers. Wider job opportunities for you if you can write SAS.
Stata is more friendly to use than SAS, much better Help screens, but again I'm not sure about data management.
There is also EpiInfo, available free from WHO & CDC - more limited in range of stats but comes with data entry, edits etc facilities.
And SPSS that you know already.
I like the link Jason gave - though I'm not sure all SAS features are well represented.
So I suggest you consider not only your present needs but also what your uses of software are likely to be in the future and your data management needs. And of course what is available to you and is supported in your setting.
When learning, there is quite a bit now on YouTube and other websites - you may find short summaries introducing you to the various features of the package you choose.
I'll be interest to hear what you decide, and why.
Dear Joanna,
In response to your doubt on the ability of R to manage large scale data, I tell you that I workwith microarray data, which implies more than 30 thousand variables per sample plus the enrichment analysis, which correlates different databases with your data, working online. All of this in a regular personal computer.
I'm sure that SAS is a very powerful software and may be the most powerful today. But, as a proprietary software, will be difficult to keep up with R in the next few years. And it is so expensive!
Wafik, you have the correct of it. I'm unsure if there is a "best". Use what is available at least cost. If it does not have the bells & whistles you need, use what is next-most-available at least cost. Keep in mind that learning curves for each kind of software can be expensive too, in terms of time and effort. I use SPSS for this reason. Also available are SAS, Epistat, R, Stata, LISREL, EpiInfo, likely others. (I understand Excel also has some statistical function). "Best" is a matter of choice, opportunity and need. But even before any of that, let's discuss research design issues, sample size and power estimation. Who uses what to plan experiments?
SPSS is great and current while SAS is ailing and may soon be gone! I'd rather you continue with SPSS and add epi info which is more international and epidemiological9depending on your area of interest). Best wishes!
I analyzed a data from some genome wide association study, having more than one million SNP information. I loaded the data in R. It took some time but it worked. I did not face any problem during the analysis with R.
Dear Joanna,
Working with user-friendly programs like SAS or Excel is sometimes dangerous.
Look what happened when Carmen Reinhart and Kenneth Rogoff were wrong in their paper because of an Excel mistake.
I am reproducing from the attached link:
"This is a big deal because politicians around the world have used this finding from R&R to justify austerity measures that have slowed growth and raised unemployment. In the United States many politicians have pointed to R&R's work as justification for deficit reduction even though the economy is far below full employment by any reasonable measure. In Europe, R&R's work and its derivatives have been used to justify austerity policies that have pushed the unemployment rate over 10 percent for the euro zone as a whole and above 20 percent in Greece and Spain. In other words, this is a mistake that has had enormous consequences."
Here in Greece we know about austerity measures.
Now, finally, we learnt that these measures were due to an Excel mistake!
http://www.cepr.net/index.php/blogs/beat-the-press/how-much-unemployment-was-caused-by-reinhart-and-rogoffs-arithmetic-mistake
And, what about Stata? I would like to know your opinion. At Spain some research groups are using Stata and they have a reallly good feeling with this software. I personaly use SPSS only but I´d like to know alternatives to ti.
Hello Juan. Like you, I use SPSS because 1) I have used it since it's inception many years ago and know it well; 2) it generally produces most statistics I require in my work; 3) it is or has been available to me at little or no cost the entire time; 4) it interfaces well ("imports") most other kinds of data formats. If one of the criteria above aren't met by SPSS I generally resort to STATA because it offers something (usually #2) SPSS does not. (This could easily be SAS for the same reasons except that I have not used it and do not care to expend the time to learn how.) In extreme cases of statistical need I have used LISREL because neither SPSS nor STATA perform the requisite statistical tests. There many other options as well e.g. -R, -S, Epistat, Systat, BMDP, EpiInfo etc. I recommend though, you use one system as a "base" tool meeting requirements 1-4 above and select alternatives based on your own needs/abilities/preferences. I cannot speak to "errors" as described above as they likely result from investigator decision-making including choice of statistical software. I'm certain other folks have their own opinions but I would caution anyone using "best" in their description because it is at the end of the day a subjective criterion.
As implied in previous comments spss seems relatively easier to use and i resort to others when spss cannot carry out the required analysis particularly multivariate graphical plots.
I would recommend R-software because its a freeware, but if cost is not an issue to u then SAS has a bouyant statistical capacity but not a very encouraging user-friendly interface.
My alternative softwares are Minitab, Statistica and Genstat which have an above average capacity to carry out most statistical analysis particularly analysis relating to biology and ecology.
Thank you for your answer, Dr. Holden. I´m relly in accord with all your opinions. Your comments are very useful me and confirm me in the fact that the attitude I've taken so far may be correct.
Best wishes
SAS is the best no question about it if you have the money, but where it lacks is the graphs which are not great compared to R.
If you have enough information about synthax language, R is the best over the other software such as SPSS, STATA, Minitab except SAS. SAS is similar softwareto R as its base oppornitues.But you should look forward to SAS is very high price software while R is free. Also version development of R is very quickly compared to SAS and other software because of R developer also researcher on special fields. If you consider the all items about R and SAS, you should start the R because of it is more creative then SAS. I could definely sugget to you R as a person started with SAS.
Good luck
With respect to determining the best use of SAS Software (when/how to use/apply SAS Software), you first need to answer a couple of questions about the nature of the data that you wish to analyze. The first question that you need to ask is, "what is the format [file type] of the data that you want to analyze?". SAS can be used to analyze data stored in a number of file formats, e.g., *.txt, *.dat, *.csv, *.xls, *.xlsx, *.dbf, etc., which gives the user great flexibility in how to approach their analysis. The second question that you need to ask is, "how large is the dataset you want to analyze?". I have used SAS to analyze datasets having approximately 4,200,000 rows (observations) and at least 14 columns (variables), so SAS is capable of processing extremely large datasets with relative ease.
Once you have answered these questions, the next step would be to determine the type of analysis you want to accomplish with SAS. SAS can be used to run just about any type of statistical analysis that you would want to perform (e.g., T-Test, ANOVA, Regression Analysis, QQ-Plots [quantile/probability plots], Pearson/Spearman/Kendall Correlations, Boxplots, Histograms, Cumulative Distribution Function plots, etc.). SAS can also be used perform linear programming analyses to find maxima and minima for a system, based on stated system constraints. Along with the support.sas.com website, you can perform a Google search on the type of data analysis that you want to accomplish using SAS and you will find publications from SAS user's groups, written by experienced SAS users, that generally provide useful explanations/clues on how to approach your task(s). A large number of health researchers, environmental researchers, social scientists, pharmacological researchers, and epidemiologists use SAS as their statistical analysis tool.
R is an extremely powerful tool for statistical analysis with great flexibility, has a large user/support community and is free (no cost). You can download R onto your computer (I have it on my computer) and begin work. There are some good books that you can investigate to learn how to use R in your work. I am recommending some R books to you from my personal library (see below):
1. Introductory Statistics with R
Author: Peter Dalgaard
ISBN: 9780387790534
2. The Art of R Programming
Author: Norman Matloff
ISBN: 9781593273842
3. A First Course in Statistical Programming with R
Authors: W. John Braun, Duncan J. Murdoch
ISBN: 9780521694247
4. R in a Nutshell
Author: Joseph Adler
ISBN: 9780596801700
The SAS community realizes that R has important functionality that can be integrated into SAS programs. I am including a reference (journal article) from the Journal of Statistical Software (January 2012) that shows how R capability can be integrated into SAS using SAS macros.
Both tools (SAS and R) have excellent capabilities and are complementary, so either tool that you choose will be able to meet your analysis requirements. This is one of the situations where either choice you make will be the correct one for you.
Go for [R]!
It is not only free, but also open sourced. Greater potential for growth and development comes from open sourced software with a strong contributing community such as [R]. I don't think in the long term, commercial software can compete with the continuous development and updates generated by the user community.
As a student, I find very much comfortable to work with R. It's easy to use and provide useful packages for many applications of statistics esp. in non-parametric test and the like. However, SAS provides detailed output when you run any data in it. But for me, R is more applicable in medical field.
R/SPSS: Dear friends it is the researcher who chooses to perform statistical analysis decides: We must go with an apt application and rest all will surely serve well
I like SAS very much because I can use batch mode to run SAS program, which it doesn't use any computer resources during writing program. I use Textpad to write SAS program and call SAS to run it and then the result will be in LST file, log in LOG file.
I am not sure if R can do it.
Textpad
http://www.textpad.com/
use "External Tools" to call SAS
As earlier mentioned, SAS would have been embraced my many more researchers if it were more affordable. Its potentials are indeed exceptional.
SAS is not only a software for statstical analysis, it also can be used for data management. If you have to merge data from different files, SAS will do the job. You can also safe data from an analysis (for example mean values or the slope of a regression) and merge them with other information. For example you can combine the reaction of a person to a certain treatment with its age / BMI ect.
This makes it a very flexible tool. The main problem has been mentioned before: the price. best wishes Irene
As reported in previous answers, I would recommend R, too (although medicine is not my research field, I work in biostatistics). It is a powerful open source (and free) software, in constant evolution. Moreover, in my opinion R is much more than statistical software, but also a programming environment. I would say that you can conduct any analysis you may need in R. Personally, I have conducted some analysis in R that could not easily carry out in SAS (e.g., complex nonlinear mixed-effects modeling with several hierarchical levels and multiple random parameters).
Best regards.
It depends, someone is easy to use R but another will be happy to use SPSS or may be SAS, Or some would like work with MSTATC or Statistix or Stata. The software which is easily accessible go for that or may be you can use different software for finding the solution of your problem . Each and every software is of great importance on its own place.
It depends, R may not be very suitable if you need to deal with large data, as by default it loads data into memory for processing. Under windows OS, it will mean a maximum of of 2 or 3 GB (or 16 GB if it is 64bit version). I believe for statistician or data scientists working in medical field, they would prefer SAS for another reason: it is a commercial software supported by a big company, which means in the event when some goes wrong for the app which happens to have something to do with SAS implemented models or algorithms, there may be safe buffer.
If you have enough money to pay SAS... but, in most of the cases, R is the best way!
R- being a open source program has become sophisticated for statistical data analysis with tons of user feedback and developers (unlike S which was a older version of R)... Any statistician, Bioinformatician these days would ve started with R and eventually moved to SAS.. SAS has lot of features that R doesn't have just because you pay a lot to get the license... STATA is a equally fun and user friendly program like SAS... If you are new to R / SAS .. I would recommend you to use R initially so that will get to learn a lot.. If you are stuck with an error in R/SAS "stackoverflow.com" is a good choice to get help...
I prefer using R because I find it easier to program in R. But both programs have their own advantages.
R is definitely more cutting edge than SAS, because the software is open source and thousands of R users can submit new features to R to quickly add new analyses, graphics and functions to the software. If SAS wants to add a new feature, then they must pay statisticians and software developers to create the new features and test them prior to release. This development process will take months or even years. Most like the new features will not be available until SAS releases a new version or a new expensive "add-on" software. That's why professional statisticians, statistics graduate students and bioinformaticians do most of their work in R ... because R makes it easy to share your new methods with the rest of the world.
The main advantages of SAS are customer support and its handling of large data. SAS is very expensive. It probably costs more than $1000 USD to purchase the base SAS software and each add-on package can costs hundreds or thousands more. A company could easily spend $10,000 or more on SAS licenses for a single researcher. However, all that money buys you world-class customer support. If something doesn't work right in SAS, then paid customer support representatives will help you over the phone or by email to resolve your problem. I've had a number of interactions with SAS customer support and they are mostly pretty good. If something doesn't work in R, then all you can do is post a message on the R mailing list / bbs and hope that someone chooses to help you in their free time.
SAS is also slightly better at handling large data than R. By default, the R software stores its data in the RAM memory of your computer. For most people, that means you will only be able to store about 2 GB to 16 GB of data in R. The base SAS software stores your data in "virtual memory" on your computer's hard drive. That means you could easily handle 100 GB or more on a relatively cheap and old Windows PC with a fast hard drive. There are options you can use to expand the data handling capabilities of R, but using the default settings ... SAS is better at handling large data. Both SAS and R can manipulate data via SQL queries on large databases, so the differences with respect to large data are not terribly significant ... but I would give SAS a very slight edge.
I've been using SAS since 2000 and R since 2004. When I started using SAS back in 2000, it seemed like the average SAS user was still some kind of a statistics geek. The annual SAS User's Group International (SUGI) conferences mostly featured statisticians and other researchers focused on quantitative work. However, over the last 10 years it seems like SAS has largely abandoned academic researchers to devote most of its time to big business analytics. Their annual SUGI conference became the "SAS Global Forum" in 2007 and since then many of their keynote speakers have been business "motivational speakers" in the mold of Tony Robbins. Seriously, they had former NFL quarterback Joe Theissman as a keynote speaker in 2012 and baseball general manager BIlly Beane for 2013. The real SAS geeks are meeting and sharing their papers in much smaller regional conferences like "Northeast SAS User's Group" (NESUG) and others, while the official company-wide conference seems to cater to non-quantitative MBA-types who can make decisions about large license purchases..
I point this out because there has been an almost ideological shift in how people choose between SAS and R. For the most part, if you are an academic researcher working at a university or if you belong to a small startup company doing cutting edge bioinformatics, then you will need to use R. If you work for a large corporation (drug company, etc) or a large government agency that has already bought into SAS, then you will continue using SAS. Other software companies like Stata, SPSS and Minitab are mostly niche products with much smaller communities, in my opinion..
Depends on the purpose. If you are interested in doing "standard" analyses, then any software is good and will serve your purpose, given your available budget. However, many studies, and exploration of data, are not standard. Visualization and data mining are very important, as are working with high-dimensional data in this omics age. SAS has improved greatly, but the typical SAS installation (without Enterprise Miner) is a bit hamstrung for modern needs in modern medicine. If you have the money, SAS can do most things, and its graphical capabilities have finally entered the 21st century. One disadvantage I find, is that it is hard to leverage other tools that are out there, perhaps more suited for a task, from within SAS. One exception, interestingly, is R, which can be accessed either using a macro or via PROC IML (which is not a favorite proc for SAS users)
R has a learning curve, but also a very strong ecosystem and you can customize analyses and simulations to do almost anything you would like. It is not menu-driven, though there are modules to help with that even. Today you can even create dynamic graphs using Javascript directly from R, so mining and slicing and visualizing data is quite powerful.
My take is that SAS is the old warhorse, with, for all its cost and "enterprise level" promise, is not that much better than R, and in some respects worse than R. Given the flexibility needed in current medical research and practice (think decision support and personalized medicine) as well as the changing demands with data volume and velocity, R is probably a wiser choice going forward.
Just one comment to another of the commenters: SAS output is parsimonious?!! It will give you every possible test under the sun. I've had new users (non-statisticians) asking which is the right one to use; there is no guidance from the manual or the software. Not giving you everything is sometimes a good thing.
Our working group often works with very large data sets and complex designs (multiple split-plots, nested and repeated with different random factors - and everything in the same analyses). We have also many very experienced R-Users which were not able to develop running models for these data sets in R. Today, there are very powerful Procs in SAS (Glimmix, HPMixed) which have been developed to analyse very complex designs with complicated data structure.
According to my experience R can handle very large data sets, and can do complex analysis.
SAS is the standard but SPSS, Minitab and other are very good as well.
To me, the primary reason for picking R or SAS is whether you want to work in a bleeding edge environment (R) or a stable development environment (SAS).
There are core (relatively stable and well documented) R packages provided by the R-Project and many more user provided packages which vary in quality up to the level of extending the influence of new research in statistics. Fundamental changes in the software occur roughly every six months and upgrading may break your old programs.
SAS adds new scripting languages and procedures but the old stuff works much like it always did and a ten year old program may well run and provide the same results it did ten years earlier. SAS can do just about anything... as long as the SAS programmer is willing to spend the time making it do anything. If you're a pharmaceutical company taking decades to develop new drugs for submission to FDA, this kind of stability may be helpful. It may not be as helpful if you're just publishing on the cutting edge of bioinformatics.
Secondary considerations are the size of your datasets and price.
R can only work with datasets that can be loaded into the random access memory of your computer. On the other hand, it doesn't cost you any money.
Some (but not all) SAS procedures will develop summary statistics one record at a time so that you can work with any dataset that you can store on a device. The IRS finds this to be helpful. For a single user of SAS, the cost ranges from thousands to tens of thousands of dollars. For institutional users of SAS, the cost per user goes way down as the number of institutional users increases.
Ease of use is not a consideration. Both R and SAS are openly user hostile. 8-)
Depending on the work at hand, I have used both R and SAS extensively. I prefer R.
I have been using both SAS and R for quite a while and without any hesitation I would chose R. It is open, free and flexible and you can very easily share your development with the research community. With an environment such as Eclipse or RStudio (both free) you can do pretty productive software development. As a researcher I would go for R.
R is a trend. It has a good name and therefore seems the precise choice, however not a good choice. Let me help you. SAS actually cares about statisticians. They provide me in their Academic Research program which is global free SAS courses equivalent to up to $11,000. They also offer every professor free teaching materials and use of their SAS on demand. I have met the leaders when I delivered my SAS paper at the SAS Global forum in San Francisco. They gave me books as well as free software. They write to me to ensure I am doing well and offer me yearly extensions on my programming courses if I am too busy with Ph.D. research. Now, you tell me which company is better to use. I can guarantee you that Dr. Goodnight delivers. His entire team at SAS are personally involved and they know me and care about me. They even offered to write a success story about me when I spoke at San Francisco and I wanted to wait until later when I achieved more. They even cared about my baby Gabriel and did an interview with me and discussed how Gabriel is just 2 and sent me the entire K-12 curriculum free. I can personally guarantee you if you get involved with SAS you will never look back. They are my personal trainers. The people involved I know by name and when I met them they knew my name and all my details. You just do not get this type of attention, support and help. They are worth what they charge and if only investigated you also can become a free consumer of all their training. SAS is amazing and I love their company. What they have done for me personally, given me in training and also personal support at the conference and online. I can get in touch with Julie Petlick and she will meet all my needs with SAS and the training. If you do not get invested in SAS you are missing a major educational opportunity. Statistics is a life long study and they are they with you on your journey to being an effective statistician or researcher.
Patrice, I have to disagree... It seems that SAS is very interested and cares a lot about statisticians. But we are not choosing a retirement plan. There is no doubt that SAS is an excelent software. But R is free, always up to date and can be used by the biggest and smallest companies. So, it is powerful, updated and democratic. You can choose any field of statistical analysis and you will find a package in R, no matter how new it is. And if you don't like the way the package perform, you simply change it. So, R is flexible too!
Of the many discussions of more stable programs (SPSS, SAS, etc) and open-source programs like R, Charles White's is the best I've seen. Though I'm a hardcore R-user, he makes a great argument for adopting the historical approaches on one's own field.
For interested users, this particular discussion has surfaced before on ResearchGate:
http://tinyurl.com/l6sb589
Once again (and like others), I caution Patrice's approach to this question. As Charles White has addressed in a couple of ways, SAS no doubt has its place in business and a number of other disciplines that (1) demand a stable working environment, and (2) are more conservative in the speed at which they adopt new analytical tools. This is not to say that these are fields that stagnate; rather, these may be fields--like medicine--in which rapid adoption of a new analytical tools may be difficult to deploy or may be altogether detrimental to users. From my own experience, I would guess that most current scientists are willing to give new analytical tools a try, after reviewing their advantages and disadvantages. After all, most of us are constantly searching for modeling frameworks that best match our data (i.e. matching model assumptions to our data). So, once again, we're faced with rapidly-developing R and relatively stable (aka static) programs like SAS and SPSS when considering overall functionality. For this, here's a resource showing that R now has >31,000 functions compared to SAS's 1,100. And, they're being developed more rapidly for R due to its open-source structure. Sure, arguments could be made for SAS having additional functions for various outputs, etc, but that's still quite a beating that R is giving SAS. Here's the article with the amateur analysis:
http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/
Re: "R is a trend" (Patrice Rasmussen above). From a review of Google Scholar citation,
http://r4stats.com/2012/05/09/beginning-of-the-end/
R does show a upward trend, whereas SAS shows a steep decline (also a trend). This can be explained in a number of ways, and I realize the downfalls of such a sampling methodology. And, of course, the absolute number is still awe-inspiring (though I only know two people in my field who actually use SAS). Still, given the lack of additional information on R versus SAS usage among various fields (and my current laziness to procure any), these data suggest that R is on the steady rise and SAS is starting to wane.
Patrice, I also must ask again, just to prove my and others point: What did you personally pay for your copy of SAS? As a scientist who interacts and mentors several with several Latin American students with little access to such expensive resources, I think it's a requirement to consider return on initial investment (especially when one is using, say, taxpayer's dollars to build educational infrastructure).
..the cost of acquiring a SAS license is the only obvious issue limiting the wide spread use of SAS. Outside this issue of cost, SAS is undeniably one of the best options around.
And, to belabor the discussion, a nicely summarized comparison of SAS and R:
http://rconvert.com/conversion-switch-to-r-data/r-versus-sas-a-summary-list/
To this, I add: R has more concise code (aka fewer lines), which is better for those of us wanting to distribute reproducible code. (I limit myself to base R to avoid some of confusion of function names, etc., but the code is often still smaller than SAS--in terms of numbers of lines).
I've already recognized SAS as a powerful player in many fields. My only criticism is of the hard-liners who are intent are naming it the only/best option in town.
J. Patrick Kelley,
Thank you for the kind words you used to describe my earlier post and your analyses. The link you have provided is interesting and relevant but obvious marketing of software services implemented in R. So, I'd like to make a few comments with regard to what the link says about documentation, what the link says about cost, and expand a little on my agreement with your discussion of "the best option in town." I tend to write strongly but I recognize the following are only my opinions.
Documentation. Free SAS documentation appears to be of significantly higher quality than free R documentation. Both sets of free documentation are available on the web for your own evaluation. However, SAS and R both have high quality statistical authors who sell books related to the software.
Cost: To me, the single biggest cost to using any software is training the user. Software, classes, and books are obvious but seriously look at the user’s hourly wage, factor in overhead costs, and factor in the amount of time spent maintaining the software. Time spent actually using the software is time spent learning how to use it better. If you’re going to encourage someone to change software, please recognize that you’re asking them to give up valuable training and the new software needs to have a reasonable potential to provide that user/organization with more value.
Best Option in Town: If you start with no history of software that does what you need done, I recommend making a list of what it is you want done with new software, finding every software package that does what you want done, and asking around about the cost, available service, user community, and available training. Of course, most people reading this message will have a history of software that does what they need done.
As I have said before, I use SAS or R depending on job requirements but I prefer R.
Chuck
..a very important point highlighted by Chuck."training the user is the biggest cost'! I agree!
@ Chuck. Very good and informative message. I enjoyed reading that. I kept expecting to read some very strong verbiage (as you mentioned), but I saw nothing!
I agree with all of your points. I do think you're right on about the cost of training the user. The original poster's question--like many other questions---here on ResearchGate are inquiring about program recommendations as if the user has no previous experience with a program in a particular field. The cost of training certainly is relevant to those already in a field that may already have a history with SAS or another program. I have no argument with that. But, for a new user, the cost of procurement trumps the cost of training.
Depends on what is the objective of learning additional package. If it is for using for your data analysis, you may for R because it is available free. If you want to enrich your CV for better job prospects especially in corporate sector, go for SAS because a good proportion of corporate sector possesses and uses SAS and therefore require SAS personnel. According your investment on learning the package will less for R than the same for SAS
@Murali. You raise a good point. Also consider that there are many companies that use R, including Merck, Google, and Facebook. In those cases, any programming knowledge will likely help your resume.
also, SAS can run R code and we are working on ways to generalize using Phoenix Integration's ModelCenter software
I've worked with both R and SAS and generally find R easier. That said, when I was working in industry and had access to dedicated SAS customer support services, I found the quality of SAS people and their commitment to customer support just phenomonal - several times I had gone to them with modeling questions and it was akin to hiring an experienced consultant. I can really appreciate the value that SAS as an entity adds to the field, but at the same time it is almost always easier for me to explore problems and formulate/code up solutions in R.
As for data management, I've found now that R connectivity to Oracle, MySQL, Postgres databases is really quite comparable to the SAS ACCESS add on, using ROracle and related packages, though this was not always the case.
Memory limitations can be an issue though.
Anyway I'm probably being too long winded here - my vote is for R.
I could not imagine working without one of those two. Both are good and any applied biostatistician should have access to both. That is my experience and opinion.
If you know SPSS (and I mean really know SPSS, as in you write your own "syntax"), there is no reason to learn SAS. Anything SAS can do, SPSS can do. In fact, over the years, SPSS has adopted a lot of SAS jargon (Type I and III SS and LSMEANS are just two examples).
If you're going to learn a new package to add skills and analyses not easily available to you in SPSS, you should learn R. R is an order or magnitude (or two) faster than SAS PROC IML in my benchmarking. R is a much better platform for bootstrapping and Markov-Chain Monte Carlo methods. R's graphics are still much superior to SAS, especially for publication purposes.
I know (as in, I can and do program in) SAS, SPSS and R. SAS simply doesn't bring enough new tools to the table to be viable (vis a vis SPSS). R, by contrast, does bring a fair amount of new techniques (not least being a good implementation of Lee Wilkinson's Grammar of Graphics ideas) to the party.
Unless you have strong reasons for learning SAS, pick something that gives you options you didn't have before. When should you choose SAS? Well, suppose you've taken a job in the Pharm industry and part of your job is doing analyses for regulatory submission. R isn't (last I heard) on the regulator's approved list. Given its rate of change, I don't expect to ever be there. In that case, you need SAS, because SAS is what everyone uses.
Dennis Cason, there isn't any regulator approved list of analysis programs for submitting to FDA. There are standards for software validation and change management, which are applicable to software from any source. Some FDA regulators are also R users. The advantage of SAS over R for FDA regulated studies is that SAS makes a point of being stable over decades while R (my software of choice) will gleefully break old code if members of the R-Project core group believe it will improve the software over the long term. Regulated projects can run from 5 to 20 years. R releases updates roughly once every six months. For more (cryptic and convoluted discussion) on software validation for FDA Regulated studies see: General Principles of Software Validation; Final Guidance for Industry and FDA Staff
Point taken, Charles. Wasn't there an effort to bring R (or perhaps some fork from R) into compliance with the FDA Guidance? If that effort has borne any fruit, I'm unaware of it.
But I believe my point remains valid: the rate of change (and the change policies) of the R code base are such that R in its present form isn't a viable tool for regulated studies.
Anyone who doubts that should look at the help wanted advertisements in AmStat News. Regulated shops seek SAS proficient programmers.
Regulated shops absolutely seek SAS proficient programmers.. Guidance for using R in FDA regulated environments can be found at: http://www.r-project.org/certification.html
I'm a professional and I will use whatever tools (software) my clients will make profitable for me to use. The general purpose statistical software I have experience using includes SAS, R, and Minitab. OK..., I've used SPSS but that was almost 30 years ago.... ;-)
R is for professional and creative statisticians. People who understand what they are doing and need to create novel analyses because they face new problems or are unsatisfied with standard solutions. SAS is for people who need to press the buttons in order to perform standard analyses, without knowing what they are doing.
I quote " SAS is for people who need to press the buttons in order to perform standard analyses, without knowing what they are doing." This statement is all wrong.
Richard will you ever learn?
While I started many years ago as a SAS user, I have been more comfortable with R recently. Two factors are driving this. First unlike before data size is no longer much of a constraint in R. Second R graphics is far superior and more flexible than SAS. Somebody in SAS has to be reminded how important graphs are for statistical analyses. They sure can do a better job.
I exaggerate. Over the years, SAS has grown to allow other paradigms of statistical analysis from those embodied in its original design. Similarly, R has evolved and moreover could be used as the engine inside a SAS-like system. However I beiieve that the original design philosophies still dominate the character of the present organisms
Moreover, R remains an open system which grows as new needs arise while SAS is commercial, closed.