Searching for statistical software to conduct analyses for qualitative and quantitative data to produce graphs and statistical output for a clinical project
You should Use R, combined with a nice IDE such Rstudio.
http://www.r-project.org/
http://www.rstudio.com/ide/
Kind regards,
Guillaume
There are plenty of them
http://www.r-project.org/
http://www.gnu.org/software/octave/
http://docs.scipy.org/doc/scipy/reference/stats.html
At the end just have a look at
http://en.wikipedia.org/wiki/List_of_statistical_packages#Open_source
Would like to quote @Guillaume Béraud there is a wonderful RStudio an IDE for R: http://www.rstudio.com/
I think R is the most powerful free statistical software you can use. But it's really difficult to use it if you don't have experience or enough statistical knowledge.
http://www.r-project.org/
The R website has instructions and also in Research Gate you'll find people who may help you with R.
Kind regards.
"R" package is the best.
If you want free software but not open source,, you have more options such as "Epi-info" of "OpenEpi" etc
You should Use R, combined with a nice IDE such Rstudio.
http://www.r-project.org/
http://www.rstudio.com/ide/
Kind regards,
Guillaume
Depending on what you want to do, R (http://www.r-project.org) is the best open source statistical tool available, however, the learning curve to get the most of out of it is quite steep. Other option is Epi Info (http://wwwn.cdc.gov/epiinfo/) which is pretty easy to use. It is not open source but free to use. Rapid Miner (http://rapid-i.com/content/view/181/190/lang,en/) is another great open source tool, but again there is quite a bit of learning curve. There is also Tanagra (http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html) but I have no personal experience using it.
For qualitative data, EZ-Text (http://www.cdc.gov/hiv/topics/surveillance/resources/software/ez-text/index.htm) is an easy to use tool. The other option is WEFT QDA (http://www.pressure.to/qda/) which is free but I have never used it so can't comment on it. One of the best qualitative data management tools is Atlas.ti and it provides a free demo version which is a fully functional version for analyzing up to 5 transcripts in a project.
If you are asking for any open source definitely R is the best one. Please visit: Http://www.r-project.org/
Thanks
Saiful
I concur with the previous comments: R
http://www.r-project.org/
I will elaborate a bit more. R is a statistical environment. If you allow the analogy, is a "Statistics OS".
It is multiplatform, so you can install it on basically most machines you can get your hands on.
By itself, it has many useful basic tools, but no thrills. The real power of R comes from the "addon packages", that are task oriented programs developed by a very active community. These programs tackle most data-analysis tasks one could think of. And if you can't find the right tools, you can build your own ;)
It is a very powerful tool. But as in many cases of powerful software, the learning curve is... not mellow.
So, if you are going to tackle this beast (and I wholeheartedly recommend you do), you could benefit from assistance from a more user-friedly Graphical User Interface (aka GUI).
I have tried several, since I'm trying to convince some technologically challenged colleagues to shift to this great tool.
At this moment two GUIs hold the top of the heap vis-à-vis ease and usability:
R-commander and RStudio.
R-commander (http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/) is the R-project's own GUI. It is installed from the CRAN repositories, and is in active development. It's not a bad GUI, but I personally found it a bit rough on the edges.
RStudio (http://rstudio.org/) is another open source GUI for R. It has a friendlier approach to it, and makes its use a bit easier for the novice. Two very strong selling points for RStudio are: the possibility to use it as a web service, and the easy integration of version control.
These two point (which I will not describe in detail here) are definitely not important for someone completely new to R, but as you get your hands dirty, they are definitely very interesting additions.
http://www.andrewheiss.com/blog/2012/04/17/install-r-rstudio-r-commander-windows-osx/
There's R:
http://www.r-project.org
And Python with Pandas library, there's also numpy:
http://pandas.pydata.org
http://www.numpy.org
I recommend you this book I'm reading right now:
http://shop.oreilly.com/product/0636920023784.do
Hope this helps!
I would use R. It's free and there is a good introduction on the link below
http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf
I can also recommend the R book
http://www.amazon.co.uk/The-Book-Michael-J-Crawley/dp/0470510242/ref=ntt_at_ep_dpt_3
James,
can you please provide more details on your needs? What kind of statistics do you need?
I agree with others above that R is for sure the richest and most accurate free statistical environment, providing a great wealth of libraries.
It is definitely worth the learning investment.
OpenOffice is suitable for some simple tabular or graphical processing; if you need something more specific there are a number of specialized programs.
I agree, R seems to be the way to go. Great graphics...nothing can beat that. If you are looking for some basic statistics, I would also recommend Epi Info. Not as steep a learning curve as 'R'!
I think indeed the question is how much do you want to invest. If you are a doctor just wanting to do a quick statistical t test then EpiInfo and R is not the answer (though not that hard either).
However, if you are using multiple platforms, I have Windows at work and Linux at home, and want to learn how to use statistical software, then R with a GUI (R Studio is cross platform OS) is your best bet.
You can use Dropbox, for example, to sync over multiple computers and if you really have an extra hour, to set up Version Control. If you change something and want to keep an overview of your changes R Studio integrates well with GIT and Bitbucker (free private cloud hosting Git repository site). Also, for other scientists out there, it keeps backups of everything and shows you changes you have made.
R + R Studio (+ GIT version control + Bitbucket) is the way I work for cross platform research. I am sure there are easy ways to do it but if you want to continue in Research, learn R even though it will take you a month.
Yes, R can help. I have seen it referenced from various published research papers, even to the point of being the primary tool of the research. It has the flexibility to be almost (?) considered a programming language by itself, and a great deal of power. R was actually inspired by an older, for-profit package called S. You can also get extension packages for R from something called CRAN (http://cran.r-project.org/). These can often help with a whole class of problem you are trying to solve.
Microsoft Excel has become almost a default choice, since it is ubiquitous. Cross-platform? There is an Excel version for Mac, but I do not know whether that has been accomplished on Linux. I have seen that people are trying. Also be aware that there is the OpenOffice suite which has something similar. I cannot say whether it does everything that Excel can do.
The choice between these two (I am ignorant of Epi, so I won't weight in) comes down to Power (R) versus Simplicity (Excel). You will not lack for resources in either case. I have a bias towards R as a software engineer. Also, you may never have to worry about this, depending on your application, but I have heard that Excel has certain limitations for row count, etc., that make it impractical for heavy statistical calculations.
The first choice would generally be R. It will work on both Windows and Linux, is open source, and generally reliable (matches results of most commercial software for most things). Front ends like RStudio and Rcmdr and Deducer make the learning curve easier.
Some other choices, not as well developed or strong, but maybe more user-friendly, are SOFA (http://www.sofastatistics.com/home.php), PSPP (an open-source clone of SPSS,http://www.gnu.org/software/pspp/), OpenOffice/LibreOffice (Excel clones which will do some statistics), Orange (a Python-based visual programming interface for statistics and data mining), Octave and Scilab (Matlab clones), WEKA (big open-source Java based machine learning platform),
I'd also vote for R in any OS... but Linux specifically, I have found that Emacs to be a robust text editor than can be easily configured for simultaneous use with R, much like Rstudio in Windows/Mac. If I'm not mistaken, RKward has not been updated/implemented for newer Linux Ubuntu builds.
Dylan, RStudio also runs on Linux. In fact RStudio Web Server only runs on a Linux box. They have both RedHat and Debian builds
I'll also throw my vote in for R + RStudio. RStudio is great for managing multiple projects and workspaces, displaying graphics, managing your R files... it's awesome, and I use it exclusively now as my statistical platform.
My vote is for R Project as well (http://www.r-project.org/) as a starting point. Depending on further requirements other open source tools can be evaluated.
What can I say... R of course. I don't know your specific needs, but I'm quite certain you will find not one but many R packages pertaining to your topic.
I perfectly agree with R advisers! If you use linux, you can also try RKward. It's a GUI for R too but it can be useful for novice, because to the command line approach it adds a close-to-SPSS approach (File, Menu and wizards for common analyses). Hope to be helpfull!
R for sure. One great advantage is that, in addition to what has been said above, R benefits from a vibrant community which provides competent advice. Which can be seen in the above posts.
Another great feature is its modular structure. When you load R you only load the basic packages, unlike other software that loads just everything. So if you need to use certain models you just load them as per your need, which can improve performance.
But the greatest advantage is the fact that R is a few years ahead of commercial software because packages which contain state-of-the art statistical methods are contributed by prestigious authors. For top quantitative research, R is in many cases the only option to go because of the latest implementations available.
I would suggest R (not so good for graphics in my view) or ROOT if you have basic knowledge of C++.
No doubt R is very good and provides facilities for statistical analysis for all disciplines. Moreover, it covers almost all types of statistical analyses that one can think of.
Further, there are GUI like R commander, Deducer and many more that provides options to perform statistics like popular commercial software. But these GUI of R are limited in comparison to commercially available GUI based software.
So who are new to R will find it difficult to work. (I am still not comfortable with R). Thus, for those who prefer GUI based statistical package, I would recommend PSPP an open source package very similar to SPSS.
Definitely R-project:
Not so because of its simplicity, but rather because of the living community working with it.
The folks who point out R's steep learning curve are absolutely right. But this is unavoidable, because statistics itself has a steep learning curve. The more "user-friendly" programs simply pre-package the more common statistical scenarios into a menu item or dialog box, and thus hide the complexity from you. But they do you a disservice, because if your data do not fit the assumptions of the hard-coded methods the software developer implemented, you will get misleading, nonsensical results and never even suspect there is a problem.
So if a developer of statistical software wants to be part of the solution rather than part of the problem, they will find themselves adding more and more menu options, buttons, etc. to address more and more scenarios where their product would otherwise lead the user dangerously astray. And so, the product evolves from deceptively simple to one with a "difficult to learn user interface". Past a certain number of choices the user is asked to make, a graphical interface really becomes less convenient than a command line (because you can't automate repetitive procedures by writing scripts or macros, for example, and you can't save your session log to later remember how you did something).
So the problem is not that R (or SAS) is user-hostile, the problem is that there is an epic disconnect between applied scientists and statisticians and a desperate need for them to understand each other better. User interfaces are the least of our worries.
The steep learning curve for R can be flattened quite a bit through the use of the RCommander interface. All tasks performed via the menu are shown in the output window, so one can get familiar with the syntax very quickly. Moreover, by reading the documentation one can understand exactly what the commands do and edit/append them with other options, and then run them by using the submit command.
This way of working allows one to get accustomed with the building blocks of future scripts and syntax and not struggle so much with the console.
Thus, one can learn relatively quickly the syntax without struggling too much with inputting commands to the console and constantly checking the documentation. This is an advantage over other software such as SAS or Shazam. I never had such help in SAS and it took me much longer to become familiar with it.
Definetively R, with or without IDE. Is powerful, efficient, and there are tons of documentation you can read on-line. Also, The packages tutorials are fairly comprehensive, and normally includes datasets for you to learn the usage of most of the functions. There are also several discussion groups where you can ask questions and browse answers. Is very flexible, and you can customize your script for whatever statistical task you want. And the last but not the least important feature is that is cross-platform, so you can share your work without compatibility issues, or run your code wherever you are.
Kind regards,
Sergio
PAST is easy to learn and fairly comprehensive for most common statistical needs.
Use R. Its free, it is powerful and you will have countless opportunities to do pirate jokes.
As every one, I can advise R, it is very powerfull (maybe to much) There is interface for beginers (I used Rcmdr, at the begening, I advised it, but when you progress, you realize that the choice that they made are not the best and it turn useless)
If you want an easy opensource software for simple statistics, I advise GNUmerics, the spreadsheet of gnome, working on linux and windows... It come with a stat menu that allow to make many comone test, then, it is better than several expensive produce such as XLSstats... Last vertion I tested reminded me some software I used for stat...
Good point free, no trouble of data import, can be used as spreadsheet as exell or open office but quicker... Bad point: only comon stat are availlable...
I would have been enough for my master project... but unsuitable for my PhD's
SPSS is very user friendly... even you can find how to use SPSS videos in youtube for specific statistical tool.
http://www.gnu.org/software/pspp/
Is very useful and similar to SPSS. It dose not have the issues associated with R.
You can check Gretl (http://gretl.sourceforge.net) out. It is a GUI driven package with much stuff for regression, time series and panel data. It has also some robust statistics and, if you need it, a programming language.
I agree with everyone above- R is definitly the way to go. Although it does require learning some basic commands, there are so many fantastic scripts available online that it is sometimes as easy as inputting your data in the correct manner. The graphic outputs are also fanatastic
Lot of online free softwares available.
Good options will be: R, Unistat.
SPSS tool gives very good output
1. SPSS (it is commercial but there are some student versions with basic tools but are nice, in case you do not want to make factorial design) is a very nice system to start and with many different statistical tools to solve several problems (it is also available for Linux).
2. R (is user friendly with RKWard or Rcmdr). The project R (http://www.r-project.org/ ) is not a program but a quite nice platform to use different programs and scripts and to implement your own ideas as well.
Best wishes
It depends on what kind of statistical work will be done. I think both R, also available as RKWard (with GUI) and GNU PSPP are good Open Source programms for statistical work.
R is really good, but if you are new to it best to use Rcmdr or Deducer as GUI interfaces until you get more familiar with the command line interface which gives more power and versatility.
OK "R" but for very beginnings its library "Rcmdr" or R Commander is better
Depending on the complexity of your intended analysis, if it is reasonably basic, one possibility not mentioned so far is to use a spreadsheet like LibreOffice Calc or Gnumeric (both free open-source software available for GNU/Linux and Windows). I find that for some users it is the easiest way to do standard descriptive statistics, some basic inferential statistics and you can even extend the functions yourself, plus the learning curve is very shallow.
R is obviously the best choice. As you will become a part of a vast community of its users whose experience will help you in various ways.
What you want is entirely depending on what you want to do with it. You can perform standard statistical tests on Libra Office 3, or Excel (not Open Source). SPSS student version is a possible solution. SAS is good if you know the syntax.
Alex kinda stole my thunder! He stated the pros and cons of using stat software too well to add much to that discussion. R would definitely be my first choice, and its syntax is quite similar to the C/C++/Java/Javascript languages, so if you have had any exposure to these, you won't be thrown off by the syntax too much. If you want to do math modeling "nitty gritty' style, Octave or Jmatlab are nice open source alternatives to Matlab. If you plan to use multiple scripting languages, SQL database scripting, multiple stat programs (e.g., R and SAS), or general OOD programming languages like Java/C++/C# or others, Eclipse is a nice extensible IDE. Bit of a learning curve, but well worth it.
SAS is not at all open source - it is proprietary. In fact SAS is probably the most expensive statistical software with respect to licensing. Whether you are looking for open source or just free/low cost statistical software, R is the way to go.
However, if you're in a corporate environment and your company is paying the bill, SAS is great.
HI James, I didn't get a chance to go through all 64 answers, but I am going to ignore the open source part of it for just a minute, because perhaps that isn't as important to you as the free part. At the opposite end of the scale from R would be a very finished looking menu-driven package called MYSTAT, which is a free (student) version of SYSTAT, based on the previous release (12), and is limited to 100 variables at a time. It is available from www.systat.com. In terms of a compact, efficient package there is MicrOsiris available from Survey Research Center at the Univesity of Michigan. It won't do all the tricks that R will do, but it is a solid package with good data management capability and can handle large data sets. It also interfaces with excellent multiple imputation software IVEware available from the same source. Sorry I don't have the link handy, but easy to find. Bob
I use R for Linux (Ubuntu12.04) for several years. I started with the R commander, which permits a lot of statistical evaluations by providing a GUI interface. At the moment, I usually write a whole script for a comprehensive statistical analysis.
R it is. I code in Vim but found RStudio to be useful. RKWard crashed too many times with buffer overflows on Ubuntu and it showed inconsistent behaviour.
R is most awesome Software for Statistical analysis. R studio is the GUI for R in windows and in Linux the R it self looks like R studio.No need to install R studio separately.
I agree that R is a good choice. This pakage is nice for shell scripts,
http://oldwww.acm.org/perlman/stat/
You can use 30-day fully functional software "Centurion XVI" from: www.statgraphics.com
Gnumeric (http://projects.gnome.org/gnumeric) is a free Office Spreadsheet that can also plot some useful statistical plots (e.g. boxplots that for example Excel cannot) and perform some statistical tests (descriptive stats, regression, F-Tests, ANOVA,...). And you can paste the formulars behind the tests into the spreadsheet which makes the calculations more transparent that in most other software. Good for beginners.
Code School offers a free (sponsored by O'Reilly), very rudimentary introduction to R (http://tryr.codeschool.com/). Also R Commander is another "front-end" for R (http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/).
I cannot tell you if it's good or bad, since I haven't tried it myself (and am not a statistician), but the GNU project offers a whimsically-named alternative to SPSS: PSPP. See http://www.gnu.org/software/pspp/
And finally, although not exclusively statistical, see:
GNU Octave (http://www.gnu.org/software/octave/), Sage (http://www.sagemath.org/) and Maxima (http://maxima.sourceforge.net/)
Minitab is pretty good and affordable for a commercial software, but i agree with many of the others that R is a very good option. If you want a more GUI based feel with the power of R you can use R Commander which is a library that can be downloaded and used in the R environment. It has many drop-down menus like most windows program. The added benefit is that when you run specific statistical procedures, it will create the actual R code. This will help you learn how to really tap into all the options, functionality, and power that is in this open-source software. Here is a link to a PDF that explains how to install and access both R and R Commander.
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/TheresaScott/How.to.Install.pdf
Good Luck.
R has a wide scope of statistical evaluation, the software allows you to design and code your own test which in essence allows you to create your own tailor made statistical package for your project. R also allows models to be created and linked with other R features so that is a plus. On first use R does look difficult to use but after you know some key lines of code it is very easy to build upon this crating more advanced codes. In terms of what is better SPSS or R it depends on your field I believe SPSS is still the top software in the social sciences and is the best tool to teach statistics to all undergraduate students (physical & social science). But I do believe that at doctoral level researchers and above in the physical sciences and engineering should be using R or one of its equivalents
I prefer R or Octave are excellent choices for stats and modelling. I often use both and SAS for big data processing (SAS is quite ubiquitous). All have a "steep learning curve;" however, as Alex points out, statistics itself has a steep learning curve. I have found in learning the languages, one gets closer to the underlying theorems and assumptions driving the methods used. This is very critical in being able to fully understand and critique your selected method(s) for analysis in order to avoid jumping to the wrong conclusions, and perhaps to open new questions for investigation (this is my favorite part) illuminated by the limitations of methods. Often it is in critiquing and analyzing methodological shortcomings in one's own research that new avenues of discovery open up. Then the fun really begins!
Go with R. You'll be glad that you did in the future when everyone else has caught up with you and seen how good it really is. I can't think of anything that I've ever done or wanted to do, that could not be done in R.
I still like R best, but here are a couple of other good alternatives.
SAGE: https://en.wikipedia.org/wiki/Sage_%28mathematics_software%29
PSPP: https://www.gnu.org/software/pspp/
List of others: http://srmo.sagepub.com/page/free-statistical-software;jsessionid=D721EFCB037B66DFBB1195398D7DD215
I agree with all others pointing to R, but a very good alternative to start with is PAST
http://folk.uio.no/ohammer/past/
in particular because the documentation is excellent.
@J.Jarrett saids opensource leads to unreliability.
On the contrary. Opensource implies that the routine may be scrutinized by anyone capable. This leads constant questioning, correction and evaluation. Search for corrigendum and you will find a number of papers whose results have been corrected after adjusting a routine to the better.
Commertial softwares know how to sell their products. But, do you think their developers are infalible? If your software has a mistake, will you ever know? And if they happent to know...will they acknowledge it for you, so you may correct misleading the research community? Not so many corrigendums from commertial packages, though maybe the errors are still out there.
You also said that opensource routines cannot be statistically robust. I don't think the language at which a routine is implemented affects its robustness at all. Maybe "robust statistically" is not your word here, or do you get differing results at several runs of your R routine?
If I can think of issues affecting the reliability of R they got to do with memory usage and inefficient processing, as the script is not compiled. Not its opensource character.
For simple statistical analysis try with R commander package in R
As an alternative to R (or higher-end commercial tools like SAS) look at SOFA Statistics: www.sofastatistics.com
if you do simple analysis like mean, SD, median and some ttest with plotting and you have only few experimental groups then excel is good enough. you can handle large dataset ini excel, but it becomes cumbersome and time consuming activity.
If you have a large datasets then you need R, SAS or MATLAB which are developed for handing large data and requires some programing and learning curve.
R uses scripts for the analysis, with a very big community at nabble if you need help. it is available for the three major OS.