I have recently done a workshop on Tableau Software of Tufts University for Microsoft Excel Data Analysis. It gives very good graphs etc.
Excel is quiet a powerful instrument often used in biomedical research as a support for datasets. It does contain some basic statistical analysis and could serve for simple inferences.
I would however not call it a "statistical software" as it is not designed for such a use. Also, I would not recommend someone to use Excel for statistical analysis for the following reasons:
1. It is difficult to know what Excel is truly doing when applying a formula.
2. Excel has weird ways of handling missing data.
3. Programming analysis in Excel is complicated and mistakes are easily passed unnoticed.
There are many easy to use free statistical packages available. R might be more difficult to manage for a beginner, so I would try out the following first:
1. Epi-info http://wwwn.cdc.gov/epiinfo/
2. OpenEpi http://www.openepi.com (online)
2. WinPEPI http://www.brixtonhealth.com/pepi4windows.html
In epidemiology, once you decide you seriously want to analyse your data yourself, I would suggest taking a course and using a statistical package such as those offered by the following software:
1. STATA (licensed)
2. R (open source)
3. MATLAB Statistics toolbox (license, expensive, but quality support)
4. SPSS (yearly license)
5. Statistica (licensed, user friendly)
Excel is quiet a powerful instrument often used in biomedical research as a support for datasets. It does contain some basic statistical analysis and could serve for simple inferences.
I would however not call it a "statistical software" as it is not designed for such a use. Also, I would not recommend someone to use Excel for statistical analysis for the following reasons:
1. It is difficult to know what Excel is truly doing when applying a formula.
2. Excel has weird ways of handling missing data.
3. Programming analysis in Excel is complicated and mistakes are easily passed unnoticed.
There are many easy to use free statistical packages available. R might be more difficult to manage for a beginner, so I would try out the following first:
1. Epi-info http://wwwn.cdc.gov/epiinfo/
2. OpenEpi http://www.openepi.com (online)
2. WinPEPI http://www.brixtonhealth.com/pepi4windows.html
In epidemiology, once you decide you seriously want to analyse your data yourself, I would suggest taking a course and using a statistical package such as those offered by the following software:
1. STATA (licensed)
2. R (open source)
3. MATLAB Statistics toolbox (license, expensive, but quality support)
4. SPSS (yearly license)
5. Statistica (licensed, user friendly)
Sure it isn't the best or most powerful. It won't do a lot of things. But for the things it CAN do, I don't see why not use it. Journals will accept sound and reproducible methods. As long as you attend that, I don't think it matters whether you have done things in Excel (as long as you really have enough good content to pass by with as basic statistics as Excel can do). I have in fact read published papers that only used Excel and mentioned that in their methods. Their graphs are much easier to use f you don't need very advanced things, so again for me it's a matter of "can Excel do it?". It cannot do a lot of things, but for what it can do, it's still the easiest and has improved a great deal on design, so graphs look better and better.
Dear Dr. Brijesh Sathian, Excel is a complete programming language but the difficult part is you have to write Macros. Regarding the issue of journal, they require result and quality not the software or tool. But the tool or software reflects the accuracy of the results.
i think you can use SPSS. This is a powerful analysis tool. Also, MATLAB has many built-in functions for statistical analysis.
Why using Excel for doing statistics ? It is very hard to manage for doing statistical computations...there are lot of good statistical software around !
I depends what you want to do. Many standard analyses are possible with Excel, many researchers use Excel, and they obviousely use it successfully. If you have really lots of data or if you want to do some more specialized analyses (generalized linear models, for instance) Excel it out. In this case I would also recommend R.
I would call Excel a statistical software package but it has many useful functions. I personally use Statgraphics but for data handling I use Excel. I did most of the calculations I need for comparing interlaboratory comparisons of chemical data with Excel by setting up tests like mandel k and h, shapiro-wilks and lillefors tests for normality, nesteted ANOVA etc. step by step. It is not too easy but doable. Most importingly, by doing the own setup of calculations you better learn and understand the way the caluculations are done in "black boxes" as such real statistical software packages act.
I also use R which is great but when you have a large amount of data you still need spreadsheets such as Excel to prepare the data. So again, I try to automate most tests in Excel before transfering the data to R (or Statgraphics).
Concerning your journal question: The journal normally does not care the reviewer should have a look on the presented data and see if it makes sense no matter which route you selected to come up with your statistical calculations and interpretations.
Excel is excellent for data management and preprocessing raw data prior to data analysis. In fact, in my opinion, it is superior to any statistical package, including R, for that purpose. The graphics and some of the basic statistics such as t-tests, F-tests, correlations and even regressions are quite easily done in Excel with little to no data formatting beyond what would be necessary for any statistical package. I would not recommend Excel for any ANOVA analyses as it requires a unique data format for the built-in analyses. In addition, it cannot handle unbalanced designs larger than one-way and cannot handle anything larger than a two-way factorial design. You can do more complex and unbalanced designs in Excel using the LINEST function and dummy variable coding, but it is clunky unless you automate it using macros. If you are going to take the time to code Excel macros, you are better off learning to code in R as it is much more powerful and is actually designed for statistical analysis.
I avoid excell and use GraphPad prism. It can create nice graphs/figures and the statistics are very robust once you understand them. The graphs in excell are very poor aesthetically.
@Brijesh, If your name belongs to the set{"Carmen M. Reinhart ", "Kenneth S. Rogoff"}, then you have all the rights to use Excel, even with mistakes. But, as I can see, your name is "Brijesh Sathian", so you have not such a right... (Neither do I, of course!)
I agree with some previous comments and I would also recommend R.
If you want to use a spreadsheet, Gnumeric is a suitable option since it uses the R internal procedures.
Regarding excel, althoug it is widely used for data management, I think that more attention should be paid to some "conceptual" statistical mistakes, related to the terminology of descriptive measures (average, mean, median, ...) and also to the concepts of “probability” and “probability distribution” in the Normal, Student t and F distribution functions.
I just found out that there is another option to use Excel and R: RExel add in for Excel: https://en.wikipedia.org/wiki/RExcel with download from http://rcom.univie.ac.at/. I will look into this as this would be ideal to combine both "worlds" as you use the spread sheets to do the regular data manipulations and R to do proper statistical calculations.
Brian, I tried to finish your surveymonkey but there must be a bug or so, as requires answers although I checked some answers.
If you are going to use Excel, examine your results carefully as some of the functions can give strange answers. The most worrisome that I personally have experienced is negative standard deviations. I do use Excel, but not fir analysis that require built-in function calls. For analysis, I am using Minitab, SPSS, R or SAS.
I would advocate very strongly against using Excel for data management and even more for analysis. The reasons are: 1. it is highly error prone; 2. it is very limited in terms of the analysis it can do (you would likely need to import your excel file into a stats program anyway); 3. it is very limited in the graphical representation of your data. For any researcher, the time investment in getting to know a statistical software package, and using it for data entry/management will be more than repaid. I would not permit any student of mine to use Excel and I will be giving this same message at an upcoming international talk. I personally favour STATA for the following reasons: 1. it is a nice blend of syntax programming and point and click for the beginner; 2. its licences are perpetual (with SAS you need to pay an annual subscription); 3. it has a much nicer graphing suite than SPSS and can do a larger variety of analyses; 4. it has a very user-friendly interface and is quite intuitive (I find SAS a bit more unwieldy in this regard); 5. it is available for Macs.
In principle any software is acceptable for scientific work and publishing tasks. A general condition is that you verify / validate the underlying procedures, and the assembly of your tasks, be it EXCEL, SAS, R or SPSS, etc., or a mixture of tools.
Just, the propensity for errors is higher in some packages resp. functions. EXCEL can be problematic if you don´t know exactly what you are going to calculate, or you don't apply the correct constraints. For instance, empty cells in EXCEL are usually interpreted as value "0", which generates incorrect results.
I have started to program statistical procedures in FORTRAN-II, later other languages like C some 45 years ago. But every procedure had to be at least crudely documented, tested and verified by a number of examples with known results. For special purposes, EXCEL may do a wonderful job, if you dive into VBA programming, and you can utilize matrix/vector algebra. Just, I would recommend to validate such applications thoroughly. The high effort may be a value in cases where you get numerous data deliveries from lab instruments attached to the computer, and you want to perform a "real time" analysis.
For casual users, programming novices, and for performing single analysis tasks with few data this is a clerar "no-go".
There seems no unique "best" or "fits-all" solution for statistical analysis.
Ask your peers what is available at your institution, and what they use and recommend as a stat tool (s). It´s the easiest way to get training, advice, help and support/trouble shooting for your work.
I use Excel extensively to create tools for public health program managers who need to perform their own planning and costing tasks. I use Excel because they find it less intimidating and more trustworthy due to its inherent transparency. However, when it comes to writing for peer--reviewed journals, I find that analyses done with STATA, SPSS, or MATLAB tend to raise fewer eyebrows than those done with Excel. R is popular among epidemiologist and health economists I know; partly because it is free, and partly because it is open source, and partly because it does many jobs equally well as the aforementioned programs.
Still want to give Excel a chance? Then consider some add-ins for Excel such as StatTools and @Risk (by Palisade), which are designed to extend the statistical capabilities of Excel.
A final word. Please use Excel 2010 or newer versions. The underlying code in these versions were extensively rewritten - partially to address the program's statistical calculation shortcomings.
A good statistical analysis must be reproducible, starting from your raw data. Using a proper statistical software allows you to save your data analysis programs, to repeat your analysis if needed and even to share it with other researchers. Which is not possible with Excel.
@Sophie:
Agreed concerning reproducibility, but that is just one of many more criteria, like correctnes in terms of scientific standards.
Disagreement with the statement that it is impossible to save the data analysis steps and repeat the analysis if needed. There is a macro RECORDER function which you can use to track the steps of data manipulation and analysis. You may generalize the recorded steps and create VBA applications, even with nice dialogue inputs, function buttons etc. And, of course, share the data and procedures with other researchers.
Anyway, for complex jobs verification/Validation is a need in order to ensure correctness and reproducibility. The basis is unit-testing. Just, most people have no idea how to plan, document and perform the job. Worst of: time, resources and budget are rarely planned and sufficiently provided for that part of the research tasks.
-----------------------------------------------------------
Just another hint: data structures may be complex, and require data base structures. An easy approach is to use MS Access, and to generate queries from EXCEL to create data views from the Access data base, which may be processed in EXCEL. That is goint to work really fine, and you are not at risk to change the raw data just by error. You can even forward your database to persons without license, as there is a free runtime application that lets other persons use the database, but just for reading (not change or add) data. For big amount of data (>1 Million records) and complex data structures I would recommend more professional databases like ORACLE, MySQL, MS SQL or INGRES.
The link from EXCEL to R has been mentioned already. There are more graphical frontends available to ease the use of R, like GJR http://rforge.net/JGR/index.html.
Personally, I prefer SAS and SAS/JMP for the stats analysis.
It is quite clear: one *can* work quite professionally with Excel - but if Excel is really used professionally (complex analyses on complex and/or big data), then everything it becomes very difficult (e.g. VBA programming!) what renders the advantage of easy-to-use and the point-and-click philosophy obsolete. To me it seems to be much simpler to write an R script than to write (and manage) excel macros and visual basic programs. However, there are really simple tasks that are frequently done with Excel and there is no need for programming at all. And because of the simplicity there won't be much risk to loose reproducibility. For more complex tasks I think that although solutions might be available for Excel, the difficulties in solving theses problems with Excel increase exponentially, whereas the difficulties in other (more "scientific" software) don't increase much at all (or only linearily, if you want). In R, for comparison, the big hurdle is right the beginning (argh: console input! programming!), but after you have managed this, there are no limits. In Excel this is just the other way around: the start is easy, but the very big hurdles come if you want to go for more sophisticated analyses.
There is of course no law against use of Excel for statistical analysis. Certainly if you have done it right and there is no reason to suspect that your analysis is faulty, I don't think any journal will refuse to publish your research, solely because you used Excel. However, I will ask why do you want to use Excel? Excel is a spreadsheet, and I might add very useful software, but it is not designed or intended for performing statistical analysis. There are several choices statistical software packages available. Some of them are available for free download from trustworthy entities such as CDC and WHO that are very decent packages. If you still insist on using it, I will caution that triple check and then triple check your work. Hard-core statistical software can make errors too, but the use of any software for a purpose other than what it was designed and intended for, exponentially increases your probability of making such an error. You can choose which ever software you want, but I would invest some time in learning one or more ‘proper’ statistical software, if I was in your shoes.
Given the bundling of MS Excel and MS Access with Windows software, these two software packages are readily-available at no added cost to anyone with a desktop computer. As such, both are frequently used as a means to enter data from paper questionnaire, and as a means to convert data caught via tablet into a form useable by major statistics software (SAS, Stata, SPSS, R, what have you).
I've worked with many an analysis that arose from data originally manipulated in this way. And that's fine, so long as you recognise the limitations of Excel (and its dBase-clone cousin, MS Access).
I avoid the built-in functions of Excel, simply because I can perform them just as easily in SAS, the stat platform I happen to use. Packages such as SAS permit one to create new variables, modify specific values, manage complex data sets, etc., _and_ to document all of those things in a program for future reference. This documentation is key, if one wants to reproduce a set of steps performed, say, 6 months ago.
Some of Excel's internal formats can be problematic, as well, particularly internal date formats. For example, I had a data set that, when processed via Excel and then imported into SAS, had dates that were 60 years in the future! Not good. The most reliable workaround was to format the date as text in Excel, so that its internal formatting wouldn't muck the dates up.
I would not suggest that anyone use Visual Basic (VBS) to program their own formulae. Some very old - but still out there - computer viruses and worms were written in VBS; and, VBS has its own security holes.
Excel is fine for data-entry (do it twice!). But I recommend that you leave the heavy lifting to a good statistical package.
@Manuel, based on my personal experience, I must disagree with your point (2); problems can arise in Excel when porting dates across platforms. It happens. Even 1900/2000 (also known a s Y2K) issues still pop up, in dates with 2-digit years.
When it comes to error-free computing, Microsoft has a tendancy to be behind the curve. Chronically.
Excellent answer from Paul Vaucher - excel has limitations. You 'should' state what software you used for analysis in the methods. As a reviewer I would think Excell would be an uncommon answer but not a problem in and of itself. I have found both Stata and Epi Info easy to learn.
I avoid using Excel for even the most simple calculations. Try a simple experiment: perform simple stats on a data set (SEM, SD etc.) then repeat with a dedicated statistical package (SPSS, SAS etc.). The differences may alarm you!
In my opinion that Excel is very powerful tool in performing basic statistics and even some advanced ones. See some of my applications and even some comparisons with results from statistical tables. It is very useful for educational purposes.
Standard Normal Table:
https://www.researchgate.net/publication/257652825_Standard_Normal_Distribution_Table_Calculated_by_Excel?ev=prf_pub
Statistical Calculations in Hydrology:
https://www.researchgate.net/publication/236447552_Spreadsheet_Model_for_Statistical_Calculations_in_Hydrology?ev=prf_pub
Random Field Generation:
https://www.researchgate.net/publication/257928431_Random_Field_Generation_Using_A_Spreadsheet?ev=prf_pub
Estiamtion of Population varainace:
https://www.researchgate.net/publication/258120613_Estimation_of_Population_Variance_(CI)?ev=prf_pub
Regarding research, there are several papers are published based on Excel calculations:
https://www.researchgate.net/publication/230641636_Modeling_Monthly_Rainfall_Records_in_Arid_Zones_Using_Markov_Chains_Saudi_Arabia_Case_Study?ev=prf_pub
https://www.researchgate.net/publication/234081674_Analysis_of_Annual_Rainfall_Climate_Variability_in_Saudi_Arabia_by_Using_Spectral_Density_Function?ev=prf_pub
Data Standard Normal Distribution Table Calculated by Excel
Data Random Field Generation Using A Spreadsheet
Data Estimation of Population Variance (CI)
Article Modeling Monthly Rainfall Records in Arid Zones Using Markov...
Article Analysis of Annual Rainfall Climate Variability in Saudi Ara...
I should say that there are some bugs in Excel. I discovered that myself in generation of random numbers of Excel version 3. However, the problem was avoided in the new versions. You can easy find that in the website.
But as I mentioned earlier it is very powerful tool in building some models for illustration to students and it can be checked by other software. There are many papers use Excel in the journal "software engineering" and I think other journals as well.
Nowadays I almost always avoid using Excel. One problem I find with Excel, is that usually one has to copy the same formula to several cells, and there is the very real danger that through editing mistakes, one or more cells may end up with the wrong formula... and this is sometimes very difficult to detect. In a scripting (or programming) language, if code is well designed a formula is defined at a single place, which makes it much easier to detect errors, and avoids possible inconsistencies.
Of course, reproducible analyses, to be usable in the future should be documented in a non-proprietary format, and preferably also human readable, and include comments. I find this much easier to achieve with R than with any other statistical software I have used in the last 35 years.
My experience of teaching R to biology undergrads and master students has been very positive. Communicating what needs to be done or was done is so much easier with a scripting language than with a graphical interface, that students adjusted very fast to this way of doing statistical analysis.
Of course, there are also personal preferences, but for anything that needs to be documented and/or repeated exactly in the same way, a script is best. This does not mean that the same analysis cannot be achieved in other ways, or that the effort needed to learn to use a different language or software is always justified.
So, I think the original question does not have a simple answer. The answer will depend on what is the design of your experiment, what type of statistical procedure you intend to use, how big your data sets are, and how many data sets you need to analyse per year. Say, if you are doing just a few t-tests, and you are already familiar with Excel, although doing such tests is also easy in R, I would just answer that Excel is good enough. But if you have data sets with repeated measurements, or a hierarchical design, or count data, or binary data, or several factors, then learning and using R or another statistics software would be my recommendation. This also goes hand in hand with your knowledge of statistics, as although the possibilities are much broader in R, to be able to take advantage of this much richer set of options you will need a better training in statistics.
As to whether manuscript reviewers or journal editors will accept statistical analyses done with Excel I think the answer is yes, as long as the analysis method is valid. Problems may arise if the limitations in Excel force you to use a suboptimal or invalid method for analyzing your data.
I support @Darryl statement:
"I would advocate very strongly against using Excel for data management [...]. The reasons are: 1. it is highly error prone"
I strongly suggest reading the following:
1) http://qz.com/75035/fixing-this-excel-error-transforms-high-debt-countries-from-recession-to-growth/
"It looks like the most frequently cited justification for fiscal austerity in the aftermath of the global financial crisis is based on a boneheaded Microsoft Excel error."
2) http://www.r-bloggers.com/did-an-excel-error-bring-down-the-london-whale/
"Excel is an excellent tool for many applications, but the intertwined cross-references of formulas make errors like this hard to detect, and hard to correct even if discovered. That's why a programming language designed for data analysis such as the R language is a better platform..."
However, if you still choose to use Excel (or you are forced to) for analysing your data, please be careful and check (at least) twice the formulas you're using.
Nowadays I only use Excel for printing small tables or reading xls files and then export into CSV format for further analysis with R....
Those were my two cents.
Looking for something else I found the following related and enlightening reading:
"Spreadsheet Addiction"
http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction/#excel
In my opinion, any computer program or software has its own advantages and disadvantages. I just use them, but not strongly believe in them.
Dear Dr. Brijesh Sathian, Again I am emphsizing that the journals require result and quality not the software or tool used.If you are comparing the algorithms then software is important to talk about in the paper. In my opinion you have to write programs even if you are using Excel. EXCEL is a complete programming language but the difficult part is that you have to write Macro; hence to know about algorithm and programmin. If you are taking that much effort then why not to go for MATLAB. MATLAB has very good interfacing with EXCEL.Other alternative is If you find SPSS use it. SPSS is rooted from EXCEL and easy to use and handle.
In response to Manuel's comments, I do not think it unrealistic to work nearly entirely in a stats program. I do analysis for a large number of co-authors who use Excel exclusively and I have no problem importing their data to STATA and exporting the dataset back out to .csv if I have generated any new variables. Certainly my personal refusal to use Excel has not had any detrimental effect on my career whatsoever, so why should it be detrimental for my students? In fact, forcing myself to do everything with STATA has enhanced my analytic and data management skills, which has accelerated by academic work.
The analogy I would draw concerning the Excel versus stats software debate is that I can do surgery with a scalpel or a kitchen carving knife. I know which one my patients would prefer I use, and which one I prefer to use. So why would one not use the best possible tools to achieve one's objectives? The possible answers are 1. cost (understandable but for most academics, a stats program is paid for by the university); 2. ease/insertia/comfort zone (to me this is a poor reason for a scientist - I doubt the road to a Nobel prize is the easy one).
Excel is indeed a very useful tool for statistical applications. You can program your own formulae, you can make use of the ready-made formulae, and you can enable the Analysis Toolpak. I just wish the ready-made formulae and Toolpak would include quadratic regression and exponential regression (and their inferential counterpart), and more non-parametric tools.
We should also noticed that Excel has not only statistical functions and tools in statistics but it has functions and VBA for making a complete software of your own. Some of these softwares have been approved by some international agencies,like program called "BIOSCREEN" approved by EPA (environmental protection agency). The program is for simulation of the fate of oil spill.
@ Eddie & Amro:
In principle, I agree with you that you can perform any scientific calculation with EXCEL. Just, you need stats/math knowledge, and some programming skills.
This is usually scarce in young academics who just perform their first steps in data analysis. Beginners without computer addiction are much better off using a standard statistics package like SPSS, SAS or R.
If you want to produce reliable results, you need verification of results resp. validation of the program. That is much more laborious with EXCEL/VBA than using a standard stats package with its script language(s).
If you are collecting and processing mass data, covering millions of records, then the limited size of EXCEL spreadsheets makes analysis very hard, if not impossible.
For those who have still in mind to use EXCEL for their work (and there may be good reasons for this choice!), there are numerous books out to learn the use of ECEL/VBA in scientific applications.
Personally, I found this book very helpful:
R. de Levie, Advanced Excel for scientific data analysis, Oxford University Press, New York (2004) 625 pages; 2nd ed. (2008) 707 pages; 3nd ed., Atlantic Academic (2012) 646 pages.
The book covers a broad spectrum of topics, helping to understand what you do, including rules for good quality design of programs and debugging.
R. de Levie has published just a lot of prctical solutions in the chemometrics applications based on EXCEL.
Look at his home page: http://www.bowdoin.edu/~rdelevie/
Please take my 5 cents, also.
some 10 years ago, when teaching biologists, I strongly pressed not to use Excel at all, recommending Word and Access, then, if statuistics neede any of available packages (Statistica, SPSS, free programs).
Later I discovered, that Analysis Toolpack is evolving, and may give a lot functions readily available - if data tables are in Excel.
After all, there is huge possibilities for manipulation of data with simple formulas, built in any version of Excel. So I myself used Excel a lot. Various books about VBA and "using Excel for xxx" are available.
Currently, I understand that:
1. Missnig data in Excel are treated by various and unpredictable way; if I forget this, results are anso unpredictable and wrong;
2. There is limitation as for number of variables in Excel
3. Copying formulae, there is high chance to make mistakes, which goes unnoticed;
4. Bugs are also present;
5. Many statistical packages, including free ones, are much easier to apply.
So now, to my opinion, Excel is a toll for data entry and simple manipulation. Calculations should be made using statistical software, designed for this purpose.
And last, if you need something to your specific purposes - there is still possible to use VBA or programming languages, to make tailored aplications.
I agree with many comments above. I'd add that
1. excel can easily mess up date fields
2. be careful when sorting in excel (columns can be sorted independently)
3. it is difficult to share what you have done in a simple, compact manner. When I'm handed a complicated analysis completed by someone else using excel, it can be difficult to tease out what they have done. If someone shows me their SAS, stata, SPSS or R code, it is much easier to follow their logic.
4. if you have any interest in analytics, I'd recommend learning to use a statistics package. There are many advantages. If you conduct an analysis, you can save your code and replicate the analysis if the data set changes. It is also easier to accomodate updates to an analysis. Over time, you can save a repository of the code you have used on projects. In 5 years, you will be an expert. In 10 years, you will be invaluable.
One more bad thing in Excel:
If there is an empy line, filters work unpredictable. I currently lost time investigating, why data dable obtained by filtering Excel and result of the crosstab of the same data in Access yield different outcomes. Just wonderland!
@Pd
you will be expert in 10 years of work in almost any field, without saving code or programming - if you are capable for this job. I kknow many biologists, who are good in their field, but simply NOT CAPABLE for any statistics except of Student's t :)
Dear Dr. Brijesh Sathian,
Nowadays, Excel is a powerful programming language which embeds many options. It has versatile interfaces with other programming languages. I switched to modern EXCEL and developed many useful software in the area of VLSI testing and data security and published those tools in int'l reputed journals.
Excel can do a lot of simple statistics. If no advanced statistics is needed in your data analysis. It is okay to use it as a statistics tool. I don't see any reasons journals would not accept. However, if your data analysis requires more advanced statistics, you should use statistical packages like SAS, SPSS, STATA &etc. There are many add-ins to Excel for more advanced data analysis. However, the learning curve required to master those macros and VBA (visual basic for applications) is no less than the learning curse required to master a statistics package.
@Mingfu,
please check, how these addons use missing data in excell, just two cases - empty cell or zero.
For anybody who knows EXCEL, and wants to keep use of it, but has not much of an idea for stat proprietay planning, there is an ecellent interface to the language R: Since 2003 we have acces to REXCEL module, which links R through EXCEL. R and REXCEL are bothe free (REXCEL only for noncommercial use). This book by Richard Heiberger (the creator of REXCEL) proved a good starter to learn R, and use the convenience of EXCEL input:
http://www.amazon.de/Through-Excel-Spreadsheet-Interface-Statistics/dp/1441900519
It also a matter of personal preference. Some may want to use Excel as a platform to access R modules through interface, others like to use R as a platform to import Excel data and work in R environment.
I recommend to read several papers by McCullough about this subject. For example McCullough BD, Heiser DA (2008). “On the Accuracy of Statistical Procedures in Microsoft Excel 2007.” Computational Statistics & Data Analysis, 52, 4570–4578.
or the most recent by Bruce D. McCullough, A. Talha Yalta (2013) "Spreadsheets in the Cloud-Not Ready Jet" in the Journal of Statistical Software http://www.jstatsoft.org/v52/i07
Basically these indicate that Excel is even today not very exact in computer intensive calculations, and that Microsoft does not attend to the reported bugs quickly enough.
Apart from that, I think scientific journals should worry more about the replicability of the works they publish and for it, they should recommend the use of free software.
There are at least three statistical free software packages that have reached a level of maturity comparable or superior to the best statistical propietary softwares: R, Octave and Gretl
In my opinion the use of Excel or other software still depends to how deep you want to go in your analysis and how huge is your data. Excel has limitations. So, you may use it for simple analysis and relatively small amount of data. The nice thing about Excel is the user friendly graphic interface and that you perform animation of your results without heavy programming and you can notice the change in your analysis once you change your input data at once. You may see this example and many others on the website:
https://www.researchgate.net/publication/236863728_2D_Gaussian_Density_Function?ev=prf_pub
Data 2D Gaussian Density Function
The problem with Excel is the user friendly graphical interface. This means that while doing your analysis you do not at the same time document in a useful (humanly readable) way what you are doing. Hence you cannot repeat what you are doing, you cannot automate what you are doing, you cannot easily correct what you are doing if you only discover that there are mistakes after a lot of data manipulations.
If you are interested in reproducible research, do not use Excel. Use R. Learn about reproducible research and about literate programming (that is: writing programs which *humans* can read, as well as computers).
Dear Richard
I know that mathematician do not like Excel, however from Engineering point of view it is a powerful software. In the engineering field, there are many software are based on Excel. I worked with statisticians and probabilists and they do not like even to work with microsoft products at all.
When I worked on Excel for some basic analysis, unknowingly, some serious mistakes occurred while using filter tool and then removing data.
Though being a powerful software, if you are not well tuned on it, it is dangerous to use it, leading to errors.
Sure, Excel is powerful, and if you don't like Microsoft, you can use LibreOffice. However it encourages bad research practices, it leads to unreproducible research, and it is closed - you do not know what algorithms is used nor how they are implemented. R is nowadays easy to use and does not have these disadvantages. It has many, many advantages.
The call for caution in using EXCEL is surely justified. Just recently, one of my MD students showed me a graph that he cast doubts upon. A simple odds ratio plot (Forest plot) with 95% confidence intervals, added as range of low and high cell range. The calculations had been performed correctly. When he switched the odds ratio axis to the log scale, the excerbations were completely wrong. I tried another representation, originally generated for representation of stock indices (high-low-final), and that was represented correctly in the log scale. This was one more error not yet reported from EXCEL.
A serious word of caution: you may produce wrong results from ANY software, even R, SPSS or SAS if you excercise bad working practices. Specifically, if you don't test your programs or scripts thoroughly. Any surprising or unexpected result should alert you, and motivate for error checking. Most of these findings will be the result of data and/or program and/or handling errors.
One more point to be added. Excel deals with 256 cases maximum so for larger data sets you should know how to use SPSS.
The links below shows some comparison between Excel calculations and tabulated statistics and mathematical functions and they are identical,
- Normal Table.
https://www.researchgate.net/publication/257652825_Standard_Normal_Distribution_Table_Calculated_by_Excel?ev=prf_pub
- Gamma Function
https://www.researchgate.net/publication/258344123_Gamma_Function?ev=prf_pub
-Well-Function (Exponential Integral)
https://www.researchgate.net/publication/258818061_Well-Function?ev=prf_pub
Data Standard Normal Distribution Table Calculated by Excel
Data Gamma Function
Data Well-Function
@Sahifa:
You are adressing the limitation of variables, not cases!
The limitation on number of variables per table is 255 (=2^8-1, and not 256), which applies to MS ACCESS in a similiar way (just keep the number of fields there to a maximum of 252 in order to avoid running into troubles!). The number of observations per table is limited to ~65000 (2^16-1) in EXCEL, and to 2^32-1 in MS ACCESS.
If you are dealing with larger datasets, it is wise to use a fully-proven statistical analysis systen, be it R, SAS or SPSS, or similar.
This is really a question of picking the right tool for the job. Excel can be very useful, but to me, is most useful at enabling users to see what happens to an output as conditions are changed. Pivot tables also provide a good means of exploring data. Of course, Excel has other uses, but for many of these there are better alternatives. Excel can be used for data collection, but isn't particularly good at this (other free tools such as REDCap are much better).
Excel has statistical functions, but I'd hesitate to think of it as a 'statistics package', and at least in medical statistics, there are far better options fo this (Stata, R, SAS). It can manipulate data, but again, with large, complicated data sets, other more powerful options are available (we use SAS and SQL).
Still, it can be a useful tool...it really depends on the context and the skill set of the user. I've seen people use them for impressive analyses that include markov monte carlo simulations. However, my preference would be to do these things in R.
I have been using Excel extensively in Engineering, finance, accounting and BI, BUT not in Statistics. At the most, Excel is at its best once using raw data for data import or export results from a statistical package for better illustration and elucidation. As modus operandi, I explore Excel data in MS Explorer view mode only cuz just opening the Excel and especially Access dataset sometime modify file structure based on the version of MS office installed on your machine. Excel 2013 has been very much beefed up in line with Google fusion tables and address Business Intelligence issues to greater extent but not quite statistical, still.
Dear Dr. Sartaj Alam,
Now, Excel is a complete programming language. Since it is built in with MS office otherwise MATLAB and SPSS are good for statistical analyses.
I agree that it depends on the context of the analysis required eg descriptive statistics (which in some situations makes up the bulk of the analysis) is fine for Excel, but predictive or inferential or non probabilistic stats is more difficult to do in excel UNLESS the user is highly trained in statistics and able to do a lot of work from first principles and understand the results. While STATA, SPSS, SAS are often used in health sciences research, I have found STATISTICA to be a great step on from excel for non statisticians wishing to do their own analysis in their applied field (eg healthcare). The system is much cheaper than SPSS, puts out much more usably formatted results (tables) than STATA in that they are more spreadsheet like and very compatible with excel which almost everyone can use; and has great online reference material. It is really important for those serious about research to do more than just an introductory stats course, and seek out expert advice, but they shouldn't need to become biostatisticians themselves.
I agree with Pd Faris above: Excel is a useful tool for working with most data (e.g. exploring, cleaning, etc) due it's graphical interface. However, it is much less useful for analyzing data of any depth due to the restrictions imposed by relative cell references. For analysis of all but the simplest data sets, you are much better off using an actual stat package.
In my opinion, as long as Excel calculations meet your analytical and statistical needs there are no reasons NOT to use it. Spreadsheets programs, in my opinion, are of no real importance as long as the experiment is designed correctly and the results are interpreted properly.
If it violates the assumptions i suggest that one can use another test if it agrees withe first that violates so the assumption is not true.
Excel's built-in regression add-in will do the calculations correctly but provides very little in the way of output that will support good principles of analysis or presentations of results. There are a number of commercial add-ins which provide such capabilities, but here is a recently-released free one ('RegressIt") which, if I do say so myself, provides better graphical output than any other regression software out there: http://regressit.com.
It is acceptable but using SPSS and PRISM both are more professional software to analysed any kind of data..
That depends on the characteristics of your experiment. If you are working with balanced design, the analysis can be done even with pen and paper by hand. Excel has all the tools to do the statistical analysis of any experiment, may be better MATLAB or SCILAB. What you should not do is to analyze an unbalanced experiment, using the traditional formulas of: regression analysis; analysis of variance or covariance analysis, because you can make serious mistakes. The most serious problem, is that you can use a formula or function erroneously. In the past, we had to spend considerable time to invert the coefficient matrix, to perform the calculations for the statistical analysis.
If you know the formulas and are an efficient user of MATLAB or Excell, you can do your analysis, using such programs, but probably you waste more time, that you can use for the interpretation and writing your paper.
I recommend to use SPSS or STATA software rather excel. Sometimes, if you are not used to it, its annoying.
One of my articles was rejected by an anonymous journal referee due to the use of Excel spreadsheets for Monte Carlo simulation – the tool dimmed inappropriate. A quite common notion within a statistical community. That motivated the work on the article which, I believe, provides evidence on Excel suitability:
Botchkarev, Alexei (2015). Assessing Excel VBA Suitability for Monte Carlo Simulation. Spreadsheets in Education (eJSiE): Vol. 8: Iss. 2, Article 3. Available at: http://epublications.bond.edu.au/ejsie/vol8/iss2/3
Alexei, have you noticed that the screenshot of the Excel table about the Chi²-test on page 21 of your publication suggests to "Accept the null hypothesis" (see cell T12)? You may consider to change this...
But more interestingly: From the information you provide the rejection of the reviewer could have been well justified. It depends on whether or not you have used the Excel's built-in random number generators (RNG). The quality of the RNG is essential to MC sims. When you only stated that you used Excel I would also be suspicious and think it is inappropriate - unless it is clearly stated that a different, external, and suitable RNG was used (you state in your own article that the Excel RNG is not ok).
That brings me to another question (just for curiosity): if MC sims have to be programmed in VBA, and if even then some external RNG has to be used, what is the advantage of using Excel for this purpose at all? Only the fact that input and output can be done more or less conveniently in spreadsheets? (I wonder if this can be a good argument in front of people who know to program [at least VBA]).
Dear Alexei
I just have a paper published that is based on Excel.See the link below below. In that paper there are many references that used Excel as well and published in good journals.
https://www.researchgate.net/publication/266266232_A_fully_distributed_spreadsheet_modeling_as_a_tool_for_analyzing_groundwater_level_rise_problem_in_Jeddah_city.
However, I should say that Excel 2003 has a problem with RNG. It
can produce RN out of the range 0 and 1. However, they solved that bug in the next versions. You may see that on the website for Excel.
Article A fully distributed spreadsheet modeling as a tool for analy...
Dear: N. Abdulwarth
You could do the statistical processing with any tool such as Excel, SPSS…..etc. or using any programming language such as Python, R. All tools acceptable due to the important point is the results and its discussion not which tool using.
Software is not a matter. The important is that how can we achieve our objectives. In fact, I have not used statistical software yet. I understand myself that MS Excel is an excellent program.
In case we have simple tests like T-test , Z-test or F-test we can use Excel for analysing our data. But, if we have experimental design, we'd better to use SAS or Minitab software for life science, and SPSS for Social science. Because there are some journals accept only some specific softwares. However, Ms. Excel is a very good tool for data analysis.
Very simple, someone can achieve goal using Excel capabilities then I don't think to have any difficulties.
Excel is an astoundingly powerful medium, and its programmability leads the list of features and benefits. It is the most widely used spreadsheet software and has immense public support. For sophisticated applications, there are some easier programs to use, but they often blind the researcher to what is exactly going on in the software. This can be readily controlled with Excel.
There are some distinct limitations and awkward situations in Excel, but there are also some very cute workarounds available via google search.
I use the macro recorder sometimes as a lazy man's way to do VBA coding. It produces workable code from tracking one's keystrokes. This is an easy way to get the detailed syntax for some command or other.
Speaking of Excel graphing, it is the worst with the newer version. For instance, a label in a graph used to be moved around by just holding the ctrl key while moving around the arrow keys. Now it is all gone and become very cumbersome to do the same job. MS rush out the new version before fully tested out to eliminate bugs. It offers some addons but that is where the cumbersome come in. Otherwise it would be impossible to do the job anymore.
Just an update about Rexcel,
This email about the product from Mr BAyer:
Dear Paulo,
I cannot tell you anything about RExcel licensing. statconnDCOM is not free for "noncommercial" use but only for Home & Student users fully complying with the HOME&STUDENT LICENSE.
statconn Excel (the successor to RExcel) will be available under the same license.
I believe the main goal of a scientific paper is to prove that the results came from the right statistics. If the calculations are done correctly in Excel, I believe that there is no problem.
Another important point is that the statistical technique must be used correctly, which, unfortunately, is in many cases left out, as in the case of assumptions for the use of a given technique. In these cases, the researcher may have used SPSS, SAS, STATA, R or Minitab, but their analysis may be compromised.
Excel is a great tool for the experimentalist. It is great for taking and treating data, and for smart control of external devices (motors, autosamplers, valves, etc.). The statistics I find most handy use the histograms, because a lot of processes and devices do not have uncertainties which follow gausslsian distribution. This is a snap in Excel. It is also a great teaching tool because you can construct the math step-wise in cells and simultaneously show the results graphically.
Ms Excel is a great tool for data analysis, but recently most of the scientific journals are demanding for a reproducible research, it means that you must provide the data and code used in the development of the paper. So, personally I would recommend using R, but excel is ok for most of the basic data analysis.
It is a very bad idea for a lot of reasons, starting from the absence of any logical separation between statistical units (rows) and variables (columns) like in any good statistical software, for the bad graphics, for the lack of many crucial tools like principal component analysis and other multidimensional statistical techniques, for the need of selecting by mouse (and not by logical operations or simply by direct call) the area to be submitted to analysis, for the lack of any efficient input-ouptput cycling....
Alessandro, separating rows and columns to define one for one thing or another is a superficial thing and of no real consequence. Excel, augmented by VBA lets users do exactly what they want. I have found so many dedicated packages make things seemingly easy but remove control and explicit correctness from the user for the sake of ease of use and programming assumptions which are invisible to the user.
Dear Scot, sorry but I do not think is so superficial, it is trivial (if you want) but not superficial, in thirty years of statistical work the most frequent errors that in many cases destroyed the meaning of the analysis were linked to this kind of confusion.
Scot, I think if one has to advocate using VBA to make some more serious statistical analysis with Excel, this actually disqualifies Excel. Although using some programming language to do analyses is (or can be) very professional and (can) produce reproducible, modifiable and communicatable analyses, but for this task there are far, far, far better tools available than VBA (e.g. SAS, Python, R, Julia, ...).
PS: I am afraind this attempt of an English text is quite clumsy - but hopefully still understandable.