There are arguments that we should teach statistics, not only using manual calculation but also by using software such as spss,minitab,stats and etc. Most of teacher or lecture prefered to use manual calculation. What is your expert opinion on this?
I did not teach before statistics but I know R is the best statistics package. Also, Matlab and jmp
If you look through the typical intro to stats book, it uses a lot of approximate methods, because the exact methods are too difficult or cumbersome to do by hand. Take for example the normal approximation to the binomial distribution. What is the probability of getting 50 to 100 positive tests given p=0.30 and N = 250. If you use software or even a good calculator, you can get the exact result. If you use the approximation, you get an approximate answer.
Outside of an intro class, you need to use stats software, I would require the use of software, and easy to use software, for the first few stats classes. No Excel. No SAS. No R. Use JMP, Minitab, STATA, SPSS.
We, as statisticians, make stats a lot harder than it has to be for NO reason. When we force students to use hand calculation methods, we detract from the utility of statistics. We make students hate stats. We also punish those willing to take the class. When a student takes a class in stats outside a stats department, the students will do all the same things just with a program. We end up looking like a bunch of jerks.... or worse... Probably much worse;-)
Usually I use the SPSS software to do any statistical analysis,,,
Both the manual calculations and SPSS are needed, this is the wisdom... I took Basic stats and applied stats courses when I did PhD. Manual calculations gave me time to mentally digest the material, and to appreciate SPSS that saved my time. So I did the Masters and PhD in about 5 years, while still teaching.
I do not teach basic statistics but some applied statistics courses in engineering. Normally we use some statistics packages such as Jmp, and SPSS in our University. Alternatively Microsoft Excel has a large number of statistical functions which can also be used.
In India SPSS is commonly used, else .xl spread sheet calculations for customized analysis (if any).
I think, it depends on the maturity of the students. It's so easy to throw in some numbers and wait for the crunching to be over. It would be nice if stats teachers would indicate what kind of students they teach with software. Statistics 101 should be about the concepts and not about the software yet. App[lied statistics should use the SW students will most likely encounter during their work (e.g. SPSS in India).
I prefer to use SPSS but having difficulties due to its English-only version. In our region the first foreign language taught in schools is German and my students usually fight with SPSS to understand it. We do not have handbooks being translated, sorry.
I use SPSS to teach students. I have a textbook (Statistical Analysis using SPSS) of two parts.
I was taught little statistics at university but now find statistics very useful. Variability and uncertainty are a very fashionable (also useful) research topic in various fields. I am working on stochastical model updating and uncerainty propogation. I find the above discussions very helpful to me. Thanks to everyone. All voted up.
I do not teach probability and statistics course which in mandatory for ll students.
I agree with Michael about necessity of teaching concepts. I have found that most students do not have a good grasp of materials covered in the course.
Statistics packages SPSS and Minitabs are popular in our University. there is also data analysis toolbox which is an ADD-ON to excel. It does most of analysis required for engineering applications.
I have wrked on data analysis and structural reliability. I prefer programming in Matlab and use Matlab statistical toolbox in my research.
I think both are important. If you never learn the basics; what if there is a power outage??? Just kidding. SPSS, EXCEL, and NVIVO are priceless for statisticians and students alike. It can take weeks to learn the software and quite frankly, most of the time I think my students still hire someone else. Once you do learn the software it is an amazing feeling. It is much more amazing if you knew how to do it by hand; pencil calculations first. Remember, you must be able to explain your findings, and the data entries that led to your findings. Herein where the problem is...Teachers expect to give students assignments which require the software and then have no instruction or step-by-step classes specifically for software training. It is just ridiculous. It took me a very long time to get both SPSS and Nvivo down. I can't imagine how long it would take students who did not learn what the formulas mean first; such as a Pearson product-moment correlation coefficient, Spearman's rho, Chronback Alpha, Associative Hypothesis Testing, Nodes, and Mapping. STATS is a never ending learning process. Bias, new technology, skewing......great discussion. See: http://www.youtube.com/watch?v=7bLZ7fqSEEc
--:0) Marilyn
I had to teach statistics using SPSS even before grad school, simply because that was the way the department determined statistics would be taught. However, I've since tutored and taught numerous students at various levels with far more freedom to choose how to introduce material, what subjects to cover and for how long, what tools to use, etc. The problem with using statistical software like SAS (or even worse, SPSS) to teach statistics to researchers who'll likely take a single undergraduate statistics class and 1 or 2 graduate level multivariate stats courses is that you end up teaching them how to use the program to run analyses/tests they don't really understand. Students in many sciences (social, cognitive, medical, behavioral, etc.) don't typically have a solid background in mathematics. It's hard enough to ensure the underlying logic of basic statistics is understood. Introducing SPSS and similar "button-pushing" statistics into statistics courses is just asking for students who aren't typically interested in quantitative reasoning/mathematics to begin with to decide they don't need to learn more than how to plug data into some statistical software package that will (they believe) tell them whether or not theIr results are "significant". Garbage in, garbage out.
Using R or MATLAB to teach upper-level statistics courses can be useful. First, one can study statistics for years and not be capable of doing much research given the computational demands of many a statistical method/model/test. Second, even if one could use a calculator, pencil, and paper there's no good reason to and plenty of good reasons not to. In short, students should, at some point, be introduced to statistical software packages. However, R and similar packages/environments require that the student actually deal with the logic of the methods used and practice formal reasoning because they have to write a certain amount of code rather than just enter data, select from some option menu, and press a few buttons. While this is often too much to ask for intro stats students, for upper level students it not only introduces them to the tools researchers use but does so in a way that forces them to really deal with the statistical methods they use.
Thanks all Prof. For the answer and opinion. I do agree with @michael,@Behrouz ,Dr.Marilyn and @andrew.
I usually divide basic statistic where session until mid they learn concept and manual calculation on the second they will have it using spss.
As in my faculty we have only one subject of statistic.
Does the SW tell you which sample size you need or which factors to load in statistical hypotheses testing? Failing at the basics does not lead to reasonable results whatever software you use.
For a broad overview on free mathematical tools, see Tarancon's thread
https://www.researchgate.net/post/Free_math_software_and_tools
We must use both, Manual as well as software. There are many available in open domain also. In industry, we may not have time to use manual methods, but to go for software based systems.
My answer is a quite in-line with Andrews and Michaels answers, but I'd like to go beyond. The main question is: What is the aim of the course?
Most courses seem to want students enable to perform some statistical analyses. They have data, they should accociate it with a name of an analysis procedure, and finally they should use some tool to calulate the numbers/results the procedure can provide.
Well, if this is the aim, then any reasonably simple-to-use and easy-to-learn software will do. At a very basic level, Excel can do a good job, but Prism, Origin, or SPSS will eventually provide easier access to most analyses and produce nicer graphics (with less effort, at least).
However, to my opinion this is not a good aim, at least not for university courses, where scientists (and not "statistical technicians") should be educated. A statistics course for university students should teach "statistical thinking". This is nothing that is aided by the typical software. Most of it are philosophical concepts (around "information", "knowledge", "learning")*, where computers are not required at all. But when it comes to get a feeling for real data, one will need a software that is flexible, so that the students can really play with the data, develop own concepts for visualization and analysis and play with it. Excel can be useful here (at least much more useful than Prism, Origin, Statistica, etc.). At the cost of learning a programming language, R would be the best alternative I can imagine.
A course should not "disqualify" to teaching essentially the application of a software. A course should teach concepts. Some concepts may be visualized and learned with the aid of some software. But this software is typically not a standard software to perform some statistical analysis. It must be a software that is programmable, otherwise the students will not be able to "play around" with it.
For instance, the "standard statistical software packages" only rarely provide the likelihood of the data under a hypothesis. This is so much "kept under the hood" that students will never understand it (and why teaching it when the software does not report it?). Most of them do not (easily) provide the residuals. How should students then understand their meaning and how this represents information that is not (yet) used by the model?
Standard software often provide simple ways to produce nice plots of the data. This can be helpful, but it also disconnects the presentation from the data: students will often not understand what the plot is showing. And the offer of pre-defined diagram types blocks the creativity of the students. Finding a way to optimally present the data supports understanding the data and the "story behind the data".
I know that the aims I consider relevant can not be reached with standard two-week-courses or one or two semesters with two hours per week. It requires a much higher investment - and a much higher commitment of the university to "statistical thinking" as a fundamental and utmost important part of any empirical science.
My 2 pence.
*to give an example: probability is often only introduced following the frequetistic interpretation. It is not even mentioned that this interpretation has serious flaws and draw-backs, and it is completely missed to demonstrate the the interpretation is not as clear as thought and that there are several competeing alternatives. You will easily take two whole weeks only to introduce the concept(s!) of probability. A typical course does it in the first 10 minutes of the first hour (omitting the more interesting 99.99% of the topic)... No wonder that students evetually learn to believe that the p-value of a hypothesis test seems to be the Holy Grail of all statistical analysis (and, as shown in several studies, most of them never understood the meaning of it! - see link).
http://www.dgps.de/fachgruppen/methoden/mpr-online/issue16/art1/haller.pdf
Jochen, thanks for this post (and the link) underpinning the idea that software is only a means but not the solution to teach statistics. Anyway, I think there are more pitfalls in statistics teaching which are hidden behind the easy input-SPSS-output way of working with conclusions regarding measurements of any kind - not only significance, as has been highlighted in the article linked to above.
Basic understanding of any statistics is very important and that we can understand thoroughly with hand calculation.Once we familiar with the all terminology and their interpretation for analysis then I think we can use any statistical software package like SPSS, NCSS etc. I mostly prefer SPSS because lot of video tutorials are available in youtube.
@Dr. Wilhelm:
Perfect.
@ Thaneswer Patel:
I largely agree. However, not all CAS, modelling software packages, and statistical packages are equal. SPSS is not only inherently limited by the fact that, as proprietary software, it lacks both the widespread community support and vast range of freely available scripts that e.g., R has, but is also designed to perform built-in analyses/models/tests as easily as possible. This sounds great, but it comes at too great a price: the easier it is to run a vast range of statistical analyses, the harder it is to easily customize any of these to one's needs or build others from scratch. There's a reason that SPSS looks a lot like Excel. It's meant to be intuitive and to require little to no background either in statistics or programming in order to use. If all one is doing is performing ritualistic t-tests, ANOVA, MANOVA, and the other half-dozen standard null hypothesis tests, then SPSS is perfect. However, as these are almost always used poorly, SPSS simply facilitates the "garbage in, garbage out" approach to mathematical modelling and statistical analysis. Free R GUI/environments include an Excel package RExcel, R Commander, RStudio, and several others. R itself is of course free, community support ranges from youtube videos to specialized packages for specific fields (not to mention loads of free tutorials), and R is vastly more powerful than SPSS. It is as much a scripting programming language as it is a statistical package.
It's great to have a research tool that has freely available tutorials on sites like youtube. It's even better, in my opinion, to have that and to have a wide range of free environments based on a free package like R that have more support and more freely available easily customized programs as well as the ability to make your own. The problem is that whether we are talking about a free or proprietary languages/packages like R, MATLAB, S-PLUS, there is a learning curve. I love MATLAB's GUI and command line options and (luckily) I have it for free. I have yet to find a comparable R environment, but that's largely because R is free. However, all such solutions require learning at least SOME programming. Thus initially they are much more challenging to use than are SPSS or SAS. Research, however, isn't about the short-term. SPSS is designed to make it easy to use statistical methods without knowing statistics Once one DOES understand the mathematical, philosophical, and computational nuances of statistical methods/measures/models, SPSS generally just makes life harder. Although it is possible, I've yet to work with someone who knows how to write scripts using SPSS. Thus SPSS is perfect for the "reject the null, accept the alternative" ritual and the corresponding p-value tests, but it's much harder to use for formally and empirically defensible methods.
This isn't to say it's a waste by any means. Again, I'm lucky enough to have it for free, so for .xls or .csv input for formatting datasets and other data preparation (including some simple exploratory analyses) I do use SPSS. But after such initial stages, it's just harder to use.
Basic should be teaching in class. As an example :- what is correlation and how it is calculated. Also the importance of that parameter. Offcourse for real application I will use the software but it is not good to treat as that it is a switch for "correlation". People should know the basic assumption.
Software usage is welcome. because when Undergraduates want to carry out project report or dissertations in their final year. they can use the softwares such as SPSS, Minitab, e-views, etc.
After their graduation, say they are in a firm where statistical packages are used for anaytical aspects. they can easily use the softwares.
nowadays, Technology is becoming popular so, when teaching software packes students can get something more on the subject matter.
software gives output. when getting output instantly, students feel good.
at the same time, students should be trained to practice on manual calculations. otherwise, they do not know ablut how to calculate and derive an answer.
Teacher or lecture should prefer to use manual calculations and software usage.
There are many aspects to this debate; students need understanding of statistical concepts and calculations; they also need experience in using software; I have noticed that many firms advertising jobs for statistical people mention R in the advertisement; on the other hand, Excel is on every desktop computer in the world and students ought to be able to use this well; I work in health care industry; Microsoft Access is very important in my work; perhaps universities ought to offer a specific subject devoted to statistical computing where students, who know some statistics, get experience with Excel, R, SPSS, Acess - and writing code; finally I expect that there is a point at which each of us use a package without really understanding what the package is doing; it's complicated.
I wonder if college/university students should't be able to learn to use software like Excel, Prism, Origin, Statistica etc. alone, without attending special university courses. I think this is a waste of time. One surely will require some software at some point to educate some principles (so that students can experience how things can affect the results!), but it should not be the (or a) key issue in a stats course. Learning a programming language like R is more difficult and would require some investment, but this will be revarding. It trains structured thinking. Dealing with variable types pushes the student's noses into the topic of information content and available options for data manipulations. They must think much more along their data, and with this they also learn to use the tool. This should not be an impossible task for a young person with highest education studying a natural science. The harder part is to thorouly understand the philosophical concepts, the meanings, the deep connections. The difficult part in statistics is not to perform a t-test or a PCA and to produce a biplot (a 10-year old child could be tought to do this) but to understand the data, and to be creative enough to find good ways to let the data tell *its* story in way we will understand it (not the other way around [telling the data what it has to present]!). To my opinion, teaching the application of (quasi-)insdustrial tools should not be (the main) part of university education that aims to grow up scientists, not "scientific technicians". Maybe this is wanting too much, maybe most of it is only wishful thinking, too far from reality and its contraints - but nevertheless we have the duty to claim such aims and try to move a little (or as much as possible) towards their direction.
Dear Jochen, I totally agree with you on all the points mentioned in your posts here. However, given the structure of our curricula (with the BA/MA split and with ever increasing contents and ever decreasing time to teach and train them) it appears to be more feasible, practically speaking, to train what you call "statistic technicians" first (during the BA) and make students aware of the more fare-reaching and exciting philosophical aspects of statistics when they enter the MA or even PhD programs. I am curious to hear what you think.
Dear Bettina,
The problem is that universities - at least in Germany - seem to be more and more becoming a substiturte for vocational schools, and this development is intentional and politically driven. The numbers of students are ever increasing, fitting to this concept. This is a political problem (at least I see it as a problem): we are "selling" the universities [at a loss] instead of building vocational schools to prepare future employees for jobs with a higher demand of expertise and skills in theoretical topics.But this is now far too off-topic for this thread.
Your idea to first train "application skills" in BA and then add a substantive theoretical and philosophical background later (MA/PhD) seems no good idea for me. I think this is often done already, and the result is clearly visible. The typical course is to train the use of some tools on simpler problems first (BA) and later on some more advanced problems (MA). Typical PhD courses are often mere revision courses for those who did not learn earlier or have forgotten most of the stuff when they really would need it for their own work (what sums up to about 99% of the students). We have to admit that the present system largely fails completely to impart the required knowledge and skills. I do not see how we can change this effectively by only doing minor adjustments here or there.
The problem clearly emerges nowadays because more and more data is generated, and the questions and experiments become more complicated. But we still provide 100-year-old cookbook recipies for standardized problems to students who are never tought the context of these solutions, their pros and cons, there pitfalls, alternatives, and all that. This inevitably leads to scientists that will not research/examine the information content of their data but instead look for some well-defined feature and push the appropriate button to report a number (as a monkey could do).
The lack of teaching the "exciting philosophical aspects" is, to my opinion, one of the reasons why students are often uninterested in statistics. They often see it as necessary evil or even as an albatross around the neck. They don't take the real benefit and only prepare to pass the exam. If they would learn first that statistical thiinking is the basis of all science, that science is about accumulating knowledge, they will be astonished how difficult it is to define "knowledge", and how this related to information, what in turn is obtained from data... and here we are.
A final remark: there are exceptions. There are good courses and even better students who understand things - despite the bad educational system. Just to mention that not everything is bad!
Yes, I do, but only the available ones-Simple Statistics for Beginners and Excel (including Analysis Toolpak). The purpose is to lessen the time for computing and provide more time for discussing the meaning of the computed values.
Perhaps it is me nitpicking, but given the nuances of the discipline in question I have to ask: isn't it more important to provide more time for discussing the meaning of the computations? There is no such thing as "simple statistics." The most frequent measure of correlation- variously called Pearson's r, product-moment coefficient, the correlation coefficient, and so on- is computationally about as easy as statistics gets. It is routinely misused, misunderstood, misinterpreted, and the reason for its development and indeed name (or some of them) forgotten by most. Often, texts designed to explain multivariate statistics to the non-statistician but practicing research (e.g., Understanding Regression Analysis) skip quickly from co-variance (the computation underlying Pearson's r) provide limited qualitative interpretations and concentrate on quantitative, provide no interpretations, or are simply misleading. The notion of correlation itself is deceptively simple: it is not simply the exemplar of statistical fallacies (i.e., "correlation does not equal causation") but is both essential to a model of causation (chance-raising, among others) and central to the logic of conditional inference and Bayesian methods. To illustrate, the fallacy taught in basic, intro stats that correlation isn't causation, misses a certain subtlety. It is undoubtedly true that (given all ravens are black) the conditional "if x is a raven, x is black" is not equivalent with "if x is black, then x is a raven". However, if one learns that x is black, one should ideally believe that one has more justification than previously to believe that x is a raven. Similarly, it is absolutely true that smoking doesn't cause cancer. Tobacco lobbies have made a business exploiting the fallacy equating correlation and causation. However, because smoking is highly correlated with cancer and correlation IMPLIES causation (classically, either A causes B, B causes A, or C causes both), there is a causal connection that, in this case, makes smoking very likely to result in cancer.
It's very easy to "lessen the time for computing". Make the computations easy. In a linear algebra course, most of the computations are simple addition and subtraction, yet students with upper level calculus courses under their belts fail to pass. Why? Because they are not used to the conceptual challenges that upper-level mathematics involves. It would be easy to solve many of the problems in a typical multivariate mathematics or linear algebra course using MATLAB or even the free, online version of "Mathematica" one can find on wolframalpha.com. However, what's the point of making students capable to do poorly, slowly, and often inaccurately what computers/calculators were designed to do and do far better than any human can?
Most students up to and including university level are taught mathematics as if they were calculators. They are taught how to take input and apply rules to give output. This is not a foundation for sound research. Teaching the meaning of computed values is frequently limited in benefit to the context of the value. Teaching the logic, meaning(s), and nuances of statistical computations themselves ("simple" or otherwise) is key. Just about everybody can do arithmetic. It took the greatest mathematical minds to attempt to axiomatize it and perhaps the greatest logician to ever live to prove this isn't possible (using as an inspiration a paradox known a few thousand years old). We have computers to do computations. We need researchers to understand what these mean, when they should or shouldn't be used, and (as you say) "the meaning of the computed values". These are all related questions, and none are particularly related to the use of any statistical software. We don't teach students how to add, subtract, multiply, and divide by giving them the latest TI calculator. We don't teach algebra this way either. Teaching statistics by teaching how to run computations is teaching how to compute, not teaching statistics. Teaching students how to use a particular software package is often teaching them even less. As I said before, there is a time and a place (indeed, a NEED) to teach how to use statistical software. This is because much of modern statistical analysis requires such tools. But teaching to use these tools isn't the same as teaching what, how, when, and why to use them.
Great post, Andrew!
This is really hitting the mark:
Teaching statistics by teaching how to run computations is teaching how to compute, not teaching statistics.Teaching students how to use a particular software package is often teaching them even less.
But i found soething to nitpick your comment :)
However, if one learns that x is black, one should ideally believe that one has more justification than previously to believe that x is a raven.
That's not correct per se and missing a very imporant point about undersanding statistics. It is surely correct if we consider that ravens are a minor part of object-classes we can observe, and that a relatively small fraction of these object-classes share this attribute. But imagine we were talking about findings in the flat of a gothic or emo, where almost everything is black. There the data "x is black" would not change the justification of out believe that "x is a raven".
The important point is that the interpretation of the data intimately depends on the whole context. One can never simply ignore it to derive sensible conclusions. And this consideration of the context that is often complex and not well defined is the essential and the hardest part of doing statistics, and it is just that part that is usually not tought at all in statistics classes.
I'll give one example: much is taught (at least in medicine, I suppose also in other life-sciences) about t-tests (skip the part that it is often taught as a [logically flawed] NHST) and students learn that the p-value (whatever it means!) is only "correct" under some assumptions about the data. It is usually not even mentioned that the test also assumes that the whole model is correct (not only the null hypothesis). But here it should become obvious that no model is ever really correct, and we must therefore abandon the entire "correct vs. not correct" discussion but turn to a discussion about the appropriateness and the value of the analysis. This requires to critically think about the whole context (also, but not exclusively, toughing questions like "is the mean a sensible measure?", "what all can influence the response?", "what is a sensible measure of an effect and how can one recognize a relevant effect?"). And this is statistics. Calculationg a p-value (if eventually required at all!) can be done by some calculator/software (who cares!?).
II just recall that we had a similar problem many many years back when I was in school. In math the question was raised if we should use pocket calculators. Many class mates were very happy, the thought that life would become easier when all these stupid calculations could be done quickly with these electronic helpers. I was scared (well, and a little excited, too), because I envisioned that if the burden of calculation is taken away we will get problems were thinking becomes more important. And thinking is much harder than calculating.
There the data "x is black" would not change the justification of out believe that "x is a raven".
Indeed. However, an important point I left out relates to the portion of my statement concerning belief. If one knows nothing whatsoever about x entity/object/etc., one has little reason to believe x is a raven vs. a computer vs. a contributor to a social media research site who idiotically rarely contributes anything except when sleep-deprived, bored, and not infrequently under the influence of alcohol. However, if one knows that ravens are black, and learns that x is also black, then the number of classes of entities x could belong to is DRASTICALLY reduced. Of course, this is trivial as you point out. What x could be is still a set with a cardinality so great as to render useless the new information that x is black.
However, this seemingly pointless, trivial example is perhaps not so trivial for two reasons. One is the ways in which the cognitive sciences have shown that the logical fallacy of equating correlation to causation has a logical foundation. Another is the formal justification used in machine learning, computational intelligence, and other A.I.-type fields: given any probability or sample space in which a partition of H includes overlapping regions A & B, and given the knowledge that B occurred or is true, we know that only the region of overlap between B & A can have occurred or be true. With my example, the overlap is so large that it tells us nothing, as you correctly point out. This is not, in general, true when it comes to research questions as such inquiries are far more finely tuned.
In the end, what's important isn't the technically correct logic I sought to describe concerning probability spaces or the clearly fallacious equating of causation with correlation but this: "The important point is that the interpretation of the data intimately depends on the whole context." [Although I might replace "intimately" with "intimately and ultimately"]
spss is professional statistcs software but excell is collection software with statistical facilities
GeoGebra is useful for the most elementary level of teaching statistics. It is easy to use, well documented, and can produce high quality figures if needed. For a more advanced level I use the program R.
In Elementary Statistics we use Microsoft Excel, as its use is widespread. For more advanced courses (multivariate statistics, sampling and so on) we use SPSS, as it is easy to use.
When teaching a basic statistics course, I use Minitab -- but only after explaining the concepts behind the computations (as many other posts have already correctly stressed). When teaching Analysis of Variance and Design of Experiments, I use Design Expert. When teaching discrete-event process simulation, I use Stat::Fit or Best::Fit for distribution-fitting.
I think the answer depends on who you are teaching and how long you have. I have taught statistics to undergraduates in allied health sciences who have only one stats class. For them, spending large amounts of time learning mathematical calculations for correlation, standard deviation, etc. only reinforces existing maths-phobia and doesn't achieve the main goal of the class (to understand the statistical principles underlying research and therefore to be able to interpret and critique the evidentiary basis for professional practice).
Learning the principles of the hypothetico-deductive paradigm, and analysing data collected on themselves gives them a deeper understanding of the process and an appreciation for what computers allow us to do.
I have also taught PhD level students in advanced applied statistics, which was one of many classes they took in stats and research methods. In that context, learning at least the mathematical principles via some hand calculations is important, but again, software applications help to reinforce and supplement the mathematical principles.
How many of us know how to operate a car safely without knowing how to tune a carburettor or fix a flat tyre? Being able to do the maths underlying stats is not essential to correctly use statistical software.
in my opinion , its depend on the field in which we want to use statistics. in clinical and health studies, SPSS is more adequate. In optimization field,which need DOE, JM is more appreciated. in chemometrics, Eignvector PLS tool box is the best.
using only statistical software without a prior knowledge in statistics , my be it's a big mistake, its neccessary to know how the soft works behind.
I teach graduate students in an information systems program. Due to the nature of their discipline, I always use a software package; primarily SPSS. I realize that you can do many of the same things in Excel but, even though they are both spreadsheet based, I find SPSS more intuitive. I have taught in a school of education and also used SPSS in their basic research classes. In both instances, for the non-scientist, I think the goal is to help students become both producers and consumers of research; for the most part, the latter. I leave computations to those in the more exact sciences.
Having said all of that, I made the same argument to my editor when first writing my stats text. I wanted to use only output from software programs, etc, to demonstrate hypothesis testing, etc. She disagreed and insisted that I include the formulas. I do briefly discuss them in my lectures, but all assignments are software based.
I teach graduate students in statistics. I use SAS and Exel. Using EXEL is similar to use manual, whe I was a student in school, to undrstand the basic theory. Using SAS can help them to have the ability for the reseach, which need data analysis skill to handle complex data set. It is very important for them to get a job after they graduate from school.
As a student, I appreciate the use of statistical software use in the classroom- it is what is used in the field. However, while the use of these packages allows us to get at the heart of a specific analysis without having to wade through the mathematical mire of endless basic math to determine why it all works, at some point there needs to be a discussion as to what the analysis means-how it works, what it determines, and why one analysis is better than another in certain cases. This provides students with the depth they need to fully be able to report findings.
I use Minitab, Exel, Mathlab, and TI83 calculator, in two ways depending upon the nature of the course, basic, mathematics based or graduate. One is to use any of them after compete detail of the subject and one is reverse teaching, that is first go to Minitab, for instance, enter a small random sample, ontain a list of results, then explain each term.
I do not teach statistics but we use SPSS. It is quite easy for students
Yes, SPSS is useful but statistica oca software id easier dir
I wanted to say:Yes, SPSS is useful but statistica software is easier for student in my opin
We teach multivariate analysis quite a bit and use Solo and PLS_Toolbox+MATLAB.
Hi Neal, could you give me an outline of how you use Mathlab for multivariate?
The Solo & PLS_Toolbox+MATLAB software packages have a large number of analysis and visualization tools. For example, we teach factor based analysis such as PCA starting with smaller data sets and then get into more complex sets to show the usefulness of the simple PCA approach. Once PCA is thoroughly introduced, we expand into other concepts such as regression analysis and multivariate/hyperspectral image analysis. We also teach multi-way analysis. I'd be happy to provide more information (maybe offline [email protected] ?).
I prefer teaching Stats by using R. I always deprecate the use of excel, at least in statistics classes.
I use Multivariate Statistical Program (MVSP) in my research as well as teaching.
As a literary, I am allergic to maths and stats. But they are essential, even in psychology, especially if you do a little research and thesis. SPSS were really the ideal software. With it you can learn and apply "recipes", without having to understand the background statistics. So much so that, while being useless in stats, I was able to write a stats manual, very simple, which allows to apply "recipes" with SPSS from research data (6.481 downloads today from academia.edu!). To learn in practice Excel is also necessary to prepare the data for SPSS.
https://www.academia.edu/4360729/Stats_pour_les_nuls
Thanks very much Neal for the information. I appreciate your time. I will contact you off line, if I needed to take more of your time.
COMMENT 1
When I took statistics (for Psychology) as an undergraduate, there was a mainframe version of SPSS (and probably SAS and some other things) available on campus, but there were no computer labs with PC versions of those packages. So we did all of our course-related calculations via hand calculator. As a result, we were taught various computational formulae that yielded correct answers with no rounding error. But unfortunately, they did not provide great conceptual insight into the thing being calculated.
Here's an example. When we were first taught about the variance, we were shown the conceptual formula:
Sigma^2 = SUM((X-Xbar)^2) / (n-1)
But then, almost immediately, we were told not to use that formula when doing our calculations, because rounding error would creep in, throwing off the final result. Instead, we were told to use this computation formula:
Sigma^2 = (SUM(X^2) - SUM(X)^2/n) / (n-1)
With that formula in hand, we proceeded to do countless problems that entailed calculation of the variance.
So, when the semester ended, and someone asked for the formula for the variance, which one do you suppose we gave? As you can probably guess, we gave the computational formula that we had so much practice with. But unfortunately, it does not give any real conceptual insight into what the variance is.
Now that computers and software are readily available to students in stats classes, I think we can do much better. E.g., we could enter some raw data in a column in Excel or SPSS (or some other stats package), and do the following:
1. Compute the mean and copy it beside the raw scores,
2. Compute the deviations from the mean, sum them, and see that the sum is always zero (assuming there has been no rounding),
3. Compute the squares of the deviations from the mean and their sum
4. Divide the sum of squared deviations about the mean by n-1.
Obviously, one would not use this approach when carrying out real data analysis (nor should one typically use Excel for serious statistical analysis); but I think it is a good way to have students do computations (nearly) "by hand", but using the conceptual formal rather than the computational formula that gives no conceptual insight.
COMMENT 2
In a Psych stats class I teach, we do have labs in which students use SPSS. In those labs, we spend a lot of time on basic data management tasks (e.g., recoding, computing new variables, merging, aggregating, etc), because I think these are often neglected in stats classes. I.e., students are often handed pristine datasets that are ready to be analyzed. And then when they get out into the real world of data analysis, they are shocked to discover that real data files are usually quite messy, and that the majority of one's time is spent on data cleaning & data management. So, no matter which stats package one uses in teaching, I think it is very important to teach some basic data management.
HTH.
The software environment R (http://www.r-project.org/) is increasingly the choice for teaching statistics where the participants can withstand some coding. It is particularly useful for getting a handle on statistical ideas due to its interactive nature. The fact that you actually have to think a bit before interacting also enhances the learning process. There are now numerous introductory books on using R for just about any field of research you can imagine. Oh, and I forgot to mention that its' free ...
"Do you use any software to teach statistics?
There are arguments that we should teach statistics, not only using manual calculation but also by using software such as spss,minitab,stats and etc. Most of teacher or lecture prefered to use manual calculation. What is your expert opinion on this?"
Of course, oftware such as spss, minitab, stats and etc. allow you to quickly perform statistical analysis of the results.
But in some cases you can use the manual calculation. Does it matter?
It's not so important!
Is essential to obtain new data, the discovery of new phenomena ... CONCLUSIONS!
A statistics (mathematics) - is only a tool (software or manual calculation).
I agree with both of Bruce's comments. Using Excel to do "manual" calculations of standard deviation, etc. can aid understanding without tedious truly manual calculations, and enables "manual" calculations with large data sets. Truly manual calculations are restricted to smaller data sets because of the time it takes to calculate, for example, 100 deviation scores by hand.
I also agree with the importance of spending time on data management. Too often, we see no reporting in articles of data screening and cleaning for outliers and errors, checking of assumptions, etc.
Using SAS and other programs that require syntax can also aid understanding because it requires more careful deliberation, whereas the GUI interface of SPSS can lead to rushed, careless analysis. I teach students to run analyses both ways in SPSS (syntax and point-and-click).
Excel is good for many basic statistical operations and for working with datasets. Going up the stack SPSS or R are good choices. SPSS is pretty much standard statistics tool in the social science but if you are teaching in STEM then R is a really great tool (particularly for Computer Science students though Python is gaining some traction in this area too).
Yes! Professionals don't do statistics by hand or Excel. We use products like SAS, STATA, and SPSS. Teaching students how to merge and prepare datasets, run analyses, and interpret results should be based on a popular program that they can use after graduating. Knowing when to do particular analyses, how to setup research studies with the analyses in mind, and what the different statistics mean are non-software parts of any good stats course.
Definitely! That's what the students will use in real life. Unless we want them to spend months (or years) to calculate a statistic by hand. Also, for some processes, it's virtually impossible to do without statistical software. (Examples: large data simulation with huge number of replications). However, some manual calculations are needed for helping student understand the statistics.
Absolutely! I can only imagine the furore if we told students that they had to learn manually and without software!
Many institutions in India, recommend SPSS package for statistical applications, but we don't teach the students on how to use it. The curriculum gives an introduction to Software applications in Statistics, which is general awareness.
In industries its only "Measures of central tendencies" through MS XL spreadsheets (other open-wares - open office calc, kingsoft etc., are not as compatible like XL)
Teaching how to use a statistical software in a course about statistics is like teaching how to use a book in a course about literature, or how to use a car in a course about logistics. Although knowing how to use a sowftawe (or a book, or a car) is essential to actually practice the subject, this is not the core of the subject. It is actually not even related to the subject w.r.t. to its relevant content. I do see that there is a need to introduce some software to be able to actually do something reasonable with real data during the course. But I feel that many courses seem to literally focus on this only, what I think is a shame. It educates students as mere applicants, lacking any sort of statistical thinking (and finally leading to these many many really bad publications we can find out there).
To clearify the point: Often students are tought the formula of the variance. You can practice its application with a paper and pencil, with a pocket calculator, or with any spreadsheet or statistical software. However, usually they do not learn the meaning of variance, how it is related to a particular model about the uncertainty (lack of knowledge) about the variable and some particular aspects of our understanding of the variable, related to its information content and so on. Nothing like that, what I'd consider the statistics part on the whole story about "variance". The take-home message for the students often is: "Variance is a measure of the dispersion of the data. The Excel formula is =VAR(range)". I wonder what this has to do with statistics. - You can imagine more sensible and complex examples. Take a t-test. Students actually "learn": "mean values of two groups are compared with the t-test (when the values are normally distributed). A p
I totally agree with Jochen Wilhelm, as the important thing to learn about statistics has to deal with the methodology and the way to get through with the problem you face, and not the software tools you use to solve it. Anyway, the use of software is important to let students know how it works in real word.
So let's use software as a tool to learn statistics, whatever software it is.
I have been teaching probability and statistics at a postdoc school of medical physics and I NEVER have mentioned a software to perform statistical analysis. I have just say they exist. Those students were supposed to mostly understand and devise tests for medical analysis that are normally provided with the instruments. In that case the problem was estimating and UNDERSTANDING levels of confidence, efiiciency, power of (mostly) tests and radiological analysis. The goals were various: provide evidence for sofistication , for intentional or unintentional pollution, estimate probability of hillness, time for radiological exposition cmpains etc.The students were first upset, but afterwords I was thanked by some of them since there will always be cases (and there were indeed) for which the software does not exists or your software may not contain that particular routine. If one does not need to make research, but just to repeat the same simple analyses a package may help, but inventing new tests is much more exiting :)
Since I am senior, in my personal research I have always written my own codes in fortran (or gcc, g95 ed derived); it seems to me that R is very flexible and can be employed within any environment. Also Mathematica, Maxima, Scilab, Matlab, SAS, ... provide useful routines but teaching how to use them seems quite offensive for students in my opinion. A chinese motto says: give a man a fish and you will feed him for a day; teach a man how to fish and you will feed him forever. The same should be for students.
At the University of Zagreb, School of Medicine, the Statistica software is used to teach statistics.
Dear I use for teaching statistic Minitab, MedCalc, and Excel softwears
We use SAS in the graduate courses for our majors, and we use SPSS too in some classes for non-majors and also for our majors. For the basic undergraduate statistics course, we use the HAWKES Learning System to deliver the course material, but we do not use any software packages in it. I make students use calculators in all exams and in most homework assignments so that some brain activity is used to better understand the results. I add assignments that are based on SAS.
I prefer manual calculations while learning and studying statistics. After fully learning the soft wares may be learned and used in analysis and interpretations of researches.
There are quite a lot of softwares for statics. I mean excel, R-studio. Matlab etc. However, there is a major problem with these softwares: if one learn statistics through software (programm), they do not learn what mena each parameters. So, at this first stage everyone should do calculation manually, just to understand what is what (SD, SE, mean etc). And only after one should learn software in order to simplified calculation. But first step: manual calculation is obligatory for the sceintist for understaing of the concept/sense of the statistics.
I strongly disbelieve that performing calculations manually, or just knowing how to write down the formulas, will enhance the understanding of the meaning. Knowing the formulas and maybe some manual calculations surely enhance the understanding of the effects/influences of some parameters. For instance, having to divide the SD by sqrt(n) to get the SE makes it obvious that the SE will become smaller with increasing sample size, and some practice may help to really get it into the brains. But this kind of understanding is distinctively different to the understnding of the meaning of the SE.
It starts as simple as with the mean (average): the formula is well known to everyone and it is no problem to calculate a mean. However, I do not know any course (or book) where the meaning of this statistic is properly explained. Sure, there is a "convenience solution", people used it since many hundred years for differnt purposes and so it got used to us. But in statistics, especially in the analysis of mesurements (what many of us do) it has a paricularily different and far deeper meaning, and to understand this, one actually should have a look at the struggle in the 18th/19th century to deal with measurements and the thoughts of Laplace and Gauß. It might even be required to discuss Bayes (because Gauß' interpretation was Bayesian [as we would call it today]) and the large-sample theory. I honestly do not see how anyone would be able to understand the meaning of the mean without going through all these developments, thoughts, theories and insights. The short-cut to learn the formula and its application to real data is surely required, but it is neither challenging nor helpful for understanding.
And again it remains the question for the aim of the education: do we want to educate scientists understanding the concepts of analysis or do we want to educate technicians that know which button to click in order to get numbers that will be accepted by reviewers?
I believe that the appropriate integration of software usage into statistics graduate courses is to complement the usage of calculators for in-class examinations and homework assignments. When students are given very large data sets with sample sizes in the hundreds or thousands, it is necessary to use a software package. I try to give students actual data sets from public health applications. In such a case, the major work is cleaning up the data sets from errors, making such data ready for statistical analysis. Then, it is important that students understand very well which statistical method to use.
The use of calculators is very important too. Students should understand each formula used. They need to understand the theory behind each formula. How are certain parametric tests and nonparametric tests related, say. How can we design an experiment to reduce the margin of error of a confidence interval? Such applications are better suited for small data sets and for calculators.
It is so easy to violate assumptions for statistical inferential procedures, but most software packages will not alarm you of such violations. As some say, "you put garbage in, you will get garbage out".
I know well what Jochen means. It is (for me) the healthy integration of theory and applications in learning/teaching statistics.
I use R. And wolfram mathematica to obtain datasets. Also we use a standar scientific calculator as well.
I googled "software for teaching statistics" and irt seems that there are many adaptations of R specially for teaching offered by universities...
The soft wares are generating more number values for a research. The soft ware users are unable to select the correct answers. Because of that they are approaching statisticians. Those who have studied the statistics, they are able to identify the correct and required results from the soft ware. My strong opinion is "Statistics must be taught with manual Calculations." There will not be any second opinion.
I am not a statistician, but statistical concepts are very important to my research (I am a biologist). I do not know which way of teaching statistics is best, yet let me share my experience not of teaching but of learning statistics and its methods. When I was a university student, there were only manual calculations used to teach us statistics. Also, I used logarithmic ruler a lot. Then this ruler was replaced with calculators and finally with computers. Now I wish computers were available to me from the very beginning! Actually, a virtual set of dices tells me a lot about distributions and gives a mechanistic understanding of null-models. Unlike this, manual calculations distract my attention from the main topic (which is whether the observed happening is by chance only or not) to mathematically elegant concepts and correct calculations… So I would prefer well-adapted software over manual introductions.
I use mainly SPSS, I find it complete, easy
and with many examples and explanations in the website of the company
There are arguments that we should teach statistics, not only using manual calculation but also by using software such as spss,minitab,stats and etc.
Yes, of course.
But using different programs and it is necessary to myself "Think with your head!"
For 5 years, I taught data analysis I and data analysis II in a very structured lab approach (lectures followed by practice, and homework). The University was on a quarter system. We meet twice a week for 2 hours.It was an intense training. The University had Minitab license so we used Minitab the menu-driven side of the software. it was helpful to teach the concept and the data analysis technique for any student in any major. It does not require programming (which requires more background and skills) The software itself is limited in the availability of advanced statistical methods. Many of my students were grateful for the learning they acquired.