I ran multiple regression along with appropriate diagnostics. I have compelling evidence that the data is positive skewed and there is a presence of heteroscedasticy in the data. Is it possible to address both violations to the assumptions at the same time? If so, how?
Thanks.
Hi, Mart,
After investigating how to obtain accurate predictions using legacy linear models for decades, I switched to using a modern non-parametric alternative which explicitly identifies every statistically viable model within the sample that varies as a function of complexity and chance-adjusted classification accuracy. Efficacy of maximum-accuracy models is evaluated in validity analysis. Here are brief articles which respectively demonstrate the poor predictive accuracy of regression models and how to maximize their accuracy; the definition of (weighted) accuracy and the term "optimal" in this context; a statement of novometric (Latin: New Measurement) theory with two timely examples; exact discrete confidence intervals for model and for chance; and statistical power analysis in optimal statistical analysis.
Finally, the last link demonstrates the use of this method to evaluate whether a limited mask mandate reduced deaths due to COVID-19.
Related articles may be found at: https://ODAJournal.com
Thank you so much, Dr. Paul Yarnold , for putting the efforts to answer my question. I am currently doing my doctorate studies and have limited skills at addressing statistical challenges. With the answer and the references that you have provided, I will surely improve on my work and with my skills.
My Pleasure, Soon-to-be Dr. Mart Andrew Supeña Maravillas
If you decide to try maximum-accuracy analysis, and then have a question, or run into a snag, please post here and I will respond if I am able. I'd think a comparison of the regression vs. maximum-accuracy models (in training and validity analyses) would be both interesting and useful.
Hoping you have a fulfilling career!
Mart -
Be careful of overfitting to a particular sample. You can try a "cross-validation" to help you avoid that. But first, to look at model fit, including heteroscedasticity, you start with a "graphical residual analysis." There is a lot on the internet about "graphical residual analyses," and "cross-validations."
I worked with a great deal of highly skewed data. Because there are large differences between sizes for members of the population for any given item, you can expect to see substantial heteroscedasticity. These two features are related. This is expected. Consider this paper:
https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity, for heteroscedasticity in regressions of form
y = y* + (e_0)(z^gamma),
for finite populations. It explains how this works and provides references (note Brewer, and Cochran).
So, you use weighted least squares (WLS) regression. Here are some examples:
https://www.researchgate.net/publication/333642828_Estimating_the_Coefficient_of_Heteroscedasticity. You can use the following to do this with your own data: https://www.researchgate.net/publication/333659087_Tool_for_estimating_coefficient_of_heteroscedasticityxlsx. Note comments there on default gamma (coefficient of heteroscedasticity) when sample sizes are small, but I think you can still get a better gamma than zero (where you have homoscedasticity), for most very small samples. Just enter your OLS results in the Excel tool, and find a coefficient of heteroscedasticity.
Once you decide on a coefficient of heteroscedasticity, gamma, you use it in this regression weight:
w=z^(-2gamma)
where the best size measure, z, is predicted y. For practical purposes you use a preliminary predicted y, say your OLS ones. Then you will get good regression weights, w, which you can enter into software which gives you WLS regression results. (It does that by simultaneously minimizing the sum of the weighted squares of the estimated residuals with respect to each regression coefficient. For homoscedasticity, software just uses equal weights, where gamma=0.)
For more information, see the following project, and its updates:
https://www.researchgate.net/project/OLS-Regression-Should-Not-Be-a-Default-for-WLS-Regression.
There are updates in there, for example, on why you don't need an hypothesis test here, why transformations are not ideal, and on regression weights in statistical software. Sometimes you may want to read comments to updates there.
Cheers - Jim
PS - Please note that for ratio estimators (one regressor and no intercept term), you can use
w=x^(-2gamma) to simplify the process, because regression weights are just relative. Thus, for the model-based classical ratio estimator, CRE, where gamma = 0.5, we have
w=1/x.
Can you please tell us, what exactly had " high skewness and heteroscedasticity", since these assumptions are not about the data (i.e. the dependent or independent variables itself), but the errors of the model. It is absolutely possible that your raw variables are absolutely skewed for example, but the distribution of the errors/residuals after fitting the model are!
Hello, Rainer Duesing . I created a histogram of the standardized residual and descriptive analysis. Graphically, the distribution of the residual deviates from the normal. Based on the descriptive statistics, skewness statistics=2.973 SE=0.017. For the check on homoscedasticity, I plotted regression standardized predicted values vs. regression standardized residuals. Graphically, I see a funnel-shaped distribution.
Allright, sounds good. Then I would like to add to the option of WLS that you should think about the data generating process, i.e. how the dependent variable is measured. There are lots of possibilities, why the residuals are skewed and heteroscedastic, e.g. bounded variables (like metric reaction times) or count variables (which are discrete, like errors) or propabilities (bounded on both sides between 0 and 1). In this cases I would not try to "stretch" the OLS regression by using weights, but instead switch to an appropriate model from the generalized linear models family, like Poisson, negative binomial or gamma regression.
Just re-express your data. That is time-tested and scientifically appropriate. It is also likely to tell you ore about your data and model than any black box fitter. If your data are quantitative measurements, then logs are usually the first function to try. If they are counts, then square root is the place to start. If they are ratios, try the reciprocal.
The difference of approach of Paul F Velleman and Paul Yarnold are indicative of two very different purposes, and we don't know which is driving your analysis.
If you intend to test a theory, then you need to specify both the causal relationships in your model but also the data generating processes that these causal processes entail, as Rainer Duesing points out. I would rephrase Paul Velleman's reply to the effect that a pre-specified model will tell you more about your theory than a black box model will.
However, if you are interested simply using data to make predictions, then a black box approach can indeed be useful. Statistics is a toolkit. You can't decide what's the right tool for the job without specifying what the job is.
Dear Ronán Michael Conroy
I am sorry, but you could not possibly be more incorrect regarding ODA. But, of course, you have zero experience with the ODA paradigm--which is obvious from your reply here.
The "black box" is not black as you say--the algorithm and the empirical solution are both always pure white (with no distributional assumptions). There are two modalities: exploratory ("two-tailed") and confirmatory ("one-tailed"), and both are directed by the investigator's hypothesis and the associated experimental methodology.
Specific hypotheses can be tested. Every statistically viable model which exists for the sample, that varies as a function of complexity, is identified--rendering model misspecification for measured attributes literally impossible. Models consistent with the hypothesis are always developed via an algorithm which explicitly maximizes accuracy normed against chance. The p value is exact, there are exact discrete confidence intervals for model and chance, and performance of the model is based on results obtained in validity analysis...
In contrast, none of the above is true for parametric legacy methods.
It would be really great if those who do NOT understand the ODA paradigm, and who have never conducted an optimal analysis, would STOP saying things based on their misconceptions. In reality, such criticisms are only true for legacy methods...
Paul Yarnold I cannot evaluate how good and useful ODA really is, but I will have a look at it. But you cannot deny that it is a little bit suspicious or at least unusual, if in essence 95% of all articles in a journal (ODA) are written by the editorial board, namely you and your colleagues. Could you please provide independent sources to assess the usability and/or superiority of ODA? Many thanks!
@Mart, when you observe heteroscedasticity in the residual plots, it is important to determine the source and much more whether you have pure or impure heteroscedasticity because the solutions are different. If you have the impure form, you need to identify the important variable(s) that have been left out of the model and refit the model with those variables. Weighted regression is a good option, which I prefer to re-expressing or using any transformation approach) but as suggested by @Rainer, you can switch to an appropriate generalized linear model family, particularly the gamma GLM which seems suited to dealing with heteroskedasticity in non-negative data.
Dear Rainer Duesing
Thank you for your reply. To answer your query--yes, very unusual, and very exhausting.
To begin to answer your question, it is important to understand the shift in the American university which occurred soon before the beginning of WWII, and which began changing again radically at the end of the Vietnam era. During the intermittent period of time there was a tremendous push for excellence--and mankind ultimately set foot on a celestial body other than Earth (this was accomplished using operations research methods, not statistical methods).Today, the overwhelming majority of people active in the field of statistics simply follow and re-re-re-re-teach methods developed centuries ago--which still haven't won a Nobel prize.
In my personal quest, after trying every legacy method I could find for two decades, I began to understand that linear models are rarely sensible (few processes in nature are linear) and thus they are simply NOT able to make accurate predictions. This too is the sign of the times: everyone gets a trophy, and if everyone is "great", then why push oneself to "be all that you can be"? Inaccurate models are OK, even if fundamental assumptions are violated. The first psychologist to win a Nobel Prize (Simon, in economics) stated that people don't optimize, they satisfice.
I asked myself, what good is a theory, or a model obtained to test a hypothesis derived using the theory, if the theory/hypothesis/model can't make accurate predictions? Should one's model be as accurate as is possible, or should researcher's continue to aspire to find inaccurate models? Are less accurate models somehow superior? These questions are rhetorical, obviously.
Why not make a model which has no assumptions, explicitly maximizes accuracy (expunging the effect of chance, of course), obtains exact p and CIs, and only considers validity performance when evaluating the results? What part of this sounds bad? What legacy method can make these claims? The answer to both of these questions is "none".
I published many applied and theoretical articles using optimal methods in many academic areas, in many top journals. It is how I became the youngest full Research Professor (ever) in Medicine (6 Divisions) and also in Emergency Medicine at Northwestern University Medical School, and simultaneously the youngest full Adjunct Professor of Psychology at the University of Illinois at Chicago. Twenty million dollars (direct costs) in RO1 and other large grants--all powered by ODA, also helped.
To everyone who has no idea, there are tens of thousands of articles and books and software systems, and journals, conferences, grant programs, corporate positions and the like, for specialists in operations research--who all use complementary optimization methods. THIS is the birthplace of ODA...
But, like the campfire in Call of the Wild, the maximum-accuracy movement died in its infancy, soon after computers powerful enough to operationalize the math came into existence. This was a consequence of the "Greed is Good" era in the US epitomized by Hollywood, and the infamous "DOT.COM" disaster in the financial markets. Operations research methods were taught in systems engineering programs--which were the home of the most numerically-talented students of that era (where rocket scientists came from). Almost all of the systems departments were quickly disbanded because the smartest students decided it was foolish to work hard for salaries, when they could create their own dynasties using much less cognitive effort...
I refused to stop research in optimal methods, as did a handful of others...
Since that time, many people have published using ODA and CTA software--those I found using Google (I have no library access) were included in the list of publications via a tab on the ODA webpage. Unfortunately, I only devoted the better part of two days to look for articles by other labs, and presently I haven't sufficient money to hire any staff to help me to do everything (in truth, anything) I would like.
In my formative days (I started college when I was 12, more than a half-century ago), researchers did it all. Einstein said he first had to read everything related to his area, and then start to figure out how to discover the future. So, I find it incredible that professional statisticians today refuse to find out what I and thousands of others, rigorous scientists--the types who make rockets fly to the moon and beyond--spend our lives doing.
And yet--it is what it is. After my beloved mentor died (the awesome Dr. Roy Patterson), I decided to devote my remaining life to the mission of improving statistical methods. Based on Pearson's experiences with Biometrika, I decided to start a new statistics journal--focus on the ODA paradigm (which evolved into novometric theory), but make the journal free to read, and free to publish within.
Could EVERYONE who wishes to, please take advantage of this free resource and find out for yourself? Reading/working-through my two most recent books on ODA would be very useful as well.
If this requires too much effort, or for any other excuse to NOT read about, understand, and test ride ODA--please continue doing whatever you wish, and resist commenting substantively about things which you don't understood and haven't personally evaluated using your own data. It is sufficient to espouse what one believes one understands, don't you agree?
I have been thinking about, and slowly moving towards, adding new features to the ODA webpage. Future permitting, I hope to add a question/answer forum; forums for individual editors; and videos concerning running the software and understanding the output. An expanded version of my discussion here will find it's way onto the ODA page.
Regarding the ODA editorial board, in prior years there were many more members, but it turned out that, at that time, there was little reason to keep them aboard. The board shrank to three members, and started growing again due to improving statistical winds. The primary criterion for board membership is experience in conducting/publishing professional articles using ODA and CTA (in ODA, or in other journals). All are encouraged and welcomed.
When Planck was asked why it took so long for his work to be accepted, and what he thought is needed for a scientific revolution to take root, he responded "tombstones"--the young will pick up the future and run with it. ODA is now read in 188 countries, more than any other scientific periodical in history (thank you, dot.com and the internet). The latest ODA article--showing that a limited mask mandate in Minnesota reduced the number of COVID-19-related fatalities--was conceived, analyzed, and written by a high-school student. The best is surely yet to come...
I hope that my response to your question is satisfactory to you, and I appreciate your asking the question--I wanted to write this mini-history for some time.
I also have a question for you. If you don't mind my asking, why is it that you "...cannot evaluate how good and useful ODA really is"? I'd wager you probably could. The software is simple to use and the output is simple to read and understand (assuming requisite study), there is a Stata interface and an R interface, and there are plentiful, varied examples in the literature. Until I am able to add a question forum to the ODA webpage, there is Research Gate.
Optimal wishes for all...
I have now looked at the ODA literature. First, ODA appears to be primarily a discriminant analysis, which is not relevant to the question originally asked. Second, the literature (mostly published it what appears to be a "house organ" fails to cite a very extensive existing literature in this area, which may not be surprising because Dr. Duesing isn't trained as a statistician (and seems, from his writing, to think little of the field of statistics.) Third, it emphasizes "optimal" solutions. One thing practicing data scientists have learned is that optimality is often not the best criterion as it leads to models that are overfit and will not replicate well on future data or have scientific relevance. As a side question, I wonder whether the ODA methods account appropriately for multiple comparisons (another large statistics literature) in determining significance.
I repeat my initial advice to Mart: consider re-expressions of your quantitative variables, starting with your dependent variable. Chances are, you can make the relationships of interest linear and fit a simple model to them. The literature of statistics going back to the 1940s has validated this approach, and it continues to work productively.
Dear Paul F Velleman
Most of the legacy herd of statisticians are afraid of their own tails--and justifiably so in my opinion. No worries, ODA cures all suboptimal issues from the past!
Your three-point summary is maximally worse than nonsense--your response is anti-sense (the exact opposite of reality).
Regarding (1), ODA handles a vast system of hypotheses including and surpassing all which may be addressed using legacy statistical methods.
Regarding (2), clearly you haven't read articles (mine and other labs') in legacy journals, or in my two most recent books, or listed in RG or ODA--in which ODA methods are compared in hundreds of articles against dozens of different legacy methods.
Regarding (3), novometric analysis finds every statistically viable model which exists in the sample that varies as a function of complexity--rendering model misspecification impossible; state-of-the-art adjustment for multiple comparisons is the standard operating procedure; and only effects obtained in validity analysis are used to evaluate the a priori hypothesis.
Regarding overfitting, imagine using all the popular legacy methods to analyze a data set, consisting of completely random data, with one "dependent" variable and several "independent" variables. What would happen? Well, ODA found absolutely nothing. Most legacy methods found PERFECT models in the random data. Anyone interested, please take a look at the four articles on this issue, published in ODA.
PS: Nope, not a house organ, good guess. I played my Spector Autograph 4-string bass in a power-trio classic rock (cover) and classical blues (mostly original) band. Ever since the 70's ended good rock and blue keys, particularly on the organ, are hard to find.
Note that I started out my response above the way I did, because of a book review I had read of ODA on Amazon. Then, turning to the question at hand, I responded that it was not surprising that skewed data, where there is likely a wide range [in predicted y values], would show heteroscedasticity in regression. See https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity.
However, from correspondence from Mart (the question author), there may be other issues not mentioned here.
All I'm saying is that if you have skewed predicted y values, then substantial heteroscedasticity is not a surprise. Heteroscedasticity in regression is a feature, not a bug.
Dear James R Knaub
Appreciate your tip on Amazon reviews of the APA ODA book. Only three reviews, however each is five stars--nice!
https://www.amazon.com/Optimal-Data-Analysis-Guidebook-Software/dp/1557989818/ref=sr_1_1?dchild=1&keywords=optimal+data+analysis&qid=1604104270&rnid=2941120011&s=books&sr=1-1#customerReviews
Strange that a copy of the book costs twice as much on Amazon as it does from the publisher...
One simple approach here is to apply data transformations. Quick guidelines for data transformations are described, e.g., here: https://xdplot.blogspot.com/p/data-transformations-guidelines.html
Yes, Dear James R Knaub, I agree that overfitting is surely not a great idea!
Spoiler Alert: more red pills (The Matrix)
In the good ole days, when research such as is presented in the articles linked above was released, other scientists/labs would attempt to replicate the findings. The articles linked above are frequently read and downloaded, but I haven't seen an attempted replication yet (ODA would gladly publish such research).
Also in my first response I included this: https://www.researchgate.net/project/OLS-Regression-Should-Not-Be-a-Default-for-WLS-Regression
Besides references for estimating the coefficient of heteroscedasticity, there are updates on using regression weights for statistical software, why an hypothesis test for heteroscedasticity is not helpful, why you may not want to use a transformation, and other information in updates and comments to updates. - Cheers.
Paul F Velleman did you really mean me with your comment " because Dr. Duesing isn't trained as a statistician " in your last answer? It is basically true, but from the context I think you meant Dr Yarnold, didn't you?
To All:
Sciences today are marked by seemingly miraculous advances--except for the field of statistics which remains mired in the dust of antiquity. Many who make their living via applied statistics today intentionally avoid learning about and trying new directions: in general (not always true), the more power one holds, and the older one becomes, the more desperately one clings to the past. The fact that some of the most revered and important work--statistical analysis of drug safety--has resulted in two of the top ten leading causes of death (CDCP: taking a prescribed medication; iatrogenic illness) does not matter. The gate-keepers profit by unilaterally supporting and whenever possible enforcing failed historical methods, to the detriment of the rest of the world...
If one truly wishes to kill the future and bask in the past, the most effective procedure would be to learn the new methods and demonstrate (for applied scientists) or prove (for academics) that they are not superior to the ancient ways. Unfortunately, this is no longer possible, as the existence of the ODA paradigm has been mathematically proven:
Que sera, sera...
Rainer, my apologies.I did intend Dr Yarnold.
I think his recent postings speak for themselves.
Scholars and researchers don't write this way. They recognize that theirs is not the only right answer and that the rest of the tens of thousands of professional statisticians are not benighted fools.
And they don't publish in house organs, but in scholarly publications where their work can be reviewed by other scholars.
(And, BTW Dr Y: The discipline of statistics is hardly "ancient", having developed into its modern form only during the past century or so, and developed further in the past decade or two with the availability of adequate computer power for modern methods. Those of us who live and work in this discipline are not as moldy as you seem to think.)
Questions for Dr Y:
How does ODA deal with outliers in the data?
How does ODA deal with high leverage points in the data?
How does ODA deal with highly correlated predictors? (Does it produce coefficients for each predictor in some form of model?)
What form of model does it produce?
Is ODA focussed on prediction or on scientific understanding? If understanding, then what is it "optimizing"?
Oh, and why does my browser warn me that your URL reference is unsafe?
To return to the original question that motivated this thread, skewness and heteroskedasticity often appear together and are often both improved by the same re-expression. That is further evidence for the selected re-expression and a further simplification of the resulting model. Because this approach works, is well-studied and justified in the statistics literature, results in easily understood models, and provides ways to diagnose and understand ways in which data deviate from the model, it is often the best choice for the initial attack on problems such as the one that started this thread. The introduction of simple functions such as log, square root, and reciprocal to scientific "laws" and formulas has a substantially longer pedigree, back at least to Newton (reciprocal square law for gravity).
Dear Paul F Velleman
As I said earlier--you are terrified of your own tail!
I suspect your computer is also malfunctioning. My browser displays a locked icon, and all my internet certification is up-to-date and active.
For some unknown reason, after interacting with you, I can't help but to recall the absurdity taking place in the executive branch of the US government.
**
Dear Everyone:
Are you motivated to personally learn (is there another way?) and evaluate the use of maximum-accuracy modelling--about which thousands of researchers in a multitude of disciplines have published--in your research?
If so, great--the ODA eJournal is free, and books and software are comparatively affordable. You'll have to obtain other articles written by legions of authors (mostly from the US, Europe, and Russia) from a library. A question-and-answer forum is under development for the ODA website (http://ODAJournal.com).
Since my questions are so simple, why not answer them?
For example: outliers are a problem with data--not specifically with statistical methods. The challenge in analyzing data with outliers is to identify, isolate, and understand them. Often we learn more from the outliers than from the rest of the data. So how do your methods do that? If all they do is "optimize" a solution blindly, then they won't deal with outliers in the way that good data analysis requires. Either they'll devote too much attention to fitting the outliers, or they'll miss that they are outliers. If they do identify outliers, it must be with respect to some model--so what is that model?
BTW, the article you cited earlier 22 of 30 references are to your own work, 7 of those are in the same "house organ" journal and 13 others are all from a single other journal. That's a "tell" in scholarly work. We build on the work of others and place our new ideas in the context of the established literature. Other scholars have provided solutions to some of the problems you claim to solve. For example, there is an extensive literature in classification and discrimination and, at least in that article, you don't cite any of it.
Dear Paul F Velleman
You are not able to be convinced--you must convince yourself. This requires that you begin by reading and working my books, as everyone must do.
No worries, learning ODA is simple, done step-by-step. You may recall the Amazon reviews linked in an earlier comment discussed how simple ODA is to learn, and how quickly users were able to analyze their data. Linked below is an article describing the experience of students and faculty in the Psychology Department of Loyola University Chicago, who learned ODA--which makes math intuitive, obvious.
You also must open your mind and actively seek truth--even if it means learning new tricks, and opening the door to competition with youth--who are far faster and stronger, and have the upper hand, versus those much further along in their careers.
Math is known to appeal to youth. As is the case in so many areas of science today, ODA opens the door for youth to become preeminent researchers. Data science is the highest-starting-salary position for people with undergraduate degrees--what if undergraduates also become new-era data analysts? I have empirical evidence this is possible. I digress...
Professionals don't learn their trade by asking random questions on the internet.
Professionals don't scan the references in one article and make a sweeping generalization about a new paradigm--knowing nothing else about the paradigm.
Perhaps responding to one of your questions will motivate you to do the right thing, for you, science, and any youth in which your discourse may have motivated FUD (fear, uncertainty, doubt)--as does the present US administration. So, I'll give it a try.
I have faith in youth--I believe those with experience using regression analysis, open minds and curiosity will immediately understand what I consider to be obvious implications of the article I link below.
You ask about outliers, and suggest they are important. I agree--I think they are often the most important responses.
So, here is a link to a little free article which shows why a regression model is inaccurate in predicting outlying data values, and how to maximize the accuracy (normed against chance) of a regression model--by modeling the sample outliers with maximum possible accuracy.
Improving the accuracy of legacy models was an early discovery: the paper linked above was already dated. Methodological evolution has yielded an alternative methodology which completely replaces regression analysis for any data measured using any ordered or (multi)categorical scale.
I'll gladly link the latter article herein, after you have read the article you requested (linked above), and then commented here regarding which methodology best predicts the outlying data?
PS: Was I trained as a statistician? Yep. Started with my Dad, who taught statistics at the University of Chicago when I came along. I recall hearing stories about and occasionally meeting Kruskal, Wallace, Cattell, Youden, and others. I started graduate statistics as a freshman undergraduate, took every stat course in every college (Liberal Arts and Sciences, Engineering, Business) at the University of Illinois, Chicago--always got the highest grade. As a sophomore undergraduate my adviser in systems engineering had me review the world literature in behavioral decision theory (when I learned the value of a research librarian--back when libraries were libraries), and then write a textbook as well as a teaching manual, for graduate students in their program, in exchange for honors credit (these books were still being used to teach that class, last I checked, years later). By the time I was a junior undergraduate I was paid to teach undergraduate stats at UIC (voted #2 in student ratings for best teacher of the year, University-wide, in my second year teaching the course). When I was a sophomore graduate student, I taught the most advanced graduate statistics course at Loyola University Chicago--and received the highest student rating of the class teacher in history, to that point. At Northwestern I taught medical students, interns/residents, physicians and various health-care professionals. My "Reading and Understanding Multivariate Statistics" books with my dear friend Dr. Larry Grimm (RIP) are the best-selling books at APA Press except for the publications manual, have been cited more than 2,000 times, and were translated into Japanese. Then, there is the $20M (USD) in direct funds which grants I (co)wrote received--and I designed/handled analysis. Obviously, I am trained as a statistician, and I trained >1,000 students, faculty, technicians, and staff in various aspects of statistics/measurement (in which I am an Elected Fellow in that APA Division). I am also a social/health/industrial psychologist, a systems engineer, and a public health expert (all these fields involve statistics). I have run ODA on various micro-, mini-, mainframe and supercomputers: the NCSA allowed me exclusive use of a partition on an IBM3090-600VF for a year (ran billions of Monte Carlo experiments studying p for ODA). I was a statistician on the DMC for a pharmaceutical company, the paid statistical reviewer for Archives of Physical Medicine and Rehabilitation, on the editorial board of Educational and Psychological Measurement. Off the top-of-my-head it is difficult to recall--statistics has been part of my life since I was a little kid, and I've been around a while...
Dear Paul Yarnold , I am not a formally trained statistician as you are, but a curious applied scientist with an intererst in statistics, who wants to learn more. In my studies of psychology and my PhD in Cognitive Science, I encountered a lot of barriers, which I tried to overcome. Limited by the LM I learned about GLM, I learned how to handle multidimensional data (EEG and MRI) and how to analyze them in with pretty advanced methods (e.g. source localization for EEG trained by a physicist) and in the recent years I more and more discovered Bayesian analyses and how to apply them. Therefore, I can say, I am very open minded and willing to learn something new.
I looked at several published papers about ODA (the links you provided throughout this discussion, e.g. regression away from the mean), I looked on your homepage and I tried to figure out HOW ODA works. But in none of the articles so far (please correct me if I am wrong and provide the sources), there has been explained in detail how it is calculated, how it is technically applied, how models are estimated. I would like to see a simle explanation of a t-test analogue, how it is done and caculated (not only some numbers, stating that it is better). Especially in times open access, where we all should be able to use the most advanced methods, it would be very helpful, if you could provide some simple lines of code in R to show the superiority of ODA. Just some simulated simple data for two groups, first analyzed with the built in t-test and then with the ODA analogue. I would really appreciate it. ;-) Since I am not a mathematician, I try to get a grip on concepts by trying it out with some data, to see what happens and how the models react to changes in the data.
I saw that you have a R package for ODA to be able to communicate with your software. Therefore, it should not be so much of a problem, should it? Many, many thanks for your help in advance!
I think, you got vital Informations &explanations from researchers and to strengthen it, you can refer the following article.
Dear Rainer Duesing ,
Sorry, I don't have copies of the best articles which show the model for a t-test analogue (lost my library and file cabinets in a "small" earthquake in San Diego, some years ago). Here are the citations.
I don't use R (wish I did, it is seriously amazing). My extraordinary colleague, Dr. Jim Rhodes, conceived of and made the R module. I only use the ODA software--Rob and I designed the commands, they are few, powerful, simple and robust.
I will be absolutely thrilled to demonstrate a t-test example for you in this thread, Dr. Duesing! Let's agree on the problem/data set, then you send me the data (we can discuss file type/formatting), I'll run the analysis and post the results, and we can chat about it. I hope this is acceptable to you.
You'll have to do the t-test, I used to use SAS, but I stopped paying the yearly rental. If you'd like to use real data--which you published previously (if available)--would make a sweet ODA article. And, I bet you know that I love simulation studies as well. It's all good!
Afterwards, I'd also be happy to demo a CTA (with a two-category "class" or "dependent" measure--runs in seconds to minutes depending on fertility of the data (to grow models), number of variables, and number of observations. If you have any candidate data, let's discuss how to get me the file, and let's get it going!
Thank you, a pleasure...
PS: You obviously are substantially advanced, highly motivated. BTW, the scientist who developed the--current, to my knowledge--method of synthesizing (scoring) EEG data is the mother of my dear friend, Lori, who runs American Angler Sportfishing in San Diego. I tested many experimental devices on that vessel (I was a sponsored big game fisherman, mostly hunting yellowfin tuna). Used ODA to perfect methods, made a lot of records. Statistics=fun!
I always wanted, but never got to evaluate EEG data using ODA. Check out the links below--optimal weighted Markov analysis. This could be a LOT of fun...
Dr Y
1) It isn’t very convincing when your responses to simple questions are ad hominum
2) My questions were entirely apt. Any good data scientist would ask the same questions of a new method
3) wherever I look in your references I find self-references rather than references to an established literature.
4) you still haven’t answered my questions or described your method
and, BTW, your reference to regression to the mean misunderstands that concept
Dear Paul F Velleman
Thank you for you interest--if you wish to, then I'm confident you'll easily and quickly learn the material and subsequently identify many superior models, some of which may be spectacular. Of course, I'd love to hear about them--either way...
Best wishes!
Paul Yarnold I sent you a small data set, really appreciate a demonstration.
Dear Rainer Duesing
Very cool! I attach the output.
The non-directional (2-tailed) ODA model was:
IF DV 11.455 THEN Predict GROUP=2.
In training analysis the model was statistically significant (exact p=0.00001). The model correctly predicted 23/30 of the Group 1 observations (76.7% accuracy, 50% is expected by chance), and 27/30 (90.0%) of the Group 2 observations. This yields effect strength for sensitivity of ESS=66.7%, indicating a relatively strong model (Rainer--ESS=0 indicates chance, 100=perfect accuracy: there are specific ranges for qualitative assessments).
In training the model was correct 88.5% of the time it predicted an observation was from Class 1, and 79.4% of the time it predicted Class 2. The effect strength for predictive value is ESP=67.9 (relatively strong).
In one-sample (leave one out, or LOO) jackknife analysis the sensitivities for Groups 1 and 2 fell modestly (73.3% and 86.7%, respectively), as did the predictive values (84.6% and 76.5%, respectively). However, both ESS (60.0) and ESP (61.1) indicate a relatively strong effect in LOO analysis.
**
Analyses above required the following commands (two job-control lines, one naming the data set file, the other naming the output file, are placed before the following commands:
VARS N GROUP DV; (this lists the variables)
CLASS GROUP; (this lists the class or "dependent" variables)
ATTR DV; (this lists the attributes or "independent" variables)
LOO; (this conducts a LOO jackknife analysis)
MC ITER 25000; (the number of Monte Carlo experiments to use to estimate p)
GO; (begin the analysis)
The output log of this program run indicates that the training and LOO analysis required 9 CPU seconds on a Pentium/Intel computer built more than a decade ago.
PS: Sometimes, for a 2-tailed test, ODA is able to compute the exact p value (one-tail p is readily computed: I attached links to articles covering the discovery of 1- and 2-tailed exact distributions for ODA in an earlier post). In this example this was true. If ODA does not compute the exact p, then one uses Monte Carlo (MC) simulation (I actually use MC all the time, unless I have a reason not to use MC). Research I did (on a supercomputer described in a prior post) to evaluate the ability of MC to correctly converge to known exact p values showed that 25,000 iterations gets to within a hair of exact, and is not much different than runs of 100K iterations or a million. With respect to confidence, any a priori target for p can be requested, those seen in the attached output are default.
The present analysis was nondirectional (exploratory or "two-tailed"). A directional (confirmatory or "one-tailed") analysis requires one more command. For example, if the hypothesis was that DV scores would be lower for Group 1 than for Group 2, the ODA command is "DIR < 1 2;". The same hypothesis could be stated as: "DIR > 2 1;".
In the output, PAC means "percentage accurate classification".
In novometric theory, ONLY the performance in validity analysis matters,
And, an exact discrete (95%) CI is computed for model and chance using an R program written by Dr. Jim Rhodes. I need to learn to use R!
Dr. Rhodes also wrote a statistical power calculator for ODA paradigm, which is available in R.
Dear Paul Yarnold I had a look at your analysis and please find attached my analyses as R script. Since you do not use R, I also uploaded the important results. For all other readers: I sent a small two group data set to Paul and wrote
"A simple two-sided hypothesis could be, if the two groups differ. If we think about an intervention (change scores), I am also interested in how much A is better than B (or vice versa, since it is two sided) " So this could be the change scores of two groups and a independent t-test would test if their changes differ (i.e. the interaction in a 2x2 ANOVA). Data set attached
1) Your ODA analysis: I am not really familiar with your whole terminology, therefore I might misinterpret some of the results. I do not get, why there are two classification tables. If I am interested, if the groups differ, how does an optimal cut off point between the groups help? How do the classification cross tables help me? I am a little bit lost here, sorry. It seems, as if the classfication worked properly, but does this answer my questions? Please help.
2) I tried to do typical analyses in my script. I started with some data screening (desciptives, histograms, boxplots), which showed that a t.test might not be appropriate, but I did it for the sake of presentation here. It was not significant, albeit a medium to large effect size.
3) The next analyses presented is a bayesian analogue to a t.test with parameter estimation. This method is quite robust against outliers (as in the data set present), which you can also see in the output. It was able to reliably separate between the groups. The nice feature here is that it not only gives you parameter estimates for mu and sigma of each group, but also an estimation of the population raw differences, the differences of sigmas (i.e. differ interindividual changes across each group) and the estimation of delta, i.e. standardized effect size. The plots help very well to inspect if everything is plausible and converged in the MCMC. In my opinion, this would be the go to method, since it answers all questions I had (do the groups differ and by how much --> posterior plots of the mean differences with HDI intervals) and gives additional valuable information. Here, due to the robustness, 100% of the posterior of the mean differences were below 0, therefore in frequentist talk the groups difer "significantly" (there are other options to declare a region of practical equivalence, but this is beyond this test).
4) I tried to "mimic" the classification approach and used a simple logistic regression. Similiar to the t.test it was not significant. But I was able to plot the predicted classification against the true groups (similar to your approach). The classification is not very different from your results (see output). So, please tell me, why is your method better, or which information did I get with ODA, that I was not able to get with the other methods (despite the optimal cut off, where I cant see the value, yet).
In my opinion, the bayesian method provided all information I was looking for and additionally, I am able to incorporate prior information if wanted/available/needed. So, why should I change?
Thanks for your help!!
Best Rainer
Dear Rainer Duesing
Beautiful, thank you.
t-test and logistic were useless--both missed a relatively strong effect.
Bayesian was cool! The use of weighting by prior odds for imbalanced sample sizes (to optimize ESS) is the only aspect of Bayesian methods which was adopted thus far in the ODA paradigm. We haven't compared Bayesian vs. ODA analysis--you are the first!
To answer your question--the first cross-classification table is for training analysis. In the ODA (novometric) paradigm a statistically significant training effect "authorizes" a validity analysis to determine if the finding might replicate if the model is used to classify the observations in another sample. The second cross-classification table is for validity analysis: here, LOO jackknife was used. If the sample was larger a split-half validity method could be used. If there were two or more independent samples, then a variety of additional models/methods are available in the ODA software (Rob's and my APA book on ODA explains these validation methods in detail).
In novometric analysis (attributable to theoretical advances taking place over 20 years, ODA became CTA, which became novometric analysis), the results of analysis are evaluated ONLY with respect to validity findings. This is consistent with the emphasis placed by the ODA paradigm on finding models which achieve maximum possible accuracy in classifying individual observations in the sample, that replicate when they are applied to classify observations in an independent random sample. Model reproducibility is crucial in physics, in chemistry, in engineering, hopefully in medicine. These are not the only sciences in which reproducible models are important: all sciences should identify reproducible models--obviously. In fact, the failure of NSF- and NIH-funded research findings to replicate has already led to TWO structural cuts in funding from Congress. POTUS called NIH and CDC scientists idiots recently, due to their invalid classifications/predictions--how embarrassing is that?
Another useful aspect of the ODA paradigm is that results obtained for any and all analyses (i.e., confirmatory or exploratory, training or validity, univariate or multivariable, two or 10**6 groups, weighted or unweighted, and all other variable aspects of data, model and study design) can be directly compared using the ESS and D statistics.
There are a few questions which address aspects of ODA which I find valuable in discussing implications and applications of ODA models. I list the questions which come to mind--and I defer to you concerning the answers: I hope it is OK.
Another useful aspect of ALL ODA (novometric) models is that they can be used to classify observations that were NOT used in the original analysis.
No distributional features or assumptions challenge the validity of the estimated p value for the ODA paradigm (i.e., when the exact p cannot be determined due to computational load).
At the moment, this is all I can think of. Can you think of any others, Rainer?
If you don't wish to answer my questions, no worries bro. They are here for didactic reasons, if not also for publication reasons.
However, you could be solo or first author (as you wish) on a paper on THIS analysis published in ODA: a fairly short article comparing t-test, logistic, Bayesian, and ODA. Would take a few hours to write it up, co-editing. I'd help you sculpt the piece, then when you were done, it would take a few more hours to review/edit and get your approval, and then to format, final-reviews, and upload. If you wish, list the ODA article as a technical report on your Vita, until it becomes indexed--speaking of which, I am soon getting a DOI generator (finally saved-up enough money).
BTW, including ODA and RG, more than 1,100 ODA articles were read this week so far (still have 5 hours to go on the counter for this week). Readers of 515 ODA articles were from 41 countries: Angola, Argentina, Australia, Bangladesh, Belgium, Brazil, Canada, Chile, China, Colombia, Costa Rica, Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, India, Indonesia, Ireland, Israel, Italy, Japan, Lebanon, Malawi, Malaysia, Mexico, Pakistan, Philippines, Portugal, Qatar, Russia, Saudi Arabia, Singapore, Slovakia, South Africa, Spain, Sri Lanka, Sweden, Switzerland, Taiwan, Thailand, Turkey, United Arab Emirates, United Kingdom, Uruguay, and the US.
Or, send your article to a DOI journal--your call. It's all good...
And there are some big improvements in progress for the journal and the website--many new features. Things should become self-sustaining soon, Lord willing...
Super Coolness...
Oops, forgot to say:
1) The value of the optimal cutoff is that it tells you where maximum difference between the groups occurs: that is, it maximizes the (weighted) Area Under the ROC Curve. If one wants to predict (say, in a validity sample) whether an observation is from group 1 (e.g., not sick--gonna be fine) or from group 2 (e.g., sick--gonna die), it behooves one to ensure that the model used to predict--group 1 or group 2--is accurate. I wouldn't want to be misclassified into either group!
Predict if a stock will go up or down. Predict if an airplane will crash or not. Predict if a kid will pass or flunk... When accuracy is important, and errors are horrible, it is good to have a validated maximum-accuracy cutpoint! Much better than an arbitrary number, or a number decided upon by groupthink...
...Which begs the question---in what type of scientific study is it a good idea to find models which are invalid and/or inaccurate? ---AND--- Should the validity and the accuracy of a model be maximized to the extent that is possible, or are methods which do NOT maximize validity or accuracy preferred?
In my view both questions are rhetorical. In fact, it was this issue which motivated Rob and me to begin our search for a method to explicitly maximize predictive accuracy (i.e., AUC)--for every combination of study design, sample, a priori hypothesis, and data.
2) Many problems with a larger N and more attributes ("independent variables") have more than one solution (this application had only one). When there is more than one solution, ESS is not the only test statistic to consider:
Article on D: https://odajournal.com/2016/11/04/theoretical-aspects-of-the-d-statistic/
Dear Paul Yarnold ,
I am afraid I was not able to follow all through your answer, but I have the impression that you did not answer my focal questions:
- If I want to quantify by how much the groups differ, how is this done in ODA?
- The classification of my logistic regression is not very different from your ODA analysis, so what it the advantage of ODA?
- How do the optimal cut off and the classification help me to quantify the group differences?
- Is ODA useful if my main goal is NOT to classify and predict, but to find functional relationships between variables (e.g. linear, non linear functions)? How?
For the questions about Bayesian parameter analysis I absolutely recommend to read Kruschke - Doing Bayesian Data Analysis. It is very good to read an presents everything you need to know about this approach. In fact, I used a script from Kruschke to calculate the t-test analogue.
But to give a quick summary (which may address some of your points): baysian analysis does not ask, nor answers the same question as frequentist analyses. While frequentists aswer the question how likely the data are, given the Hypthesis (typical H0 = 0 effect) is true, i.e. p(D|H0). By contrast, Bayesian analysis answers the question how likely the Hypothesis is given the data, i.e. p(H0|D). For this the data you gathered is (simply speaking) the product of your prior believe and the data you have. The result of this calculation (typically done with an MCMC algorithm) IS the uncertainty for the parameter. This is the "chance" in the model estimation. So, what you saw in my output IS the uncertainty about the model parameters, given the data and prior. There is also a "training" of the data (the burn in phase) which is not part of the result typically. Generalizability is gained by calculating more than one MCMC chain an compare if their predictions converge, despite different starting points.
Since the uncertainty is directly calculated by the model, we need no assumtions of distributions, since no p-values are needed or caclulated (which are only needed for inferences in frequentists calculations anyways). But of course you should know what the data generating process was, to be able to model a distribution. But again, this could also be part of your prior, to model a wide variety of distributional forms, which in turn would show in the result the best fit (in my simple example, the response variable was modeled with a t-distribution [not for inference!!! only as a form] to be able to catch the outliers without sacrificing precision of mu and sigma. With a normal distribution the only way to account for outliers would be to increase mu and sigma. With a t-distribution, this can be done with the normality parameter nu (df in other notations) to allow heavy tails. In turn, the value for nu is again modeled with a exponential prior, to catch a large range and find the optimal value.
Therefore, this approach has the flexibilty to use any modelform and to give all parameters of practical interest in my opinion. Is there an advantage to switch to ODA?
Best, Rainer
Dear Rainer Duesing
Thank you for your reply. I address each of your comments in turn below. You still fail to respond to any of my queries.
**
RG: I am afraid I was not able to follow all through your answer, but I have the impression that you did not answer my focal questions:
PY: That is unfortunate. I answered all of your questions. I do so again herein.
**
RG: If I want to quantify by how much the groups differ, how is this done in ODA?
PY: The ESS statistic in validity analysis: 0=chance, 100=perfect model (absolute discrimination between groups based on the attribute).
**
RG: The classification of my logistic regression is not very different from your ODA analysis, so what it the advantage of ODA?
PY: "Not very different" is not the same as "the maximum possible".
1) Given this situation, it is natural to wonder what is the estimated cross-generalizability of the Bayesian finding--is such methodology available?
Please define “cross generalizability”.
A: Apparently, we have different ideas/definitions about that. And this is also part of the next few questions/answers. Bayes finds the best representation of the model in a multidimensional prior space. As validation (and prediction, see the next questions) you can have for example posterior predictive plots (as shown in the output in the top right), which are samples of distributions from the posterior. If they are similar to the data, then the model is able to mimic/predict/recreate/validate the findings and the data structure. Other possibilities to check this are given, please read Kruschke chapter 7 or the new Gelman et al book – Regression and other stories, which is also full Bayesian.
2) Is it possible to estimate the effect strength for the Bayesian result--after eliminating the portion of the effect attributable to chance--for the training and for the validity results? How is the effect of chance operationalized in the Bayesian community?
A: What do you mean with “after eliminating the portion of the effect attributable to chance” I would understand it as the deterministic part of a model, hence the functional model parameters. In this simple case this would be the best estimation of the mean differences or the standardized difference. If you mean something different with your terminology, please explain. Uncdertainty is essential part of data. It is excactly the shown distribution, the histograms, this is the repesentation of uncertainty.
3) If one wishes to predict for a new sample of observations (with unknown group status) which are from group 1 and which are from group 2, how is this done using the Bayesian findings (after removing the effect of chance)
A: The main goal of the analysis was (as seen in my original question / hypothesis which I wanted to test) to assess the difference between the groups, not making prediction, but you can of course turn it around and use a predictive Bayesian framework. But again, that was not the question and goal. Prediction in the sense of, “how good does the model recreate the data”, is regularly done, see my point 1, posterior prediction. For direct prediction of group membership, all possible methods (e.g. discriminant analysis, logistsic regression, crosstables can be turned full bayesian if you like, but I personally haven't done it, yet, no need for it yet)
4) It would also be interesting to compare the (training and validity) predictive values between ODA and Bayesian methods--can this be done using Bayesian? Is the same true for the Bayesian training and validity results?
A: Again, what do you mean with „training analysis”. How is it done in ODA? There is “training” in Bayes in the MCMC algorithm, as I wrote. Since THIS Bayesian analysis is not about prediction of group membership, this has not been done, if you are asking specifically for a crosstable with calssification. But it yould be done, see 3)
5) If I want to quantify by how much the groups differ, how is this done in ODA?
PY: The ESS statistic in validity analysis: 0=chance, 100=perfect model (absolute discrimination between groups based on the attribute).
A: This does clearly not answer the question. If I ask “by how much differ the groups” an answer like “the ESS was 68 out of 100” does not answer it. If I want to quantify, by how much more Group A gains more than Group B, I want to know something on the scale of the original variables or at least in some form of standardized value, which is directly related to the data and not the properties of a crosstable.
6) RG: How do the optimal cut off and the classification help me to quantify the group differences?
PY: In this example the only group difference is that the distribution of scores for one group lies below the distribution of scores for the other group.
A: That is not true, or do you mean that an estimate for the central tendency of one groups lies below the other? Wouldn’t ODA always find an optimal cut off point? So what is the point? Is the only answer that ODA give, that one group's central tendency lies below the other and that classification worked? That is not very rich in information, wouldn’t you agree, as compared to the Bayesian approach, with lots of useful and easy to interpret parameters? And remember, the parameters given are estimates directly of the population parameters!! These are not the sample values, nor are the histograms confidence intervals!!!
7) Only ODA finds all statistically valid models in the sample, so model misspecification is impossible. RAINER--HOW TO YOU *GUARANTEE* THAT A LOGISTIC REGRESSION (OR ANY LINEAR) MODEL IS NOT MISSPECIFIED?
A: What do you mean by “all statistically valid models”? There is an infinite number of possible models, especially in more complex design. So, how do you test an infinite number and does such an atheoretical approach not capitalize on chance, without any educated guesses about the general structure? Would ODA be able to find the original functional form of some variables, if I would give you simulated data, where I know a priori what the population parameters are? And how does ODA guarantee(!!) it?
8) In this example the Logistic model was NOT statistically significant, the ODA model was. RAINER--DO YOU PREFER TO PUBLISH MODELS WITH P>0.05? OR, DO YOU PREFER WHEN YOUR MODEL HAS EXPERIMENTWISE P
Dear Lecturer Rainer Duesing
Sadly, you missed the train--I was hopeful you might be my fellow traveler, speeding straight into the future.
You (and others) refused to participate in an interactive conversation.
You feigned (I'm giving you the benefit of the doubt) methodological ignorance.
You (and others) demonstrated no authentic interest in modern methods.
You (and others) refused to exert any effort to educate yourself or to prove my assertions wrong--the opposite of both of which are essential characteristics of top-shelf researchers. The poignance of this fact is amplified by the reality that in every head-to-head comparison the transparent, easy-to-understand new paradigm defeats your obsolete methods, using your own data!
It is for these reasons that I appreciate your and others' (non)replies, because astute readers who possess an open mind will learn a lot about both points of view--pro-future vs. pro-past.
If/when an article on the subject of this thread (which you rejected) is published, I will make a post in this thread for truly interested scientists to take a look.
For anyone who missed the train before, God willing, more opportunity will come your way in the future. If you wish to hasten your arrival into the future, I suggest exerting a little effort and doing what authentic math-types do--that is, learn the new method. It is simple, elegant, powerful, enlightening, and fun.
With optimal wishes for all,
Professor Yarnold
I would say all of these statements are false (for my person):
1) You (and others) refused to participate in an interactive conversation.
2) You feigned (I'm giving you the benefit of the doubt) methodological ignorance.
3) You (and others) demonstrated no authentic interest in modern methods.
4) The poignance of this fact is amplified by the reality that in every head-to-head comparison the new paradigm defeats your obsolete methods, using your own data!
ad 1) this is strange, since I participanted very vigorously in my opinion, I showed my results and asked question, where I felt you did not answer them. If you have the impression that you answered them, but I did not understand them, I would ask you now to clearly state how it is done and not referring to your books (which I do not own. And even if I would order them now, they wont be available for me to participate in this discussion). You explicitly requested anwers to your questions in your last post, for which I gave answers to every single one of them, to my best knowledge, but since some concepts are not well defined between the two of us, threre seems to be some work ahead. I am still interested in new methods, never the less.
ad 2) Why ignorant, I am willing to understand, but there is little to work with, just some results. Please show us HOW and WHY. Some formulas would help for example.
ad 3) how do you know that my interest is not authentic? I would say this is an allegation. True is: I am not convinced of ODA, yet, since your best evidence you provided so far was a p-value smaller than that of a t-test/logistic regression, and a crosstable, which was not very different. Nothing more. There is more evidence needed than that to convince me. Throwing around buzzwords like "optimal" or "novometric" does not help to understand it. Please provide sound arguments for your method.
ad 4) How exactly defeated your ODA analysis the bayesian results?? Is this a race? How did you quantify that ODA "was better"? What was your criterion? What was your dependent variable to assess it and to come to this conclusion?
I think it is not really "optimal" to quit a scientific discussion, because you did not convince someone in the first attempt. I am sceptic and dont jump on bandwagons (or trains) all too quickly. So, as a scholar you should be able to defend your position, even if you did not succeed immediately.
Oh, look out the train window--isn't the future clear and clean? Oh boy!
Time is precious, and you require a sufficient level of self-education before we can discuss what I tried several times in this thread to communicate.
Good luck!
Is this the "runaway train" you are sitting in? Since you answered again, I would assume it is not over, yet.
Was your last argument to make an argument from authority Mr. 'full professor'? Really? ;-)
Ok, I quit, your are apparently not interested in true education and teaching people, but selling your products. Good luck with that mate.
Dear Rainer Duesing
I was very interested in the example you brought up. And I really appreciate someone who takes the time to dwell into bayesian methods.
I was surprised to see how the two approaches gave such different results.
By no means, this is a criticism to you or what you presented: I just think that there are a few things that needed clarification and I see too often that there is an excessively hard opposition between bayesianism and frequentism (in the Neyman-Peason sense, which is how you use it), while the two schools had completely different purposes to start with.
Article Mindless Statistics
), but more so a "ritual" that became popular in the 50s.Article Bayesian inference and the parametric bootstrap
) you would obtain roughly the same results: the p-direction would still be 100% as in your case and the 95% HDI would be {-3.8,-2.3}, which is similar to what you find. With that sample size obviously the priors have a strong impact on the results.Dear Stefano Nembrini ,
Thank you for your contribution and let me start with your 8th point: I would say that I am both, Bayesian and frequentist. 😉 My personal opinion is that frequentistic statistics have it’s value (in the vein of Fisher or Neyman and Pearson, where I would rather go with Fisher, since truly repeated samples are very hard to find in psychology/social sciences, so that N&P is not really an option, where we have no measures in the long run…), but that the application is often very flawed (and that is not the fault of the methods itself!), resulting in the mindless NHST and declaration of “significance”. My whole example was not to demonstrate and contrast Bayes vs. frequentists, but just to show different methods and that they provide sometimes more or less information. I kept some things rather short for the sake of discussion and not because I would handle it that way with real data.
1) I would agree, that this is close to “significant”, but typically this would not considered as such in most of the publications. I totally agree that a p-value does not tell the whole story and personally, I would not rely on it (there is no real difference in p=.08 and .04). I would rather look for effect sizes and confidence intervals for effect sizes to get an impression of the effect.
2) You are right about the t-test, BUT it is still one of the most used tests around! So, I think it is fair to compare in this case BEST (you were right) to THE standard approach. I did not say that you cannot do other things.
3) I would totally agree again. After screening the data I would have chosen other robust methods than Welch’s t-test (but see 2. why I didn’t, and Welch is already a more "robust" test, which you won't find that often in publications). I am a huge fan of Rand R. Wilcox and his books on robust estimation techniques (maybe you know his work), therefore different methods would be more appropriate, but this would depend on the data and the research question. Maybe a complete different approach is sometimes useful, e.g. generalized linear models, when you know that you have counts or bounded data, where you a priori expect different distributions.
4) Explaining the priors would have been beyond the scope of the discussion again and was not the point. But in that example I stuck with the defaults of BEST (typically, I would use the Kruschke scripts for more flexibility. At the moment I try to work through the R bookdown from Solomon Kurz, who “translated” the complete Krusche book into the brms package! https://bookdown.org/content/3686/ ). In the example I used a t-distribution for the noise parameter, an exponential distribution for nu of the t-distribution, a normal distribution for mu and not sure if gamma or uniform prior for sigma. The “subjectivity” argument is old and I do not support this view (especially with non-commital priors), since it is open for discussion and different views/priors may be compared if necessary.
5) I am absolutely aware that “significance” does not mean “practical relevance”! I succinctly wrote it as “frequentist talk”, since I have the impression that most of the times results are reduced to p-values or even worse dichotomous decisions and 95% of all papers are frequentist. So, maybe I exaggerated here. For the decision making process: I fancy Kruschke’s method of parameter estimation and not testing against a strict 0 value, but to declare regions of practical equivalence (ROPE) and test how compatible the posterior distribution is with this ROPE. So you can have “significant” results, which are “relevant” at the same time, since it would only declared significant if your posterior is outside of the ROPE (for which you already said that it is of no relevance). But you also can have evidence for the Null (posterior inside ROPE) or inconclusive results. That is very appealing for me. I would rather have a well gathered sample and a parameter estimation than a p-value. As you said, it is about the parameters (and precision), this is what we want to interpret in the end. Therefore, I do not like bayes factors so much, since I have the impression that they could be used as reductionistic as the p-value (although it has other advantages of course!)
6) Can you elaborate on that point? I do not know what you are referring to.
7) See 5., I agree. But we do not necessarily stick to a “NULL” in the sense of a fixed value. Even in frequentistic approaches, we could use the equivalence tests to assess a null-area instead of a point estimate.
I hope it became clear that I am not a Bayesian hardliner, but always searching for appropriate methods to understand the data and answer my research questions, and modern Bayesian approaches seem to do this quite well in my opinion. My example was not exhaustive and was not intended to be. 😉
Best, Rainer
When you refer to heteroscedasticity and skewness do you mean the residuals ?
If so you just need to use inference method that aro robust
If you mean the variables that you include in the model you have to remember you are modeling the expected value which might not be the most useful if your variables are skewed
Re-express your data. If the skewness is to the high end, try log(y) first, but be open to other functions. Chances are, the right re-expression will improve both skewness and heteroskedasticity.
On October 28 I said "I worked with a great deal of highly skewed data. Because there are large differences between sizes for members of the population for any given item, you can expect to see substantial heteroscedasticity." Then I gave a tool for doing WLS regression. Rather than hit or miss with a transformation, and then interpreting the new variables, the tool I gave takes you right from OLS to WLS with no transformations. You just keep the heteroscedasticity in the error structure where it belongs and is also used in the regression coefficients.
I then heard from the person asking the question, and he actually seemed to have other issues. Then other things were discussed. Now this seems back to the original question, as asked.
@Mart, Heteroscedasticity has serious consequences on OLS estimator, no longer the BLUE (Best Linear Unbiased Estimator.). important not only to suspect but test and visualized on residual plots too hence identify the sources perhabs outliers , or others relating generating process. Usually, as rightly offered by James R Knaub , informed Weighted regression, the idea is to give small weights to observations associated with higher variances to shrink their squared residuals. When you identity and use the correct weights, heteroscedasticity is replaced by homoscedasticity. Good luckbby @ibiloye
Because skewness often makes the natural heteroscedasticity in regression, due to differences in sizes of predictions, more obvious, I expect observed heteroscedasticity and skewed data to go hand-in-hand. Actually, however, it is often less obvious with multiple regression, which I think is a consequence of being more difficult to find a model (the predicted values) which 'fully' describes the behavior of the y-variable. If more variables are actually needed, the relationship is often a bit murky. If more variables are used than are actually needed, that is problematic as well. And getting exactly the ones needed may also be difficult or impossible to know and/or collect.
Please see the following:
"When Would Heteroscedasticity in Regression Occur?"
Preprint, June 2021, J. Knaub,
https://www.researchgate.net/publication/352134279_When_Would_Heteroscedasticity_in_Regression_Occur
I believe it best to leave heteroscedasticity in the error structure and model it, not to transform to try to 'remove' it. There is often nothing wrong with having heteroscedasticity in regression, and often it is a sign of a flaw or flaws if you do not have it. The above article discusses that. A slightly improved version appears in the October 2021 issue of the Pakistan Journal of Statistics.
As I noted above, on September 2, 2021, skewness highlights the difference in predicted-yi size measure, and may make the natural heteroscedasticity more pronounced. Thus a regression weight would be needed. This is very natural. See my featured items on https://www.researchgate.net/profile/James-Knaub.
Seeing both skewness and heteroskedasticity cries out for reexpressing y. Should fix both and leave a simple OLS model.
A transformation may still leave some heteroscedasticity, and unnecessarily complicates interpretation. See my responses above.
An article on heteroscedasticity in regression, in case anyone has an interest:
Knaub, J.R., Jr.(2021), "When Would Heteroscedasticity in Regression Occur?" Pak. J. Statist., Vol. 37(4), 315-367, https://www.pakjs.com/,
https://www.researchgate.net/publication/354854317_WHEN_WOULD_HETEROSCEDASTICITY_IN_REGRESSION_OCCUR
Thank you - Jim Knaub