I wish to know how to determine the most suitable probability distribution for a data in case I wish to calculate the means, Standard deviation, variance, etc?
I hope I understand your issue correctly. If so, I would suggest you look into literature to see how similar constructs 'behaved' across various research example and then you can obtain some information on what to expect regarding your data - means, SDs, and variance.
By using different Goodness of fit measures. It my be Chi-Square test, K.S test, Darling test, Relative Mean Square error, Mean Absolute Deviation index etc.
The first you have to make sure that your data are of random type, i.e., they possess the property of statistical stability. If the property of statistical stability is not violated, with out hesitate use the methods of the probability theory, in the opposite case - the methods of the theory of hyper-random phenomena.
To calculate the mean, variance, and perhaps higher moments directly from data you don't need to assume any particular probability distribution. Just use the definitions of those quantities.The question what probability distribution best represents your collected data - that's another story. Here you need higher level of statistics, namely hypothesis testing apparatus.
The easiest thing to do is to use a software package including several distributions, so you could try many of them and have an idea on which ones relates better to your data. SPSS, STATISTICA, MATLAB and R are good choices. Any of them will allow you to perform a fitting procedure without having a deep knowledge on statistics.
Hello everyone. I need a help regarding fitting marginal distribution on drought Duration. I am applying Gamma, Exponential, Lognormal, Loglogistic and Weibull Distributions. But null hypothesis is rejected every time. Drought Duration is discrete data like (1,1,2,1,1,3,2,1,1,1,1,4,1,1,7). I also apply poission distribution but it is not fitted. Can you please tell me how to fit distribution on that data or convert discrete data into continuous data. Thanks
Replicated data contain exactly the same information as the original set. On the other hand the claim "the lie repeated one thousand times becomes truth" got some popularity long time ago and is still under attempts to proliferate.
More seriously: if your 'replicated' data are in fact new ones then yes, their mle estimates will certainly be better (more precise, closer to reality) after 10000 'replications'. Roughly 100 times better.
if your data is in the form of probabilities at several points, then you can construct your cumulative continuous distribution function as follows.
Select four key points from your data, I.e origin O (0, 0), quarter point N (0.25, PN), middle point M(0.5, PM), and end/truncation point T (1., PT). Then compute two control parameters aM=PM/(PT-PM), aN=PN/(PT-PN), & b=Log(aN/aM)/Log(DN/ON). Use control parameters to define the probability distribution as. PF=PT*aM*D^b/(O^b+aM*D^b), in which D & O are the so called state functions defined as functions of the state variable t as follows:
D=0.25(1+6*t^2-4*t^3-cos(pl*t))
O=0.25(3-6*t^2+4*t^3+cos(pi*t))
where t is the state variable (between 0 & 1), defined as follows:
t=(x-xO)/(xT-xO), in which x is working coordinate, and xO & xT are coordinates of the origin and truncation point respectively.
Fo more information and worked examples refer to the literature and the researchgate and look for phenomenon functions, state functions, and the state based philosophy, and the Persian Curves.
Dear Zulfigar Ali, thank you for your comment. My proposal for fitting distribution curve, is a universal one, and is complete. It is a replacement for all distributions in the literature. I have used and verified it in different fields for decision making such as all branches of engineering, economy, management, earthquake and other hazard engineering, and etc. One may fit a curve via insertion of the formulation in few column of excel. best regards
Wahab Adewuyi Adejumo ......Mostly used Goodness of fit (GOF) tests were used to identify the best model. Goodness of fit tests like Anderson Darling (AD) test, Chi-Square (χ2) goodness of fit and Kolmogorov Smirnov (KS).
Available packages of R-Language or Used Easy_Fit Software to find.
I looked at the literature to several R Packages for fitting probability distribution functions on the given data. Depending on the data different packages proposed. A close insight into these packages revealed that, overall the Universal Persian Curves proposed by Ranjbaran,s research team, is the exact simple and cheap for the job. You may look at the file in two post before this. You may test it. The proposed formulation is a logical alternative for all R packages in the literature.
I would be very pleased to receive your comment, here, and/or via
I am working on extreme in R and I have to estimate the parameters for Gamma-Pareto and Gamma-Generalized Pareto Distributions using "MLE", "L moments" and "Adaptive MCMC" methods using R. Could you be able to help me with the R code for fitting Gamma-Pareto Distribution and the Gamma-Generalized Pareto Distribution, estimating the parameters estimation by mle, l moments and adaptive MCMC methods?
Use AgriMetSoft distribution calculators. It has several distributions consists of Gamma.
https://agrimetsoft.com/distributions-calculator/
In addition, the new version of "Data-Tool" (It's an Excel add-in of AgriMetSoft) has a future to find the best distributions between 12 distribution. Just by one click in excel.
hello and hope everything is fine. I had the honor to be your student in Shiraz University almost ten years ago.
I want to ask you to send me a simple example solution of Universal Persian Curve(for example for 20 or 30 discrete numbers) if it doesn't make trouble for you.
Statistics don't just look at the data and calculate the average, what statistics try to do is to find the original probability distribution from which the collected data originated.
You can use probability distributions to model and predict the outcomes of your system.
You mentioned 4 out of more then hundreds. For a given probability cumulative data, necessarily non of them may be suitable. Every day one may observe 10th of new papers for fitting a curve on probability data!
But via the so called Persian probability function, one may fit the function on the data in an easy way. The attached file may be useful.