What does the multifactor dimensionality reduction whole dataset statistics show?

I'm not sure I follow you or understand what you mean by "whole dataset statistics". All dimensionality reduction methods are "whole dataset statistics" as they take the "whole dataset" and project it onto a lower dimensional space. The example I give to beginning students is JPEG image files. The modern digital camera can take pictures with a resolution so high the file is comparatively massive (compared, that is, to .jpg, .gif., and other common image files). In order to post such pictures online, send them in a text message, etc., the size of the entire picture file (the "whole dataset") must be reduced. Simplistically, this is done by using dimensionality reduction methods that find "parts"/data points of the original picture file and projecting them onto the new lower dimensional space as a single data point. The point is that the statistical methods used here act on the entire image file (all the data points).

However, you might mean something very differently by "whole dataset" and/or "whose dataset statistics", but I am not sure what. This is particularly true because ROC analysis, like area under the roc curve, all forms of regression analysis (linear, logistic, etc.), multidimensional scaling (and most scaling techniques), and more are also "whole dataset statistics" in the only sense that I can think you mean by this term. Could you elaborate? Would you consider PCA or FA "whole dataset statistics"?

Peter V. Zolotukhin

Dear Andrew, unfortunately I'm not a newbie in statistics which would be the easiest and best solution for the problem. I've worked and work with most criteria used in molecular biology/biochemistry/epidemiology/diagnostics, including dimensionality reduction methods (inc. PCA).

MDR is a human/molecular/population genetics statistical method offered by MDR software. It's a highly specialized tool that has its own terminology. So, if interested, please reffer to the paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268866/. I also attach an example of the MDR output presented in Supplementary materials of this paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3500181/ - where you can see a "whole dataset statistics" section.

Andrew Messing

Dear Peter V. Zolotukhin:

I didn't mean to imply you were at all a newbie and apologize for apparently doing so. I assumed the complete opposite, as I would for anybody who asks about statistics in terms of training, especially instead of the standard statistics that have been around since before computers (many of which were known to be inferior to alternative methods even then, but were to computationally demanding). I simply thought that there was a breakdown in communication due to different uses of terms or just different terms.

It is not entirely true, though, that MDR either uses terms specific even to genetics/bioinformatics/etc., nor that it is offered through only through MDR software. The latter I know to be true because I've loaded the R MDR package (for your convenience, I have attached the paper "An R package implementation of multifactor dimensionality reduction" from the open access journal BioData Mining). The former I know because even though I've rarely seen MDR used outside of genetics, bioinformatics, etc., I have frequently seen it in volumes from Springer's Lecture Notes in Computer Science volumes such as Artificial Intelligence: Theories and Applications (Lecture Notes in Computer Science Vol. 7297) and Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (Lecture Notes in Computer Science 7833), not to mention a few IEEE articles (some of which had nothing to do with the life sciences, e.g. "The Multifactor Extension of Grassmann Manifolds for Face Recognition"). Also, regardless of the fields in which I've read about MDR, I've frequently seen it compared to PCA and other dimensionality reduction methods (not to mention artificial neural networks).

Finally, I've read a fair amount of papers on MDR for someone who hasn't yet found a good use for it, but I'm not sure "whole dataset statistics" is a technical term reserved just for MDR given that I've seen it in everything from a geophysics power-point presentations (see link) to a computer science research project (see 2nd link), to a paper on signal processing in geoscience (see attached paper). It just seems like the phrase isn't generally found anywhere, but where it is found it means exactly what it sounds like. After all, as you yourself state, your no beginner to statistics yet you are unsure of what the term means.

http://earth.esa.int/workshops/fringe07/participants/712/pres_712_perissin.pdf#page=5

http://www.cs.technion.ac.il/~cs234313/projects_sites/S13/44/site/site/scartechnion/viewer.html

Could you suggest a chitosan/TPP nanoparticles degelation method?

How do you account pre-detached cells in trypan blue assay?

What happens to cells treated with TNF/CHX/zVAD/Necrostatin-1?

Have you ever used trziol/Tri reagent/Qiazol with adherent cells grown in plastic flasks?

How do xenografts metastasize?

Custom organic synthesis services

Could you please kindly explain the chemistry/physics of DNA precipitation in the presence of PEG?

Can anyone suggest whether transfection medium must be replaced 24 h after transfection?

Can you share your knowledge on oxidized nucleobases catabolism?

How many scrambled siRNAs do you use?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Why does my protein refolded to beta sheet during thermal denaturation analysis?