How do we ensure data is normally distributed before performing PCA and HCA using SPSS or R?

More Philip M Nyenje's questions See All

What causes bandgaps in periodic elastic laminates?

I've been trying to find an explanation for why bandgaps can form in periodic elastic laminates but so far I haven't found a good explanation. Can anyone point me to any resources which might...

01 May 2024 2,241 1 View

What causes bandgaps in periodic media?

I am aware that periodically layered media (if structured properly) can produce bandgaps, however I have never found a great explanation for this in the phononics case. Can someone please provide...

30 April 2024 9,681 4 View

How to determine anode and cathode materials for supercapacitor applications?

I observed that activated carbon is employed as anode materials and manganese oxide as a cathode material for supercapacitor application using two-electrode system in KOH electrolyte. Please...

26 April 2024 7,962 1 View

Why some materials store energy through cation while others through anions in two electrode system?

It is observed that MnO2 stores charge by means of potassium (+) ion while cobalt oxide store energy by hydroxide (-) ion while both materials are employed as cathodes in separate systems.

26 April 2024 8,079 1 View

What resources would you recommend for acoustic properties of layered media?

I am doing a literature review on acoustic properties of layered media and I am struggling to find articles, resources etc to add to the review. Does anyone know of any seminal papers in this...

22 April 2024 9,870 4 View

What is the role of geospatial techniques in land use planning and management?

importance

18 March 2024 6,521 2 View

Article for which we do not have the sharing rights?

Those wanting full text of "Poisson's equation solution of Coulomb integrals in atoms and molecules†" May 2006 Molecular Physics 104(9):1385-1389 DOI: 10.1080/00268970500462248 📷Charles...

10 March 2024 1,273 0 View

Can someone help me with the Documents and the citation for these models in APA format?

I need the full Book for these models by Nygaard, Theoretical Model of Research Productivity And Technology Acceptance Model by Davis, 1989

05 December 2023 7,201 0 View

Is there any recent Study(Author) to cite that backs up Cochran’s (1977) and Krejcie and Morgan’s (1970) sample size Determination Table?

I need this to justify a survey sample Size.

13 November 2023 4,083 3 View

Recent authors that have talked about effects of high taxation?

their research gaps, factors they reserched and findings

20 October 2023 2,486 3 View

• What the possible Persistent Organic Pollutants and Heavy metals present in fluorspar, sediments, and water bodies around its mining area?

Approximate concentrations are require in compared with the WHO permissible limts

11 August 2024 2,723 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Is it true that $\det(V(A))$ may be only $\pm 1$, depending on $n$, for the last symmetric tridiagonal matrix $A$?

One can try to generalize the Vandermonde determinant in the following direction: Let $A$ be any symmetric $n$-order square matrix. Consider its powers' diagonal elements $(A^k)_{ii}$ and...

08 August 2024 6,690 1 View

André I Wierdsma Popular answer

If your dependent variable is not normally distributed, you have 4 options: (1) forget about it - Central Limit Theory will help you out; (2) tranform or exclude outliers; (3) go non-parametric; or (4) use another model (gamma, poisson, negative binomial, etc). There is less of a problem in case one of your continuous predictors is not normally distributed. Regression models only assume that there is a linear relation with the dependent. So don't kick them out but look at the relationships (graphically).

Richard James Telford

Standardising data is not the same as normalising it (although some people use the terms interchangeably).

Depending on the options you use, PCA will automatically standardise the variables for you.

For example

princomp(MyData, cor=TRUE)

will use the correlation matrix - equivalent to standardising the data to mean zero and standard deviation 1.

If the data are not normally distributed it may be necessary to transform them. Plot a histogram (or better a qqnorm plot) of each variable and look if there is marked skew or kurtosis. Log or square-root transforms are common, but there is a whole range of transformations that can be used.

Philip M Nyenje

Hello RIchard, thank you very much for your answer. I have realised that most of my variables are skewed to the right. Some of them need a logtransformation and some need a sqrt root transformation. Can I use different transformations for the same dataset or we have to use the same transformation and then ignore those variables that cannot be transformed.

I will try to read more about princomp.

Adrian Otoiu

It is usually a bad thing to transform data before doing any estimation-modelling, as data loses its properties and modelling is done on something that may not reflect the underlying phenomena. This comes from my professor James MacKinnon, and his advice is usually very good. Are you sure that data has to be normalized for PCA/HCA, I saw several analyses where this was not done. Maybe you need to consult some classic textbooks on this.

Dear Vladimir: THank you for this answer. Yes I have read several papers and they suggest that data has to be normalised. I have seen that the two criteria for testing normality are also included in SPSS: Kolmogorov - Smirnov for n>50 and the other for n

In real life this doesn't really happen. I have not seen many studies when data was first normalized, at least in regression analysis.

Vladimir Bakhrushin

First of all you have to check the normality of the data. This can be done using criteria such as omega-squared or Kolmogorov - Smirnov. You must also verify the homogeneity of the existing samples. If they are heterogeneous, the normalization has no meaning.

André I Wierdsma

Formal normality tests and graphical methods will be of limited use (see the link)

You could enter a non-parametric correlation matrix in your factor-analyses.

http://www.statisticalmisses.nl/index.php/frequently-asked-questions/77-what-is-wrong-with-tests-of-normality

Here is SPSS-syntax for scale-free nonparametric factor analysis

Andre: This is a very interesting contribution. From the link you sent, the non-parametric methods do not require normality tests. I will try to read more about the nonparametric analysis syntax you sent.

Generally. I think that data are rarely normally distributed. SOme can be transformed and some cant. However, if one kicks out a certain variable (s) on the basis that it (they) violates the requirements for normality, there is a possibility of missing out important processes that could be explained by that variable. This could be an important limitation of multivariate statistics in understanding hydrological processes.