Log transformation and standardization, which should come first?

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

How to back transform the results generated from analyses using log transformed with In(X+1) data?

I am conducting my analysis using SPSS. I log transformed my data using In(X+1) as my data contain zero values. However, when I want to back transform the regression coefficients generated from my...

31 July 2024 7,860 3 View

Have you tried using Vizly for your data analysis? Use the link: https://vizly.fyi/?via=olatomide. How do you see it?

AI has made it easier to code and analyze data

25 July 2024 9,861 1 View

Is it appropriate for researcher(s) to collapse five or four rating Likert scales to three or two as the case maybe during data analysis?

Five or four rating Likert scales e.g. Strongly agree, agree, neutral, disagree and strongly disagree or Strongly agree, agree, disagree and strongly disagree are usually collapse to SA/A, N, D/SD...

24 July 2024 9,841 4 View

How to test multivariate outlier in STATA?

Hey all, I need help testing for multivariate outliers using STATA for my master thesis. The literature recommends the Minimum Covariance Determinant (MCD) (Verardi & Dehon, 2010). I found the...

22 July 2024 8,821 2 View

Reversed flow at outlet due to the release of DFBI?

Hi everyone, I am working on a simulation involving restricted canal with ship using DFBI. I am facing reversed flow in my outlet boundaries as the DFBI is released (In 1.25s). Is there any...

17 July 2024 7,032 1 View

Who wants opportunities for scientific cooperation?

Dear Colleagues, I hope this message finds you well. My name is Noor Al-Huda K. Hussein, and I am a researcher specializing in deep learning applications in genetic data analysis. I am currently...

16 July 2024 3,981 6 View

Suggestion for PhD Research Topic/Topics in Applied Statistics?

Hi All I recently get admission in PhD statistics. After a long discussion with my supervisor, the topic I selected for PhD is " Air Pollution and its impact on Economy: A case study of...

15 July 2024 1,820 5 View

David Picard Popular answer

It kind of serves the same purpose of making all features comparable. If the clustering algorithm uses the euclidean distance (e.g., k-means), then you implicitly make some sort of isotropic assumption and the results can be very bad if one axis is very skewed compared to the other.

To my mind, the only order that makes sense is log then standardization, since the effect wanted is to "unskew" the axis wise distributions and that effect is maximized when you apply the log on the full dynamic range compared to applying it on variables with a unit standard deviation.

But before doing that, you should really plot the distribution for each axis and see if it's badly skewed. If that doesn't seem to be the case, you probably don't need the log and a simple centering+standardization should be fine.

You can also try the power normalization (sign(x)*abs(x)^a, with a typically between 0.1 and 0.5), which is also very good to make features comparable.

David Picard

Jochen Wilhelm

I second David: log first, then standardization.

For biological/biochemical data that is strictly positive, I generally recommend using the logarithms, even when the data at hand does not show a severe skewness.

Mehmet Guven Gunver

none

Article TO DETERMINE SKEWNESS, MEAN AND DEVIATION WITH A NEW APPROAC...

Fernando Marcos Wittmann

@David What if the variable has negative values. Wouldn't be more appropriate first to scale into positive values (between 0 and 1 for example) and then apply the log transformation? I am assuming the proposed log transformation is log(a + x)

Craig Parkinson

It's perfectly possible to deal with negative values by using another transformation method to un skew the data. I.e Cube root transformation. The other approach is to add to your data a consistent value to make the negative values all positive e.g +1000. However, you have to do this on both the features otherwise they are no longer comparable. Then standardisation to a Z score is performed. If dealing with negative values I typically use a different transformation method as I can't guarantee future data won't be negative.

Wilson Lum

Hi , I would like to confirm from above discussion that log transformation has the similar effect as the standardscalar to prepare unskew data for SVM ? Which will convert the data to a normal distribution curve type ?