It kind of serves the same purpose of making all features comparable. If the clustering algorithm uses the euclidean distance (e.g., k-means), then you implicitly make some sort of isotropic assumption and the results can be very bad if one axis is very skewed compared to the other.
To my mind, the only order that makes sense is log then standardization, since the effect wanted is to "unskew" the axis wise distributions and that effect is maximized when you apply the log on the full dynamic range compared to applying it on variables with a unit standard deviation.
But before doing that, you should really plot the distribution for each axis and see if it's badly skewed. If that doesn't seem to be the case, you probably don't need the log and a simple centering+standardization should be fine.
You can also try the power normalization (sign(x)*abs(x)^a, with a typically between 0.1 and 0.5), which is also very good to make features comparable.
It kind of serves the same purpose of making all features comparable. If the clustering algorithm uses the euclidean distance (e.g., k-means), then you implicitly make some sort of isotropic assumption and the results can be very bad if one axis is very skewed compared to the other.
To my mind, the only order that makes sense is log then standardization, since the effect wanted is to "unskew" the axis wise distributions and that effect is maximized when you apply the log on the full dynamic range compared to applying it on variables with a unit standard deviation.
But before doing that, you should really plot the distribution for each axis and see if it's badly skewed. If that doesn't seem to be the case, you probably don't need the log and a simple centering+standardization should be fine.
You can also try the power normalization (sign(x)*abs(x)^a, with a typically between 0.1 and 0.5), which is also very good to make features comparable.
For biological/biochemical data that is strictly positive, I generally recommend using the logarithms, even when the data at hand does not show a severe skewness.
@David What if the variable has negative values. Wouldn't be more appropriate first to scale into positive values (between 0 and 1 for example) and then apply the log transformation? I am assuming the proposed log transformation is log(a + x)
It's perfectly possible to deal with negative values by using another transformation method to un skew the data. I.e Cube root transformation. The other approach is to add to your data a consistent value to make the negative values all positive e.g +1000. However, you have to do this on both the features otherwise they are no longer comparable. Then standardisation to a Z score is performed. If dealing with negative values I typically use a different transformation method as I can't guarantee future data won't be negative.
Hi , I would like to confirm from above discussion that log transformation has the similar effect as the standardscalar to prepare unskew data for SVM ? Which will convert the data to a normal distribution curve type ?