One of my time series variables has positive and negative numbers. I like to take the log to base 10 of this series. What is the most intuitive and correct method?
There are some extremely simple approaches such as adding a constant to ensure all values are positive; however, I would recommend looking at Plots, Transformations, and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis, byAnthony C. Atkinson.
Without knowing more about what you are doing, there likely to be challenges raised by mathematicians and statisticians. For example, interpreting what you found may be an issue among other things. You might share more details and others might offer suggestions.
Squaring each data first and then find the log will be appropriate in this case. The data may not lose its property. However this may change the policy implication. The best way is to use the variance approach over the mean of the data.
Let's emphasize our preferences (after acknowledging the problem's specifcites, see Sergei's nice example). I do like Xiouguang's answer because it is a neat tranformation of a real variable. Everybody knows a positive real variable x (not so small) can easily be transformed into y=log(x) and then, if the application is as simple as a regression, the coefficient is actually called an 'elasticity'. If x is a Price, then the x can easily be expressed in terms of cents instead of in terms of dollars OR the good for which x is paid can be redefined from kilos to tons, thus there will not be any problem with the asymptote (because x, the price, cannot be "small" enough). Provided that those pre-transformations are made (in physics you can also transform any measure into another), I do not see any problem with my previous easy proposal: y=log(x) if x>0 AND y= - log( - x) if x
To answer to Carlos R. Barreta, I think there is a problem with the solution he provided. In most of the cases, when you want log transform your data, it is usually because you need a normal distribution of your variable. If you apply the solution proposed: y=log(x) if x>0 AND y= - log( - x) if x
Not quite. First, the need for a normal distribution is understandable but not included in the original Hammad Hassan Mirza's question. Second, our need for a normal distribution will not be satisfied by neither summing a constant nor using my solution. Since our data z is just given and we can make many transformations to it, say t(z),, then we must test whether t(z) is STATISTICALLY normally distributed or not. And this, dear Andrea, should not be made by just asssuming it and the using the log-normal distribution's properties! Clearly, somebody who is accustomed to assume whatever she needs, she will eventually asume too much!! :)
Hi all, I have a similar problem. I want to get your comments on what I did to my data:
a) I have a series of comparative data (2 groups) both in negative and positive values
b) I transformed them into log using =sign(x)*log((abs(x))+1,2) into log base 2.
c) Then I performed a t-test onto the transformed data and calculate the changes by deducting data from one series to another.
d) Please look at the excel file that I have here.
Ultimately I want to know if there are any significant different between the data collected from the 2 different groups. The differences calculated in log was then converted back to absolute value.
If you transform x, such that xi*=xi+min(x)+1 and regress yi = b1+b2xi*+ei then b2*= (dyi/dlnxi )(xi+min(x)+1)/xi not the usual semi-elasticity dyi/dlnxi . The marginal effect of a proportional increase in xi , is now b2*xi/(xi+min(x)+1) and depends on xi .
You can of course calculate an average marginal effect over the sample or an average marginal effect at the mean using this marginal effect formula if you wish to use this transformation.