Hi David, I don't know what redefining a variable would be, but I can say with more certainty what transforming a variable is. When we transform a variable, it is generally because it has distributional properties that make it unattractive for a statistical procedure we want to use. The essential property we want when we make a transformation is that it is monotonic. You can look this up, but essentially we don't mix up the order of things. Case 7 stays between cases 6 and 8, assuming, if you will they are in rank order. So if I take the natural log of a variable or I square it, this is always true. These are variable transformations. Now, sometimes we do something really radical to a variable, and the medical literature is particularly prone to this, we throw away information. We might have measured everyone's weight in kilograms, but we decided to throw away most of the information and classify everyone as :healthy, overweight, or obese. Or even healthy and obese; medicine is particularly fond of putting things into just two categories and then very fond of odds ratios. Maybe it's because their ultimate outcome is alive or dead. In any event, I guess I might call that redefining the variable, because it no longer seems like transforming the variable to me. Case 99 (obese) may still be above case 1 (healthy), but I have thrown away all the information about where cases 97 and 98 are relative to 99 and where 2 and 3 are relative to 1, whereas a monotonic transformation would have preserved all that. Bob
The definition of a variable includes the operationalization of an attribute, that is the precise formulation of the procedure how/what is observed and how it is translated to a value ("value" can be numerical or categorical). A transformation, in contrast, does not relate to the way values are obtained from observations but simply means that these values are transformed by some mathematical function. Thus, a re-definition of a biased variable can reduce the bias, whereas a transformation can never reduce any bias.
Practical example: the variable "length" can be measured in meters, obtained by comparing the distance to a given on (the "meter"). Distance may also be measured as time, for instance the time required to cover this distance by some means (light-year is a prominent candidate here). The way the distance is measured defines what kind of errors can be done, it defines the bias and the possible precision. I can transform "lightyears" into "meters", but any precision and bias I will have for these "meters" (obtained from a transformation of light years) are those of the "lightyears scale", and not those of the "meters scale".
I agree with Jachen for redefining a variable. It is subjective issue that a variable may not be appropriate for the research one is doing, so it needs to be redefined. For instance, if a inflation is one of the variables, we can pick CPI, SPI, WPI, PPI, Core Inflation, Inflation YoY, Inflation MoM, or GDP Deflator. These all represent inflation, it depends upon the research which measure of Inflation is more appropriate.
Transformation, on the contrary, does not represent different measures of a variable BUT is done by applying some arithmetic operations. these arithmetic operations are the demand of modeling like if inflation (any measure) has unit root issue then First Difference is obtained to carry on analysis. Similarly the variables which are in currency like Money supply, GDP, BOP, BOT etc., log is obtained to transform the variable.
I kindly could say that it is what i understand your question,
Re defining data is giving new value or giving a condition your variable. e.g. there is age data and u want to say less than 15 is young so on. But transforming the data, i have been working on Box-Cox power transformation. A variable is not following normality and you are using kind of techniques to make your data normal. Basically taking square is also transforming the data.