Many papers discuss 'least median squares' or 'least trimmed squares' but could anyone suggest a simple procedure? Is box-plot based determination really worthwhile for time-series?
I agree with what several others have said about detecting outliers. However, within the context of time-series, there is another type of outlier. Let's say that you have a series with given time-series properties - an ARMA structure reflecting the trend and cycle of the series. An observation "out of cycle" can be considered an outlier (the term for this is an "innovation outlier" - see Mills' classic textbook, p. 241 referencing Fox, 1972). This would not show up on traditional boxplots. You have two options to catch these type of outliers. One is graphical, you can use a time-series boxplot (basically a boxplot for different time periods). The trick there is that you need to have a short enough timeframe to be able to catch the cycle. A better option in my mind is to use a time-series decomposition routine such as TRAMO/SEATS or X-11/12/13 ARIMA to decompose the time series into trend, cycle, seasonal and error (most often called "irregular" in program output). Look for the large errors. And John is absolutely right in that this is potentially information that you can use to figure out data cleanliness. But it can also be information to help you check for things like policy changes or structural breaks in the data.
Box-Plot is widely used technique that clearly figure out "outliers". Beside, one may follow the lower and upper bands of 2SD or 3SD, even it will identify outliers.
You can simply designate a number of standard deviation from mean value at which you deem to be an outlier. As Nawaz says, you could choose 2 sd or 3sd or 1.96sd or even go for "big" outliers 4sd...
For a time series, you can just create a dummy variable(s) for those observations you deem to be too far from the mean to be credible.
You are right, James, but you must be quite careful with that strategy: you must analyze first if there is a ¡n underlying economic reason for the outlier... sometimes, strange things actually happen.
I would suggest to adapt methods described and programmed by Verardi and Croux (2009), Dehon and Verardi (2010) as well as Jann (2010). Their work focuses on cross-section and panel data, but it should be a very good starting point for time series.
Let me also refer you to standardize"Outsiders in Statistical Data" by Barnett and Lewis (1995). You would find many ways of detecting outliers there. I am also working on outlines detection in time series data.
Let me also refer you to "Outliers in Statistical Data" by Barnett and Lewis (1995). You would find many simpler ways of detecting outliers there. I am also working on some outlier detection techniques in time series data. - sorry for the errors in the earlier version.
The simplest statistical method is based on interquartile range (IQR) which is a measure of variability of the data.
IQR is calculated by this equation: IRQ = Q3-Q1
Here Q1 and Q3 are the middle values in the first and the third half of the data set respectively. An outlier is any value x that is at least 1.5 interquartile ranges below the first quartile Q1, or at least 1.5 interquartile ranges above the third quartile Q3. One of these equations should be satisfied:
xQ3+1.5×IQR
Boxplot (Box-and-Whisker plot) can be used to display the outliers graphically.
I think you should start from the rationale of your study. Would the outliers potentially be causing bias, less efficient, or what? Relative to your your objective, you can then analyze which outliers are the problems. Exercises like quantile regression might be of help.
This test is based on the Wilks'method (1963) designed for detection of a single outlier sample from multivariate normal and approaching the maximun squared Mahalanobis distance to a distribution function F by the Yang and Lee (1987) formulation. A significative squared Mahalanobis distance means an outlier. To use it you must also download the ACR code, available in FILE EXCHANGE of MATHWORKS
Outliers can exist for many reasons, not least of which is an error in the data source. Far too many people use data without ever looking at it. Statistical analysis should not be a substitute for common sense and the first thing to do is look at a chart of the data (levels, changes etc.) Often a problem with the data will leap from a chart without any further work being necessary. Sounds old fashioned and not very scientific but it can avoid a lot of errors.
I agree and never suggested discarding outliers. They merely warrant an investigation. I find lots of errors in trusted data sources--especially in daily financial data sets.
I agree with what several others have said about detecting outliers. However, within the context of time-series, there is another type of outlier. Let's say that you have a series with given time-series properties - an ARMA structure reflecting the trend and cycle of the series. An observation "out of cycle" can be considered an outlier (the term for this is an "innovation outlier" - see Mills' classic textbook, p. 241 referencing Fox, 1972). This would not show up on traditional boxplots. You have two options to catch these type of outliers. One is graphical, you can use a time-series boxplot (basically a boxplot for different time periods). The trick there is that you need to have a short enough timeframe to be able to catch the cycle. A better option in my mind is to use a time-series decomposition routine such as TRAMO/SEATS or X-11/12/13 ARIMA to decompose the time series into trend, cycle, seasonal and error (most often called "irregular" in program output). Look for the large errors. And John is absolutely right in that this is potentially information that you can use to figure out data cleanliness. But it can also be information to help you check for things like policy changes or structural breaks in the data.
I recommend you to use TRAMO (Time series Regression with ARIMA noise, Missing values and Outliers) developed by Agustin Maravall from the Bank of Spain. You can free download from the following url: http://www.bde.es/bde/es/secciones/servicios/Profesionales/Programas_estadi/Programas.html
This is well known program which is used by many national statistical institutes and some national banks. It is also include in the commercial software Eviews.
I agree with Kenneth, I don't think a box-plot or some other statistical outlier test can detect all types of outliers. For example, there is the possibility of a 'trend' outlier or a 'seasonal' outlier.