Many papers discuss 'least median squares' or 'least trimmed squares' but could anyone suggest a simple procedure? Is box-plot based determination really worthwhile for time-series?
Box-Plot is widely used technique that clearly figure out "outliers". Beside, one may follow the lower and upper bands of 2SD or 3SD, even it will identify outliers.
You can simply designate a number of standard deviation from mean value at which you deem to be an outlier. As Nawaz says, you could choose 2 sd or 3sd or 1.96sd or even go for "big" outliers 4sd...
For a time series, you can just create a dummy variable(s) for those observations you deem to be too far from the mean to be credible.
You are right, James, but you must be quite careful with that strategy: you must analyze first if there is a ¡n underlying economic reason for the outlier... sometimes, strange things actually happen.
I would suggest to adapt methods described and programmed by Verardi and Croux (2009), Dehon and Verardi (2010) as well as Jann (2010). Their work focuses on cross-section and panel data, but it should be a very good starting point for time series.
Let me also refer you to standardize"Outsiders in Statistical Data" by Barnett and Lewis (1995). You would find many ways of detecting outliers there. I am also working on outlines detection in time series data.
Let me also refer you to "Outliers in Statistical Data" by Barnett and Lewis (1995). You would find many simpler ways of detecting outliers there. I am also working on some outlier detection techniques in time series data. - sorry for the errors in the earlier version.
The simplest statistical method is based on interquartile range (IQR) which is a measure of variability of the data.
IQR is calculated by this equation: IRQ = Q3-Q1
Here Q1 and Q3 are the middle values in the first and the third half of the data set respectively. An outlier is any value x that is at least 1.5 interquartile ranges below the first quartile Q1, or at least 1.5 interquartile ranges above the third quartile Q3. One of these equations should be satisfied:
xQ3+1.5×IQR
Boxplot (Box-and-Whisker plot) can be used to display the outliers graphically.
I think you should start from the rationale of your study. Would the outliers potentially be causing bias, less efficient, or what? Relative to your your objective, you can then analyze which outliers are the problems. Exercises like quantile regression might be of help.
This test is based on the Wilks'method (1963) designed for detection of a single outlier sample from multivariate normal and approaching the maximun squared Mahalanobis distance to a distribution function F by the Yang and Lee (1987) formulation. A significative squared Mahalanobis distance means an outlier. To use it you must also download the ACR code, available in FILE EXCHANGE of MATHWORKS
Outliers can exist for many reasons, not least of which is an error in the data source. Far too many people use data without ever looking at it. Statistical analysis should not be a substitute for common sense and the first thing to do is look at a chart of the data (levels, changes etc.) Often a problem with the data will leap from a chart without any further work being necessary. Sounds old fashioned and not very scientific but it can avoid a lot of errors.
I agree and never suggested discarding outliers. They merely warrant an investigation. I find lots of errors in trusted data sources--especially in daily financial data sets.
I agree with what several others have said about detecting outliers. However, within the context of time-series, there is another type of outlier. Let's say that you have a series with given time-series properties - an ARMA structure reflecting the trend and cycle of the series. An observation "out of cycle" can be considered an outlier (the term for this is an "innovation outlier" - see Mills' classic textbook, p. 241 referencing Fox, 1972). This would not show up on traditional boxplots. You have two options to catch these type of outliers. One is graphical, you can use a time-series boxplot (basically a boxplot for different time periods). The trick there is that you need to have a short enough timeframe to be able to catch the cycle. A better option in my mind is to use a time-series decomposition routine such as TRAMO/SEATS or X-11/12/13 ARIMA to decompose the time series into trend, cycle, seasonal and error (most often called "irregular" in program output). Look for the large errors. And John is absolutely right in that this is potentially information that you can use to figure out data cleanliness. But it can also be information to help you check for things like policy changes or structural breaks in the data.
I recommend you to use TRAMO (Time series Regression with ARIMA noise, Missing values and Outliers) developed by Agustin Maravall from the Bank of Spain. You can free download from the following url: http://www.bde.es/bde/es/secciones/servicios/Profesionales/Programas_estadi/Programas.html
This is well known program which is used by many national statistical institutes and some national banks. It is also include in the commercial software Eviews.
I agree with Kenneth, I don't think a box-plot or some other statistical outlier test can detect all types of outliers. For example, there is the possibility of a 'trend' outlier or a 'seasonal' outlier.
First of all, we have to differentiate between three issues (1. outliers, 2. Missing, 3. Structural Breaks), for the outliers, you can downloading in add-ins icon in the main menu in the E-views desktop (Trimming and Winsorising), and you will overcome the value of the outlier by using Trimming, and then you can to compensate outlier value through Winsorising.
In addition, to make sure that you overcome the issue of outliers, open as a group the three series (original series, series solve outlier problem, series after solve missing data problem), then plot a graph for that using Boxplot, and you will see the results.
for missing value, in E-views you click on (pro) in the object window, then click on interpolation, and using linear methods, and to make sure that you overcome the issue of missing value, open as a group the two series (original series with missing value, series after compensating the missing value), and then you will see the results.
in terms of the structural Breaks, you can use the chow breakpoint test and then using dummy variables to solving the third issue of structural breaks.
Forecasting is one of the most important concepts in time series. To forecast truly the model should be determined that identifies the data set best. However one or more outliers in the model affect the parameters of the model and forecasting. You can start with a plot of time series