I have applied a DEA model to explain the efficiency of some DMU's. I want to find a model to explain the values of efficiency obtained (between 0 and 1) depending on some independent variables.
I do not know if anybody ever tried it, but I think that DEA might also have potential as a discriminating technique not only between DMU's (as usual) but also between criteria or variables themselves. And be used as a model extraction technique.
You can use cognitive mapping (casual maps) to extract different sets of independent variables that might 'explain' the DEA ranking.
Then from the same DMU's you can run DEA consecutively for every set of those criteria/variables and verify which variables' set discrimination agrees better with your original DEA ranking outcomes. You would be looking for a correlation between the inputs/outputs of your original DEA rankings and the rankings obtained with the new sets of variables/criteria containing the possible explanations.
Maybe with a modification to DEA it is also possible to run all models (sets of variables) at once by introducing a new variable xi, where xi=1 means the variable i is being used in the model and xi=0 means it is not, and making it also a decision variable.
I was planning to use something like this in my PhD but since I am not sure I would have time to do it (it is not the core of my work) and since it might fit this question I share it here hoping it might be helpful. If you plan to use something like this just let me know so that I discard the possibility of trying it myself.
I find your question very important because to me DEA would be more helpful if it could also work as a learning technique rather than just a ranking technique that does not say how can one improve.
So my understanding is that you want to pinpoint and analyze potential determinants of efficiency differences in your data (not to determine the best model to describe the data). If so, there are two ways to do this:
1) Running DEA and building a regression model with DEA efficiency scores being the dependent and other potential determinants as explanatory variables. This is well known two-stage approach which has often been criticized and shown to produce biased results. However, you would be surprised how often it is used at least to determine which determinants could be relevant.
2) Another option is a one-stage approach based on stochastic frontier analysis (SFA) - this, however, is parametric. In case of literature of Bayesian approach to SFA we have for example Varying Efficiency Distribution models; see, e.g., Koop Osiewalski and Steel (1997) “Bayesian efficiency analysis through individual effects…” Journal of Econometrics, 76. See also papers by Tsionas for alternative solutions. I remember that there are models that allow us to parameterize efficiency using some exogenous variables (potential determinants) in the classic/frequentist approach to SFA but I simply don’t follow that literature so closely. Also, I am not familiar with a nonparametric method that would allow for a one-stage procedure such as the one above.
I am attaching a section from my forthcoming book that can provide some information about two stage application as well as how to analyze the efficiency scores and other related results of DEA. I hope it will help.
If I understand correctly you wish to calculate which variables explain better the efficiency scores.
1) DEA provides an overall coefficient based on how well we use the inputs to produce a given level of output, or given a level inputs which is the level output achieved, but nothing related to coefficients in explaining the efficiency scores.
You may need a second stage using the efficiency scores as dependent variables and then include other variables that may not be control by the management of your DMUs (environmental). For example, this could be done with a truncated regression (where the efficiency units are not used in the second stage since they tend to be correlated with the estimators, Simar & Wilson 2007), bootstrap regression etc.
The truncated regression is a non-parametric regression due to not making assumptions regarding the structure of the population. In this case since the efficiency units are not considered in the analysis; the regression is truncated in the upper limit and therefore the efficiency scores must be lower than the unity to be evaluated .
Regression models are based on the assumption that the errors have a constant variance (homoscedasticity) and therefore they do not depend on the explanatory variables or predictors use to estimate the dependent variable ( ). Heteroscedasticity implies that differences between the observed value and the real value of the population are potentially variable as the error depends on the predictors’ conditions used in the model; consequently the variance of the errors is not constant
Bootstrap methodology allows estimating a robust standard error by relaxing the assumption of homoscedasticity, when there is no assurance regarding how the errors are distributed. Bootstrapping is a technique to obtain robust estimator of standard errors and confidence intervals such as mean, median, proportion, odds ratio, correlation coefficients and regression coefficients. This is the most useful alternative to parametric estimates when the assumptions are in doubt.
2) The question is, is DEA the best way of estimating efficiency scores based on the database you got?
Alternatively, you may used Stochastic Frontier Analysis: Battese & Coelli (1988, 1992, 1995), Kumbhakar (1990)
Where you will see the impact of your inputs and outputs and you may able to include environmental variables as well explaining the technical inefficiency.
I think the best way of performing is to see the data you got, and trying to see which model explains better the inefficiency of your DMUs
I wish this provides a background of the alternatives.
If you decide to conduct the above two-stage approach (DEA as first step, followed by a censored regression), keep in mind that efficiency scores defined between 0 and