I want to use some categorical variables in DEA using DEAP software. I want to know if we can use binary data in DEAP. If not, is there any procedure in R or any other software that may help me with that.
I guess this would be a problem because with binary (0,1) variables it would only take an alternative to have a value of 1 in one criteria/variable (output for example) to be on the efficiency frontier, since DEA maximizes the weights of the criteria that each DMU (alternative) excels in. With a weight of 100 % on that variable/criteria, such DMU (alternative) could never be deemed inefficient. And about 50% (in average) of the alternatives would satisfy that.
I suggest that you aggregate variables/criterias (Outputs and/or inputs). For example aggregating 3 binary outputs you would already get some more discrimination. Your new variable would have the following value levels:
1+1+1=3
1+1+0=2
1+0+0=1
0+0+0=0
This sort of variable redefinition could help overcome the problem. Since the results might depend on the chosen aggregation I would do it the most reasonable way (from a modelling perspective) aggregating the variables according to their meaning.
I would suggest to have a look on the following studies by Banker and Morey (1986) and Forsund (2001). Although both are bit old but may be helpful for you. Here you find the links to both publications: http://astro.temple.edu/~banker/DEA/11%20The%20Use%20of%20Variables%20in%20Categorical%20Data%20Envelopment%20Analysis.pdf
http://www.icer.it/docs/wp2001/Forsund6-01.pdf
Both focused on use of categorical variables with mix of continuous variables into DEA.
Actually, the basic assumption of the original DEA is that the variables are on a ratio-scale. Thus the traditional DEA can be applied to binary variables. However, there are some ways to overcome this problem:
1. If you have many binary variables you may use their linear combinations with positive weights. Those variables are approximately ratio-scale ones. Principal component analysis is not working, even if it is much used, because some weights are negative. It means that originally inefficient variables may become efficient.
2. Another way is to use a completely different technique in the spirit of the original DEA in the sense that an effficien score is computed in the different way as in the oriiginal one. See, the reference below:
Dehnokhalaji, A., Korhonen, P. J., Köksalan. M., Nasrabadi, N., and Wallenius, J. (2010): “Efficiency Analysis to Incorporate Interval Scale Data”, European Journal of Operational Research 207, pp. 1116-1121.
In most papers, principal component analysis is used, but as I mentioned, it is problematic. If your variables have a hierarchical structure, you could use AHP (=Analytic Hierarchy Approach (see, Saaty 1980, or Saaty 1986).
How about when exogenous variables are binary ? I am running this model on R (rDEA package- Simar and Wilson 2007 model), however I am not sure whether it is considering binary variables as dummy in the truncated regression model (I have declared these variables as factor beforehand).Please guide.