if among 50 utilities some have data available for some years and some utilities among 50 have data for other years and if want to calculate technical efficiency for all of them using DEAP how can we do so?
OPTION 1: Delete the items with missing data. The remaining set has no missing data. This approach may produce bias. The result may not be representative of the intended forecast subject mater.
OPTION 2: Use imputation. Imputation means use the most probable values to substitute the missing data. in order to properly impute missing values, it is necessary to verify the pattern of the missing data. See attached files.
Generally, there are 2 approaches:
(i) Monotone Missing data. If the pattern of the missing data is monotonal, use parametric regression that assumes multivariate normality or non-parametric that uses propensity score. An article dealing with this score is attached.
(ii) Arbitrary Missing Data. If the pattern of the missing data is arbitrary, i.e. no structure or predictable pattern, apply the Markov Chain Monte Carlo (MCMC) with assumed multivariate normality.
In both (i) and (ii), you might want to run a randomness tests to verify whether the missing data is a random occurrence. After all the data has been filled (imputed), you are ready for TE calculation.
STOCHASTIC FRONTIER: If starting with the stochastic frontier model:
(1) Yi = f(x; B) exp(vi)exp(ui)
x = input vector;
Y = output by producer i;
f(x; B) = production function where B = technology;
exp(vi) = stochastic component of production function; and
exp(ui) = inefficiency captured by noise level ui.
The technical efficiency may be given by:
(2) TE = Yi / [f(x; B)exp(v)]
A unity means that the system is 100% efficient. The noise ui = log / TE. The production function f(x) may be in the form of Cobb-Douglas or trans-log.