The Evaluationofthe Efficiencywith Data Envelopment Analysisincase of Missing Values: A Fuzzy Approach

ELIAS K. MARAGOS & DIMITRIS K.DESPOTIS

Dept. Of Informatics

University of Piraeus

80 Karaoli and Dimitriou

post.code:18534

Piraeus

GREECE

Abstract: The evaluation of productivity of educational units during the last decades has become an important priority for many countries. A current approach considers the schools as production units that use multiple inputs and produce multiple outputs.Data Envelopment Analysis (DEA) is a very effective methodology for the estimation of relative efficiencies in the presence of multiple inputs and outputs. The shortcoming of the requirement ofexact data in traditional DEA lately has been confronted by imprecise or fuzzy DEA approaches. In these considerations it is possible to take into account variables of qualitative manner or variables with missing values. In this paper we evaluate the public high schools of the Greater Athens Area (GAA) for the academic year 2002-2003. The problem of missing values which appeared in some school units is countered by using the methodology of fuzzy DEA and the a-level sets of fuzzy numbers.

Keywords: Imprecise DEA, Efficiency, Missing values, Fuzzy DEA, Non-controllable Inputs, Sensitivity.

1

1 Introduction

The evaluation for educational purposes is dated back to ´50s but the first quantitative approaches have been established during the past two decades. The main purpose in school evaluation is the definition of the factors that reflect the performance of the school. A current approach to school evaluation considers schools as production units that use multiple resources and produce multiple outputs. DEA provides an effective framework for the evaluation of the relative efficiency of production units with an unknown production function. These units usually are called decision making units or dmus. DEA evaluates the relative efficiency of a set of dmus by using the ratio of the weighted sum of outputs to the weighted sum of inputs. The estimation of the relative efficiencies is done non-parametrically by the construction of the frontier of best practice (efficient frontier) using the observed data. In that way a best set of weights is estimated for each one dmu. DEA has the advantage that it handles multiple inputs and outputs in an easy way. The common DEA methodology accepts that the data concerning the inputs and outputs of the dmus are known with accuracy and the dmus are homogeneous. In real life evaluations usually these two conditions are violated. Firstly there are qualitative variables which are necessary to be considered. A variable of that kind in the case of educational evaluation could be the satisfaction of the student from the operation of his (or her) school. Secondly the dmus can not be considered as homogeneous because of non controllable factors with vague values. Finally in real life surveys,we faced up very often the existence of missing values.In the above mentioned cases the traditional DEA models are not suitable for use. As the common deterministic DEA methodology could not handle imprecise data,ausual way to handle these cases was the restriction of the evaluation only within the dmus for which we have accurate data or the replacement of the vague values with exact ones. A confrontation of that kind leaded to a loss of information. Lately imprecise or fuzzy models have empowered DEA. These models extend the DEA to a non deterministic methodology which could handle even verbal variables. The existed fuzzy approaches are usually categorized in four categories: a) the tolerance approaches like Sengupta[9] b) the defuzzification approaches like Letrworasirikul[7] c) the a-level based approaches like Kao and Liu[5] or Leon et al [6] and finally d) the fuzzy ranking approaches like Guo and Tanaka [9]. In our survey we evaluate the public high schools operating in the Greater Athens Area (GAA) during the academic year 2002-2003.Our sample consists of 75 schools out of the 214 totally operating in that area. In our evaluation we take into account intra-school variables and we also consider socio-economic differences among the schools. We used the existing missing values in two schools of our sample to test the possibility of using the fuzzy DEA methodology as another mean of sensitivity analysis. We consider that the variables with missing values are of fuzzy nature. We also adopted for the fuzzy variables the triangular membership function. This leads to a sample of seventy five schools where the majority (seventy three) of schools had precise data. We examined the influence of the inclusion of the two schools upon the ranking of the rest seventy three. The structure of the paper is the following. In section 2 there is an analysis of our problem setting with a discussion of the adopted methodology.The evaluation models are based on the Variable Return to Scales DEA model of Banker and Morey [1] as full proportionality in input output measurements cannot be supported in an educational context [3]. We also recognize the fact that the non-controllable inputs could change with fuzziness and we extended the previous model [1] to a fuzzy version.

The results are contained in section 3 with a discussion. The paper ends with a conclusion.

2 Problem Setting and Methodology

We evaluate the public high schools in the Greater Athens Area, Greece during the academic year 2002-2003. The performance analysis is restricted to a sample of 75 public high schools out of the 219 schools operating in GAA. The population of GAA is about the 40% of the population of Greece. In our analysis we consider two kinds of variables, the socio-economic variables and the intra-school variables. The socio-economic variables are faced-up as non-controllable variables while the intra-school variables are faced as controllable.

To capture the socio-economic environment within which the schools operate, we created three socio-economic indicators. These are a) the educational indicator (EDU) b) the occupation indicator (OCU) and c) the housing indicator (HOU). The values for those three indicators were evaluated using data from the National Census of 1991 (National Statistical Service of Greece). All the indicators except HOU are calculated for each municipality over the age group in the range 30-50. This age group normally constitutes a potential parentage group related to the students attending the high school in the period of the study. EDU is defined as the percentage of the residents in the municipality who have graduated at least a high school over the population of the municipality in the above age group. OCU is defined as the percentage of the residents who posses a high-level job (scientists, doctors, executives, merchants, teachers etc.) over the population of the municipality. The third indicator (HOU) captures the average housing facilities in each municipality for the above age group. It is calculated as the percentage of the residents who live in a house with a number of main rooms at least equal to the number of persons living in this house. Each school takes values on those three indicators according to the municipality in which it belongs. The Pearson Correlation Index for each pair of those three indicators was greater to 96% showing a very strong correlation among them. The intra-school indicators were created using performance data for each school of the sample. As the above socio-economic indicators were found to be highly correlated, we aggregated them to one non-controllable variable (SOC) using Principal Component Analysis. Then we rescale thePrincipal Component SOC to avoid the negative values. Table 1 summarizes the socio-economic indicators with some statistics.

The performance data for the schools were collected by a survey accredited by the Ministry of National Education and Religions Affairs. The intra-school indicators considered in the study are: regular full time teachers (FTT), full time teachers equivalent (FTE), number of students per teacher (SPT), university entrants (UEN). FTT is the percentage of regular full time teachers over the total number of teachers in the school. FTE is computed as full time teachers equivalent over the school year. SPT is defined as the students-to-teachers ratio and captures the class size. To handle SPT as input, the inverse of the students-to–teacher ratio is used. UEN is the percentage of the students of the school that matriculated in one of the Universities of the GreekState after participating in the national matriculation examinations.TEN is the percentage of the students of the school that matriculated in one of the Technological Universities of the GreekState after participating in the national matriculation examinations. UENand TEN are included as outputs because a commonly accepted measure of a well-performing school is based on the records of its students in the national matriculation examinations [8].

Table 1:Statistical table for the socio-economic indicators and the non controllable input SOC
HOU / OCU / EDU / SOC
Min / 0.62 / 0.3 / 0.24 / 0.05
Max / 0.98 / 0.91 / 0.96 / 0.99
Av / 0.77 / 0.49 / 0.59 / 0.48
St.Dev / 0.09 / 0.14 / 0.19 / 0.30
Table 2. Statistical analysis of the controllable inputs/output mix of the models A,B
FTE (Input) / FTT (Input) / SPT (Input) / UEN (Output) / TEN (Output)
Min / 54 / 1 / 0.82 / 0.69 / 0.750
Max / 13 / 0.09 / 0.01 / 0.09 / 0.125
Av / 25.8 / 0.91 / 0.11 / 0.43 / 0.431
St.dev / 7.42 / 0.18 / 0.09 / 0.12 / 0.116
Table 3. The examined models and their input output mixes
Model / A / B / C
# of schools with exact data / 75 / 73 / 73
# of schools with fuzzy data / 0 / 2 / 2
Precise inputs / FTT, FTE, SPT,
SOC(75 cases) / FTT, FTE(74 cases)
SPT(73 cases)
SOC(75 cases / FTT, FTE(74 cases)
SPT(73 cases)
SOC(75 cases
Fuzzy inputs / 0 / FTT,FTE(1 case)
SPT(2 cases) / FTT,FTE(1 case)
SPT(2 cases)
Precise outputs / UEN(75 cases) / UEN(75 cases) / UEN,TEN (75 cases)

Table 2 summarizes the intra-school indicators with some statistics. We examined three DEA models A, B,C. In model A we evaluated the school efficiencies via the deterministic model of Banker and Morey [1] (model 1) and we estimated exact efficiencies for the seventy five schools. Because in the cases of the schools S1-14, S3-15 we have missing values, we replaced the missing values with the average values of the corresponding variables (in case of S1-14 the variables with missing values are the inputs FTT, FTE and SPT, in case of S3-15 the missing value corresponds to the SPT variable). In models B and C we evaluated via a fuzzy extension of the previous model and we estimated fuzzy efficiencies for seventy five schools.

The fuzzified approach of model 1 is shown in model 2. This model accepts fuzzy variables. As the precise numbers are a limitation of the case of fuzzy number this model can handle fuzzy and non fuzzy variables together. It also can handle controllable or non controllable variables.In model 2 the sign is used as the symbol of the fuzzy operator of (less than or equal) and the sign is used as the symbol of the fuzzy operator of (greater than or equal).

The fuzzy variables are considered as normalized triangular fuzzy numbers (see Figure 1) with a) centre being the average value of each variable in the sample of the 73 schools b) left spread Sleft of the fuzzy number considered to be the difference between the average and the observed minimum value and c) the right spread Sright considered to be the difference between the observed maximum and the average value with triangular membership function. The rest of the variables are considered as variables with exact data.

Figure 1. The considered membership function for the fuzzy variables

The lower and upper bounds of the a-level sets for the fuzzy inputs are evaluated by the formulas:

In order to translate the fuzzy inequalities of model 2 into ordinal inequalities we used the definition of Tanaka et al.[10].According to this definition if we have two fuzzy numbers the relation (19)

holds if:

Using the meaning of fuzzy inequality and the left and right bounds of the a-sets the model 2 is finally transformed to the parametrical model 3.In this model the only fuzzy variables are the inputs. In this model the parameter is the desired level ofpossibility of the fuzzy number.

The model 3 is used for the estimation of the fuzzy efficiencies of model C. The elimination of constrain which is referred to TEN (relation 29) leads the model 3 to be a suitable model for the estimation of the efficiencies according the model B.We must notice that, for the schools with exact data, it holds:

(33)

and

MinFTT=MaxFTT=FTTobserved (34)

MinFTE=MaxFTE=FTEobserved (35)

MinSPT=MaxSPT=SPTobserved (36)

Sleft= Sright=0 (37)

(If the non-controllable inputs or the outputs are fuzzy numbers we can transform the model 2 to a parametric model in a similar way).

3 Results and Discussion

As we mentioned, we estimated three kinds of school efficiencies: a) efficiencies with precise data set (model A), b) fuzzy efficiencies with mixed exact and fuzzy data of model B and c) fuzzy efficiencies with mixed exact and fuzzy data of model C. We solvedthe parametrical model 3for the schools of model B for a=0, 0.1, 0.2,…,1. Table 3 contains the examined levels of possibilityand the values of the left and right bounds of the ranges for the fuzzy inputs that correspond to each level of possibility.We observed that some schools attain efficiency scores equal to unity in all of examined levels of possibility. The majority of the examined schools had fuzzy efficiency scores. We ranked the schools a) according to their attained efficiencies of model A (Ranking A),b) according to their attained fuzzy efficiencies of model B (Ranking B) and according their attained efficiencies of model C (Ranking C). There are several approaches of fuzzy ranking.We used the ranking procedure of Chen and Klein[2]because we wanted to avoid any consideration about the kind of the membership function of the efficiencies. Chen and Klein propose that we must take three or more levels of possibility to take ranking results valid. For the reason that we take eleven values of possibility, our results are considered as valid. We tested the stability of ranking among the examined models. The results are included in table 5.Fifteen schools were characterized as efficient in model A.There were eleven efficient schools in any of the two fuzzy considerations. The ranking B and the ranking C were identical, i.e. the efficient schools of model B were the same with the efficient schools of model C.This means that the inclusion or not of the output TEN does not affect the set of the efficient schools and the way of fuzzy ranking among the schools was stable in any way. Our second concern was the discovering of the schools which remain efficient in any model of evaluation. We observed that sevenschools among the fifteen schools which were characterized as efficient in model Aare also efficientaccording the fuzzy considerations B,C This fact empowers the characterization of the seven schools as efficient. The total number of the efficient schools in any of the fuzzy consideration is lesser by a percentage of 25% than the total number of the efficient schools in model A.This means that the fuzzy models leaded to a kind of a better discrimination among the schools instead of the evaluation of model A.

4 Conclusion

The common problem of the robustnessof an efficient dmu was studied through the comparison of the results between a deterministic and a fuzzy DEA model. In our case the examined dmus were seventy five public high schools which operate in GAA, Greece. In our sample were present missing values. In order to test the robustness of an efficient school we extended the deterministic DEA modelof Banker and Morey [1]into a non-deterministic fuzzy DEA model which can handles controllable and non controllable variables. Using the Fuzzy Set Theory of Zadeh [11] and the meaning of a-level sets we transformed the fuzzy model to a parametric one with parameter the level of desired possibility.We believe that the fuzzy DEA models provide another way for sensitivity analysis as they use the meaning of data perturbation and data uncertainty.We also believe that models of that kind are valuable as they permit the handling of variables with imprecise or verbal structure. This fact is important in cases of educational evaluation where some very important variables are of qualitative manner

1

Table 4. The examined levels of possibility and the Left and Right bounds of the ranges for the fuzzy inputs
Level / LFTE / UFTE / LFTT / UFTT / LSPT / USPT
0.0 / 13.00 / 54.00 / 0.09 / 1.00 / 0.01 / 0.82
0.1 / 14.28 / 51.18 / 0.17 / 0.99 / 0.02 / 0.75
0.2 / 15.57 / 48.37 / 0.25 / 0.98 / 0.03 / 0.68
0.3 / 16.85 / 45.55 / 0.33 / 0.97 / 0.04 / 0.61
0.4 / 18.13 / 42.73 / 0.41 / 0.96 / 0.05 / 0.54
0.5 / 19.42 / 39.92 / 0.50 / 0.95 / 0.06 / 0.47
0.6 / 20.70 / 37.10 / 0.58 / 0.95 / 0.07 / 0.40
0.7 / 21.98 / 34.28 / 0.66 / 0.94 / 0.08 / 0.33
0.8 / 23.26 / 31.46 / 0.74 / 0.93 / 0.09 / 0.25
0.9 / 24.55 / 28.65 / 0.83 / 0.92 / 0.10 / 0.18
1.0 / 25.83 / 25.83 / 0.91 / 0.91 / 0.11 / 0.11
Table 5. The ranking of the schools for the models A, B
School / Ranking A / Fuzzy Ranking B / Fuzzy Ranking C
S1-1 / 1 / 1 / 1
S1-10 / 41 / 65 / 65
S1-11 / 1 / 1 / 1
S1-12 / 1 / 1 / 1
S1-13 / 63 / 57 / 57
S1-14 / 44 / 1 / 1
S1-2 / 34 / 63 / 63
S1-3 / 1 / 17 / 17
S1-4 / 37 / 38 / 38
S1-5 / 18 / 56 / 56
S1-6 / 39 / 60 / 60
S1-7 / 24 / 59 / 59
S1-8 / 1 / 8 / 8
S1-9 / 35 / 55 / 55
S2-1 / 46 / 56 / 56
S2-10 / 1 / 1 / 1
S2-11 / 28 / 10 / 10
S2-12 / 54 / 57 / 57
S2-13 / 1 / 4 / 4
S2-14 / 60 / 31 / 31
S2-15 / 48 / 54 / 54
S2-16 / 57 / 25 / 25
S2-17 / 1 / 1 / 1
S2-18 / 32 / 35 / 35
S2-19 / 30 / 12 / 12
S2-2 / 33 / 26 / 26
S2-20 / 74 / 47 / 47
S2-21 / 62 / 24 / 24
S2-22 / 72 / 36 / 36
S2-23 / 45 / 23 / 23
S2-24 / 1 / 7 / 7
S2-25 / 26 / 12 / 12
S2-26 / 27 / 5 / 5
S2-27 / 42 / 24 / 24
S2-28 / 49 / 23 / 23
S2-29 / 1 / 4 / 4
S2-3 / 16 / 35 / 35
S2-30 / 1 / 3 / 3
S2-31 / 56 / 22 / 22
S2-32 / 51 / 21 / 21
S2-33 / 69 / 27 / 27
S2-34 / 29 / 6 / 6
S2-35 / 50 / 23 / 23
S2-36 / 64 / 10 / 10
S2-37 / 55 / 5 / 5
S2-38 / 25 / 5 / 5
S2-39 / 47 / 4 / 4
S2-4 / 38 / 23 / 23
S2-40 / 21 / 9 / 9
S2-41 / 61 / 19 / 19
S2-5 / 36 / 5 / 5
S2-6 / 1 / 1 / 1
S2-7 / 20 / 16 / 16
S2-8 / 65 / 15 / 15
S2-9 / 19 / 1 / 1
S3-1 / 73 / 11 / 11
S3-10 / 68 / 12 / 12
S3-11 / 70 / 14 / 14
S3-12 / 40 / 10 / 10
S3-13 / 1 / 3 / 3
S3-14 / 66 / 10 / 10
S3-15 / 71 / 4 / 4
S3-16 / 58 / 6 / 6
S3-2 / 43 / 6 / 6
S3-3 / 22 / 6 / 6
S3-4 / 52 / 3 / 3
S3-5 / 31 / 6 / 6
S3-6 / 75 / 8 / 8
S3-7 / 23 / 7 / 7
S3-8 / 67 / 6 / 6
S3-9 / 52 / 5 / 5
S4-1 / 17 / 1 / 1
S4-2 / 1 / 2 / 2
S4-3 / 59 / 1 / 1
S4-4 / 1 / 1 / 1

1

1

References:

[1] R.D. Banker, R. Morey, Efficiency analysis for exogenously fixed inputs and outputs. Operations Research, Vol.34, No. 4,1986a, pp.513-521

[2] C.B. Chen, C.M. Clein, A simple approach to ranking a group of aggregated fuzzy utilities, IEEE Trans.Systems Man Cybernet. Part B: Cybernet, Vol 27,1997, pp26-35.

[3] M.A. Conceicao-Silva-Portela, E. Thanassoulis, Decomposing school and school-type efficiency,European Journal of Operational Research, Vol. 132, 2001, pp357-373.

[4] P. Guo, H. Tanaka,FuzzyDEA:aperceptualevaluationmethod,FuzzySetsandSystems, Vol.113, 2001, pp.149-160

[5] C. Kao, T. Liu,Fuzzy efficiency measures in Data Envelopment Analysis.Fuzzy Sets and Systems, Vol.113, 2000, pp. 427-437

[6] T. Leon, V. Liern, J.L. Ruiz,I. Sirvent, A fuzzy mathematical programming approach to the assessment of efficiency with DEA models,Fuzzy Sets and Systems,Vol. 139,2003, pp. 407-419.

[7] S. Lertworasirikul, Fuzzy Data Envelopment Analysis for Supply Chain Modeling and Analysis,Dissertation Proposal in Industrial Engineerin,North CarolinaStateUniversity, 2001.

[8] T. Kirjavainen, H. Loikkanen, Efficiency differencies of Finnish Senior secondary schools: An Application of DEA and Tobit analysis, Economics of the Education Review, Vol. 17, 1998, pp. 377-394.

[9] J.K. Sengupta,AfuzzysystemapproachinDataEnvelopmentAnalysis,Comput. Math.Appl., Vol.24, 1992, pp. 259-266

[10] H. Tanaka, H. Ichihasi, K. Asai, A formulation of fuzzy linear programming problem based on comparison of fuzzy numbers, Control and Cybernetics Vol. 13, 1984, pp. 185-194.

[11] L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy sets and Systems , Vol. 1, 1978, pp.3-28.

1