A case on Decision Tree

‘A rule-based technique to predict corporate bankruptcy’

Data description and input selection

For the experiment, we used the yearly financial data collected by the Korea Credit Guarantee Fund. The corporations used in the analysis belong to the manufacturing industry with the asset size of $1 million to $7 million. The number of bankrupted corporations is, as usual, much smaller than the other party, which inevitably causes no learning at all in data mining techniques. To deal with this unbalanced distribution, we withdrew the samples of equal size from both parties. The data consist of 944 bankrupted corporations and 944 healthy (non-bankrupted) corporations from the fiscal year 1999 to 2002. Outliers are detected and eliminated. The variables are standardized so as to having a mean of zero and a standard deviation of one, which is helpful reducing the measurement errors (Peel et al., 1986). Out of 83 variables in total, 54 variables are selected by a t-test as a preliminary screening, and then 11 variables are finally selected by a stepwise logistic regression. Table 1 summarizes the variables used in the analysis and their definitions.

Table 1. List of financial variables selected

Variable / Definition
X13: interest expenses to sales / (interest expenses / sales) ´ 100
X17:profit to sales / (profit / sales) ´ 100
X24:operating profit to sales / (operating profit / sales) ´100
X27:ordinary profit to total capital / (ordinary profit / total capital) ´ 100
X28:current liabilities to total capital / (current liabilities / total capital) ´ 100
X103:growth rate of tangible assets / (tangible assets at the end of the year / tangible assets at the beginning of the ´ 100) - 100
X108: turnover of managerial assets / sales / {total assets - (construction in progress + investment assets)}
net financing cost / interest expenses - interest incomes
X127: net working capital to total capital / {(current assets - current liabilities) / total capital} ´ 100
X129:growth rate of current assets / (current assets at the end of the year / current assets at the beginning of the year ´ 100) - 100
X140:ordinary income to net worth / (ordinary income / net worth) ´ 100

Figure 1. Decision tree from See5 run