Ensemble Learning With Decision Trees



Recursive partitioning methods such as decision trees are non-parametric statistical techniques which can provide extremely flexible and interpretable predictive models for complex data sets. These methods work by recursively partitioning the feature space into sets of observations with similar response values, and the resulting classification or regression rules are naturally adapted to decision-making processes. As such these methods are much more flexible than traditional linear or logistic regression models, automatically perform variable selection and are applicable to big data sets. We will discuss the construction of such models with practical examples and consider the strengths and weaknesses of the method. Particularly, these methods can suffer from high variance due to its close dependence on the data. Therefore popular ensemble approaches have emerged primarily in the machine learning literature which combine sets of decision trees to overcome this problem. We will shortly discuss three of these approaches, namely bagging, boosting and random forests.

The ERIM Phd Seminar Series are dedicated to enhance the methodological dialogue between PhDs and are organised by the ERIM PhD Council.