Recruitment paused Data science for the public sector



This project focuses on developing and applying novel, non-parametric, methods to explore large data sets. In particular, we aim to develop methods that can be used both to summarize relationships and, if appropriate, predict relevant events. The focus will be on new, algorithmic approaches that require no distributional assumptions and their application to applications in the public sector.

Some specific problems often encountered in “big data” related prediction and exploration settings that are to be addressed in this study concern:

  • Sample selection and balance problems (e.g., many prediction problems concern relatively “rare” events. In such cases, often non-random samples are taken and used as input for prediction models);
  • Automated variable selection procedures (e.g., which variables, or combinations thereof, are relevant for predicting and/or understanding, certain events);
  • Time dependence and dynamics of the problems (i.e., with quickly changing environments, models need to be updated frequently or even continuously. However, many advanced prediction methods are computationally intensive making it impossible to provide fast, and accurate, results);
  • Interpretation of results. Machine learning methods place a heavy emphasis on prediction performance. Consequently, interpretation of the results is often not trivial. Applications in the public sector, however, can have far reaching societal impact. Therefore, understanding of the roles of leading indicators and/or underlying interactions, is of great importance.


Big data, machine learning, unsupervised learning, supervised learning, dimension reduction, cluster analysis and least-squares methods.


Development of methodology to summarize relationships and, if appropriate, predict relevant events in the public sector.

Supervisory Team

Patrick Groenen
Professor of Statistics
  • Promotor
Michel van de Velden
Associate Professor of Statistics
  • Daily Supervisor