Processing of Erroneous and unsafe data Defended on Thursday, 19 June 2003

Statistical offices have to overcome many problems before they can publish reliable data. Two of these problems are examined in this thesis. The first problem is the occurrence of errors in the collected data. Due to these errors publication figures cannot be directly based on the collected data. Before publication the errors in the data have to be localised and corrected. In this thesis we focus on the localisation of errors in a mix of categorical and numerical data. The problem is formulated as a mathematical optimisation problem. Several new algorithms for solving this problem are proposed, and computational results of the most promising algorithms are compared to each other. The second problem that is examined in this thesis is the occurrence of unsafe data, i.e. data that would reveal too much sensitive information about individual respondents. Before publication of data, such unsafe data need to be protected. In the thesis we examine various aspects of the protection of unsafe data.

Keywords

Branch-and-bound algorithm, cell suppression, cutting plane algorithm, error localisation, imputation, mixed integer programming, local suppression, optimisation, statistics, statistical data editing, statistical disclosure control, vertex generation


  • Share on