Learning About Your Customers on a "Data Diet": Customer-Base Analysis Using Repeated Cross-Sectional Summary (RCSS) Data



Firms are increasingly facing the “data smog” problem---while they are collecting huge amounts of customer-activity data due to rapid advances in information technology, they are unable to meaningfully use most of this data. We ask a critical question that many firms are facing today: Can customer data be stored and analyzed in an easy-to-manage and scalable format without significantly compromising the inferences that can be made about the customers' transaction activity? We address this question in the context of customer-base analysis. A number of researchers have developed customer-base analysis models that perform very well given individual-customer-level data. We explore the possibility of estimating these models using repeated cross-sectional summaries (RCSS) of the transaction data (e.g., four quarterly histograms). Such summaries are easy to create, visualize and distribute, irrespective of the size of the customer base. An added advantage of RCSS data is that individual customers cannot be identified, which makes it desirable from a privacy viewpoint as well. We focus on the widely-used Pareto/NBD model and carry out a comprehensive simulation study covering a vast spectrum of market scenarios. Our results consistently and convincingly establish that the model fit (and parameter values) associated with the use of three or four cross-sections of RCSS data can closely match the model fit (and parameter values) associated with the use of individual-level data. We also confirm the results of the simulations on a real dataset of online CD purchases.
Contact information:
Dr. G. Liberali
This research seminar is organised by the Erasmus Centre for Marketing of Innovation (ECMI).