Forecasting, Censoring and Scalability for Social Networking Data




Marketers are increasingly interested in understanding patterns of how  customers interact with each other, and how to predict future interactions. Social networking data is often collected during a finite observation period, so some observed connections may not persist in the future, while unobserved connections may appear after observation has ended. Thus, conditioning on observed data, without taking this uncertainty into account, can lead to misleading inferences and poor predictions. Also, for a dataset of N individuals, there are N(N-1)/2 interdependent dyads to consider, so standard models and methods are intractable for all but the smallest datasets. We present a nonparametric Bayesian framework for modeling censored network data that manages this scalability problem, while accounting for interdependent variation in unobserved customer traits. By exploiting the discreteness of the Dirichlet process, we dramatically reduce the number of likelihood computations at each iteration of an MCMC algorithm. We demonstrate the need for, and effectiveness of, our model using a dataset of call records from a major Chinese cell phone service provider.

Contact information:
Dr. S. Puntoni