Uncovering the Path to Purchase Using Topic Models



In gathering information for an intended purchase decision, consumers submit search phrases to online search engines. These search phrases directly express the consumers’ needs in their own words and thus provide valuable information to marketing managers. Interpreting consumers’ search phrases renders a better understanding of consumers’ purchase intentions, which is critical for marketing success. In this paper, we develop a model to connect the latent topics embedded in consumers’ search phrases to their website visits and purchase decisions. Our model captures the dynamics and heterogeneity in the latent topics searched by consumers along the path to purchase. Additionally, we apply topic models, which have been traditionally used to analyze long text documents, to short search phrases. Using a unique dataset provided by a hospitality firm and containing more than 8,000 search phrases submitted by the consumers, our model identifies five latent topics: “loyalty”, “convenience”, “luxury”, “economy”, and “location”, underlying the searches that led consumers to the firm’s website. Compared to a model with existing semantic heuristics such as the Latent Dirichlet Allocation or a model without any usage of the textual information in consumers’ search phrases, our model provides a better evaluation of a consumer’s position on the path to purchase and achieves much better predictive accuracy based on five-fold cross validations. We also extend our discussion on the aggregator websites and segments of consumers who respond to the firm’s ads. Marketing managers can use our method to extract structured information from consumers’ search phrases and better design offerings and promotions to target the right consumers.