๐Ÿ‡บ๐Ÿ‡ธDriver Analysis Using Topics Derived From Unstructured Textual Data

Yong Zhang, Ph.D., Procter & Gamble

Presented at the 10th Annual BayesiaLab Conference on Wednesday, October 26, 2022.

Abstract

We investigated how to conduct driver analysis based on topics derived from unstructured textual data. These data include online consumer reviews, ratings, complaints, comments, and verbatims in surveys. The major challenge is the high missing rate of topics in each individual textual document. As an example, each consumer review may just mention a few topics. This leads to an overall higher missing rate in the data. Without knowing the explicit missing mechanism, BayesiaLab recommended using Approximate Dynamic Imputation (ADI) to impute the missing values. We performed simulations to study different methods of processing missing data and performing driver and impact analysis. With complete and missing simulated data (a mixed missing mechanism), Filtered State and ADI tend to learn the same or very similar model structures, drivers, and impacts. At a low missing rate (~10%), structures, drivers, and impacts are the same as those from the simulated Ground Truth BBN model; at a medium missing rate (40-60%), they also tend to be the same or very similar as GT BBN model through equivalent model structures; at a high missing rate (80%), they tend to recover most of the correct structure, drivers and impacts.

About the Presenter

Dr. Yong Zhang leverages Bayesian data and modeling science to develop strategies for product design, manufacturing, storage, and transportation across P&G to improve consumersโ€™ life quality and positively influence the environment and society. He develops first principle and data science/machine learning methods and tools through Front-End Innovation projects to enable and promote the capability across P&G for breakthrough consumer understanding and product innovation. The methods and tools can be used to extract and integrate information from a variety of data sources to find a โ€œBody of Evidenceโ€ for consumer and product research based on Nonparametric Bayesian statistics and deep learning algorithms.

Presentation Video

Presentation Slides

About the Presenter

Dr. Yong Zhang leverages Bayesian data and modeling science to develop a strategy for product design, manufacturing, storage, and transportation across P&G to improve consumersโ€™ quality of life and drive positive influence on the environment and society under different climate change scenarios. He develops modeling and simulation methods and tools through Front End Innovation projects to enable and promote the capability across P&G for breakthrough consumer understanding and product innovation. The methods and tools can be used to extract and integrate information from a variety of data sources to find a โ€œBody of Evidenceโ€ for consumer and product research based on Nonparametric Bayesian statistics and deep learning algorithms.

x

Last updated

Logo

Bayesia USA

info@bayesia.us

Bayesia S.A.S.

info@bayesia.com

Bayesia Singapore

info@bayesia.com.sg

Copyright ยฉ 2024 Bayesia S.A.S., Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd. All Rights Reserved.