๐Ÿ‡ฌ๐Ÿ‡งTreatment of Missing Data in Bayesian Structural Learning: A Simulation Study for Social Science

Presented at the 9th Annual BayesiaLab Conference on October 12, 2021.


Bayesian networks allow us to uniquely visualise data and tackle complex interdisciplinary problems. Bayesian networks are based on Bayes' theorem. The premise of this theory is that initial (prior) beliefs can be updated based on new evidence. Part of the appeal of this method is its intuitive nature. The process of updating beliefs, given new information, is common to everyday scenarios. Bayesian networks can be used for variable inference (identifying the value of variables), parameter inference (identifying probabilistic dependencies between variables), and structure learning (understanding associations among variables). Social science is an area with large amounts of complex interdisciplinary data where Bayesian networks may be useful to unravel relationships among variables. However, โ€‹the uptake of Bayesian networks in social science is relatively low. Here, we look at how Bayesian networks have been applied to antibiotic resistance and antimicrobial use and explore potential barriers to their use in this field of study. The complex nature of this biosocial phenomenon means that applications are increasingly making use of social science data, e.g., survey data. This type of data is often associated with high levels of missing data. Here, we further consider how this missing data can be addressed for Bayesian network structure learning. We compare a commonly used method in social science, multiple imputation by chained equations (MICE), with one specific for Bayesian network learning, structural expectation-maximization (SEM). We simulate multiple incomplete data sets with different missingness mechanisms, numbers of categorical variables, and amounts of missing data. We evaluate and compare the performance of MICE and SEM in capturing the real Bayesian network structure under each condition. We find that applying either method (MICE or SEM) provides better structure recovery than doing nothing, and SEM, in general, outperforms MICE. This finding is robust across missingness mechanisms, the number of variables, and the amount of missing data. This suggests that taking advantage of the additional information provided by network structure during SEM can improve the performance of Bayesian networks for social science and other interdisciplinary analyses.

Presentation Video

About the Presenters

Madeleine Clarkson Irvine Building University of St Andrews St Andrews, KY16 9AL, Fife, UK mcc23@st-andrews.ac.uk

Ms. Madeleine Clarkson has an undergraduate degree in Economics from the University of Cape Town, South Africa, and an MSc in the Control of Infectious Disease from the London School of Hygiene and Tropical Medicine(LSHTM), United Kingdom. She has worked as a research assistant in infectious disease modeling at Imperial and LSHTM. She is currently undertaking a Ph.D. in Bayesian Network analysis of Antimicrobial resistance at the University of St Andrews based within Dr. V Anne Smith's Lab.

Xuejia Ke Harold Mitchell Building University of St Andrews St Andrews, KY16 9TH, Fife, UK xk5@st-andrews.ac.uk

Ms. Xuejia Ke has an undergraduate degree in Pharmacy from China Pharmaceutical University, China, and an undergraduate degree in Pharmacology and Biochemistry from the University of Strathclyde, United Kingdom. She has an MSc in Bioinformatics from the University of Edinburgh, United Kingdom. She has worked on statistical models and software for RNA-seq quantification from subcellular fractions in her MSc project. She is currently undertaking a Ph.D. in Bayesian Network analysis of social science data at the University of St Andrews within Dr. V Anne Smith's lab.

Presentation Slides

Last updated


Bayesia USA


Bayesia S.A.S.


Bayesia Singapore


Copyright ยฉ 2024 Bayesia S.A.S., Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd. All Rights Reserved.