Elicit-and-Augment: A Hybrid Bayesian Framework for Risk Modeling in Data-Scarce Construction Projects

Abstract
Construction projects are inherently uncertain, complex, and data-sparse, rendering conventional deterministic risk management (RM) frameworks insufficient for proactive and scalable decisionmaking. In response, this research introduces an integrated hybrid Bayesian framework that bridges expert subjective and experience-based reasoning and sparse empirical project data to provide probabilistic, explainable, and generalizable risk predictions across large-scale infrastructure and construction projects.
Developed through collaboration with Politecnico di Milano, Jacobs Solutions, and the Garrick Institute of Risk Sciences (UCLA), the framework is applied to a real-world case study comprising 44 construction projects in Italy. The core model is built upon an elicitation-based Bayesian Network (BN), where expert judgment is elicited systematically to define the structure and conditional dependencies across key risk variables such as technical complexity, financial health, permitting delays, schedule volatility, and contractor reliability.
To overcome the limitations of the small project dataset and improve posterior estimation, the model integrates Generative Adversarial Networks (GANs) trained on the available tabular project database. This “elicit-and-augment” strategy combines probabilistic expert reasoning with synthetic data to mitigate overfitting and improve generalization. While GANs have predominantly been used in image processing, this study pioneers their application in structured tabular data augmentation for risk modeling, generating statistically coherent synthetic project cases that mirror the complex multivariate dependencies observed in real projects.
A comparative evaluation was conducted to benchmark the performance of this Bayesian approach against alternative modeling paradigms:
- Deterministic machine learning models, including Artificial Neural Networks (ANN), Decision Trees, and XGBoost, were trained on the available objective project data;
- A Fuzzy Logic model was developed based solely on expert-derived qualitative inputs.
Each model’s ability to estimate the project risk levels (low, medium, high) was recorded and compared across multiple scenarios. Results show that while deterministic models are effective when trained on rich datasets, they performed poorly under conditions of data sparsity and failed to capture causal dependencies. Fuzzy logic models, on the other hand, captured expert insights but lacked adaptability and inference depth. The Bayesian Network model outperformed all others, offering interpretable, scalable, and uncertainty-aware insights, with an 18% increase in classification accuracy after synthetic augmentation.
Furthermore, the framework integrates:
- Information-theoretic elicitation optimization, using expected value of information (EVI) to guide expert questioning; • Scenario-based inference and counterfactual simulations, to evaluate the impact of targeted mitigation strategies (e.g., early permitting);
- Explainability tools, including influence-path analysis and node sensitivity diagnostics, supporting stakeholder transparency.
This work showcases how Bayesian inference, augmented with GAN-based data synthesis, can enable intelligent, data-efficient, and trustworthy risk management in capital project environments. It represents a replicable methodology for operationalizing probabilistic graphical models in domains where data fragmentation and epistemic uncertainty are structural barriers.
About the Presenter
Dr. Ania Khodabakhshian is a postdoctoral researcher and senior construction project manager with a multidisciplinary background spanning construction engineering, Artificial Intelligence, project management, and probabilistic risk modeling. She holds a Ph.D. in Architecture, Built Environment and Construction Engineering from Politecnico di Milano and has conducted visiting research at the UCLA Garrick Institute of Risk Sciences and the MIT Schwarzman College of Computing.
Her research focuses on integrating Machine Learning and probabilistic graphical models into risk management for complex construction, infrastructure, and energy systems. Her contributions include advances in Bayesian Networks, LLM-driven semantic analysis, generative AI for data augmentation, techno-economic analysis for clean energy solutions, energy retrofit policymaking, and the ethical governance of AI applications in industry.
Dr. Khodabakhshian has published extensively in high-impact, peer-reviewed journals such as Journal of Building Engineering, Buildings, Journal of Architectural Engineering, and ITcon, and has co-authored several academic books with major publishers, including Springer and Wiley. She has presented at numerous international conferences in the fields of construction and digital technologies, served as the scientific committee member and reviewer for renowned conferences and journals in construction domain, and her work has been cited over 150 times.
She is the recipient of multiple competitive and prestigious awards and research grants from prestigious organizations including CIB World, California Energy Commission, and the ENHANCE Alliance. Currently, she serves as a Senior Project Manager for data center developments at Lombardini22 and has collaborated with industry leaders such as Jacobs Solutions, CyrusOne, Terna, Develog, DEA Capital, Generali, and AQ Compute on the application of AI for resilient infrastructure planning and delivery.