The BayesiaLab Digest - January 15, 2015
Here is this week's selection of interesting new journal articles on applied research with Bayesian networks.
Benndorf, M., Kotter, E., Langer, M., Herda, C., Wu, Y., Burnside, E.S., 2015.
Development of an online, publicly accessible naive Bayesian decision support tool for mammographic mass lesions based on the American College of Radiology (ACR) BI-RADS lexicon.
Eur Radiol 1–8.
To develop and validate a decision support tool for mammographic mass lesions based on a standardized descriptor terminology (BI-RADS lexicon) to reduce variability of practice.
Materials and methods
We used separate training data (1,276 lesions, 138 malignant) and validation data (1,177 lesions, 175 malignant). We created naïve Bayes (NB) classifiers from the training data with tenfold cross-validation. Our “inclusive model” comprised BI-RADS categories, BI-RADS descriptors, and age as predictive variables; our “descriptor model” comprised BI-RADS descriptors and age. The resulting NB classifiers were applied to the validation data. We evaluated and compared classifier performance with ROC-analysis.
In the training data, the inclusive model yields an AUC of 0.959; the descriptor model yields an AUC of 0.910 (P < 0.001). The inclusive model is superior to the clinical performance (BI-RADS categories alone, P < 0.001); the descriptor model performs similarly. When applied to the validation data, the inclusive model yields an AUC of 0.935; the descriptor model yields an AUC of 0.876 (P < 0.001). Again, the inclusive model is superior to the clinical performance (P < 0.001); the descriptor model performs similarly.
We consider our classifier a step towards a more uniform interpretation of combinations of BI-RADS descriptors. We provide our classifier at www.ebm-radiology.com/nbmm/index.html.
McVittie, A., Norton, L., Martin-Ortega, J., Siameti, I., Glenk, K., Aalders, I., 2015.
Operationalizing an ecosystem services-based approach using Bayesian Belief Networks: An application to riparian buffer strips.
Ecological Economics 110, 15–27.
The interface between terrestrial and aquatic ecosystems contributes to the provision of key ecosystem services including improved water quality and reduced flood risk. We develop an ecological–economic model using a Bayesian Belief Network (BBN) to assess and value the delivery of ecosystem services from riparian buffer strips. By capturing the interactions underlying ecosystem processes and the delivery of services we aim to further the operationalization of ecosystem services approaches. The model is developed through outlining the underlying ecological processes which deliver ecosystem services. Alternative management options and regional locations are used for sensitivity analysis. We identify optimal management options but reveal relatively small differences between impacts of different management options. We discuss key issues raised as a result of the probabilistic nature of the BBN model. Uncertainty over outcomes has implications for the approach to valuation particularly where preferences might exhibit non-linearities or thresholds. The interaction between probabilistic outcomes and the statistical nature of valuation estimates suggests the need for further exploration of sensitivity in such models. Although the BBN is a promising participatory decision support tool, there remains a need to understand the trade-off between realism, precision and the benefits of developing joint understanding of the decision context.
Rancoita, P.M.V., Zaffalon, M., Zucca, E., Bertoni, F., de Campos, C.P., n.d.
Bayesian network data imputation with application to survival tree analysis.
Computational Statistics & Data Analysis.
Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).