1 of 18

2018 BayesiaLab Conference in Chicago

Hilton Chicago/Magnificent Mile, November 1–2, 2018

The 6th Annual BayesiaLab Conference

Conference Photo Gallery

Conference Presentations

Introducing BayesiaLab 8

Dr. Lionel Jouffe, CEO, Bayesia S.A.S.

Presentation Video

About the Presenter

Dr. Lionel Jouffe is co-founder and CEO of France-based Bayesia S.A.S. Lionel holds a Ph.D. in Computer Science from the University of Rennes and has worked in Artificial Intelligence since the early 1990s. While working as a Professor/Researcher at ESIEA, Lionel started exploring the potential of Bayesian networks.
After co-founding Bayesia in 2001, he and his team have been working full-time on the development of BayesiaLab. Since then, BayesiaLab has emerged as the leading software package for knowledge discovery, data mining, and knowledge modeling using Bayesian networks. It enjoys broad acceptance in academic communities, business, and industry.

Understanding Product and Consumer Segments

Yong Zhang, Ph.D., Procter & Gamble

Abstract

Driver models built on individual products can be biased or even misleading. We model and analyze a single product or a consumer segment by “borrowing” information from other product legs or similar studies. Specifically, we leverage Impact Analysis, Landscape Analysis, and Profile Analysis to sharpen our understanding of a single product leg or consumer segment with a relatively small base size. These new analysis methods have been added as new features into BayesiaLab 8.0.

Presentation Video

Presentation Slides

About the Presenter

Dr. Yong Zhang leverages Bayesian data and modeling science to develop a product design, manufacturing, storage, and transportation strategy across P&G to improve consumers’ life quality and drive positive influence on the environment and society under different climate change scenarios. He develops modeling and simulation methods and tools through Front End Innovation projects to enable and promote the capability across P&G for breakthrough consumer understanding and product innovation. The methods and tools can be used to extract and integrate information from a variety of data sources to find a “Body of Evidence” for consumer and product research based on Nonparametric Bayesian statistics and deep learning algorithms.

Crowdsourcing Bayesian Networks with Prediction Markets

Charles Twardy, Ph.D., KeyW Corporation & George Mason University

Abstract

In this talk, we look at how the SciCast system uses prediction markets to efficiently crowdsource both the structure and parameters of a Bayesian network. Prediction markets are among the most accurate methods to combine forecasts; forecasters form a consensus probability distribution by trading contingent securities. A combinatorial prediction market forms a consensus joint distribution over many related events by allowing conditional trades or trades on Boolean combinations of events. In general, that is infeasible, but using a Bayesian network underneath makes it tractable for large problems. As a bonus, the market solves some difficulties with Bayes net elicitation, including knowledge bottlenecks, divided or overlapping expertise, and some impossibility theorems that affect other aggregation methods. We have a formal paper in press in J. Artificial Intelligence Research.

Presentation Video

Presentation Slides

About the Presenter

Dr. Charles R. Twardy is a Senior Data Scientist with KeyW Corporation and an affiliate professor at George Mason University. From 2011 to 2015, he led the George Mason DAGGRE and SciCast teams for the IARPA ACE and ForeST forecasting challenges. The final result was a novel Bayesian prediction market that allowed users to make forecasts conditional on the outcomes of other forecasts at scale on the fly. Forecast accuracy improved by about 35% over baseline, with much greater expressivity.

From 2015 to 2016, as a Senior Data Scientist for NTVI Federal, he helped the Defense Suicide Prevention Office establish procedures for handling and analyzing sensitive personal data and provided sound approaches to hot-spot detection. In late 2016, he joined Sotera (now KeyW), where he worked on two DARPA big data programs and on IARPA CREATE, a successor to IARPA ACE.

Dr. Twardy focuses on evidence and inference, with a special interest in causal models, Bayesian networks, Bayesian search theory (especially wilderness search), and information theory. He has also worked on argument mapping, trajectory clustering, sensor-selection, counter-IED models, source credibility, image recognition, environmental decision-making, and epidemiology.

He received a dual Ph.D. in Cognitive Science and History & Philosophy of Science from Indiana University and a B.A. from the Interdisciplinary Majors program at the University of Virginia.

Using Bayesian Networks to Characterize Wildlife Habitat Use

Steven F. Wilson, Ph.D., Standpoint Decision Support, Inc.

Abstract

Using trail cameras to capture images of wildlife is becoming an increasingly important management tool. Improving technology is reducing cost, improving reliability, and allowing the deployment of cameras in large numbers in remote locations. Images can provide important information regarding the distribution and abundance of different species, but resulting data rarely meet parametric assumptions, and analyses can be challenging. I used Bayesian networks to characterize habitat use by caribou, wolves, moose, and black bears in a study area in northern British Columbia, Canada, using data collected at 85 remote cameras. I first used unsupervised learning to generate a network that described the habitat around each camera based on a large number of field-measured variables and then used clustering to identify latent factors that represented higher-level habitat features. I modeled habitat use by the four species based on these latent factors as well as snow conditions. Results provide insight into the partitioning of available habitat by the different wildlife species and how habitats can be managed to meet conservation goals.

Presentation Video

About the Presenter

Steven F. Wilson, Ph.D., EcoLogic Research, 302-99 Chapel Street, Nanaimo, BC V9R 5H3, Canada, steven.wilson@ecologicresearch.ca

Steve Wilson has 30 years of experience working at technical and professional levels in strategic and operational planning for wildlife and other ecological values. He specializes in quantitative approaches to decision support and policy analysis. Steve holds a Ph.D. in wildlife ecology from the University of British Columbia in Vancouver.

Bayesian Network Modeling of Imagery Features

Nicholas V. Scott, Ph.D., Riverside Research: Bayesian Network Modeling of Imagery Features From Direct Numerically Simulated Turbulent Sediment-Laden Oscillatory Flow

Abstract

Direct numerically simulated data can serve as a proxy for understanding many issues concerning multidimensional remotely sensed data. As a step towards performing operational Bayesian belief network modeling for rivers, which is of practical utility to naval intelligence, direct numerically simulated sediment-laden oscillatory flow is used to estimate statistical surface layer spatial eddy scales. This is done using spatial realizations of the sediment concentration, vertical velocity, and pressure fields, along with feature extraction algorithms that utilize self-organizing mapping, independent component analysis, and two-dimensional omnidirectional Morlet wavelet analysis. Stress versus scale statistical distributions exhibit distinct phase modulation over the three ambient forcing phases of maximum negative, zero, and maximum positive velocity. The stress versus sediment concentration scale distribution, which is of great pertinence to riverine remote sensing, exhibits a significant amount of large eddy scales, suggesting coherent large-scale sediment structure formation, possibly due to particle interstitial forces. Estimated statistical results, in turn, serve as feature parameters for naïve Bayesian belief network modeling of bottom boundary layer stress and surface eddy scale observations. From a diagnostic reasoning viewpoint, initial results suggest that robustly inferring sub-surface boundary layer stress from surface sediment concentration eddy scales uniquely may be a difficult task. From a prognostic reasoning viewpoint, preliminary model results suggest that large sediment concentration eddy scales may result from the application of large positive Reynolds stress. The model formalism used allows for the ability to statistically characterize flow structure at depth from observations taken across a surface boundary layer. This makes the results relevant to image analysis at the air-sea interfacial boundary layer in large-scale coastal and riverine systems.

Presentation Video

Authors

Nicholas V. Scott, Ph.D. Riverside Research Institute, Dayton Research Center

Tian-Jian Hsu, Ph.D. University of Delaware, Center for Applied Coastal Research

About the Presenter

Dr. Nicholas Scott has been a member of the professional staff at Riverside Research in Dayton, OH, since October 2012, working predominantly in the area of hyperspectral and multispectral image analysis. He investigates the applicability of traditional and non-traditional signal and image processing techniques to the extraction of information from remotely sensed imagery. His present work includes cognitive modeling of geo-intelligence information and the application of pattern recognition techniques to turbulent flow imagery. He is also involved in the application of probabilistic graphical modeling algorithms for information fusion and statistical inference.

Methicillin-Resistant Staphylococcus Aureus in Children Living with Industrial Hog Operation Workers

Jacqueline MacDonald Gibson, Ph.D., University of North Carolina, Chapel Hill

Abstract

Since the first documented transmission of antibiotic-resistant Staphylococcus aureus from hogs to a Dutch child in 2004, evidence of such hog-to-human transfer has mounted. However, the factors contributing to transmission risk remain poorly understood. Using empirical data collected from 198 children living with workers employed by North Carolina industrial hog operations, we developed the first Bayesian network models quantifying transmission of methicillin-resistant S. aureus (MRSA) from industrial hog operation workers to children living in the same household. Multiple learning algorithms were tested for variable selection, and the augmented naïve Bayes algorithm was then used to learn a network from the resulting variables. Network performance in predicting children’s carriage of MRSA was evaluated through 10 runs of five-fold cross-validation. The network with the highest area under the receiver-operating characteristic (AUROC) curve was selected as the final model, with other networks used for sensitivity analysis. Overall, 14% of the children living with hog farm workers carried MRSA. The best-performing network maintained an AUROC above 0.90 during cross-validation. The network revealed that the variables with the most influence on the risk of MRSA transmission to children living with hog farm workers include workers having direct contact with hog manure, workers bringing home face masks and work suits, and the type of health insurance available to the child. The results can be used to design intervention programs to prevent the spread of MRSA from hog farms to human populations.

Presentation Video

About the Presenter

Jackie MacDonald Gibson has a multi-disciplinary background in mathematics and engineering that she applies to risk assessment and policy problems. Before joining the University of North Carolina faculty, she was Associate Director of the Water Science and Technology Board, U.S. National Research Council. She was also a Senior Engineer at the RAND Corp. She holds Ph.D. degrees in Engineering and Public Policy and Civil and Environmental Engineering from Carnegie Mellon University; an M.S. in Civil and Environmental Engineering from the University of Illinois at Urbana–Champaign; and a B.A. in mathematics from Bryn Mawr College.

An Application of Bayesian Networks to Yield Prediction in Portuguese Viticulture

Louis Sharrock, MSc, Imperial College London

Abstract

The development of robust models for wine grape yield prediction is of clear and growing relevance to viticulture and the winemaking sector. A precise, timely, and dynamic forecast of annual yield enables vineyards to optimise planning and management of all seasonal activities, including vineyard intervention dates, irrigation, nutrient management, and post-harvest procedures. Furthermore, yield is principally determined by atmospheric factors, in which context the provision of relevant and accessible predictions can improve the capacity of agriculture to manage the risks and exploit the opportunities that will result from future climate change. Conventional crop forecasting systems have been either statistical (regression-based) or dynamical (computer simulation-based) in their formulation. The former are typically limited by an inability to capture the complex interactions between variables relevant to yield, and the latter by their computational complexity and the need for lengthy calibration.

In this thesis, we propose a novel Bayesian network approach to yield prediction in the Douro Demarcated Region of Portugal, which integrates our prior knowledge of the yield formation process with several data-driven statistical learning procedures. While the need for further model refinement is evident, initial results indicate that our model has relatively high predictive skill and is largely successful in replicating well-established physiological relationships between particular climatic variables and wine grape yield. Sensitivity analyses identify a set of variables known to be particularly significant for yield in this region whilst also providing tentative evidence in support of previously hypothesised relationships between global warming and the advancement of grapevine phenology.

Presentation Video

About the Presenter

Louis Sharrock is a PhD student at Imperial College London, aligned with the Centre for Doctoral Training in the Mathematics of Planet Earth. His current work focuses on large-scale inference in state space models with applications to environmental monitoring. He previously completed a BA in Mathematics at the University of Cambridge and an MSc in Statistics at Imperial College, with a thesis based on a novel application of Bayesian Networks in viticulture.

Synthesis of Causal Discovery and Machine Learning

Robert Stoddard, Software Engineering Institute, Carnegie Mellon University

Abstract

Fundamental research at the Software Engineering Institute at Carnegie Mellon University has raised questions surrounding the synthesis of both causal discovery and machine learning. Specifically, our research team has employed both the CMU open-source tool called Tetrad (for causal graph discovery from data) and BayesiaLab for supervised/unsupervised machine learning. This talk will briefly orient the audience to the Tetrad causal discovery process, share some contrasting results, and pose a list of open research questions regarding the potential synergy of the two technologies.

Presentation Video

Presentation Slides

Bayesian-Neural Networks Ensemble Modeling: An Initial Experiment

Robert “Mick” McWilliams, Ph.D., Phoenix VII Predictive Analytics

Abstract

Neural networks can achieve strong predictive accuracy in situations in which predictor variables exhibit nonlinear relationships with the targeted outcome of the “dependent” variable. However, a significant drawback of neural networks is their "black box" nature. It can be difficult to impossible to understand the “why” behind a neural network’s predictions. The goal of "explainable machine learning" is to address this black box challenge. This presentation reports results from an initial experiment that uses the often-explored Sonar data set to test whether a synergetic combination of BayesiaLab and neural network modeling might be able to address the explainable machine learning challenge while possibly also increasing the accuracy of predictive models.

Presentation Video

About the Presenter

Robert “Mick” McWilliams is President & Principal Consultant at Phoenix VII Predictive Analytics. Phoenix VII provides advanced machine learning and predictive analytics services to an array of marketing research industry clients and specializes in Bayesian network modeling using the BayesiaLab platform. Mick earned his Ph.D. in Research Sociology from Virginia Polytechnic Institute & State University. Prior to founding Phoenix VII, he held senior VP-level data scientist roles for three prominent marketing research firms.

Optimizing Mix of Local Media & Promotional Budgets

Neeraj Kulkarni, CIEK Solutions

Abstract

Our client, a US top 10 optical retailer, was spending significant dollars on promotion and media budgets in local markets in pursuit of aggressive growth plans in the U.S. As plans were being made to expand distribution to new markets, senior executives required greater decision support on how much to budget for promotion versus media across local markets.

Some of the key questions they were trying to solve:

Should less money be spent on promotion and more on brand media advertising to generate growth in local markets?
Which promotions and media have historically been high contributors to sales across all markets?
How can media and promotional dollars be optimized locally across the top 10 key markets?
What is the ROI on media and promotional spending, and how can it be maximized going forward to meet sales objectives?

The presentation highlights the use of Bayesialab as the primary software for understanding causal relationships among media and promotions at the local market level. We also used additional open-source programming software like R/python for data aggregation and running scenarios to optimize promotional and media budgets.

Presentation Video

About the Presenter

In 2015, Neeraj founded CIEK, where he is currently President. He leads all facets of the company that help clients optimize marketing investments by integrating experiential knowledge with historical data to create actionable, predictive strategies and predict business outcomes with certainty. In 2017, CIEK was also recognized as one of the rising innovative companies in the VA/DC area. Prior to starting CIEK, Neeraj led analytics and data strategy for powerhouse advertising agencies like The Martin Agency and Havas. He has also led major strategic domestic and international engagements with numerous Fortune 500 companies, including retail, CPG, transportation services, credit and financial services, and travel and hospitality. He is also on the board of two stealth start-ups in media and marketing technology.

Modern Approaches to Causal Modelling in Customer Experience Measurement

Brooks Gard, Course5 Intelligence

Abstract

Traditionally, statistical techniques such as correlation analysis and linear regression were used to perform key driver analysis on survey data, where a set of attributes and outcome variables such as overall satisfaction or likelihood to purchase are rated using a scale-based question. However, survey data, particularly scale-type questions, sometimes introduce peculiar data conditions that are not accommodated by traditional approaches.

Presently, due to the availability of high computing power and advanced software, it is possible for us to use more robust statistical techniques, such as Bayesian methods, which allow for the development of robust driver models. In our presentation, we will showcase how we applied Bayesian technologies to help a major utility company solve complex causal modeling of customer experience management, providing required diagnostics on the model to obtain actionable insights to optimize the customer experience.

Presentation Video

Meta-Modeling with Bayesian Networks to Facilitate Intelligent Use of Engineering Simulation

Zack Xuereb Conti, Harvard University Graduate School of Design & Singapore University of Technology and Design

Abstract

Abstract For decades, engineers have utilised engineering simulation tools such as finite element analysis to aid consulting architects on how proposed building designs are likely to behave before they are constructed. More recently, the emergence of computational tools in architecture, together with faster simulation algorithms, are enabling architects in the early stages to explore and evaluate a larger variety of design options by searching iteratively through a so-called ‘design space’. Design space in our context is a multidimensional mathematical space typically bound by the simulation inputs and outputs. When the number of variables defining the design space is more than a handful, it becomes cognitively challenging to draw meaningful inferences on how the input design variables are influencing the simulation response. Consequently, using simulation blindly leads to a shallow understanding of the design space.

In response, we adopt Bayesian networks to compress input/output simulation data into a simulation metamodel whose underlying relationships can be explored. Simulation metamodels are widely used in fields such as aerospace and automotive engineering for quick response prediction. However, most metamodels are typically formulated as forwarding mathematical functions (inputs to output) whose mapping remains difficult to infer global knowledge from when studying multiple variables. Bayesian networks, on the other hand, do not distinguish between inputs and outputs. Thus, the influence between design variables and simulation response can be explored bi-directionally to reveal the important dependencies driving the engineering response over the entire distribution of sampled data points in the design space. Through an applied case study involving structural design, we will illustrate how designers may utilise a Bayesian network metamodel to reveal valuable insight into practice.

Presentation Video

Bayesian Sense-Making in Data Science

Michael Thompson, Ph.D., Procter & Gamble

Presentation Video

Presentation Slides

Decision Support in Extreme Environments — Designing a Medical Support System for a Mission to Mars

David Musson, MD, P.hD., Northern Ontario School of Medicine & Lunar Medical Inc.

Abstract

In 2016, Lunar Medical Inc. and collaborating partners completed a detailed concept for an Advanced Crew Medical System (ACMS) for the Canadian Space Agency. ACMS is the concept for a comprehensive medical support system for long-duration spaceflight. The system will support crew medical care for multi-month spaceflight missions to destinations beyond low Earth orbit (BLEO). BLEO missions will be unlike current missions to the International Space Station (ISS) and similar low Earth orbit (LEO) destinations. While ISS astronauts remain in close contact with Mission Control and can easily return to Earth in a matter of hours for medical treatment, long-duration BLEO missions, such as a return to the moon for extended periods or to near-Earth asteroids (NEA) and Mars, will be very different. Any early return to Earth due to crewmember illness or injury will take weeks or months and may simply not be an option. Crews will be required to monitor and maintain crewmember health and well-being throughout the mission, and any direct ground medical support will be complicated by time-delayed communications with Earth. A central component of the ACMS will be its spaceflight medicine decision support system (SMDSS or DSS). SMDSS will assist the crew medical officer in diagnostic and treatment decisions during the mission. Key challenges include our limited understanding of medical illness in the spaceflight environment, time-delayed communications with the Earth, and limited onboard medical expertise. ACMS and SMDSS will also need to support crew health and medical operations when communication with Mission Control is interrupted or in circumstances where a medically trained astronaut is unavailable or unable to respond. System constraints and requirements for the range of missions currently under consideration will be presented.

Presentation Video

Unsupervised Bayesian Network Learning for Non-Obvious Marketing Insight

Charles Hammerslough, Ph.D. & Praveen Singaraju, VSA Partners

Abstract

Marketing researchers frequently deploy consumer surveys to test marketing hypotheses or to segment their potential audience. This paper shows examples of how Unsupervised Bayesian Networks unlock additional insights from consumer surveys beyond their original purposes. Unsupervised Learning approaches the data from an assumption-free perspective of knowledge representation. It discovers the underlying structures of consumer demand and expectations within the market.

The paper outlines the challenges and opportunities of Unstructured Learning, sets out best practices for analyzing consumer marketing data, and presents three case studies drawn from survey data on consumer packaged goods, restaurants, and Internet services. It draws some non-obvious conclusions in each of the three domains, suggests new directions in product development, and refines value propositions.

Presentations Slides

About the Presenters

Charles Hammerslough is the Discipline Lead for Data Science at VSA Partners, a design, marketing, and branding agency located in Chicago and New York. He oversees a wide range of projects involving marketing and predictive analytics. Prior to VSA, he was the Director of Modeling for the 2012 Obama for America campaign and a vice president of Research and Development at Nielsen. He received his Ph.D. in Sociology and Demography from Princeton University.

Praveen Singaraju is the Director of Strategy and Analytics at VSA Partners. He specializes in advanced analytics of marketing, pricing and trade, consumer and branding, and go-to-market strategies. Prior to VSA, he was a Senior Associate at PricewaterhouseCoopers and an Analytics Consultant at Nielsen Marketing Analytics. He received his MSBA from the University of Cincinnati.

Natural Language Processing and Bayesian Networks

Benoit Hubert, Ipsos

Presentation Video

Access to this presentation video is restricted. Please contact the presenter, Mr. Benoit Hubert, at benoit.hubert@ipsos.com to obtain access to this video.

Large-Scale Inference for Intelligence Analysis

Ero Carrera, Ph.D., Google

Abstract

The large amounts of data related to threats in the cyber environment and the theoretical developments in probabilistic reasoning in the last few decades have enabled better ways to reason about threat intelligence. In this talk, I will give an overview of the historical developments that underlie the foundation of our work using Bayesian Networks and probabilistic reasoning to tackle some of the problems faced in computer security, together with some examples of areas where they are being applied with practical and interpretable results.

Presentation Video

Access to this presentation video is restricted. Please contact the presenter, Mr. Ero Carrera, at ero.carrera@gmail.com to obtain access to this video.

About the Presenter

Ero Carrera is currently a Senior Software Engineer for the Threat Analyst Group at Google. Previously, the author worked at F-Secure and Zynamics GmbH (acquired by Google), spanning a career of nearly 20 years with a focus on reverse engineering and cybersecurity work.

2018 BayesiaLab Conference in Chicago

The 6th Annual BayesiaLab Conference

Conference Photo Gallery

Conference Presentations

Introducing BayesiaLab 8

Presentation Video

About the Presenter

Other Presentations by Lionel Jouffe

Understanding Product and Consumer Segments

Abstract

Presentation Video

Presentation Slides

About the Presenter

Crowdsourcing Bayesian Networks with Prediction Markets

Abstract

Presentation Video

Presentation Slides

About the Presenter

Using Bayesian Networks to Characterize Wildlife Habitat Use

Abstract

Presentation Video

About the Presenter

Bayesian Network Modeling of Imagery Features

Abstract

Presentation Video

Authors

About the Presenter

Methicillin-Resistant Staphylococcus Aureus in Children Living with Industrial Hog Operation Workers

Abstract

Presentation Video

About the Presenter

An Application of Bayesian Networks to Yield Prediction in Portuguese Viticulture

Abstract

Presentation Video

About the Presenter

Synthesis of Causal Discovery and Machine Learning

Abstract

Presentation Video

Presentation Slides

Bayesian-Neural Networks Ensemble Modeling: An Initial Experiment

Abstract

Presentation Video

About the Presenter

Optimizing Mix of Local Media & Promotional Budgets

Abstract

Presentation Video

About the Presenter

Modern Approaches to Causal Modelling in Customer Experience Measurement

Abstract

Presentation Video

Meta-Modeling with Bayesian Networks to Facilitate Intelligent Use of Engineering Simulation

Abstract

Presentation Video

Bayesian Sense-Making in Data Science

Presentation Video

Presentation Slides

Decision Support in Extreme Environments — Designing a Medical Support System for a Mission to Mars

Abstract

Presentation Video

Unsupervised Bayesian Network Learning for Non-Obvious Marketing Insight

Abstract

Presentations Slides

About the Presenters

Natural Language Processing and Bayesian Networks

Presentation Video

Large-Scale Inference for Intelligence Analysis

Abstract

Presentation Video

About the Presenter

2018 BayesiaLab Conference in Chicago

The 6th Annual BayesiaLab Conference

Conference Photo Gallery

Conference Presentations

Understanding Product and Consumer Segments

Abstract

Presentation Video

Presentation Slides

About the Presenter

Using Bayesian Networks to Characterize Wildlife Habitat Use

Abstract