Knowledge Discovery with
Bayesian Networks and Virtual Reality
Friday, January 19, 2018, 1:00 p.m. – 4:00 p.m.
CIC Boston, Lighthouse West, 50 Milk Street, 20th Floor, Boston, MA 02109
Many of today's popular machine-learning techniques produce black-box models. While such models can be extremely powerful in terms of their predictive performance, they often turn out to be useless for the structural understanding of their underlying problem domains. Thus, questions about why and how something is happening can rarely be answered with such models. Most importantly, causal questions are entirely out of scope for even the most advanced Artificial Intelligence methods.
In this seminar, we pursue a different approach and perform machine learning with the objective of discovering new knowledge from data.
For this purpose, we present Bayesian networks as a type of Artificial Intelligence that can help explore complex problems. We introduce the remarkably simple theory behind Bayesian networks and how it relates to probability calculus and statistics.
Furthermore, we use BayesiaLab's machine-learning algorithms to produce meaningful and easily interpretable graphical models of complex domains with hundreds and even thousands of variables. We go from raw data to very explicit, high-dimensional models within seconds.
In the seminar, we showcase examples from different fields of study, including finance, biology, and economics and employ BayesiaLab's supervised and unsupervised learning algorithms. We can directly compare the machine-learned graphs to our background knowledge.
However, a new challenge emerges at this point. It is no longer the lack of explicitness that hinders human comprehension, it is the opposite. The multitude of simultaneous relationships leads to cognitive overload when complex graphical models are flattened for display on screen or paper.
Fortunately, recent advances in Virtual Reality have opened up new opportunities for overcoming the constraints of two dimensions. With the recent launch of BayesiaLab 7, we can now leverage Virtual Reality methods to visualize Bayesian networks in three dimensions. The depth of space literally allows untangling complex Bayesian network graphs. Our natural cognitive ability can now capture the richness of relationships represented in models. Needless to say, this approach facilitates the exploration of large and complex problem domains, which were practically impossible to comprehend in the past.
Seminar participants will have the opportunity to try out BayesiaLab's VR module using the Oculus Rift during the last hour of the seminar. This VR module is available as a free download for all users of BayesiaLab 7 Professional.
- Big Data & Artificial Intelligence, their promise and their limitations for research
- Map of Analytic Modeling
- Purpose of Models: Prediction vs. Explanation
- Source of Models: Data vs. Theory
- Why Bayesian Networks?
- Introductory Example: Differential Diagnosis of Diseases
- Joint Probability Distribution
- Inference through conditioning and marginalizing
- Independence assumptions from domain knowledge
- Direct encoding of causal knowledge into a Bayesian network
- Properties of Bayesian Networks
- Compact representation of the joint probability distribution
- No distinction between dependent and independent variables
- Omnidirectional inference
- Non-parametric & probabilistic
- What is BayesiaLab?
- Supervised Learning for Classification
- Learning = Searching
- Minimum Description Length as a heuristic for network learning
- Information-theoretic measures: Entropy, Mutual Information, Kullback-Leibler Divergence
- The Wisconsin Breast Cancer Database
- The Cancer Genome Atlas
- Unsupervised Learning for Knowledge Discovery
- S&P 500 Ticker Data
- New Vehicle Experience Survey, including 1,000 variables consisting of product features, consumer ratings, demographics, and psychographics
- Introducing the new Multinet Data Clustering algorithm for discovering behavioral segments among consumers.
- Virtual Reality Demo with the Oculus Rift
- 3D exploration of Bayesian networks
- Database of 10-K filings of public companies
- National Health and Nutrition Examination Survey
- Federal Crash Databases (FARS, NASS, LTCCS)
Who should attend?
Biostatisticians, clinical scientists, data scientists, decision scientists, demographers, ecologists, econometricians, economists, epidemiologists, knowledge managers, management scientists, market researchers, marketing scientists, operations research analysts, policy analysts, predictive modelers, research investigators, risk managers, social scientists, statisticians, plus students and teachers of related fields.
Please note that this seminar is geared towards applied researchers, NOT software developers or computer scientists. Questions related to algorithms, programming, scalability, architecture, infrastructure, etc., will be out of scope at this event.