## Knowledge Elicitation & Reasoning with Bayesian Networks

This seminar was recorded on May 12, 2017, at Indiana Wesleyan University in West Chester, Ohio.

## The Seminar at a Glance

In this half-day workshop, we demonstrate how to elicit human knowledge—in the absence of data—for developing a high-dimensional computational model of a problem domain. As our case study topic, we examine the market for electric vehicles, for which only little consumer data exists given the novelty of the technology. More specifically, we utilize the web-based **Bayesia Expert Knowledge Elicitation Environment (BEKEE)** and develop a Bayesian network model for estimating the long-term volume potential of electric vehicles.

We shall see that systematically eliciting and encoding numerous pieces of (admittedly imperfect) knowledge into a Bayesian network can produce a remarkably useful approximation of the underlying domain. In a way, we leverage the "wisdom of crowds" to quantify a multitude of interactions within our high-dimensional problem domain. As a result, the audience's collective understanding is represented in a Bayesian network, which provides us with a common framework for reasoning about the problem domain, such as producing a volume estimate or evaluating policy options, e.g. tax incentives, etc.

Furthermore, by using a Bayesian network model, we can preserve all the uncertainty that exists in the audience's knowledge and perform inference consciously taking into account all that uncertainty. Thus, we manage to avoid two common and problematic extremes in reasoning, i.e., (i) suppressing uncertainty by calculating with the fake precision of single-point estimates, or (ii) being overwhelmed by uncertainty and abandoning quantitative reasoning altogether.

## Knowledge Elicitation Exercise: The Potential of Electric Vehicles

Back in 2011, President Obama predicted that by 2015 there would 1 million electric vehicles on the road in the U.S. As it turns out, less than 30% of that target was achieved by 2015. Today, the number of electric vehicles being sold remains tiny considering the size of the overall vehicle market. It is easy to come up with a long list of reasons for why sales are so low: limited range, inadequate charging infrastructure, long charging time, high battery cost, relatively low fuel prices, etc.

### What's the True Potential?

More important than looking for what currently constricts electric vehicle sales is determining what would happen under counterfactual conditions: what if the electric vehicle range were better, what if there were more charging points, what if batteries were cheaper, what if fuel prices were higher? Such counterfactual conditions might help us understand the ultimate volume potential of electric vehicles. That is the quantity we clearly wish to know for making sound decisions about long-term investments in infrastructure and product development.

### No Data = No Model = No Science?

Of course, if we had a mathematical model describing all the dynamics in this problem domain, we could compute various scenarios and come up with a volume forecast. However, we don't have any data from which we could generate such a model. Currently, none of the available electric vehicles are close to the technical specifications that we may wish to consider as realistic scenarios for the future.

Without data, and without a mathematical model generated from such data, are we, then, stuck with mere speculation? Is it even possible to come up with reasonably objective volume estimate?

In this day and age of "Big Data," we may be led to believe that facts can only be established from data, especially in the context of a scientific inquiry. This is a misconception. Even without data, humans do often possess reliable knowledge, qualitative or quantitative, tacit or explicit, about many aspects of the world. We believe that a useful amount of knowledge exists regarding the problem domain at hand. Also, there is one particular type of knowledge that data on its own cannot yield, and that is causality. For that, we always have to rely on human expertise.

### The Wisdom of Crowds

Although there may not be a single expert who can fully comprehend the entire domain, there may be numerous individuals, or stakeholders, who are more or less knowledgeable about different parts of the puzzle. It is our objective to break down the overall problem domain into numerous simpler questions, which are perhaps more easily "knowable," at least to some.

So, we are not looking for a single authoritative opinion. Rather, we are looking to collect and consolidate the full spectrum of thought, including causal relationships, from a number of stakeholders. This is where the "wisdom of crowds" comes into play. We want stakeholders to provide their individual and independent assessments of different elements and relationships within the problem domain.

### Knowledge Elicitation with the Bayesia Expert Knowledge Elicitation Environment

While the objective of collecting multiple opinions is straightforward, there are many technical and practical challenges in terms of implementing such a process. In this seminar, we propose employing the **Bayesia Expert Knowledge Elicitation Environment (BEKEE)** for this purpose. More specifically, we plan to utilize BEKEE for collecting the opinions of our seminar participants about the forces that affect the electric vehicle market, such as the issues mentioned above. We certainly don't expect to have any bona fide experts in this field. Rather, we anticipate a fairly diverse range of opinions given the presumably wide range of backgrounds of the participants.

### Spreadsheets are Deterministic, Opinions are Probabilistic

One may be tempted to think that the knowledge collected via BEKEE can be assembled in a spreadsheet so that various conditions can be simulated. Unfortunately, spreadsheets are of little use in this context. A multitude of opinions cannot be neatly translated into spreadsheet formulas.

Quite naturally, combining multiple assessments across numerous dimensions produces uncertainty, which cannot be directly encoded in a spreadsheet. A Bayesian network, on the other hand, can explicitly capture the uncertainty generated from the diversity of opinions. Using the BEKEE platform, we can directly encode the elicited knowledge, along with all the associated uncertainty, into a Bayesian network in using BayesiaLab software platform.

### Reasoning with Bayesian Networks and BayesiaLab

On the basis of the newly-generated Bayesian network, which represents the collective knowledge of the seminar participants, we can use the simulation tools in BayesiaLab to reason probabilistically about the implications of various hypothetical conditions. Thus, the collective knowledge will produce a volume estimate, i.e. the principal quantity of interest in this seminar.

Furthermore, we can use the same Bayesian network model to reason about the causal effects of hypothetical policy mandates and other government interventions beyond the natural equilibrium state of the electric vehicle market.

## What is BEKEE?

Everybody is talking about "Big Data" and all the opportunities that are associated with it. Very often, though, we hear almost as much about the challenges that come with this flood of data. Where to store it, how to analyze it, how to explain it, the list goes on and on. We think this is a very nice problem to have. Much more serious problems exist on the opposite end of the spectrum, where there is not enough data. Unfortunately, all the advanced knowledge discovery algorithms fail in the absence of data.

In over ten years of continuous development, and in increasingly sophisticated ways, BayesiaLab has permitted deriving knowledge from data through its machine learning algorithms, very much in the spirit of understanding "Big Data." However, BayesiaLab has maintained an equal focus on managing knowledge that exists beyond measurable and countable data points, such as the knowledge contained in the human mind. BayesiaLab's graphical user interface has made it highly intuitive for individual subject matter experts to encode their own domain understanding into a Bayesian network, thus capturing what they explicitly or implicitly know. What is especially important, one can very easily and formally capture causal directions in a Bayesian network graph, which is something that few other frameworks can do.

However, when it comes to consolidating the collective knowledge from a group of experts, rather than from an individual, the process is not as straightforward any longer. Traditionally, one would perhaps bring the experts together in a brainstorming session and let them form a common understanding. Subsequently, such a consensus could be encoded manually. However, brainstorming sessions are prone to introducing a wide range of biases, which can be disastrously counterproductive in studying complex domains.

Bayesia Expert Knowledge Elicitation Environment, or BEKEE for short, is a new web application that is designed to minimize detrimental group biases. The central idea is not to coerce consensus, but rather to elicit everyone's individual views regarding the domain under study. To ensure the independent elicitation of probabilities, BEKEE queries stakeholders individually via an interactive questionnaire linked to the core BayesiaLab application. Retrieving expert views in such a fashion generates many "parallel universes" in terms of domain understanding. These different perspectives can be formally compared by the facilitator and potentially returned to the group for a formal debate in the case of seriously conflicting assessments.

In most cases, this is an iterative process and, even if stakeholder opinions do not converge, BayesiaLab will compile all views and produce a unifying Bayesian network. This graph is now the mathematically correct summary of all the available expert opinions. As such, it can be utilized as a formal representation of the underlying domain. Most importantly, this graph is not merely a qualitative illustration. Rather, a Bayesian network is a fully computable model of the domain, which immediately facilitates the simulation of what-if scenarios.

In fact, we can evaluate this Bayesian network model the same way as a statistical model estimated from "Big Data." One might still prefer a data-based model if data were indeed available, but in the absence thereof, the formally-encoded collective expert knowledge best represents what is known at the time.

## The Motivation for BEKEE

### Complexity & Cognitive Challenges

It is presumably fair to state that reasoning in complex environments creates cognitive challenges for humans. Adding uncertainty to our observations of the problem domain, or even considering uncertainty regarding the structure of the domain itself, makes matters worse. When uncertainty blurs so many premises, it can be particularly difficult to find a common reasoning framework for a group of stakeholders.

### No Data, No AI?

It is presumably fair to say that Artificial Intelligence is nowadays perceived to be associated with Big Data. We, however, plan to employ AI at the opposite end of the spectrum, where we have no data. More specifically, we propose using Bayesian networks as a form of Artificial Intelligence that makes formal reasoning and optimization possible under such conditions.

### No Data, No Analytics?

If we had hard observations from our domain in the form of data, it would be quite natural to build a traditional analytic model for decision support. However, the real world often yields only fragmented data or no data at all. It is not uncommon that we merely have the opinions of individuals who are more or less familiar with the problem domain.

### To an Analyst With Excel, Every Problem Looks Like Arithmetic.

In the business world, it is typical to use spreadsheets to model the relationships between variables in a problem domain. Also, in the absence of hard observations, it is reasonable that experts provide assumptions instead of data. Any such expert knowledge is typically encoded in the form of single-point estimates and formulas. However, using of single values and formulas instantly oversimplifies the problem domain: firstly, the variables, and the relationships between them, become deterministic; secondly, the left-hand side versus right-hand side nature of formulas restricts inference to only one direction.

### Taking No Chances!

Given that cells and formulas in spreadsheets are deterministic and only work with single-point values, they are well suited for encoding “hard” logic, but not at all for “soft” probabilistic knowledge that includes uncertainty. As a result, any uncertainty has to be addressed with workarounds, often in the form of trying out multiple scenarios or by working with simulation add-ons.

### It Is a One-Way Street!

The lack of omni-directional inference, however, may the bigger issue in spreadsheets. As soon as we create a formula linking two cells in a spreadsheet, e.g. B1=function(A1), we preclude any evaluation in the opposite direction, from B1 to A1.

Assuming that A1 is the cause, and B1 is the effect, we can indeed use a spreadsheet for inference in the causal direction, i.e. perform a simulation. However, even if we were certain about the causal direction between them, unidirectionality would remain a concern. For instance, if we were only able to observe the effect B1, we could not infer the cause A1, i.e. we could not perform a diagnosis from effect to cause. The one-way nature of spreadsheet computations prevents this.

### Bayesian Networks to the Rescue!

Bayesian networks are probabilistic by default and handle uncertainty “natively.” A Bayesian network model can work directly with probabilistic inputs, probabilistic relationships, and deliver correctly computed probabilistic outputs. Also, whereas traditional models and spreadsheets are of the form y=f(x), Bayesian networks do not have to distinguish between independent and dependent variables. Rather, a Bayesian network represents the entire joint probability distribution of the system under study. This representation facilitates omni-directional inference, which is what we typically require for reasoning about a complex problem domain.

### Probabilistic Reasoning

On the basis of this newly-generated Bayesian network, we can reason probabilistically about the implications of actual observations or hypothetical scenarios. Of particular interest are diagnostic inference tasks, i.e. reasoning from observed effects back to the not-directly-observable cause. Our example illustrates that reasoning correctly about the given question can become intractable—both cognitively and mathematically—unless we employ Bayesian networks as a reasoning framework. The example also proves how valuable even vague and diverse opinions can become if we systematically elicit them from stakeholders and encode them in a Bayesian network.

### Optimization Under Uncertainty

In addition to ad hoc inference, we can use the Bayesian network in conjunction with BayesiaLab's search algorithms to identify the optimal course of action for achieving the desired outcome, given current conditions and uncertainties and while also taking into account any costs and utilities.

### Value of Information

Finally, the Bayesian network allows us to quantify the value of information and measure the consistency (or conflict) of different pieces of actual or hypothetical evidence. Most importantly, the network can identify those pieces of yet-to-be-observed evidence that would reduce our uncertainty the most or would have the greatest impact on determining the optimal course of action.