It was exciting to see the launch of the beta version of IBM's Watson Analytics. The interface is highly intuitive, and the visualization of the results is brilliant.
However, I must point out a fundamental flaw in the modeling approach presented in Watson Analytics. Throughout the entire site and all its video tutorials, there is a consistent conflation of statistical and causal concepts. For example, prediction, explanation, association, and impact are all presented indistinguishably. There are countless instances in Watson where measures of association are falsely labeled with causal descriptions.
So, we are now facing the opposite problem in comparison to what plagued science in the last century. Throughout most of the 20th century, causality without experiments was inconceivable to consider for statisticians, which turned out to be a considerable obstacle to scientific progress (Pearl, 1999). Today, instead, data scientists seem to be utterly unconcerned about the critical distinction between statistical and causal inference. There is little awareness of the requirements for proper causal identification and estimation. Watson is a prime example. So, we've gone from anathema to anything goes.
Causal Identification and Estimation Done Properly...
For a fairly comprehensive discussion of causal identification and estimation, please see the following two items on our website:
Causality for Policy Assessment and Impact Analysis, a 2-hour video lecture.