## Description

Many statistical analyses aim at a causal explanation of the data. The early observational studies on the risks of smoking (Cornﬁeld et al., 1959), for example, aimed at something deeper than to show the poorer prognosis of a smoker. The hoped-for interpretation was causal: those who smoked would, on average, have had a better health had they not done so and, consequently, any future intervention against smoking will, at least in a similar population, have a positive impact on health. Causal interpretations and questions are the focus of this present book. They underpin many statistical studies in a variety of empirical disciplines, including natural and social sciences, psychology, and economics. The case of epidemiology and biostatistics is noted for a traditionally cautious attitude towards causality. Early researchers in these areas did not feel the need to use the word ‘causal’. Emphasis was on the requirement that the study be ‘secure’: that its conclusions should not rely on special assumptions about the nature of uncontrolled variation, something that is ideally only achieved in experimental studies. In the work of Fisher (1935), security was achieved largely by using randomization within an experimental context. This ensures that, when we form contrasts between the treatment groups, we are comparing ‘like with like’, and thus there are no systematic pre-existing differences between the treatment groups that might be alternative explanations of the observed difference in response.

Another idea originated by Fisher (1932) and later developed by Cochran (1957), Cox (1960), and Cox and McCullagh (1982) is the use of supplementary variables to improve the efﬁciency of estimators and of instrumental variables to make a causal effect of interest identi-ﬁable. The use of supplementary and instrumental variables in causal inference is discussed in Chapter 16 of this book, ‘Supplementary variables for causal estimation’ by Roland Ramsahai.

Early advances in the theory of experimental design, largely contributed by Rothamsted researchers, are discussed in Chapter 1, ‘Statistical causality: some historical remarks’, by David Cox. Also discussed in this chapter are some implications of the ‘Rothamsted view’ (and of the controversies that arose around it) for the current discussion on causal inference. A technical discussion of the problems of causal inference in randomized experiments in medicine is given in Chapter 21, ‘Causal inference in clinical trials’, by Krista Fischer and Ian White.

The 1960s witnessed the early development of a theory of causal inference in observa-tional studies, a notable example being the work of Bradford Hill (1965). Hill proposed a set of guidelines to strengthen the case for a causal interpretation of the results of a given observational study. One of these guidelines, the presence of a dose–response relationship, is discussed in depth in Chapter 19, ‘Nonreactive and purely reactive doses in observational studies’, by Paul Rosenbaum. Hill’s guidelines are informal, and they do not provide a deﬁ-nition of ‘causal’. During the 1990s, a wider community of researchers, gathered from such disciplines as statistics, philosophy, economics, social science, machine learning, and artiﬁ-cial intelligence, proposed a more aggressive approach to causality, reminiscent of the long philosophers’ struggle to reduce causality to probabilities. These researchers transformed cause–effect relationships into objects that can be manipulated mathematically (Pearl, 2000). They attempted to formalize concepts such as confounding and to set up various formal frame-works for causal inference from observational and experimental studies. In a given application, such frameworks allow us (i) to deﬁne the target causal effects, (ii) to express the causal as-sumptions in a clear way, and determine whether they are sufﬁcient to allow estimation of the target effects from the available data, (iii) to identify analysis modalities and algorithms that render the estimate feasible, and (iv) to identify observations and experiments that would render the estimate feasible, or assumptions under which the conclusions of the analysis have a causal interpretation.

In retrospect, such effort came late. Many of the tools for conceptualizing causality had been available for some time, as in the case of the potential outcomes representation (Rubin, 1974), which Rubin adapted from experimental (Fisher, 1935) to observational studies in the early 1970s. Potential outcomes are discussed in Chapter 2, ‘The language of potential outcomes’, by Arvid Sj¨olander, and are used in several other chapters of this book. In this representation, any individual is characterized by a notional response Yk to each treatment Tk , regarded as ﬁxed even before the treatment is applied. In Chapter 10, ‘Cross-classiﬁcations by joint potential outcomes’, Arvid Sj¨olander discusses the idea of a ‘principal stratiﬁcation’ of the individuals, on the basis of the joint values of several potential outcomes of the same variable (so that each stratum speciﬁes exactly how that variable would respond to a variety of different settings for some other variable). A sometimes serious limitation of such an approach is that typically there is no way of telling which individual falls into which stratum. In Chapter 10, principal stratiﬁcation is used to bound nonidentiﬁable causal effects and to deal with problems due to imperfect observations. Principal stratiﬁcation is also used in Chapter 21 to deal with problems of protocol nonadherence and of contamination between treatment arms, in the context of randomized clinical trials.

Another legacy from the past is the use of graphical representations of causality, predated by Wright’s work on path diagrams (Wright, 1921, 1934) and later advocated by Cochran (1965). This area is currently dominated by the Non Parametric Structural Equations Models (NPSEMs) discussed in Chapter 3, ‘Structural equations, graphs and interventions’, by Ilya Shpitser. Shpitser emphasizes a conceptual symbiosis between NPSEMs and potential out-comes, where NPSEMs contribute a transparent language for expressing assumptions in terms of conditional independencies implied by the structure of a causal graph (Dawid, 1979; Geiger et al., 1990). Constructive criticism of NPSEMs is given in Chapter 5, ‘Causal inference as a prediction problem: assumptions, identiﬁcation and evidence synthesis’, by Sander Greenland. A strong interpretation of an NPSEM regards each node in the model as associated with a ﬁxed collection of potential responses to the various possible conﬁgurations of interventions on the set of parents of that node. This interpretation sheds light on some of the problems of nonstochasticity and nonidentiﬁability that Greenland mentions in relation to NPSEMs.

In the light of these problems, some researchers have set aside potential outcomes and NPSEMs in favour of approaches that fully acknowledge the stochastic nature of the world. One of these is the decision theoretic approach of Chapter 4. Focus here is on the assumptions under which an inference of interest, which we would ideally obtain from an experiment, can be drawn from a given set of observational data. In general, inferences are not transportable between an observational and an experimental regime of data collection, the reason being that the distributions of the domain variables in the two regimes may be completely different. The decision-theoretic approach considers special ‘regime indicator’ variables and uses them to formalize (in terms of conditional independence relationships) those conditions of invariance between regime-speciﬁc distributions that make cross-regime inference possible. In some problems of causal inference, the decision-theoretic approach leads to the same conclusions one reaches by using potential outcomes, but relaxing the strong assumptions of the latter. Chapters 8, 14, 15, and 16 further illustrate the use of a decision-theoretic formalism in combination with explicit graph representations of the assumed data-generating mechanism. A theme for future research is a comparison of different formulations of statistical causality, in relation to real data analysis situations and in terms of their ability to clarify the assumptions behind the validity of an inference.