Individual Causal Models and Population System Models in Epidemiology
Commentary by
John W. Lynch PhD MPH

The Need for More Comprehensive Causal Models
During much of the "Modern Epidemiology" era, the analytic methods and causal models of epidemiology have been directed toward risk factor effects on individuals. Recently, epidemiology has turned again to more broadly addressing population phenomena whose effects on health of populations cannot be viewed as the sum of effects on the individuals in that population (1-3). These phenomena include how population patterns of exposure and not just numbers of exposed individuals affect the health of populations. Exposure patterns are particularly important determinants of infection levels in populations. Changing the pattern of connections between exposed and unexposed individuals can often affect infection levels more than changing the exposure status of individuals (4-7). Similarly, different patterns of the distribution of income in a population can have population health effects beyond the effects attributable to individual incomes (8-10). Analysis of how population patterns affect population disease levels could be called "population system epidemiology". The focus of population system epidemiology is on population phenomena rather than individual phenomena. A characteristic of a system in contrast to a "heap" is that the arrangement of elements makes a difference. When the pattern of exposures or connections between individuals in a population makes a difference for disease levels, we are dealing with a population system.
The thesis of this commentary is that population system epidemiology needs causal models with different characteristics from the currently dominant causal models. The sufficient-component cause model (11) has been valuable because it abstracts joint effects of multiples in individuals. It, however, doesn't encompass population system phenomena. Schwartz and Carpenter in this issue employ the sufficient-component cause model to analyze effects whose origins lie at a population rather than an individual level (12). But we will show that this model does not provide a basis for examining many population effects. Population transmission models are causal models with a markedly different focus and structure from the sufficient-component cause model. They are non-linear models where the patterns of contact intrinsically affect population health levels in ways ignored by the sufficient-component cause model. Traditionally, transmission models have ignored joint effects of multiple exposures – a hallmark of the sufficient-component cause model. New directions for causal epidemiologic models that include virtues of both the sufficient-component cause model and transmission models are suggested.
Some Aspects of Causal Models
To contrast sufficient-component cause models with infection transmission models, we should consider first the nature of causal models. Causal models help scientists by abstracting particular elements of causal phenomenon while discarding details that get in the way of deriving useful theory and insights about the phenomenon under study. Reality, especially the biological and social reality with which epidemiology deals, is diverse and multifaceted. But theory, by its very nature, must be focused and unifying. A narrow focus on causal phenomenon that disregards realistic details can help achieve wide applicability of theory. For example, to develop a revolutionizing and productive theory about bodies in motion, Newton had to ignore Aristotle's insistence that the slowing of bodies in motion had to be a central part of theories of motion because the observation of such slowing was so universal. To formulate universal laws of motion, Newton used a model of motion that ignored this universal reality. Newton's laws of motion are validated by their theoretical utility, not their fit to reality.
Sufficient-component cause theory, in a similar fashion to Newton's laws, ignores several aspects of reality but has demonstrated its utility for developing theory about the effects of multiple exposures in individuals. The ability of this model to generate productive new insights is linked to the way it ignores details on timing of exposure and on population phenomena arising from patterns of connection between individuals that it cannot neatly encompass. Transmission system theory, on the other hand, has employed dynamic models where timing and connection between individuals are central but joint effects of multiple exposures are ignored. Rather than modeling discrete individuals, transmission system models most commonly model interactions between continuous population segments. This enables transmission system theory to use differential equations to focus on population rather than individual outcomes.
These two contrasting approaches to modeling causal issues of central concern to epidemiology focus on such different phenomena that they might seem incommensurate. We need to elaborate each of these basic theories so that they can better relate to each other. If used in isolation from each other, neither of these approaches provides an adequate basis for population system epidemiology.
The Structure of Epidemiological Data
Perhaps the best way to see the need for integrating sufficient-component cause theory with transmission theory is to begin with data rather than theory. Standard epidemiological analysis methods array data for separate individuals into rows, and data on both outcome and predictor variables for each individual into columns. This is represented in the individual data plane in Figure 1. The fact that standard analyses do not view individuals as being part of a system is manifest by the fact that the row an individual is in (i. e., the arrangement of individuals) is assumed to make no difference to the results of standard analyses. This two-dimensional data arrangement at the individual level can be viewed as the face of a three-dimensional cube, depicted in Figure 1,where the third dimension defines the pattern of connections between individuals that generate population system phenomena.
Figure 1
The three dimensional shape of epidemiological data

Both social network analysis (13) and phylogenetic analysis (14) are performed in the network plane that is perpendicular to the individual data plane. For these analytic methods, the data are arranged as a square matrix with individuals along both axes. The values in the matrix represent degrees of connection between individuals. If connections are described dichotomously, there may be a 1 or a 0 in each cell depending upon whether a connection between two individuals exists or not. Instead of a single variable dichotomous definition of connection, there may be multiple continuous connection variables. These might have directionality from axis 1 to axis 2 or vice versa. In the case of phylogenetic analysis using DNA sequence data, there might be a variable for each base pair location indicating identity or difference at the sight.
Many variables collected in field studies might not directly measure degrees of connection between individuals. They might only reflect chances of having connections with different groups of individuals. One class of variables of this type includes variables that describe an aspect of a contact that both individuals report identically. For example, the type of sex act or the courtship time between meeting and having sex can be used to reflect who is likely to be connected to whom. Geographic or social locations where contacts are made could be used to reflect many different types of contact beyond sexual contact. The intrinsic value of these variables lies in the network plane. If they are analyzed only in the individual data plane, much of their value will be lost because their value to population systems epidemiology lies in describing relationships in the network plane.
Figure 1 is presented for its heuristic value. It is not an exact representation of the shape of epidemiological data needed for the analysis of any particular model. The number of different connection variables may not correspond to the number of individual variables measured as the existence of a layer of connection for each individual variable in the figure might imply. Figure 1 does not include the time dimension that is needed for dynamic analysis of population systems. Figure 1 does, however, capture the essential argument we wish to make that the dimension of connections between individuals is an integral part of epidemiological data even if it is ignored in data analysis.
Standard epidemiological analyses assume that the dimension connecting individuals is irrelevant. They assume that the outcome in one individual is independent of the outcome in other individuals. This assumption is also inherent to any use of the sufficient-component cause model to represent populations. This assumption is violated whenever transmission of infection generates new sources of infectious agent for risk factors to transmit. Because this violation is so readily apparent for infectious diseases, we use infectious disease examples in this commentary. But social network connections are also likely to have other health effects that will violate this standard assumption of epidemiological analysis. Examples include social support, social stress, transmission of behavior norms, transmission of knowledge that influences behavior, and transmission of power relationships that help organize society or that generate exploitation.
Because standard epidemiological analyses and the sufficient-component cause model ignore network connections between individuals, they also ignore the forces that determine different patterns of network connections. Models that incorporate network structure could help highlight these important determinants of disease and bring them into the realm of epidemiological investigation.
Network Connections, Individual Risk, And Population Risk
We can think of infection as flowing in the network plane that is ignored or assumed to be irrelevant by standard analytic methods in epidemiology. Transmission system models are constructed mainly to capture phenomenon occurring in this network plane. Epidemiological data relevant to the network plane would most often not classify every pair of individuals as to whether or not they are connected. Data that relate to individual connections would most often be collected from a sample of individuals. It would only relate to the connections made by individuals in the sample and usually would only be helpful in indicating the class of individuals contacted rather than the exact individuals who where contacted. Data gathered from individuals about their contacts is said to be "egocentric". It describes the contacts around individuals but not the overall population pattern of contacts. To demonstrate the importance of data relevant to the network plane, however, we use theoretical examples where complete network data of a dichotomous nature is available. In our examples, circles represent individuals and lines represent the existence of a connection between individuals. Time relationships are ignored.
Figure 2
Distinct Patterns Of Connection Between Identical Individuals with Two Connections

The example in Figure 2 demonstrates that egocentric information is critically incomplete even when all the individuals in the population are identical. The egocentric information from either population A or population B in figure 2 would indicate that each individual is connected to two other individuals. But egocentric information does not establish whether or not those individuals form chains that sustain transmission. In A, they would. In B, they would not. Although all individuals in populations with either pattern A or B would appear to be the same, the population with Pattern A would have higher levels of infection. This illustrates the fact that thee two populations are not just the sum of the individuals in them as seen from the individual data plane. The two populations need to be defined as well by the network plane defining the pattern of connections between individuals.
Different Effects of Network Roles and Individual Risk on Population Risk
In figure 2, all individuals in a network play the same role in their network. Now we consider an example where individuals play different roles in their network. In Figure 3, individuals with three contacts can be distinguished by how close or far they are to a connecting link between two groups. Moreover, most individuals have 3 connections but one individual has only two. Consider the situation where transmission across each link occurs with some specified probability and there is a random introduction of infection into the network. There is a range of transmission probability values where the individual with only two links has the lowest chance of becoming infected after random introduction of infection to the population. Thus, from an individual risk view, this individual has the lowest risk of infection. The contribution of that individual to infection levels in the population system, however, can, at certain transmission probabilities, be greater than that of any other individual. Eliminating one connection to this individual can do more to lower average infection levels in the population after introduction of infection than eliminating a connection to any other individual. Analysis only in the individual plane, that is to say any analysis founded upon the sufficient-component cause model, would miss this fact.
Figure 3
A Network In Which The Most Important Individual Determining Population Levels Of Infection Has The Lowest Risk of Infection

The failure of individual risk analyses to identify key individuals determining transmission at the population level is particularly notable when individuals are distinguished by risk factor status. Consider two populations where for each individual in the first population, there is a corresponding individual in the second population with exactly the same risk factors and exactly the same history of contacts. Any description of individuals in the individual data plane will find these two populations to be identical. They differ only in the way individuals are connected. By changing contact patterns, however, we can change the level of infection from zero to complete.
Consider the transmission of an infection that does not induce immunity in a population with individuals who do and do not have a risk factor. Suppose for the sake of exposition that this risk factor is a gene. The gene might facilitate the transmission of infection either to or from the individual with the gene. The group with the gene might make all of their contacts with individuals not having this gene. If these latter individuals are incapable of sustaining chains of transmission on their own, then the population will not sustain transmission. If individuals with the gene make all of their contacts with each other, infection might flow quickly between them but not at all to the group without the gene. If the group with the gene makes just enough contacts with each other to sustain circulation between them, they might make the rest of their contacts with the group without the gene and infect many or most of them. In summary, the proportion of contacts made by individuals with a risk factor that are with individuals without the risk factor may be decreased from one to zero. Starting at the one level, the infection level in be zero. It could then rise as the fraction is decreased. At some point it will reach a peak and then fall as the fraction is decreased further until individuals with the risk factor only contact other individuals with the risk factor. At this level, the maximum infection level is the fraction of the population having the risk factor.
Most egocentric data in epidemiological studies would not provide data on whether the contacts of a subject had a gene or not. Even if such data is obtained, analysis of this network data in the individual data plane cannot determine the population level of infection. A transmission model analysis capturing phenomena occurring in the network plane is required.
The previous examples emphasized how analyses in the individual plane obscure effects of patterns of contact in the network plane. We now consider an example where the patterns are the same but the type of effect a risk factor has differs. The gene in the previous example might have one of two effects. It might make the person with the gene more susceptible to infection or it might make the person with the gene generate more infectious agent when they become infected. These two effects can be quite distinct because the survival of a transmitted agent can depend upon quite different thing than the proliferation of the agent in the host. Consistent with these effects, a gene might make an individual "X" times as susceptible to infection or "X" times as contagious if they become infected. The increase in contagiousness will raise population levels of infection considerably more than the increase susceptibility. Standard risk factor epidemiology, however, has a very strong bias to investigate and detect risk factors affecting susceptibility rather than contagiousness. A population system approach to epidemiology is needed to focus on the risk factor effects that are most important at the population level. An analysis in the individual data plane, that is to say an analysis based on sufficient-component cause concepts, could detect risk factors affecting contagiousness. It could do so by defining exposures in terms of risk factors in the persons contacted. But individual level analyses could not derive the insight that individuals with exposures having contagiousness effects have a larger impact on infection levels in the population than individuals with exposures having equally strong susceptibility effects. Models of population system phenomena rather than models of exposure effects in individuals are required to derive such insights.
The Sufficient-Component Cause Model
The sufficient-component cause model has been a productive, though under-utilized, causal model. It has focused and unified epidemiological thinking (11). It has clarified concepts of confounding and effect modification. It has helped clarify the mathematical relationships for assessing the public health impact of exposure to risk factors (15). It has also served as the basis for defining observations that can distinguish whether two risk factors act in distinct causal pathways, have distinct roles in the same causal pathway, or play the same role in pathogenesis (16). But its limitations for analysis of infectious disease effects have long been noted (17).
Two limitations deserve discussion. The first is that the sufficient-component cause model does not treat time in a fashion that allows for dynamic analysis. Dynamic analysis underlies the definition of endemic and epidemic thresholds, the basic reproduction number, the rate of rise of epidemics, and endemic infection levels. The effects of risk factors for transmission should ideally be defined in terms of their effects on these transmission system measurements. But because the sufficient component cause model handles time as an element of variable definition rather than as an explicit model element, dynamic analysis practically impossible. Models for analysis of population systems in epidemiology must handle time explicitly in the model structure. Transmission models have this characteristic.
The second limitation is that effects in the network plane are not incorporated in an analyzable fashion into the sufficient-component cause model. The examples just provided are meant to make that clear. The problem is not just that defining exposure as a function of the contacts made is unwieldy in the context of the sufficient-component cause model. The real problem is that individual characteristics alone do not determine the population risks. This is evident in the figure 2 example where populations of identical individuals have different risks depending upon patterns of connection. The inadequacy of individual risk assessment for population risk assessment is further evident in the figure 3 example where the individual with the least individual risk generates the most population risk. The importance of causal models that capture phenomenon in the network plane that the sufficient-component cause model misses is further emphasized by the example where a gene affected infection risk. In that example, infection risk could vary from zero to very high in populations with identical individuals.
That defining effects at the individual level misses whole conceptual areas of importance for population infection control is evident in the last example. The sufficient-component cause model does not have a structure that could detect the fact that risk factors increasing contagiousness have greater effects on population levels of infection than do risk factors with equal effects on susceptibility. Extending this insight to non-infectious disease areas, we see that the sufficient-component cause model lacks the essential structure needed to assess the effects of factors affecting contact patterns in populations on the level of disease in populations.
Transmission Models
Transmission system models explicitly model phenomena in the network plane that most epidemiological analyses ignores. They have a long and exponentially growing tradition of development in epidemiology. Many useful concepts coming out of this tradition have been presented by Anderson and May (18). The differential equation models commonly used in this tradition handle time explicitly and therefore offer a basis for dynamic analysis that the sufficient-component cause model lacks. Perhaps the characteristic that most distinguish transmission models from the sufficient-component cause model is that they model non-linear population effects (4). Non-linearity of effects at the population system level has two important implications. First, individual effects will not sum to population effects. Second, patterns of connection between individuals will have effects at the population level.
The dominant tradition in transmission system modeling ignores many aspects of reality so that, like Newton's model of motion, the essence of the phenomenon under study can be illuminated, insights can gained, and new analytic tools can be developed. The major simplification made by transmission models is that populations are treated as continuous entities and individuals are ignored. This simplification is intrinsic to the use of differential equation models. This simplification imposes further simplifications with regard to contact patterns. The type of differential equations commonly used to construct transmission system models cannot capture details of individual connection patterns like those seen in figures 2 and 3. Moreover, in differential equation transmission models, contact is an instantaneous event. Differential equations can be defined that incorporate continuous population segments where the basic units are not individuals but pairs of individuals (19). In this type of differential equation model, contact can have duration and need not be instantaneous. But contact models of this type still cannot define individual networks of the type in figures 2 or 3.
Recently there has been better acceptance of discrete individual models in the transmission system modeling tradition. Discrete individual models of transmission have been especially useful in the examination of sexually transmitted infections (20-24). The reason for this acceptance is that individual network patterns where contacts occur in ongoing relationships are now seen to be important determinants of population infection levels of sexually transmitted infections. Individual models can capture these determinants better than differential equation models.
Blended Transmission and Sufficient-Component Cause Models
One reason for promoting the development of transmission models with discrete individuals who connect to each other in population patterns is that by including individuals in transmission models, one can meld the theoretical insights of transmission models with the theoretical insights of the sufficient-component cause model. Such blended models could provide a basis for fully using all the dimensions of epidemiological data. A new approach we have taken to discrete individual models of transmission systems may offer some advantages in this endeavor (25). One advantage of this new approach is that it allows for theoretical model analysis that is difficult or impossible with other approaches. Another advantage is that it explicitly incorporates data on the social and geographic setting of contact. Such data is readily collectible in epidemiological studies. Most other discrete individual models only incorporate data from the network plane relevant to connections between specific individuals. Such data is not readily collectible in epidemiological studies.
The sufficient-component cause model cannot be integrated directly into transmission models because it lacks an explicit integration of time. Graph theoretic models may have an advantage over the sufficient-component cause model in this regard (26). Graph theoretic models, like the sufficient-component cause model, are designed to address the joint effects in individuals of multiple variables. By including time relationships in arrows instead of just in variable definitions, this type of model has been able to clarify some aspects of confounding that were not so readily apparent just from analysis of the sufficient-component cause model (26). In their current form, however, graph theoretic models do not incorporate data relevant to the network plane. They only model individual effects and they assume that there are no dependencies between individuals. But in the same manner that G-estimation methods have been used to incorporate time data, (27) graph theoretic models might be elaborated to incorporate network data.
The point of this commentary is not to advocate any particular approach to developing integrative causal models. Rather, the point is that the theoretical foundation for population system epidemiology cannot depend wholly upon individual level models. Dynamic models of population processes are needed that incorporate exposure patterns, time, and contact networks in the manner of transmission models while also incorporating the joint effects of multiple exposures in individuals in the manner of the sufficient-component cause model. Bringing both these traditions together could enhance the pursuit of a population systems approach to epidemiology.
References