Epidemiology 802 Chapter 4

University of Michigan

Compartmental Model Analysis of Epidemiologic Processes

Chapter 4

Epidemiological Statistics Relating Exposure and Disease

by

Jim Koopman


Chapter Outline

Chapter Purpose

Our only disease model so far, the constant rate model, did not specify exposure status. In this chapter we do so. We start out in the simple two by two world of exposure and disease relationships which has been a beacon of clarity for developing epidemiological thinking. By duplicating our constant rate model and calling one group exposed and another unexposed, we generate the four categories of population that allow us to calculate prevalence odds ratios, incidence odds ratios, rate ratios, risk ratios, risk differences, rate differences, and population attributable risks. Note that these statistics are often treated as model parameters by epidemiologists when they use them to make scientific inferences or to project the population effects of changing exposure. Epidemiologists rarely make very explicite, however, the models in which the parameters they are estimating are found. We relate these statistics to model parameters for very simple models.

Whether a statistic which we use to relate exposure and disease reflects the causal action of exposure in producing the disease is a function of two things: 1) whether the statistic reflects the causal action parameterized by a causal model, and 2) whether the parameterization of causal action in the causal model employed reflects what is happening in the real world. In other words we need our statistics to have a theoretical connection to our parameters of interest and we need those parameters to be embedded in models that capture the essential aspects of the real world with sufficient accuracy. In this chapter we concentrate on point 1, how statistics reflect a particular parameterization of causal action. We leave the issue of evaluating whether a model adequately captures the essence of the phenomenon addressed aside. The constant rate model which we use to address this first issue is so simple that we could say right off that for many practical questions the model will not sufficiently capture the actions we might be interested in. This simple model, however, allows us to see a little more clearly the issues of how epidemiological statistics relate to causal model parameters. The constant rate model may be extremely simplistic and unrealistic but it underlies a great deal of epidemiological thinking. By making its predictions and relationships explicite with numerical solutions to the dynamics it entails, and by making more explicite the assumptions we would have to make about a disease to make this model applicable, we make clearer the need to elaborate this model to make it more realistic.

The statistics we examine are almost always calculated and used by epidemiologists without considering the dynamics of the processes that generated the data from which the statistics are calculated. We, however, examine these statistics dynamically as the disease process unfolds in our model population. The expected relationships of the statistics over time is something that many epidemiologists will not have deeply considered. By forcing ourselves to predict these relationships, by drawing out their dynamic curves before Stella™ generates those curves, we will experience one of the greatest values of modeling. The modeling will provide a framework on which we can clarify our thinking about processes and relationships. By formalizing a numerical solution to the dynamic process we are considering, we will get immediate feedback as to when our thinking is correct and when it is wrong. Moreover we will most likely be forced by the results of our models to consider how processes are working in ways that we have not considered earlier. This is especially true when we add growth and death processes to the population which we have put into 2 by 2 categories.

The 2 by 2 models we will generate will be too simple to provide a basis for constructing testable theory about how disease generating processes determine the population distribution of disease by exposure. But they will be more complete and explicite models than those that sit vaguely behind the uses to which epidemiologists put the statistics we will examine. Just as for the constant rate vs. the simple risk model, the model we will develop will predict many aspects of the realationships we will examine which are ignored by the standard 2 by 2 calculations which epidemiologists make. Our models will not only predict which statistic will be greater, lesser, or equal to which other, they will predict the shape of the curves relating those statistics over time.

While this chapter, like the previous two, must be seen as basically providing preliminary skills before one can begin the process of constructing scientifically useful models, it advances our skill in using models to create a framework on which we can make more explicite what we do and what we do not know. While we will be treating the disease generating processes too globally in our models for the models to make clear what we do not know about the processes we are modeling and what we must do to generate new knowledge, we will begin to see how this process works by generating relationships over time which had not previously considered. This naturally makes us think about what these relationships would be if the unrealistic assumptions of the constant rate model were not being made. It generates a need to add more realistic details to our models as we will do in later chapters.

Parameterizing the effect of exposure

One reason we estimate statistics relating exposure and disease in a population is that we want to assess the effect of a risk factor on the development of a disease process. That risk factor might be a personal behavior, an environmental contamination, a social or administrative factor, or a biologic condition. If we were developing detailed causal models of such diverse risk factors, the models for each type of risk factor would be different. For this chapter, however, we are dealing at a less detailed and more abstract level so that we can treat all types of risk factors the same. The causal effect we will be parameterizing is one which is manifest directly upon processes within the individuals exposed which result in disease. We assume that what happens within one individual has absolutely no effect upon what happens in other individuals. I remind you here that actually there are no individuals in our models, only populations. I commonly write as if our models dealt with individuals because that seems to help our thinking about processes that affect individuals in the real world. But models that actually deal with individuals are of a quite different nature than those presented here.

Our population will be divided into only four segments as determined by dichotomous classifications on exposure and disease. Thus what we assume about the causal effect we will parameterize is that the effect upon the well and exposed segment of the population is not affected by what has happened in the population in the past and how many diseased individuals or unexposed individuals there are.

When we specify the causal action of exposure in this way, we are excluding effects of exposure which alter the transmission of an infectious agent. The type of exposures we will deal with do not increase the susceptibility to infection, they do not increase the degree to which an infectious agent multiplies in an individual and thereby affect the amount of agent an individual will disseminate into the environment, they do not affect the number of contacts an individual makes where transmission might take place if the person contacted were infectious, they do not affect the chances of transmission if a contact with an infectious individual is made, and they do not affect who an individual contacts. We will see later on that infectious disease risk factors which have these effects can generate quite different relationships between the statistics which we will examine.

As stated in the introduction, whether a statistic reflects real world causal effects will be a function of two things: 1) whether the way we formulate the statistic reflects the causal action that is parameterized in our causal model, and 2) whether the parameterization of causal action in our model reflects what is happening in the real world. The rest of this chapter makes urealistically simple abstractions in order to focus on point 1

We begin by parameterizing causal actions very simply. Exposure is assumed to be dichotomous. Each individual (or more accurately each population segment) is assummed to be always exposed or never exposed. The actions of exposures are assummed to be always present and everywhere constant. Disease is also assumed to be dichotomous and there is no recovery from disease. Getting a disease means that an individual is categorized as having that disease for the rest of their lives. Not only are the exposure and the disease presumed to be very simple, the causal effects of the exposure on the disease are also assumed to be very simple. The exposure is presumed to be to multiply the base rate of disease in the unexposed by a constant relative rate as follows:

Exposed Rate = Base Rate * Relative Rate

In the models of exposure effect we will develop, exposure is presumed to have the same multiplicative effect disease rates under all circumstances. We will therefore call this parameterization of exposure effect a Relative Exposure Effect. We could have parameterized the effect of exposure as adding a fixed amount to the disease rate.

Exposed Rate = Base Rate + Rate Difference

We call such an exposure effect an absolute or additive exposure effect. When we only have one class of exposed and unexposed individuals in our model, the choice is arbitrary. In this simple case there is a one to one relationship between the Relative Rate and the Rate Difference given that the Base Rate of disease in the unexposed is fixed.

Rate Difference = Base Rate * Relative Rate - Base Rate.

Rate Difference = Base Rate * (Relative Rate - 1) ......................................Equation (1)

I do not want you to get the idea, however, that the choice between these two different parameterizations is arbitrary. When there are multiple classes of individuals, say some that have high levels of predisposing factors to the effect of the exposure and others that have lower levels, or some individuals that get the disease via a mechanism completely unrelated to the exposure and others that do not, then the choice between these two parameterizations of exposure effect is not arbitrary. The choice for particular exposures between these parametrizations and various alternative ones which we will later examine becomes a central way that we put scientifically useful theory into our models.

Statistics relating exposure and disease

The statistics we will discuss are the rate ratio, risk ratio, prevalence odds ratio, incidence odds ratio, rate difference, risk difference, population attributable risk (PAR) or etiologic fraction, and the population attributable risk percent (PAR%) or attributable risk per cent. We will fix the rate ratio as a time independent parameter of our model. In other words, our models will assume that an exposure multiplies a rate of disease by some constant. That means that we cannot examine the effects of any dynamic process we model upon the rate ratio. There will be no such affects because we have assumed that there are no such effects. Instead we will be examining what the assumption of a constant rate ratio effect implies regarding the behavior of the other parameters, namely the risk ratio, the prevalence odds ratio, the incidence odds ratio, the risk difference, and the population attributable risk. As discussed above in reference to Equation (1), fixing the rate ratio and the rate of disease in the unexposed also fixes the rate difference. Thus just as we will not be examining the effects of dynamic processes on the rate ratio, we will not be examining the effects of the dynamic development of disease upon the rate difference either.

Why examine the dynamics of these statistics?

Let us reiterate. We will examine how the risk ratio, prevalence odds ratio, incidence odds ratio, risk difference, PAR, and PAR% reflect a causal action which we are modeling as a fixed rate ratio. Why do we want to do this? One reason is that if a statistic imperfectly reflects the causal action generated by a rate ratio effect in our models, we should understand the nature of these imperfections in order to better guide decisions about causation when we are presented with a risk ratio, prevalence odds ratio, risk difference or PAR. Other reasons for presenting the material in this chapter are more didactic. Understanding these relationships in this very simple model will provide a basis for understanding the behavior of these common epidemiological statistics in more complex models where their behavior may be more able to stimulate us to make hypotheses about the causal system.

Both of the reasons just presented for pursuing the material in this chapter presume that the basic task of epidemiology is to build ever better and more accurate causal models of how the causes of disease generate patterns of disease in populations. Epidemiologists are not necessarily used to conceiving of their enterprise in this fashion. Epidemiologists are not used to thinking about how statistics relating exposure and disease reflect the parameters of causal models. Epidemiologists are usually satisfied with a statistic if they feel there is some generally monotonic relationship between the value of the statistic and the degree of causal effects. The nature of the quantitative relationships are often ignored. Since epidemiologists are usually preoccupied with separating out causal effects from the effects of sample selection biases, information biases, and confounding, the problem of what is the underlying causal model that they are trying to get at with the statistics that they estimate seems secondary. It should not be. If you took epidemiology 801, I hope that you can appreciate that the only way to conclude that an effect is a causal effect is to make a judgment that data are more likely to be explained by a causal model than by a model of a bias or a confounding variable effect. Epidemiologists who cannot think of the statistics they calculate from data collected in their studies in terms of estimates for the parameters of some underlying causal model have no basis for coming to causal conclusions other than a set of arbitrary rules of thumb like Hill's criteria. My hope is that in this course you acquire some of the basis for making more scientifically sound decisions regarding causal models.

Let me say that again. We should continually try to relate the real world we are observing to causal theories about that world. By doing so we advance both the development and evaluation of new scientific theories. This also provides a basis on which to guide public health policy. Comparing observations to theory is basic to both the scientific process and public health practice. If we don't have explicit causal models in mind when we estimate statistics that relate exposure and disease, we will be missing opportunities to advance scientific understanding and we very well might make quite poor public health decisions.

Since reality is very complex, the relationships of the statistics we will examine to real world causal processes is also very complex. We advance our understanding, however, by beginning with an austere caricature of the causal processes generating disease. We start off dealing with disease as a dichotomous, on-off, state that is generated by constantly acting risk factors whose levels and effects remain constant. There will be no feedback affecting exposure levels or exposure effects. We will assume that there is only a one step process leading to disease and that the exposure affects the relative rate of this process. In our first model we will further assume that our population is a cohort of originally well individuals which we are following in time and that there is no variation in susceptibility to the exposure effect.

Ratio Statistics

Since our causal parameter will be a ratio of rates, we might expect our other ratio statistics, the risk ratio and the odds ratio, to reflect it most closely. But in what ways and under what conditions will they deviate from it? That is something we should know. Since this is such a simple issue, you probably have learned how incidence odds ratios, prevalence odds ratios, rate ratios, and risk ratios for constant rate processes relate to each other. But being able to build a model of these relationships will solidify this knowledge for you and improve your ability to use these relationships.

Let us be clear about our odds ratio measures. The prevalence odds ratio uses the current number of cases (population sizes) in the four exposure and disease categories. We will first be calculating it from a cohort study of a fixed population. Later, when we add vital dynamics to our model population, we will be calculating it from cross sectional studies with relationship to time. The incidence odds only use new cases that occur over a defined period of time for the disease categories. For the sake of convenience we will confine our time period to a single dt step in the numerical solution of our model.

Absolute effect statistics

Risk differences, the PAR or EF, and the PAR% are three different proportion statistics that we use to reflect the amount of disease caused by a risk factor. The numerator of each of these proportions is the number of cases attributable to an exposure under the assumption that the exposed individuals differ from the unexposed individuals only with regard to exposure. The denominator of the risk difference is the number of exposed individuals in the population. The denominator of the PAR is the number of diseased individuals in the population.

The PAR has received a great variety of different names at different times. Levin and Lilienfeld called it the attributable risk (before MacMahon or anyone else began calling the risk difference the attributable risk!). MacMahon and Pugh (1970) called it the Population Attributable Risk. Cole and MacMahon (1971) then called it the Population Attributable Risk Percent. Ouellet et al. (1979) called it the attributable fraction. Miettinen gave it the name Etiologic Fraction, but he also gave other fractions the same appellation. Rothman (1986) called the analogous rate based measure the attributable proportion. Hennekins labels a measure expressing the proportion of both exposed and diseased individuals who have disease attributable to the exposure as the attributable risk percent and then erroneously equates this measure with the PAR or attributable proportion. A more confusing state of affairs does not seem humanly possible. To help in your understanding of this issue, let us derive these measures.

The risk difference expresses the number of exposed individuals with disease minus those exposed diseased individuals that would have gotten disease without exposure divided by the number of exposed individuals. We only have four categories of individuals that we are dealing with in all of these proportions so let us call the exposed diseased ED, the exposed well EW, the unexposed diseased UD, and the unexposed well UW. These four categories will correspond to four different compartments (or in Stella™ terms, stocks) in our model. The risk difference will be labeled "RD" and the population attributable risk "PAR". The population of individuals who have disease attributable to exposure we will define as those who got disease when exposed but would not have gotten disease if they had not been exposed. We will label this theoretical population as AC, the population of attributable cases. AC is a subpopulation of ED. In the real world an exposed case whose disease is attributable to an exposure cannot be distinguished from an exposed case who developed disease not as a result of exposure but as a result of background factors. We will see later that making this distinction between cases the developed from the action of exposure or not is fundamental to construction of useful scientific theory about how exposure actions generate patterns of disease in populations.

For the moment, however, we take the view of these statistics that we get from our 2 by 2 classification of exposure and disease. Let us begin by considering the number of cases attributable to an exposure. We consider that any case that would not have occurred if the exposed individual had not been exposed to be a case attributable to exposure. Note that this is not the same as the number of cases caused by an exposure because a case might have been caused by an exposure but if it had not been caused by the exposure it might have been caused by background factors. The number of attributable cases is thus the number of cases in the exposed population minus the number of cases expected in the absence of exposure.


For the risk difference we have


which is the proportion of exposed individuals who have disease attributable to exposure. If we multiply this risk difference by the exposure fraction and divide by the case fraction we have the PAR or EF.


This same fraction is often derived by first calculating the proportion of the entire population (including both exposed and unexposed) which is attributable to exposure and then dividing by the case fraction.


The proportion that Hennekins (and in some cases Miettinen also) labels as the etiologic fraction is what Rothman calls attributable risk percent. We will adopt Rothman's terminology.

.

When the exposure is being unvaccinated, AR% corresponds to a statistic that is commonly used to assess vaccine efficacy which we will examine in later chapters.

For all the statistics we have enumerated, we will build models that allow us to examine how the relationships we described above change when we introduce different susceptibilities to disease in the population. For the statistics which do not require us to estimate risks, namely the odds ratio and the PAR, we will consider how adding vital dynamics changes the relationships of these statistics to the relative causal effect parameters in our causal models.

A Compartmental Model of Constant Relative Exposure Effect:

We now construct a model of a constant rate disease process where exposure increases this constant rate. We do not include a birth or a death process. A model without a birth or death process could fit two situations: 1) the disease occurs over a short enough time interval so that births and deaths in the interval are negligible, or 2) a cohort of exposed and unexposed individuals is assembled and followed prospectively.

Both exposed and unexposed individuals will be classified into only two states: well and diseased. There will be only one transition between these states that is allowed: that is from well to diseased. We could think of this as exposure randomly affecting individuals in the population of exposed with there being no incubation time between when an exposure has its effect and when the disease develops. There will be no recovery from disease so that one cannot go from the diseased to the well category of individuals. The rate of transition from well to diseased will be higher for the exposed than the unexposed, but it will be constant for all time. Everyone in the population will be equally susceptible. Everyone in the exposed population will be equally exposed. The exposure will have exactly the same effect on everyone in the population.

A STELLA™ model of this situation with our derived statistics is seen in diagram 4.1.

Diagram 4.1

A constant rate of disease model in a cohort study where exposure increases the instantaneous flow rate into the diseased state.


ED(t) = ED(t - dt) + (NewExpDis) * dt ; INIT ED = 0

NewExpDis = EW*RelativeRate*Base_rate

EW(t) = EW(t - dt) + (- NewExpDis) * dt ; INIT EW = 1000

NewExpDis = EW*RelativeRate*Base_rate

UD(t) = UD(t - dt) + (NewUnexpDis) * dt ; INIT UD = 0

NewUnexpDis = UW*Base_rate

UW(t) = UW(t - dt) + (- NewUnexpDis) * dt ; INIT UW = 1000

NewUnexpDis = UW*Base_rate

Base_rate = .005

RelativeRate = 2

PAR or EF = RiskDifference*(EW+ED))/(ED+UD)

AR% = RiskDifference*(ED+EW)/ED

RiskDifference = ExpRisk-UnexpRisk

UnexpRisk = UD/(UD+UW)

ExpRisk = ED/(ED+EW)

RiskRatio = ExpRisk/UnexpRisk

IncidOR = NewExpDis*UW/(NewUnexpDis*EW)

PrevOR = ED*UW/(UD*EW)

Note that we start off with everyone in the well state, consistent with a cohort study. After any period of time the number of exposed individuals in the diseased state divided by all exposed individuals is the risk of disease across that period of time for exposed individuals. The risk ratio and risk difference statistics are thus easy to calculate. The prevalence odds ratio measure is similarly easy to calculate. It is the cross product ratio. Note that in this case where everyone starts off well and once in the diseased state no one recovers and moves back into the well state, the prevalence odds ratio corresponds to the incidence odds ratio over the period from. Usually, however, we would think of an incidence odds ratio as using incident cases over a shorter period of time. In the above model, we use the flow from well to diseased to represent the new cases. The flow is actually expressed per time unit, not per dt. But it is calculated for every dt and any calculated flow only acts for a single dt. Since our calculated odds ratio will have one flow in the numerator and one in the denominator, the denominator units of the flows will cancel out so the ratio of flows will be the actual ratio of flows during the single dt where that ratio is calculated.

The risk difference estimates the ratio of number of attributable cases to the number of exposed individuals at risk. Multiplying this by the number of exposed individuals gives the number of attributable cases and dividing this by the total number of cases gives the PAR or EF. Multiplying the risk difference by the ED and dividing by the total number of cases gives the AR%.

The rate ratio is entered as a parameter in our causal model and therefore we do not have to derive any statistic for it. This should help you understand the difference between a parameter and a statistic. A statistic is something that is calculated from variable (compartment) values, a parameter is an element in a model. Usually we calculate statistics because those statistics reflect parameters of interest to us. In our model, the rate ratio is a parameter and the risk ratio and odds ratios would be potentially observable statistics that reflect that parameter.

Homework C4.1

Given the above model structure, draw the curves you would expect for the following graph. Only after drawing the curves Only after drawing the curves Only after drawing the curves (Don't be a slouch now!), construct and numerically solve the model. Please inform the professor of any differences between what you predicted and what you observed and be able to explain how and why the differences arose.

If you peek at the simulation results first, you lose a lot of the stimulus for careful thinking so put your pencil to this paper before flipping the page. Try to be as accurate as possible. Try to figure out the exact location where the values will start, whether they will go up or down, and whether they will go up or down in straight lines, in convex curves, concave curves, or sigmoid curves. Write out your logic for why you drew what you did before you turn the page! (You learn a lot about the logic of a model by predicting what the model will produce and then seeking to explain any unexpected findings.) Then after flipping the page, write out any differences between the relationships you predicted and STELLA described. Everyone should have had some differences.

Graph 4.1

Exposure effect ratio statistics and parameters for a constant rate and constant exposure effect model


Homework C4.2

Given the above model structure, draw the curves you would expect for the following graph. Only after drawing the curves Only after drawing the curves Only after drawing the curves (Don't be a slouch now!), construct and numerically solve the model. Please inform the professor of any differences between what you predicted and what you observed and be able to explain how and why the differences arose.

Graph 4.2

Quantiative exposure effect statistics for a constant rate and constant exposure effect model


Numerical solutions of ratio statistics and parameters

In the first instant of the simulation, the disease is rare so the OR and Risk Ratio have the same value. (I trust your education in your introductory epidemiology course was adequate on this and so do not go into great detail. Make sure, however, you understand fully the logic as to why this should be the case.) Subsequently the prevalence OR will have a greater value than the Risk Ratio since the diseased category does not go into the denominator of the odds ratio as it does for the risk ratio. In fact, risk ratios always have a capped upper limit which is the inverse of the risk in the unexposed. This limit is achieved when all of the exposed are ill. For example if the risk in the unexposed is 0.2, then the maximum risk ratio when all of the exposed are diseased is 5. ORs on the other hand, have no upper limit. The risk ratio will fall over time because the number of new cases in the exposed per time unit will fall faster in the exposed than the unexposed. That is because there are fewer individuals for the disease rate to be acting on over time since their higher disease rate has drained out the the well individuals faster.

To examine whether a function is increasing or decreasing over time, one could examine its first derivative. We, however, are trying to avoid calculus in this course, so see if considering the issue as follows can clarify the direction of change. The prevalence OR will rise because the odds of disease in the exposed will rise faster than the odds of disease in the unexposed. To see the inevitability of this, we can compare the OR at one instant with the OR an instant before. Our instant will be "dt".

Let "a" be the rate of disease development in the unexposed. Let "b" equal our exposure effect or relative rate parameter. Then if our prevalence OR at "t" is

prevalence OR = ,

our OR at "t+dt" will be

OR = .

The flows from the well to the diseased in both the exposed and unexposed individuals are added and subtracted appropriately from the values at "t" to get the values at "t+dt". If the odds ratio is rising over time, we have:

....................................Equation (2)

Inspecting the relationships between these two prevalence ORs should convince you that the relationships in inequality equation (2) are correct. The numerator of the numerator has a relative change that is greater than the relative change in the numerator of the denominator. That will increase the OR. Likewise the denominator of the numerator has a relative change that is greater than the denominator of the denominator. That will also increase the OR.

We can go further than just saying that the odds ratio increases over time. We can predict that it will increase faster over time so that the curve of its increase will be convex. To examine whether a function is concave or convex, we would examine the second derivative of the function. There are various ways one might examine relationships at incremental time steps to determine if the differences across time steps are increasing with each time step or decreasing with each time step. While doing the relevant algebra would be a good practice, my experience is that students don't gain much from this rather complex exercize so we will not pursue this issue. Just let us state that the second derivatire of the prevalence OR is positive so that the curve is convex (forms a cup) instead of concave (forms a cave).

The risk ratio will fall by decreasing amounts over time so that it to is convex.

The incidence OR as we have calculated it will be precisely the same as the relative rate parameter. That is because we have defined both the numerator and denominators of this ratio as the flow over the amount from which the flow is arising. The numerator and denominator of the incidence OR are thus the rate in the exposed and the rate in the unexposed. Over 40 time period relationships are as follows:

Graph 4.2 (filled in)

Exposure effect ratio statistics and parameters for a constant rate and constant exposure effect model


Over 400 time periods we see the following:


At the end of 400 time periods more than 86.5% of unexposed and 98% of exposed individuals are diseased.

Numerical solution of the quantitative effect parameters

The PAR will be falling for the same reason that the RR is falling. At the start, cases among the exposed are generated at two times the rate that cases are generated among the unexposed. Given that the relative rate is "a", at the start "a" times as many cases are produced in the exposed than in the unexposed. One out of "a", however, is attributed to the same causes as those affecting the unexposed so that cases at the start are attributed to exposure. As more and more of the unexposed become ill, fewer of the exposed cases are now attributable to exposure so the PAR drops.

Homework C4.3

Make sure you go over the relationships in the following graph carefully in your mind and play with the algebra behind them so that you have a thorough understanding of why you get the shape of the parameters in the graph. It will help you to build the simulation that gives these results and then examine the results in a table of all the pertinent entities, including all of the individual odds, risks, rates and compartment sizes. Being able to formalize the details of a process and look at what is happening to each entity separately can be a great aid to carefull and logical thinking.

(To be handed in). Explain why the curves for the different statistics in Graph 4.2 have the shapes that they do.

Graph 4.2 (filled in)

As an example of graphs or tables that might help you interpret graph 4.2, we include graph 4.3.

Graph4.3 (filled in)


Note that one of the most important uses of modeling is to help you think more clearly about a problem. You very often cannot predict what the behavior of your model is going to be because the situation you are modeling is difficult to analyze. When you see the behavior, and note something that you might not have expected, then you are forced to reexamine your model, reexamine the behavior of all the elements of your model, and clarify your understanding of what is happening. The key to using simulation models to advance your understanding is to always be asking the question: "Why did it do that?" If you haven't put something terribly complex and impenetrable into your model, you should be quite often able to answer that question and advance your understanding. When you can't, then at least you have a healthy reminder of how little you know. Don't be discouraged by your inability to predict model behavior. If you could always predict model behavior accurately you wouldn't need a tool like STELLA® and you wouldn't need a course like this.

Cases attributable to an exposure and cases caused by an exposure

To clarify the difference between cases attributable to an exposure and cases caused by an exposure, we now build a model where we separate out the cases caused by an exposure. In some unusual situations there might be a biomolecular genetic trail left by a cause of a cancer so that a biomarker of the causal action could be detected. In that case we would want to have a model that separated out these cases. In fact, if we had such a marker, we would probably want to designate cases with the marker as representing a different disease from cases without the marker. Since the disease would have different causes and different ways of preventing it, failing to distinguish the biomarked cases from those not biomarked would just dilute any examination for the effects of causal or preventive factors. In a later chapter we will examine this issue in more detail. For now just let us mention that any model with two separate flows into the disease category is called a model of simple independent action. We say that the effects of the exposure risk factor and the background risk factors have joint effect relationships described by the model of simple independent action.

The major value of constructing such a model now, however, is just to clarify our thinking about what the attributable risk statistics we have calculated above represent. By clarifying the model that underlies these statistics, we will clarify the assumptions that are needed to use these statistics in any predictive fashion.

In Diagram 4.2 we have separated out the cases caused by the exposure in our model. This is just like the model in Diagram 4.1 except that we are now distinguishing two distinct categories of individuals in the ED category. Note that in this model the sum of the two flows out of EW exactly equals the flow out of EW in the model represented by Diagram 4.1. Dividing a flow in a Stella™ model does not affect the total volume of the flow. Stella™ does not calculate one flow out of a compartment and then calculate the other flow based upon what is left. It calculates both flows using compartment values at the beginning of the dt.

Likewise in a differential equation, dividing an outflow term into two separate inflows does not change the total flow. Here, however, the reasons are a little different. In a differential equation the reason that draining a compartment from which another flow is occuring has no effect on the value of the other flow is that on an instantaneous basis, any draining of the compartment is infinitesimally small so that it has no effect.

Diagram 4.2

A constant rate of disease model in a cohort study where exposure increases the instantaneous flow rate into the diseased state and where cases caused by the exposure are distinguished from cases caused by background factors

Background_ED(t) = Background_ED(t - dt) + (NewBackExpDis) * dt

INIT Background_ED = 0

NewBackExpDis = EW_2*Base_rate

EW_2(t) = EW_2(t - dt) + (- NewBackExpDis - NewExpCausedDis) * dt

INIT EW_2 = 1000

NewBackExpDis = EW_2*Base_rate

NewExpCausedDis = EW_2*Base_rate*(RelativeRate-1)

ExpCaused_ED(t) = ExpCaused_ED(t - dt) + (NewExpCausedDis) * dt

INIT ExpCaused_ED = 0

NewExpCausedDis = EW_2*Base_rate*(RelativeRate-1)

UD_2(t) = UD_2(t - dt) + (NewUnexpDis_2) * dt

INIT UD_2 = 0

NewUnexpDis_2 = UW_2*Base_rate

UW_2(t) = UW_2(t - dt) + (- NewUnexpDis_2) * dt

INIT UW_2 = 1000

NewUnexpDis_2 = UW_2*Base_rate

AttributableCases = (Background_ED+ExpCaused_ED)-(Background_ED+EW_2+ExpCaused_ED)*(UD_2/(UD_2+UW_2))

Base_rate = .005

RelativeRate = 2

Graph 4.4

Comparison of cases attributable to an exposurre and cases caused by an exposure for a constant exposure and exposure effect model.


Homework C4.4

Confirm that the sum of the two exposed and diseased compartments in model 4.2 equals the single exposed and diseased compartment in model 4.1 by making a derived variable for the sum and then examining the two values in a table. Use the free floating decimal point option to insure that you have a more exact comparison. You will have to have model 4.1 and 4.2 on the same sheet. To construct model 4.2, you can copy and past model 4.1, bomb the parts you don't need and continue with the reconstruction.

Homework C4.5 (Hand in)

Explain why the curves in graph 4.4 have the relationships that they do. Use other graphs or tables generated by numerical solution of the model to help you with your explanation. Comment upon the relationships between the two curves when the disease is rare.

Modeling cases attributable to an exposure

The difference between cases caused by an exposure and cases attributable to an exposure can be seen more clearly if we develop a model which has a separate compartment for cases attributable to an exposure (AC in the calculations presented earlier). Exposed and diseased individuals who had their disease caused by the exposure effect may not have their disease attributable to exposure, even though their disease was caused by exposure, if it would have been susequently caused by background factors anyway. We can model this easily enough just by having an outflow from the cases caused by the exposure that occurs at the rate of disease caused by background factors. The model form is seen if diagram 4.3.

Diagram 4.3

A constant rate of disease model in a cohort study where exposure increases the instantaneous flow rate into the diseased state and where cases attributable to the exposure are distinguished from cases caused by background factors and from cases caused by the exposure but that would have been caused by background factors if the exposure had not acted first.

Homework C4.6

Run model 4.1 and 4.3 on the same sheet to confirm that AC over all of the exposed individuals equals the risk difference, that AC over all of the diseased individuals equals the PAR, and that AC over the exposed and diseased individuals equals the PAR%.

Epidemiologic parameters in open populations with births and deaths:

Many epidemiologic studies are not of cohorts without births and deaths. Many times we make cross sectional observations of dynamic populations with births and deaths. From such studies we cannot estimate risks and therefore cannot estimate risk ratios or risk differences. But we can estimate the prevalence odds or the incidence odds. What interpretation can we give to the prevalence odds in terms of rate ratios or risk ratios? The answer to this question for populations at equilibrium was presented to the epidemiologic community by Olli Miettinen in the early 70s. Most doctoral students will know that the prevalence odds ratio in a population at equilibrium in regards to exposure, disease, and total size is the same as the rate ratio as long as exposure does not have a differential effect on duration of disease. Most will not be able to explain well why this is the case, however. Few will be able to describe well how deviations from the equilibrium assumptions will affect the relationships between the odds ratio and the rate ratio. We explore these issues here.

We build the model in diagram 4.3 where new individuals come into the population only in the well state. We choose our birth rate into this well state to be equal the death rate out of either the well or the disease state so that disease does not affect death rate or population size. (Remember that this gives us a precisely balanced equilibrium but if birth or death rates should change, we would get explosive growth or collapse of our population unless both change by precisely the same amount.) We set the initial values in the population at their equilibrium values given no effect of exposure on increasing the disease rate. We run the simulation for five time units just to confirm that everything is at equilibrium and then we start the exposure effect. This allows us to see how a rising disease rate affects the relationship between the prevalence odds ratio and the rate ratio.

Homework C4.7

Explain the logic we used to enter the initial equilibrium values for the exposed and unexposed diseased individuals and derive the formulas that were entered.

Diagram 4.3

ED(t) = ED(t - dt) + (New_Exp_Cases - EDdeaths) * dt

INIT ED = UW*BaseRate/Birth_Death_Rate

New_Exp_Cases = EW*BaseRate*RelativeRate

EDdeaths = ED*Birth_Death_Rate

EW(t) = EW(t - dt) + (Exp_Births - New_Exp_Cases - EWdeaths) * dt

INIT EW = 1000

Exp_Births = (EW+ED)*Birth_Death_Rate

New_Exp_Cases = EW*BaseRate*RelativeRate

EWdeaths = EW*Birth_Death_Rate

UD(t) = UD(t - dt) + (New_Well_Cases - UDdeaths) * dt

INIT UD = UW*BaseRate/Birth_Death_Rate

New_Well_Cases = UW*BaseRate

UDdeaths = UD*Birth_Death_Rate

UW(t) = UW(t - dt) + (UnExp_Births - New_Well_Cases - UWdeaths) * dt

INIT UW = 1000

UnExp_Births = (UW+UD)*Birth_Death_Rate

New_Well_Cases = UW*BaseRate

UWdeaths = UW*Birth_Death_Rate

BaseRate = 0.005

Birth_Death_Rate = 0.02

RelativeRate = If time <5 then 0.0 else 2.0

Prevalence_OR = ED*UW/(UD*EW)

Homework C4.7

Predict the pattern that the prevalence odds ratio will follow. It should start out at one since we begin the simulation with no exposure effect. Fill in the following graph before looking ahead at what the simulation produced. Then explain the model behavior and the reasons for any difference between what subsequent simulation showed model behavior to be and what you predicted. Use the model to play around looking at different graphs and tables to help your explanation. Try to make hypotheses about why an observed pattern is appearing and then try to think about model output that would help you to evaluate those hypotheses.

Graph 4.4

Prevalence Odds Ratio statistic and Relative Rate Parameters in a model of constant disease effect from the onset of disease effect to equilibrium


The Prevalence OR at equilibrium when exposure is having an effect can be derived by determining the equilibrium value for each stock algebraically. One way we do that is by setting the inflows and outflows to a stock equal. Doing that for two of the four stocks used in calculating the prevalence OR allows us to demonstrate that the prevalence OR in a population at equilibrium equals the rate ratio. We do that below:

Stock ED :


Stock UD :




One way to think about this equality of the Prevalence OR and the Rate Ratio is to consider that if we were following a cohort, the OR would progressively rise above the Rate Ratio as we saw earlier. But new individuals are being fed into the Well stocks. Relatively more individuals are being fed into the exposed well stock as compared to the unexposed well stock because more individuals are being drained away into disease in the exposed population. This greater relative increase in EW as compared to UW drives up the denominator and compensates for the decrease that would take place from the relatively greater inflow into ED as compared to UD.

When exposure effect is newly introduced, many people in the exposed categories have been there for some time without experiencing exposure effects. They dilute out the OR. The OR gradually rises as fewer such individuals remain. The patterns produced by this simulation are seen in graph 4.4 (filled in). Note that the odds ratio is constant for the first five time units.

Graph 4.4 (filled in)

Prevalence Odds Ratio statistic and Relative Rate Parameters in a model of constant disease effect from the onset of disease effect to equilibrium


The following graphic output might help explain why the prevalence odds ratio rises asymptotically to the relative rate after the exposure effect begins.

Graph 4.5


Graph 4.6


Review questions:

1 In a fixed cohort where all of the population is equally susceptible and the rates of disease in the exposed and the unexposed stay constant,

a what will be the pattern of the risk ratio over time? Why will it have that pattern?

b what will be the pattern of the risk difference over time? Why will it have that pattern?

c what will be the pattern of the rate difference over time? Why will it have that pattern?

d what will be the pattern of the prevalence odds ratio over time? Why will it have that pattern?

e what will be the pattern of the PAR over time? Why will it have that pattern?

f what will be the pattern of the AR% over time? Why will it have that pattern?

2 When an open population with births and deaths and homogeneous susceptibility is at equilibrium with regard to population size and disease frequency, the prevalence odds ratio equals the rate ratio. Explain why this is so.

3 How could simulation models like those presented in this chapter help in the process of generating hypotheses about what is causing disease in the real world?