Our only disease model so far, the constant rate model, did not
specify exposure status. In this chapter we do so. We start out
in the simple two by two world of exposure and disease relationships
which has been a beacon of clarity for developing epidemiological
thinking. By duplicating our constant rate model and calling one
group exposed and another unexposed, we generate the four categories
of population that allow us to calculate prevalence odds ratios,
incidence odds ratios, rate ratios, risk ratios, risk differences,
rate differences, and population attributable risks. Note that
these statistics are often treated as model parameters by epidemiologists
when they use them to make scientific inferences or to project
the population effects of changing exposure. Epidemiologists rarely
make very explicite, however, the models in which the parameters
they are estimating are found. We relate these statistics to model
parameters for very simple models.
Whether a statistic which we use to relate exposure and disease
reflects the causal action of exposure in producing the disease
is a function of two things: 1) whether the statistic reflects
the causal action parameterized by a causal model, and 2) whether
the parameterization of causal action in the causal model employed
reflects what is happening in the real world. In other words we
need our statistics to have a theoretical connection to our parameters
of interest and we need those parameters to be embedded in models
that capture the essential aspects of the real world with sufficient
accuracy. In this chapter we concentrate on point 1, how statistics
reflect a particular parameterization of causal action. We leave
the issue of evaluating whether a model adequately captures the
essence of the phenomenon addressed aside. The constant rate model
which we use to address this first issue is so simple that we
could say right off that for many practical questions the model
will not sufficiently capture the actions we might be interested
in. This simple model, however, allows us to see a little more
clearly the issues of how epidemiological statistics relate to
causal model parameters. The constant rate model may be extremely
simplistic and unrealistic but it underlies a great deal of epidemiological
thinking. By making its predictions and relationships explicite
with numerical solutions to the dynamics it entails, and by making
more explicite the assumptions we would have to make about a disease
to make this model applicable, we make clearer the need to elaborate
this model to make it more realistic.
The statistics we examine are almost always calculated and used
by epidemiologists without considering the dynamics of the processes
that generated the data from which the statistics are calculated.
We, however, examine these statistics dynamically as the disease
process unfolds in our model population. The expected relationships
of the statistics over time is something that many epidemiologists
will not have deeply considered. By forcing ourselves to predict
these relationships, by drawing out their dynamic curves before
Stella generates those curves, we will experience one of
the greatest values of modeling. The modeling will provide a framework
on which we can clarify our thinking about processes and relationships.
By formalizing a numerical solution to the dynamic process we
are considering, we will get immediate feedback as to when our
thinking is correct and when it is wrong. Moreover we will most
likely be forced by the results of our models to consider how
processes are working in ways that we have not considered earlier.
This is especially true when we add growth and death processes
to the population which we have put into 2 by 2 categories.
The 2 by 2 models we will generate will be too simple to provide
a basis for constructing testable theory about how disease generating
processes determine the population distribution of disease by
exposure. But they will be more complete and explicite models
than those that sit vaguely behind the uses to which epidemiologists
put the statistics we will examine. Just as for the constant rate
vs. the simple risk model, the model we will develop will predict
many aspects of the realationships we will examine which are ignored
by the standard 2 by 2 calculations which epidemiologists make.
Our models will not only predict which statistic will be greater,
lesser, or equal to which other, they will predict the shape of
the curves relating those statistics over time.
While this chapter, like the previous two, must be seen as basically
providing preliminary skills before one can begin the process
of constructing scientifically useful models, it advances our
skill in using models to create a framework on which we can make
more explicite what we do and what we do not know. While we will
be treating the disease generating processes too globally in our
models for the models to make clear what we do not know about
the processes we are modeling and what we must do to generate
new knowledge, we will begin to see how this process works by
generating relationships over time which had not previously considered.
This naturally makes us think about what these relationships would
be if the unrealistic assumptions of the constant rate model were
not being made. It generates a need to add more realistic details
to our models as we will do in later chapters.
One reason we estimate statistics relating exposure and disease
in a population is that we want to assess the effect of a risk
factor on the development of a disease process. That risk factor
might be a personal behavior, an environmental contamination,
a social or administrative factor, or a biologic condition. If
we were developing detailed causal models of such diverse risk
factors, the models for each type of risk factor would be different.
For this chapter, however, we are dealing at a less detailed and
more abstract level so that we can treat all types of risk factors
the same. The causal effect we will be parameterizing is one which
is manifest directly upon processes within the individuals exposed
which result in disease. We assume that what happens within one
individual has absolutely no effect upon what happens in other
individuals. I remind you here that actually there are no individuals
in our models, only populations. I commonly write as if our models
dealt with individuals because that seems to help our thinking
about processes that affect individuals in the real world. But
models that actually deal with individuals are of a quite different
nature than those presented here.
Our population will be divided into only four segments as determined
by dichotomous classifications on exposure and disease. Thus what
we assume about the causal effect we will parameterize is that
the effect upon the well and exposed segment of the population
is not affected by what has happened in the population in the
past and how many diseased individuals or unexposed individuals
there are.
When we specify the causal action of exposure in this way, we
are excluding effects of exposure which alter the transmission
of an infectious agent. The type of exposures we will deal with
do not increase the susceptibility to infection, they do not increase
the degree to which an infectious agent multiplies in an individual
and thereby affect the amount of agent an individual will disseminate
into the environment, they do not affect the number of contacts
an individual makes where transmission might take place if the
person contacted were infectious, they do not affect the chances
of transmission if a contact with an infectious individual is
made, and they do not affect who an individual contacts. We will
see later on that infectious disease risk factors which have these
effects can generate quite different relationships between the
statistics which we will examine.
As stated in the introduction, whether a statistic reflects real
world causal effects will be a function of two things: 1) whether
the way we formulate the statistic reflects the causal action
that is parameterized in our causal model, and 2) whether the
parameterization of causal action in our model reflects what is
happening in the real world. The rest of this chapter makes urealistically
simple abstractions in order to focus on point 1
We begin by parameterizing causal actions very simply. Exposure
is assumed to be dichotomous. Each individual (or more accurately
each population segment) is assummed to be always exposed or never
exposed. The actions of exposures are assummed to be always present
and everywhere constant. Disease is also assumed to be dichotomous
and there is no recovery from disease. Getting a disease means
that an individual is categorized as having that disease for the
rest of their lives. Not only are the exposure and the disease
presumed to be very simple, the causal effects of the exposure
on the disease are also assumed to be very simple. The exposure
is presumed to be to multiply the base rate of disease in the
unexposed by a constant relative rate as follows:
In the models of exposure effect we will develop, exposure is
presumed to have the same multiplicative effect disease rates
under all circumstances. We will therefore call this parameterization
of exposure effect a Relative Exposure Effect. We could have parameterized
the effect of exposure as adding a fixed amount to the disease
rate.
Exposed Rate = Base Rate + Rate Difference
We call such an exposure effect an absolute or additive exposure
effect. When we only have one class of exposed and unexposed individuals
in our model, the choice is arbitrary. In this simple case there
is a one to one relationship between the Relative Rate and the
Rate Difference given that the Base Rate of disease in the unexposed
is fixed.
Rate Difference = Base Rate * Relative Rate - Base Rate.
Rate Difference = Base Rate * (Relative Rate - 1) ......................................Equation
(1)
I do not want you to get the idea, however, that the choice between
these two different parameterizations is arbitrary. When there
are multiple classes of individuals, say some that have high levels
of predisposing factors to the effect of the exposure and others
that have lower levels, or some individuals that get the disease
via a mechanism completely unrelated to the exposure and others
that do not, then the choice between these two parameterizations
of exposure effect is not arbitrary. The choice for particular
exposures between these parametrizations and various alternative
ones which we will later examine becomes a central way that we
put scientifically useful theory into our models.
The statistics we will discuss are the rate ratio, risk ratio,
prevalence odds ratio, incidence odds ratio, rate difference,
risk difference, population attributable risk (PAR) or etiologic
fraction, and the population attributable risk percent (PAR%)
or attributable risk per cent. We will fix the rate ratio as a
time independent parameter of our model. In other words, our models
will assume that an exposure multiplies a rate of disease by some
constant. That means that we cannot examine the effects of any
dynamic process we model upon the rate ratio. There will be no
such affects because we have assumed that there are no such effects.
Instead we will be examining what the assumption of a constant
rate ratio effect implies regarding the behavior of the other
parameters, namely the risk ratio, the prevalence odds ratio,
the incidence odds ratio, the risk difference, and the population
attributable risk. As discussed above in reference to Equation
(1), fixing the rate ratio and the rate of disease in the unexposed
also fixes the rate difference. Thus just as we will not be examining
the effects of dynamic processes on the rate ratio, we will not
be examining the effects of the dynamic development of disease
upon the rate difference either.
Let us reiterate. We will examine how the risk ratio, prevalence
odds ratio, incidence odds ratio, risk difference, PAR, and PAR%
reflect a causal action which we are modeling as a fixed rate
ratio. Why do we want to do this? One reason is that if a statistic
imperfectly reflects the causal action generated by a rate ratio
effect in our models, we should understand the nature of these
imperfections in order to better guide decisions about causation
when we are presented with a risk ratio, prevalence odds ratio,
risk difference or PAR. Other reasons for presenting the material
in this chapter are more didactic. Understanding these relationships
in this very simple model will provide a basis for understanding
the behavior of these common epidemiological statistics in more
complex models where their behavior may be more able to stimulate
us to make hypotheses about the causal system.
Both of the reasons just presented for pursuing the material in
this chapter presume that the basic task of epidemiology is to
build ever better and more accurate causal models of how the causes
of disease generate patterns of disease in populations. Epidemiologists
are not necessarily used to conceiving of their enterprise in
this fashion. Epidemiologists are not used to thinking about how
statistics relating exposure and disease reflect the parameters
of causal models. Epidemiologists are usually satisfied with a
statistic if they feel there is some generally monotonic relationship
between the value of the statistic and the degree of causal effects.
The nature of the quantitative relationships are often ignored.
Since epidemiologists are usually preoccupied with separating
out causal effects from the effects of sample selection biases,
information biases, and confounding, the problem of what is the
underlying causal model that they are trying to get at with the
statistics that they estimate seems secondary. It should not be.
If you took epidemiology 801, I hope that you can appreciate that
the only way to conclude that an effect is a causal effect is
to make a judgment that data are more likely to be explained by
a causal model than by a model of a bias or a confounding variable
effect. Epidemiologists who cannot think of the statistics they
calculate from data collected in their studies in terms of estimates
for the parameters of some underlying causal model have no basis
for coming to causal conclusions other than a set of arbitrary
rules of thumb like Hill's criteria. My hope is that in this course
you acquire some of the basis for making more scientifically sound
decisions regarding causal models.
Let me say that again. We should continually try to relate the
real world we are observing to causal theories about that world.
By doing so we advance both the development and evaluation of
new scientific theories. This also provides a basis on which to
guide public health policy. Comparing observations to theory is
basic to both the scientific process and public health practice.
If we don't have explicit causal models in mind when we estimate
statistics that relate exposure and disease, we will be missing
opportunities to advance scientific understanding and we very
well might make quite poor public health decisions.
Since reality is very complex, the relationships of the statistics
we will examine to real world causal processes is also very complex.
We advance our understanding, however, by beginning with an austere
caricature of the causal processes generating disease. We start
off dealing with disease as a dichotomous, on-off, state that
is generated by constantly acting risk factors whose levels and
effects remain constant. There will be no feedback affecting exposure
levels or exposure effects. We will assume that there is only
a one step process leading to disease and that the exposure affects
the relative rate of this process. In our first model we will
further assume that our population is a cohort of originally well
individuals which we are following in time and that there is no
variation in susceptibility to the exposure effect.
Since our causal parameter will be a ratio of rates, we might
expect our other ratio statistics, the risk ratio and the odds
ratio, to reflect it most closely. But in what ways and under
what conditions will they deviate from it? That is something we
should know. Since this is such a simple issue, you probably have
learned how incidence odds ratios, prevalence odds ratios, rate
ratios, and risk ratios for constant rate processes relate to
each other. But being able to build a model of these relationships
will solidify this knowledge for you and improve your ability
to use these relationships.
Let us be clear about our odds ratio measures. The prevalence
odds ratio uses the current number of cases (population sizes)
in the four exposure and disease categories. We will first be
calculating it from a cohort study of a fixed population. Later,
when we add vital dynamics to our model population, we will be
calculating it from cross sectional studies with relationship
to time. The incidence odds only use new cases that occur over
a defined period of time for the disease categories. For the sake
of convenience we will confine our time period to a single dt
step in the numerical solution of our model.
Risk differences, the PAR or EF, and the PAR% are three different
proportion statistics that we use to reflect the amount of disease
caused by a risk factor. The numerator of each of these proportions
is the number of cases attributable to an exposure under the assumption
that the exposed individuals differ from the unexposed individuals
only with regard to exposure. The denominator of the risk difference
is the number of exposed individuals in the population. The denominator
of the PAR is the number of diseased individuals in the population.
The PAR has received a great variety of different names at different
times. Levin and Lilienfeld called it the attributable risk (before
MacMahon or anyone else began calling the risk difference the
attributable risk!). MacMahon and Pugh (1970) called it the Population
Attributable Risk. Cole and MacMahon (1971) then called it the
Population Attributable Risk Percent. Ouellet et al. (1979) called
it the attributable fraction. Miettinen gave it the name Etiologic
Fraction, but he also gave other fractions the same appellation.
Rothman (1986) called the analogous rate based measure the attributable
proportion. Hennekins labels a measure expressing the proportion
of both exposed and diseased individuals who have disease attributable
to the exposure as the attributable risk percent and then erroneously
equates this measure with the PAR or attributable proportion.
A more confusing state of affairs does not seem humanly possible.
To help in your understanding of this issue, let us derive these
measures.
The risk difference expresses the number of exposed individuals
with disease minus those exposed diseased individuals that would
have gotten disease without exposure divided by the number of
exposed individuals. We only have four categories of individuals
that we are dealing with in all of these proportions so let us
call the exposed diseased ED, the exposed well EW, the unexposed
diseased UD, and the unexposed well UW. These four categories
will correspond to four different compartments (or in Stella
terms, stocks) in our model. The risk difference will be labeled
"RD" and the population attributable risk "PAR".
The population of individuals who have disease attributable to
exposure we will define as those who got disease when exposed
but would not have gotten disease if they had not been exposed.
We will label this theoretical population as AC, the population
of attributable cases. AC is a subpopulation of ED. In the real
world an exposed case whose disease is attributable to an exposure
cannot be distinguished from an exposed case who developed disease
not as a result of exposure but as a result of background factors.
We will see later that making this distinction between cases the
developed from the action of exposure or not is fundamental to
construction of useful scientific theory about how exposure actions
generate patterns of disease in populations.
For the moment, however, we take the view of these statistics
that we get from our 2 by 2 classification of exposure and disease.
Let us begin by considering the number of cases attributable to
an exposure. We consider that any case that would not have occurred
if the exposed individual had not been exposed to be a case attributable
to exposure. Note that this is not the same as the number of cases
caused by an exposure because a case might have been caused by
an exposure but if it had not been caused by the exposure it might
have been caused by background factors. The number of attributable
cases is thus the number of cases in the exposed population minus
the number of cases expected in the absence of exposure.

For the risk difference we have
which is the proportion of exposed individuals who have disease
attributable to exposure. If we multiply this risk difference
by the exposure fraction and divide by the case fraction we have
the PAR or EF.

This same fraction is often derived by first calculating the proportion
of the entire population (including both exposed and unexposed)
which is attributable to exposure and then dividing by the case
fraction.

The proportion that Hennekins (and in some cases Miettinen also)
labels as the etiologic fraction is what Rothman calls attributable
risk percent. We will adopt Rothman's terminology.
.
When the exposure is being unvaccinated, AR% corresponds to a
statistic that is commonly used to assess vaccine efficacy which
we will examine in later chapters.
For all the statistics we have enumerated, we will build models
that allow us to examine how the relationships we described above
change when we introduce different susceptibilities to disease
in the population. For the statistics which do not require us
to estimate risks, namely the odds ratio and the PAR, we will
consider how adding vital dynamics changes the relationships of
these statistics to the relative causal effect parameters in our
causal models.
We now construct a model of a constant rate disease process where
exposure increases this constant rate. We do not include a birth
or a death process. A model without a birth or death process could
fit two situations: 1) the disease occurs over a short enough
time interval so that births and deaths in the interval are negligible,
or 2) a cohort of exposed and unexposed individuals is assembled
and followed prospectively.
Both exposed and unexposed individuals will be classified into
only two states: well and diseased. There will be only one transition
between these states that is allowed: that is from well to diseased.
We could think of this as exposure randomly affecting individuals
in the population of exposed with there being no incubation time
between when an exposure has its effect and when the disease develops.
There will be no recovery from disease so that one cannot go from
the diseased to the well category of individuals. The rate of
transition from well to diseased will be higher for the exposed
than the unexposed, but it will be constant for all time. Everyone
in the population will be equally susceptible. Everyone in the
exposed population will be equally exposed. The exposure will
have exactly the same effect on everyone in the population.
A STELLA model of this situation with our derived statistics
is seen in diagram 4.1.

ED(t) = ED(t - dt) + (NewExpDis) * dt ; INIT ED = 0
NewExpDis = EW*RelativeRate*Base_rate
EW(t) = EW(t - dt) + (- NewExpDis) * dt ; INIT EW = 1000
NewExpDis = EW*RelativeRate*Base_rate
UD(t) = UD(t - dt) + (NewUnexpDis) * dt ; INIT UD = 0
NewUnexpDis = UW*Base_rate
UW(t) = UW(t - dt) + (- NewUnexpDis) * dt ; INIT UW = 1000
NewUnexpDis = UW*Base_rate
Base_rate = .005
RelativeRate = 2
PAR or EF = RiskDifference*(EW+ED))/(ED+UD)
AR% = RiskDifference*(ED+EW)/ED
RiskDifference = ExpRisk-UnexpRisk
UnexpRisk = UD/(UD+UW)
ExpRisk = ED/(ED+EW)
RiskRatio = ExpRisk/UnexpRisk
IncidOR = NewExpDis*UW/(NewUnexpDis*EW)
PrevOR = ED*UW/(UD*EW)
Note that we start off with everyone in the well state, consistent
with a cohort study. After any period of time the number of exposed
individuals in the diseased state divided by all exposed individuals
is the risk of disease across that period of time for exposed
individuals. The risk ratio and risk difference statistics are
thus easy to calculate. The prevalence odds ratio measure is similarly
easy to calculate. It is the cross product ratio. Note that in
this case where everyone starts off well and once in the diseased
state no one recovers and moves back into the well state, the
prevalence odds ratio corresponds to the incidence odds ratio
over the period from. Usually, however, we would think of an incidence
odds ratio as using incident cases over a shorter period of time.
In the above model, we use the flow from well to diseased to represent
the new cases. The flow is actually expressed per time unit, not
per dt. But it is calculated for every dt and any calculated flow
only acts for a single dt. Since our calculated odds ratio will
have one flow in the numerator and one in the denominator, the
denominator units of the flows will cancel out so the ratio of
flows will be the actual ratio of flows during the single dt where
that ratio is calculated.
The risk difference estimates the ratio of number of attributable
cases to the number of exposed individuals at risk. Multiplying
this by the number of exposed individuals gives the number of
attributable cases and dividing this by the total number of cases
gives the PAR or EF. Multiplying the risk difference by the ED
and dividing by the total number of cases gives the AR%.
The rate ratio is entered as a parameter in our causal model and
therefore we do not have to derive any statistic for it. This
should help you understand the difference between a parameter
and a statistic. A statistic is something that is calculated from
variable (compartment) values, a parameter is an element in a
model. Usually we calculate statistics because those statistics
reflect parameters of interest to us. In our model, the rate ratio
is a parameter and the risk ratio and odds ratios would be potentially
observable statistics that reflect that parameter.
Given the above model structure, draw the
curves you would expect for the following graph. Only after drawing
the curves Only after drawing the curves Only
after drawing the curves (Don't be a slouch now!),
construct and numerically solve the model. Please inform the professor
of any differences between what you predicted and what you observed
and be able to explain how and why the differences arose.
If you peek at the simulation results first,
you lose a lot of the stimulus for careful thinking so put your
pencil to this paper before flipping the page. Try to be as accurate
as possible. Try to figure out the exact location where the values
will start, whether they will go up or down, and whether they
will go up or down in straight lines, in convex curves, concave
curves, or sigmoid curves. Write out your logic for why you drew
what you did before you turn the page! (You learn a lot about
the logic of a model by predicting what the model will produce
and then seeking to explain any unexpected findings.) Then after
flipping the page, write out any differences between the relationships
you predicted and STELLA described. Everyone should have had some
differences.

Given the above model structure, draw the
curves you would expect for the following graph. Only after drawing
the curves Only after drawing the curves Only
after drawing the curves (Don't be a slouch now!),
construct and numerically solve the model. Please inform the professor
of any differences between what you predicted and what you observed
and be able to explain how and why the differences arose.

In the first instant of the simulation, the disease is rare so
the OR and Risk Ratio have the same value. (I trust your education
in your introductory epidemiology course was adequate on this
and so do not go into great detail. Make sure, however, you understand
fully the logic as to why this should be the case.) Subsequently
the prevalence OR will have a greater value than the Risk Ratio
since the diseased category does not go into the denominator of
the odds ratio as it does for the risk ratio. In fact, risk ratios
always have a capped upper limit which is the inverse of the risk
in the unexposed. This limit is achieved when all of the exposed
are ill. For example if the risk in the unexposed is 0.2, then
the maximum risk ratio when all of the exposed are diseased is
5. ORs on the other hand, have no upper limit. The risk ratio
will fall over time because the number of new cases in the exposed
per time unit will fall faster in the exposed than the unexposed.
That is because there are fewer individuals for the disease rate
to be acting on over time since their higher disease rate has
drained out the the well individuals faster.
To examine whether a function is increasing or decreasing over
time, one could examine its first derivative. We, however, are
trying to avoid calculus in this course, so see if considering
the issue as follows can clarify the direction of change. The
prevalence OR will rise because the odds of disease in the exposed
will rise faster than the odds of disease in the unexposed. To
see the inevitability of this, we can compare the OR at one instant
with the OR an instant before. Our instant will be "dt".
Let "a" be the rate of disease development in the unexposed. Let "b" equal our exposure effect or relative rate parameter. Then if our prevalence OR at "t" is
prevalence OR = ,
our OR at "t+dt" will be
OR = .
The flows from the well to the diseased in both the exposed and unexposed individuals are added and subtracted appropriately from the values at "t" to get the values at "t+dt". If the odds ratio is rising over time, we have:
....................................Equation
(2)
Inspecting the relationships between these two prevalence ORs
should convince you that the relationships in inequality equation
(2) are correct. The numerator of the numerator has a relative
change that is greater than the relative change in the numerator
of the denominator. That will increase the OR. Likewise the denominator
of the numerator has a relative change that is greater than the
denominator of the denominator. That will also increase the OR.
We can go further than just saying that the odds ratio increases
over time. We can predict that it will increase faster over time
so that the curve of its increase will be convex. To examine whether
a function is concave or convex, we would examine the second derivative
of the function. There are various ways one might examine relationships
at incremental time steps to determine if the differences across
time steps are increasing with each time step or decreasing with
each time step. While doing the relevant algebra would be a good
practice, my experience is that students don't gain much from
this rather complex exercize so we will not pursue this issue.
Just let us state that the second derivatire of the prevalence
OR is positive so that the curve is convex (forms a cup) instead
of concave (forms a cave).
The risk ratio will fall by decreasing amounts over time so that
it to is convex.
The incidence OR as we have calculated it will be precisely the same as the relative rate parameter. That is because we have defined both the numerator and denominators of this ratio as the flow over the amount from which the flow is arising. The numerator and denominator of the incidence OR are thus the rate in the exposed and the rate in the unexposed. Over 40 time period relationships are as follows:

Over 400 time periods we see the following:

At the end of 400 time periods more than 86.5% of unexposed and
98% of exposed individuals are diseased.
The PAR will be falling for the same reason that the RR is falling.
At the start, cases among the exposed are generated at two times
the rate that cases are generated among the unexposed. Given that
the relative rate is "a", at the start "a"
times as many cases are produced in the exposed than in the unexposed.
One out of "a", however, is attributed to the same causes
as those affecting the unexposed so that cases at the start are
attributed to exposure. As more and more of the unexposed become
ill, fewer of the exposed cases are now attributable to exposure
so the PAR drops.
Make sure you go over the relationships in the following graph carefully in your mind and play with the algebra behind them so that you have a thorough understanding of why you get the shape of the parameters in the graph. It will help you to build the simulation that gives these results and then examine the results in a table of all the pertinent entities, including all of the individual odds, risks, rates and compartment sizes. Being able to formalize the details of a process and look at what is happening to each entity separately can be a great aid to carefull and logical thinking.
(To be handed in). Explain why the curves
for the different statistics in Graph 4.2 have the shapes that
they do.
As an example of graphs or tables that might help you interpret graph 4.2, we include graph 4.3.

Note that one of the most important uses of modeling is to help
you think more clearly about a problem. You very often cannot
predict what the behavior of your model is going to be because
the situation you are modeling is difficult to analyze. When you
see the behavior, and note something that you might not have expected,
then you are forced to reexamine your model, reexamine the behavior
of all the elements of your model, and clarify your understanding
of what is happening. The key to using simulation models to advance
your understanding is to always be asking the question: "Why
did it do that?" If you haven't put something terribly complex
and impenetrable into your model, you should be quite often able
to answer that question and advance your understanding. When you
can't, then at least you have a healthy reminder of how little
you know. Don't be discouraged by your inability to predict model
behavior. If you could always predict model behavior accurately
you wouldn't need a tool like STELLA® and you wouldn't need
a course like this.
To clarify the difference between cases attributable to an exposure and cases caused by an exposure, we now build a model where we separate out the cases caused by an exposure. In some unusual situations there might be a biomolecular genetic trail left by a cause of a cancer so that a biomarker of the causal action could be detected. In that case we would want to have a model that separated out these cases. In fact, if we had such a marker, we would probably want to designate cases with the marker as representing a different disease from cases without the marker. Since the disease would have different causes and different ways of preventing it, failing to distinguish the biomarked cases from those not biomarked would just dilute any examination for the effects of causal or preventive factors. In a later chapter we will examine this issue in more detail. For now just let us mention that any model with two separate flows into the disease category is called a model of simple independent action. We say that the effects of the exposure risk factor and the background risk factors have joint effect relationships described by the model of simple independent action.
The major value of constructing such a model now, however, is just to clarify our thinking about what the attributable risk statistics we have calculated above represent. By clarifying the model that underlies these statistics, we will clarify the assumptions that are needed to use these statistics in any predictive fashion.
In Diagram 4.2 we have separated out the cases caused by the exposure in our model. This is just like the model in Diagram 4.1 except that we are now distinguishing two distinct categories of individuals in the ED category. Note that in this model the sum of the two flows out of EW exactly equals the flow out of EW in the model represented by Diagram 4.1. Dividing a flow in a Stella model does not affect the total volume of the flow. Stella does not calculate one flow out of a compartment and then calculate the other flow based upon what is left. It calculates both flows using compartment values at the beginning of the dt.
Likewise in a differential equation, dividing an outflow term
into two separate inflows does not change the total flow. Here,
however, the reasons are a little different. In a differential
equation the reason that draining a compartment from which another
flow is occuring has no effect on the value of the other flow
is that on an instantaneous basis, any draining of the compartment
is infinitesimally small so that it has no effect.

Background_ED(t) = Background_ED(t - dt) + (NewBackExpDis) * dt
INIT Background_ED = 0
NewBackExpDis = EW_2*Base_rate
EW_2(t) = EW_2(t - dt) + (- NewBackExpDis - NewExpCausedDis) * dt
INIT EW_2 = 1000
NewBackExpDis = EW_2*Base_rate
NewExpCausedDis = EW_2*Base_rate*(RelativeRate-1)
ExpCaused_ED(t) = ExpCaused_ED(t - dt) + (NewExpCausedDis) * dt
INIT ExpCaused_ED = 0
NewExpCausedDis = EW_2*Base_rate*(RelativeRate-1)
UD_2(t) = UD_2(t - dt) + (NewUnexpDis_2) * dt
INIT UD_2 = 0
NewUnexpDis_2 = UW_2*Base_rate
UW_2(t) = UW_2(t - dt) + (- NewUnexpDis_2) * dt
INIT UW_2 = 1000
NewUnexpDis_2 = UW_2*Base_rate
AttributableCases = (Background_ED+ExpCaused_ED)-(Background_ED+EW_2+ExpCaused_ED)*(UD_2/(UD_2+UW_2))
Base_rate = .005
RelativeRate = 2

Confirm that the sum of the two exposed and diseased compartments in model 4.2 equals the single exposed and diseased compartment in model 4.1 by making a derived variable for the sum and then examining the two values in a table. Use the free floating decimal point option to insure that you have a more exact comparison. You will have to have model 4.1 and 4.2 on the same sheet. To construct model 4.2, you can copy and past model 4.1, bomb the parts you don't need and continue with the reconstruction.
Homework C4.5 (Hand in)
Explain why the curves in graph 4.4 have the relationships that they do. Use other graphs or tables generated by numerical solution of the model to help you with your explanation. Comment upon the relationships between the two curves when the disease is rare.
The difference between cases caused by an exposure and cases attributable to an exposure can be seen more clearly if we develop a model which has a separate compartment for cases attributable to an exposure (AC in the calculations presented earlier). Exposed and diseased individuals who had their disease caused by the exposure effect may not have their disease attributable to exposure, even though their disease was caused by exposure, if it would have been susequently caused by background factors anyway. We can model this easily enough just by having an outflow from the cases caused by the exposure that occurs at the rate of disease caused by background factors. The model form is seen if diagram 4.3.

Run model 4.1 and 4.3 on the same sheet
to confirm that AC over all of the exposed individuals equals
the risk difference, that AC over all of the diseased individuals
equals the PAR, and that AC over the exposed and diseased individuals
equals the PAR%.
Many epidemiologic studies are not of cohorts without births and
deaths. Many times we make cross sectional observations of dynamic
populations with births and deaths. From such studies we cannot
estimate risks and therefore cannot estimate risk ratios or risk
differences. But we can estimate the prevalence odds or the incidence
odds. What interpretation can we give to the prevalence odds in
terms of rate ratios or risk ratios? The answer to this question
for populations at equilibrium was presented to the epidemiologic
community by Olli Miettinen in the early 70s. Most doctoral students
will know that the prevalence odds ratio in a population at equilibrium
in regards to exposure, disease, and total size is the same as
the rate ratio as long as exposure does not have a differential
effect on duration of disease. Most will not be able to explain
well why this is the case, however. Few will be able to describe
well how deviations from the equilibrium assumptions will affect
the relationships between the odds ratio and the rate ratio. We
explore these issues here.
We build the model in diagram 4.3 where new individuals come into
the population only in the well state. We choose our birth rate
into this well state to be equal the death rate out of either
the well or the disease state so that disease does not affect
death rate or population size. (Remember that this gives us a
precisely balanced equilibrium but if birth or death rates should
change, we would get explosive growth or collapse of our population
unless both change by precisely the same amount.) We set the initial
values in the population at their equilibrium values given no
effect of exposure on increasing the disease rate. We run the
simulation for five time units just to confirm that everything
is at equilibrium and then we start the exposure effect. This
allows us to see how a rising disease rate affects the relationship
between the prevalence odds ratio and the rate ratio.
Explain the logic we used to enter the initial
equilibrium values for the exposed and unexposed diseased individuals
and derive the formulas that were entered.
ED(t) = ED(t - dt) + (New_Exp_Cases - EDdeaths) * dt
INIT ED = UW*BaseRate/Birth_Death_Rate
New_Exp_Cases = EW*BaseRate*RelativeRate
EDdeaths = ED*Birth_Death_Rate
EW(t) = EW(t - dt) + (Exp_Births - New_Exp_Cases - EWdeaths) * dt
INIT EW = 1000
Exp_Births = (EW+ED)*Birth_Death_Rate
New_Exp_Cases = EW*BaseRate*RelativeRate
EWdeaths = EW*Birth_Death_Rate
UD(t) = UD(t - dt) + (New_Well_Cases - UDdeaths) * dt
INIT UD = UW*BaseRate/Birth_Death_Rate
New_Well_Cases = UW*BaseRate
UDdeaths = UD*Birth_Death_Rate
UW(t) = UW(t - dt) + (UnExp_Births - New_Well_Cases - UWdeaths) * dt
INIT UW = 1000
UnExp_Births = (UW+UD)*Birth_Death_Rate
New_Well_Cases = UW*BaseRate
UWdeaths = UW*Birth_Death_Rate
BaseRate = 0.005
Birth_Death_Rate = 0.02
RelativeRate = If time <5 then 0.0 else 2.0
Prevalence_OR = ED*UW/(UD*EW)
Predict the pattern that the prevalence
odds ratio will follow. It should start out at one since we begin
the simulation with no exposure effect. Fill in the following
graph before looking ahead at what the simulation produced. Then
explain the model behavior and the reasons for any difference
between what subsequent simulation showed model behavior to be
and what you predicted. Use the model to play around looking at
different graphs and tables to help your explanation. Try to make
hypotheses about why an observed pattern is appearing and then
try to think about model output that would help you to evaluate
those hypotheses.

The Prevalence OR at equilibrium when exposure is having an effect
can be derived by determining the equilibrium value for each stock
algebraically. One way we do that is by setting the inflows and
outflows to a stock equal. Doing that for two of the four stocks
used in calculating the prevalence OR allows us to demonstrate
that the prevalence OR in a population at equilibrium equals the
rate ratio. We do that below:
Stock ED : 

Stock UD : 


One way to think about this equality of the Prevalence OR and
the Rate Ratio is to consider that if we were following a cohort,
the OR would progressively rise above the Rate Ratio as we saw
earlier. But new individuals are being fed into the Well stocks.
Relatively more individuals are being fed into the exposed well
stock as compared to the unexposed well stock because more individuals
are being drained away into disease in the exposed population.
This greater relative increase in EW as compared to UW drives
up the denominator and compensates for the decrease that would
take place from the relatively greater inflow into ED as compared
to UD.
When exposure effect is newly introduced, many people in the exposed
categories have been there for some time without experiencing
exposure effects. They dilute out the OR. The OR gradually rises
as fewer such individuals remain. The patterns produced by this
simulation are seen in graph 4.4 (filled in). Note that the odds
ratio is constant for the first five time units.

The following graphic output might help explain why the prevalence odds ratio rises asymptotically to the relative rate after the exposure effect begins.


1 In a fixed cohort where all of the population is equally susceptible and the rates of disease in the exposed and the unexposed stay constant,
a what will be the pattern of the risk ratio over time? Why will it have that pattern?
b what will be the pattern of the risk difference over time? Why will it have that pattern?
c what will be the pattern of the rate difference over time? Why will it have that pattern?
d what will be the pattern of the prevalence odds ratio over time? Why will it have that pattern?
e what will be the pattern of the PAR over time? Why will it have that pattern?
f what will be the pattern of the AR% over time? Why will it have
that pattern?
2 When an open population with births and deaths and homogeneous
susceptibility is at equilibrium with regard to population size
and disease frequency, the prevalence odds ratio equals the rate
ratio. Explain why this is so.
3 How could simulation models like those presented in this chapter help in the process of generating hypotheses about what is causing disease in the real world?