Epidemiology 802

Compartmental Model Analysis of Epidemiologic Processes

Chapter 9

Multistage Disease Processes and Time Dependent Flows

by

Jim Koopman

Purpose of This Chapter

Multistage models of disease development are a theoretically and practically useful way of modeling time delays between when a disease process begins and when it is manifest. Modeling time delays is useful for both infectious and non-infectious diseases in two ways. First it captures important aspects of dynamics. Second it creates a foundation for modeling the joint effects of exposures and for understanding the dynamics of screening programs.

Modeling the incubation period for a contagious disease is useful because the distribution of incubation periods between onset of the infection process and the onset of contagiousness affects the speed with which an infection can spread through a population. For chronic non-infectious diseases the dynamics of disease development in individuals will affect the age distribution of disease. For example the rising frequency of cancer with age has been explained mostly in terms of multistage models. The dynamics of disease development after a transient exposure will affect the way we analyze exposure effects and evaluate the effects of exposure control programs. Some model of disease development dynamics is always assumed when analyzing exposure-disease relationships.

While we address these direct uses of multistage modeling in this chapter, they do not present the major motivation for the presenting multistage models. This chapter is meant to create a basis for further practical and theoretical modeling issues. One practical issue for which it creates a modeling foundation is the design and evaluation of screening programs. In the next chapter we will use multistage models to gain insight into how predictive values should change after the institution of a screening program and how intervals between repeat screenings will affect the performance of screening programs. These issues are not well addressed by the static approaches to relating sensitivity, and specificity to predictive values at different levels of disease prevalence.

An issue needing multistage models that has even broader practical and theoretical implications is the joint effects of two or more different exposures. The pattern of disease in the population by exposure is dependent in predictable ways on how two different exposures relate to each other in causing disease. There are three major ways that they may relate to each other: 1) they may contribute to advancing the same stage of a disease process, 2) they may contribute to different stages of the same disease process, or 3) they may contribute to distinct pathways to the disease. To correctly assess exposure effects statistically and to predict the consequences of eliminating exposures, the analysis model used must have joint effects that correspond those produced by the underlying causal relationships. By building models of these three relationships we will gain insight into the analytic models that will be appropriate for different causal situations.

Single Stock Models of Disease Incubation:

Up to this point we have dealt with disease processes as if the time that you entered a disease process didn't influence the time when you developed a disease. Making this assumption simplifies the mathematics of disease processes and it reduces the time a computer must take to simulate a process. Although such assumptions are usually unrealistic, they provide appropriate caricatures of systems in the initial exploration of how those systems might be expected to behave.

Simple one stage disease process models can be particularly useful for studying the equilibrium states of systems. Sometimes, but not always, the equilibrium states don't depend on the distribution of times that it took individuals to reach a disease state. Sometimes they only depend upon the mean time that it takes individuals to reach the disease state. We will illustrate one such situation in this chapter. Later in the course we will also see that the shape of the distribution of incubation periods can markedly affect how fast an infectious disease epidemic rises but it does not affect the equilibrium levels of infectious diseases.

A simple alternative to modeling the development time to disease is to assume that there is a fixed time to disease development so that everyone takes exactly this fixed time to develop the disease. We can produce simulations with such fixed delays using the conveyor stock in Stella. The conveyor model is equivalent to having a compartment for each dt in the numerical estimation procedure and having the entire contents of each compartment move into the next compartment each dt. The negative exponential distribution and the fixed delay represent two extremes of possible distributions of incubation periods. Let us generate these two extreme distributions using the following Stella models:

Diagram 9.1

In the conveyor we put 1 individual in the most distant time slot from the exit and zeros in the 99 other time slots. In the Well Reservoir we put one individual as well. You can appreciate that we have to plot the flows from the well state on different scales in these two models. The maximum rate from the reservoir is 0.01 but there are continuous flows. From the conveyor, our one individual flows out all at once. Make sure you can reproduce the output in graph 9.1.

Certainly there must be some more realistic alternatives between these two extremes.

Graph 9.1


The Effect of Infectious Period Distributions on Infectious Disease Transmission Dynamics

Let us consider one implication of these two models for infectious diseases. Rather than using these models for incubation periods, we first consider a simple SIR model where we use these two models for the infectious period. In the first SIR model we use a reservoir for the infection compartment so that the distribution of times spent in that compartment has a negative exponential form as seen in Graph 9.1. In the second, we use a conveyor so that every individual has exactly the same duration of stay in the infectious state. We set the rate of exit from the reservoir equal to the inverse of the time length of the conveyor so that in each case the average duration in the infectious state is the same. The diagram for our model is seen in Diagram 9.2.

Diagram 9.2


COMPARTMENTS

Icon(t) = Icon(t - dt) + (NewIcon - NewRcon) * dt

INIT Icon = .0001

TRANSIT TIME = 2

INFLOW LIMIT = INF

CAPACITY = INF

INFLOWS:

NewIcon = Scon*c*beta*(Icon/(Icon+Rcon+Scon))

OUTFLOWS:

NewRcon = CONVEYOR OUTFLOW

Ires(t) = Ires(t - dt) + (NewIres - NewRres) * dt

INIT Ires = .0001

INFLOWS:

NewIres = Sres*c*beta*(Ires/(Ires+Rres+Sres))

OUTFLOWS:

NewRres = Ires/D

Rcon(t) = Rcon(t - dt) + (NewRcon) * dt

INIT Rcon = 0

INFLOWS:

NewRcon = CONVEYOR OUTFLOW

Rres(t) = Rres(t - dt) + (NewRres) * dt

INIT Rres = 0

INFLOWS:

NewRres = Ires/D

Scon(t) = Scon(t - dt) + (- NewIcon) * dt

INIT Scon = .999

OUTFLOWS:

NewIcon = Scon*c*beta*(Icon/(Icon+Rcon+Scon))

Sres(t) = Sres(t - dt) + (- NewIres) * dt

INIT Sres = .999

OUTFLOWS:

NewIres = Sres*c*beta*(Ires/(Ires+Rres+Sres))

PARAMETERS

beta = .25

c = 4

D = 2

Now let us examine the incidence and cumulative incidence curves from this model. We see in Graph 9.2 that the epidemic rises faster given the conveyor model of duration of contagiousness. Since the basic reproduction number is the same for both models, however, both have the same threshold values for contact rates and transmission probabilities at which there will be rising transmission and both come to the same final fraction of the population infected.

Graph 9.2


Let us consider some technical points about how we generated these curves in Graph 9.2. Whenever a delay, a quick burst inflow, or a conveyor is used in our simulations, we should stick to the Euler method of numerical estimation. To attain accuracy with the Euler method, a very small time step is required in this model because during the rapid rise of the epidemic in the reservoir case, flows can be relatively large. When a sufficiently small time step is used, however, you see that the final fraction of individuals infected is the same for both models. The initial rate of rise of the epidemics, however, is different.

If we start with a small enough seed of initially infected individuals, we can examine the initial period of the epidemic to see how long it takes either the total number of infected individuals or the flow of newly infected individuals to double. The reservoir model has a constant doubling time from the start which persists until the fraction of the population which is not susceptible can perceptibly affect the doubling time. The conveyor model starts out with a uniform distribution and takes a few doubling times until cases are distributed along the conveyor in an exponential fashion. After that point there is again a period of constant doubling times until the decreased fraction susceptible begins to slow the rise.

The period of constant doubling times at the very beginning of an epidemic can be used to estimate the basic reproduction rate of the transmission process. We can see from these results, however, that to use that doubling time to estimate the basic reproduction number, we will have to have a model that describes the distribution of infectious periods and not just an average duration for the infectious period.

Homework 9.1

Construct two SIR models with vital dynamics that have an unchanging population size. In one use a reservoir for the infectious period and in the other use a conveyor. Demonstrate that when the average duration of the infectious period is the same, these two models lead to the same equilibrium values.

Multistage Disease Processes:

Realistically the time one develops a disease is likely to depend upon when one started incubating the disease. That is to say that a one stage reservoir model is not completely realistic. There will almost always be a distribution of disease onsets after exposure where some individuals have short incubation periods and some have longer periods. If that is the case then the conveyor model is not ideally realistic either. For some purposes we might construct our models with these two extremes and see if we come to the same conclusion with either extreme. If we do, then we could generally feel confident that a more realistic intermediate model would also lead us to that same conclusion. Because of limitations we find in Stella on the nature of flows into and out of conveyors, however, we may not be able to test our model with these two extremes in all cases. Thus we are motivated to find an intermediate model. Fortunately there is a very convenient one readily at hand. It is the multistage model.

We can create a realistic distribution of incubation periods with most incubation periods clustered around a peak by introducing multiple stages to the disease development process.

Let us think about a multistage process in physiological terms. Consider inhaling a dose of influenza viruses as the chance event that starts a disease process. It takes some time for those viruses to bounce around at random until they finally attach to the cells that they can invade. Then the cell membrane molecules have to bounce around at random until all the necessary linkages between virus and cell membranes are made that lead to actual virus entry and separation from virus RNA and attached proteins from its viral membrane. Then the virus proteins have to randomly go about until they hit their targets. It seems that we could always go into finer and finer detail with such descriptions. In that sense, any disease development process seems to have a lot of random event sequences that it must go through. Only a few of those, however, will be long enough to make much of a difference. For the influenza virus situation, there will have to be several generations of virus replication going through all of these minute stages before there is enough virus present to make one ill. We could think of each viral generation as a stage of the incubation period.

For the incubation of cancer we generally think of stages during which somatic cells have to undergo mutations. Each cumulative mutations marks another stage. We will see later that such multistage models can be elaborated further by specifying whether the mutations have to occur in sequential order or whether any order will do the job.

A simple two stage model:

Lets build a simple two stage disease process and examine the distributions of times to entry into the disease state by plotting the flows into the disease state. The model we will build is as seen in STELLA diagram 9.3

Diagram 9.3


COMPARTMENTS:

First_stage_incubating(t) = First_stage_incubating(t - dt) + (- Asymptomatic_progressions) * dt

INIT First_stage_incubating = 1

OUTFLOWS: Asymptomatic_progressions = First_stage_incubating*Stage_1_Progression_rate

Second_stage_incubating(t)

= Second_stage_incubating(t - dt) + (Asymptomatic_progressions - Disease_onsets) * dt

INIT Second_stage_incubating = 0

INFLOWS: Asymptomatic_progressions = First_stage_incubating*Stage_1_Progression_rate

OUTFLOWS: Disease_onsets = Second_stage_incubating*Stage_2_progression_rate

Diseased(t) = Diseased(t - dt) + (Disease_onsets) * dt

INIT Diseased = 0

INFLOWS: Disease_onsets = Second_stage_incubating*Stage_2_progression_rate

PARAMETERS

Stage_1_Progression_rate = 1

Stage_2_Progression_rate = 1

DERIVED VARIABLES

Well_Total = First_stage_incubating+Second_stage_incubating

Incidence_rate_or_Hazard_function = Disease_onsets/Well_Total

In this model we put all of the modeled population into the first stage compartment before we begin running the program. There is no input into this compartment so everyone in the compartment will have spent the same amount of time in the compartment. Therefore we can think of this as a model of a cohort of individuals who are identified at some point of time and followed. We will consider the start of the model to be just at the moment that individuals have begun their first step toward developing disease. Then individuals will have to take a second step to get into the second compartment and a third step to get into the compartment where disease is manifest. For example, we might think of time zero in the first stage as the time of HIV infection and the disease onsets in the third compartment as AIDS onsets.

Alternatively, we might think of an entire cohort of individuals being exposed to some hazard beginning at time zero that defines the onset of the disease process. For example we might be interested in examining the incubation period through which smoking generates cancer. Smoking might generate a constant chance of reaching the next stage of the disease process which might be some somatic mutation. Then once that stage is reached (in the second compartment in our model), there could be a constant hazard that the next step which takes one into the disease state could be taken. This step might be the growth of a tumor to a critical detectable size or it might be attainment of the capacity to metastasize. For most cancers we will in fact probably want more than two stages. Growth processes in general require us to use several stages and metastases don't become immediately manifest as symptoms. Thus in this example we might want actually to use more than two stages. But we want to begin our multistage modeling with the simplest multistage model.

In either example, because we have put everyone into the first stage of disease at time zero, time zero is when we define the risk as beginning. At that time no one will have made any progression to disease. Of course with many disease processes, if we were to select a cohort of well individuals for study, some of those individuals would be at various stages already along the progression to disease. We assume in this model that we have some way of defining a beginning point for the process.

Suppose we want the total average incubation period or time spent in non-diseased stages to be two time units so we will divide the total average incubation period into two parts.

Note that we can treat the flow into the diseased state in this two stage model just as we treated the flow into the diseased state in the one stage model. Across any discrete time interval, the number of individuals flowing into the diseased state represents the number of individuals that have an incubation period corresponding to the midpoint of that time interval. To emphasize this we present graph 9.3. The total number of healthy individuals in this graph has been calculated by summing the individuals in the two different incubating states.

Graph 9.3

As the "dt" constituting the base of the rectangles used to measure the volume under the curve defined by "Disease Onsets" gets smaller and smaller, the sum of the volume of those rectangles gets closer to the total volume under that curve. That total volume is one since we put one individual in the first well stage. As the step sizes go to zero, the rectangles will have no volume. That is to say the probability of any individual fitting into any particular rectangle as the dt goes to zero will approximate zero. But the height of the curve can be interpreted as the probability density of having an incubation period of the corresponding length of time.

Note that just as in the case of one stage disease, the survival function is one minus the cumulative risk function, the probability density function is the derivative of the cumulative risk function or the negative derivative of the survival function, and the hazard function is the probability density function divided by the survival function. Let us summarize this symbolically:

  1. F(x) = cumulative risk or cumulative density function. In this STELLA® model implementation it is just the value of the Diseased stock.
  2. f(x) = the flow into the diseased stock
  3. 2 S(x) = 1-F(x) = the survival function. In this STELLA® model implementation it is just the value of the sum of First Stage Incubating and Second Stage Incubating stocks.
  4. 3 f(x) = dF(x) / dx = -dS(x) / dx = the probability density function for incubation periods assuming that incubation can be considered to start at time zero. In this STELLA® model implementation it is just the value of the flow "Disease Onsets".
  5. 4 l(x) = f(x) / S(x) = Hazard function or rate of disease development. You would have to make a derived variable to get this from the STELLA® simulation. This derived variable would be the flow into the disease compartment divided by all of the well compartments leading up to that flow.

For those of you with a calculus background, you might also be interested to see that

l(x) = -d ln(S(x)) / dx.

In this case, as opposed to the one stage time independent or autonomous model, our hazard function is not constant over time. Note that our model is still time independent (flows don't depend on the values of stocks at any time in the past) and it is still autonomous (time itself does not affect parameter values). We have not put any delays or conveyors into the system and we have not put time into any of the functions defining flows. The hazard function is the instantaneous rate at which disease is developing in well individuals. In Stella we have to create this as a derived variable by taking the flow into the disease stage per unit time and dividing by the total well population. Note that unlike the one stage disease process, this hazard function changes value over time. The reason for this is that at first there is no one in the stage just before disease so no one can get disease. As more and more individuals flow into that pre-disease state, the total proportion of individuals in that state will rise, and the overall disease rate will rise. Let us look at this function graphically.

Graph 9.4


The duration of separate stages and the overall duration

There are various ways we could divide the incubation period into two parts. We could have a lot of the incubation period in the first stage and little in the second. We could make the two parts equal as done for graph 9.3. Or we could make the first stage short and the second stage long. We want now to examine how these different divisions of the total incubation time into the different stages affects the probability density function of the incubation period.

If we divided our incubation stages evenly so that the average time in each stage is 1 time unit, the flow rates out of each incubating stage would have to be one in order for the average time spent in each of them to equal one. We will label the flow into the diseased state as IR 50 per 100 for this 50-50 division of the incubation period. Alternatively we might divide the time periods up so that on the average half a year is spent in the first stage incubating and 1.5 years in the second stage incubating. To make this division, we would make the flow rate out of the first stage to equal 1/0.5 or 2 and out of the second stage to equal 1/1.5 or .66667. In the graph below, we label the flow into the diseased state as IR 25 per 100 for these values.

Note that in order to get a total time of 2, it is not the rates that add up, it is one over the rates! We could also divide the time up so that only 1.5 time units on average were spent in the first stage and 0.5 were spent in the second stage. We would just switch around the previous flow rates for this purpose. We label the corresponding flow IR 75 per 100. Similarly we could divide the periods into .2 and 1.8 years so that the flow rates would be 5 and 1/1.8. (IR 10 per 100) Or we could go to the extreme where 99% of the incubating time would be spent in the second stage so the flow rates would be 50 and 1/1.98. (IR 1 per 100) (Better use a very small time step when you have a flow rate of 50 or alternatively change your time units!)

Given these combinations of flow rates, we get the distributions of incubation times as seen in graph 9.5 by plotting the flows into the diseased state.

Graph 9.5


Homework 9.2

Reproduce a set of curves like those in graph 5.4 but where the average total incubation period is only 0.5 time units. Make sure that your results are not influenced by the time period for the dt which you have chosen by demonstrating that your results are similar even when you halve the dt. With a total incubation time in the two stages of 0.5, you are going to have some pretty big flow rates. Make sure you understand how to handle such high rates without generating artifacts.

Note that in a two stage model like this, the final distribution of incubation periods or times spent in the non-diseased state will be the same if we make the first stage fast and the second slow or vice versa. We see this above graph where the IR 25 per 100 and the IR 75 per 100 curves are congruent.

Homework 9.3

Explain why the IR 25 per 100 and the IR 75 per 100 curves are congruent. Be as specific as possible. See if you can use any mathematical logic in your explanation.

Note that for all practical purposes, we could treat the IR 1 per 100 model as a one stage process. The second stage happens so fast that it doesn't make a noticeable difference in the incubation period. From exercise 1 above you should have seen that adding a small, fast moving stage complicates the calculation process considerably. Thus there is every reason to avoid such stages in a model and little reason to include them.

Note also that to move the peak of our distribution of incubation times maximally to the larger numbers, we divided the total incubation period evenly. But even then, the curve has a very short left side and a long right side.

How can we move the peak even further to the right while still preserving the same total average incubation time? We can do that by adding more stages. Graph 9.6 presents the distribution of onset times for a total average incubation time of 10 years when there are 1, 2, 4, 20, and 40 stages. Each stage is given exactly the same length. The flow rate out of each stage is r/10 where r is the number of stages.

Graph 9.6


You will note that as the number of stages is increased, the curve gets more bell-like but that it still has a longer tail on the right than on the left.

The Erlang distribution:

The curves in graph 9.6 are Erlang distributions. Erlang distributions are a subset of a larger family of distributions called gamma distributions. Gamma distributions can describe probability density functions just like the negative exponential function rate*e-rate*time describes a probability density function. In fact the negative exponential distributions are a subset of the Erlang distributions. There is a closed form for the gamma distribution which is kemtm-1e -kt / m! . When m is constrained to the integer numbers, we have an Erlang distribution. In that case k corresponds to the flow rate out of each of the sequential compartments and m corresponds to the number of compartments.

The average incubation period is m / k . That is because the average duration in each stage is one over k and there are m stages.

The Multi-Hit process.

In the above examples, one has to be in a stage before the events that lead to some other stage can take place. In some situations where two different events are involved in leading up to a disease process, it may not be necessary to have one event occur before the other. For example, perhaps there are a set of genetic mutations that need to be accumulated before the disease develops and it makes no difference which mutation occurs first.

The accumulation of random risk over time (Weibull) process.

The Weibull distribution:

Consider another way of making the flow out of an incubating disease stage depend upon the time that one has been in that incubating stage. Instead of adding additional stages, we could make the flow dependent upon the amount of time that has been spent in a stage as we fill in the formula for the flow regulator. That approach will only work in a cohort study where everyone starts in the incubating compartment at time 0 and no new individuals come into that compartment after time zero. For that reason this approach does not have as wide applicability in compartmental modeling as the multistage model does. In the model in Diagram 9.4 we have filled in the flow rate as having a linear dependence on time.

Diagram 9.4

Diseased(t) = Diseased(t - dt) + (Disease_Onsets) * dt

INIT Diseased = 0

Well(t) = Well(t - dt) + (- Disease_Onsets) * dt

INIT Well = 1

Weibull_Parameter = .25

Disease_Onsets = Well*Weibull_Parameter*time

Incidence_density_or_Hazard_Function = Disease_Onsets/Well

This gives us graph 5.6

Graph 5.6


Curve 3 is the Weibull distribution. It has the statistical advantage of having only one parameter. The closed form for this curve is at(base of the natural logarithms to the power)[-at2 / 2]. The a is the Weibull parameter and t is the time since entering the state. In this model the time of entering the state is time zero so that t is just simulation time which we can enter into STELLA® just by typing time.

The average incubation period given this type of process is .

You can see by the fact that the flow rate out of the healthy state can exceed in value the number of remaining healthy incubating individuals that this distribution has a different shape from the Erlang distribution. Also the first part of the curve is not concave as is the case with the Erlang distribution.

For your entertainment:

See how close you can make a Weibull distribution approximate a 3 stage Erlang distribution or the generalized three stage Erlang from the last exercise. Start with a Weibull parameter of around 0.02. Also see how well you can make a Weibull approximate a 3 stage Erlang distribution or a 10 stage Erlang distribution. Suppose you are trying to model the incubation period from the onset of HIV infection to the onset of AIDS and you have data only on half of the rising part of the curve. Comment upon the problems of using a Weibull distribution if the real distribution were an Erlang distribution.

The reason for using a distribution of this sort in data models is the advantage of having only one parameter instead of two required by Erlang distributions. It is difficult to think of a causal process, however, that would make the flow rate linearly dependent upon the time in a state. Moreover the fact that in Stella we can only use this distribution when examining a cohort that all start their risk at the same time makes it of limited application for compartmental models.

Review Questions:

1 Describe the two extreme distributions for a one stage disease process. How can you generate these with a Stella® simulation?

2 Describe two different approaches to generating distributions that are intermediate between the extreme distributions discussed in question 1. Discuss the biological relevance of these approaches.

3 Describe what happens to the shape of the incubation period distribution as "n" increases when there are "n" stages of disease and the flow out of each stage is some constant times "n".

4 How can one generate the probability density function of an Erlang distribution using Stella?

5 How can one generate the cumulative density function of an Erlang distribution using Stella?

6 How can one generate the survival function associated with an Erlang distribution using Stella.