Practical 2 -- Maximizing Likelihoods in R

Summary of the Problem

In this practical we will explore alternative methods of likelihood maximization for a large data set of ABO blood group phenotypes. The data set consists of 6000 individuals and the observed counts for the A, B, AB and O blood groups are 1920, 1627, 607, 1846 individuals, respectively.

If f_A, f_B, f_AB and f_O denote the population frequencies of each blood group phenotype, then the likelihood for the data can be written as:

In addition, if we assume Hardy-Weinberg Equilibrium and denote the population frequencies of the A, B and O alleles as p_A, p_B, p_O, the likelihood for the data can be written as:

In the following sections we will proceed to estimate maximum likelihood allele frequency estimates for this sample.

Defining the log-Likelihood function

The first step we will take is to define two simple R functions that estimate the likelihood of observed counts as a function of two allele frequencies, p_A and p_B. The third allele frequency is simply:

p_O = 1.0 - p_A - p_B

For simplicity, we will ignore the multinomial constant. Here is what the likelihood function might look like:


lnL <- function(pA, pB, nA = 1920, nB = 1627, nAB = 607, nO = 1846) {

   pO = 1.0 - pB - pA

   nA * log(pA^2 + 2*pA*pO) + nB * log(pB^2 + 2 * pB * pO) + 
        nAB * log(2 * pA * pB) + 2 * nO * log(pO)

}

Using this function you should be able to evaluate the log-likelihood of the data for any arbitrary set of allele frequencies. For example, with p_A = 0.1 and p_B = 0.5 the log-likelihood should be estimated as -10104.75. Using this function and a little bit of patience, you could try and maximize the likelihood by trial and error...

The E-M Algorithm

The first approach we will try to likelihood maximization is the E-M algorithm. To do so we will estimate the number of individuals carrying each possible genotype based on the observed counts for each phenotype and estimates of the allele frequencies at each iteration.


EM <- function (pA, pB, nA = 1920, nB = 1627, nAB = 607, nO = 1846, debug = FALSE) {

    # Evaluate the likelihood using initial estimates
    llk <- lnL(pA, pB, nA, nB, nAB, nO)

    # Count the number of iterations so far
    iter <- 1

    # Loop until convergence ...
    while (TRUE)
       {
       # Estimate the frequency for allele O
       pO = 1.0 - pA - pB
 
       # First we carry out the E-step

       # The counts for genotypes O/O and A/B are effectively observed
       # Estimate the counts for the other genotypes
       nAA <- nA * pA / (pA + 2*pO)
       nAO <- nA - nAA
       nBB <- nB * pB / (pB + 2*pO)
       nBO <- nB - nBB
       
       # Print debugging information
       if (debug)
          {
          cat("Round #", iter, "lnLikelihood = ", llk, "\n")
          cat("    Allele frequencies: pA = ", pA, ", pB = ", pB, ", pO = ", pO, "\n")
          cat("    Genotype counts:    nAA = ", nAA, ", nAO = ", nAO, ", nBB = ", nBB, 
              ", nBO = ", nBO, "\n")
          }

       # Then the M-step
       pA <- (2 * nAA + nAO + nAB) / (2 * (nA + nB + nO + nAB))
       pB <- (2 * nBB + nBO + nAB) / (2 * (nA + nB + nO + nAB))

       # Then check for convergence ...
       llk1 <- lnL(pA, pB, nA, nB, nAB, nO)

       if (abs(llk1 - llk) < (abs(llk) + abs(llk1)) * 1e-6) break       

       # Otherwise keep going
       llk <- llk1
       iter <- iter + 1
       }

  list(pA = pA, pB = pB)
  }

To evaluate the performance of the algorithm, you can run the E-M function with the debug flag set to TRUE. For example, you could try E-M(pA = 0.4, pB = 0.4, debug = TRUE).

Methods Based on Scoring

Although matrix inversions can make these methods quite to tedious to carry out by hand, they are a breeze in R. The R functions D, deriv and deriv3 allow derivatives to be calculated analytically for simple functions. They can also be used to calculate the Hessian matrix of second derivatives.

Try them out


D(expression(x^2), "x")
D(expression(1/x), "x")
D(expression(x*log(x)), "x")
D(expression(cos(x)), "x")

dxy <- deriv(expression(x^2+y^2), c("x", "y"), function(x, y) { } )
dxy(1,1)
dxy(2,0)

In this case, we will first use deriv3 to construct a function that will evaluate our likelihood, its gradient and matrix of second derivatives at an arbitrary point.


L <- deriv3(expression(
      nA * log(pA^2 + 2*pA*(1 - pA - pB)) + nB * log(pB^2 + 2 * pB * (1 - pA - pB)) +
      nAB * log(2 * pA * pB) + 2 * nO * log(1 - pA - pB)), c("pA", "pB"),
      function (pA, pB, nA = 1920, nB = 1627, nAB = 607, nO = 1846) { } )

We can now use this information to construct a simple algorithm based on Newton's method


Newton <- function(pA, pB, nA = 1920, nB = 1627, nAB = 607, nO = 1846, debug = FALSE) {

   # Calculate likelihood, score and information
   pointEstimates <- L(pA, pB, nA, nB, nAB, nO)

   iter <- 1

   while (TRUE) {

       llk <- pointEstimates[1]
       S <- attr(pointEstimates, "gradient") [1,]
       I <- -attr(pointEstimates, "hessian") [1,,]

       # Print debugging information
       if (debug)
          {
          cat("Round #", iter, "lnLikelihood = ", llk, "\n")
          cat("    Allele frequencies: pA = ", pA, ", pB = ", pB, ", pO = ", 1-pA-pB, "\n")
          cat("    Score: ", S, "\n")
          }

       # Update point
       delta <- solve(I, S)

       pA <- pA + delta[1]
       pB <- pB + delta[2]

       # Calculate likelihood, score and information
       pointEstimates <- L(pA, pB, nA, nB, nAB, nO)
       llk1 <- pointEstimates[1]

       # Check for convergence 
       if (abs(llk1 - llk) < (abs(llk) + abs(llk1)) * 1e-6) break

       # Keep going
       iter <- iter + 1
       }

    list(pA = pA, pB = pB)
    }

The Nelder-Mead Simplex Method

R has the built-in ability to minimize a function using the Simplex method. To do this we will use the R function optim(). You can learn about by typing help(optim). Since optim() assumes that the function to be minimized takes a vector of parameters, we will first define an appropriate wrapper function for the likelihood:


wrapper <- function( p, nA = 1920, nB = 1627, nAB = 607, nO = 1846) {

   pA = p[1]
   pB = p[2]

   if (pA <= 0.0 || pA >= 1.0 || pB <= 0.0 || pB >= 1.0 || pA + pB >= 1.0) return(-Inf)

   lnL(pA, pB, nA, nB, nAB, nO)

   }

The key parameters to optim() are the starting values, the function to be minimized or maximized, the optimization method and control parameters specific to each method. In this case, we are optimizing the wrapper function using the "Nelder-Mead" method. We want to maximize the function, so we will ask optim() to rescale function values by multiplying them by -1. Optionally, we can ask for debug information through using the trace parameter.

For example:


optim(c(0.1, 0.1), wrapper, method = "Nelder-Mead", control = list(fnscale = -1))
optim(c(0.1, 0.1), wrapper, method = "Nelder-Mead", control = list(fnscale = -1, trace = TRUE))

Again, you can evaluate the performance of the algorithm by trying a number of alternative starting points...