University of Michigan Center for Statistical 
Genetics
Search
 
 

 
 

FUGUE Project Home Page

FUGUE is the program used to construct haplotypes for the chromosome 22 and 19 linkage disequilibrium maps.

FUGUE is currently under development, but it already provides some functionaly not available in other programs. If you decide to use fugue, please e-mail me at goncalo@umich.edu.

Source Distribution

This version is recommended for Unix users with access to the GNU C++ compiler. To install FUGUE, unpack the archive below, type make and follow instructions. Have fun!

    fugue-0.2.3.tar.gz

Example Files

This archive includes example input files only. These files are used in the brief FUGUE tutorial at the bottom of this page.
   fugue-examples.tar.gz

Precompiled Binaries

If you do not have access to a C++ compiler, one of the following precompiled versions may work on your system:

    Linux-fugue.tar.gz      For GNU/LINUX systems
    SunOS-fugue.tar.gz      For Sun Workstations
    Windows-fugue.tar.gz    For Windows Workstations

Brief Fugue Tutorial

This tutorial will give you a feel for how fugue works. To run it, you will need to have FUGUE and a recent version of MERLIN installed.

We will first see how to estimate haplotype frequencies in a sample of families, unrelated individuals or both. This is a two step process, where MERLIN is used to enumerate all possible haplotypes for each founder (assuming no recombination) and FUGUE then uses an E-M algorithm to estimate haplotype frequencies.

We will use the three input files family.dat, family.ped and family.map. For a description of input formats, see the MERLIN tutorial. If you examine the input files, you will find out that they include genotypes for 569 individuals in 77 families with between 3 and 4 generations. A total of 10 SNP markers, with average heterozygosity of 48% are listed.

To ask MERLIN to list all possible non-recombinant haplotypes for each family, we will use the --all, --zero and --founders command line options. Issue the command:

   prompt> merlin -d family.dat -p family.ped -m family.map --zero --founders --all

This will generate a merlin.chr file which details sets of possible haplotypes for each family and a merlin.hap file which summarizes possible haplotype sets for each founder. This later file will be automatically detected and used by FUGUE as input. To run fugue, issue the command:

   prompt> fugue -t 0.005

Your screen output detail estimated haplotype frequencies (excluding haplotypes with frequencies of zero or close to zero) and the estimated log-likelihood of the data, accurate up to an arbitrary constant. The -t 0.005 command line option requests that only haplotypes with estimated frequencies of 0.005 or greater should be displayed.

FUGUE - Frequency Using Graphs
(c) 2001 Goncalo Abecasis

The following parameters are in effect:
                    Input File :      merlin.hap (-fname)
                      Max Bits :              16 (-b9999)
                      Restarts :               0 (-r9999)
         Convergence Threshold :           1e-06 (-c99.999)
             Display Threshold :           0.005 (-t99.999)
            Divide-And-Conquer :             OFF (-a[+|-])

Filtering data...

Total: 390 Haplotypes in 76 Sets
Known: 0 Haplotypes in 0 Sets [UNDERESTIMATE]

1024 haplotype frequencies will be estimated
  [~0 Mb of memory required]
Starting with equal allele frequencies...
Pass  9, log(lk) = -634.581

Best log(lk) = -634.58

Haplotypes with estimated frequency > 0.005
  0.55% 1111111222
 39.09% 1111112111
  0.84% 1111112112
  9.45% 1111121222
  5.59% 1111122111
  7.67% 1121121222
  0.83% 1121122111
  0.84% 2221112111
  1.10% 2222221221
 30.43% 2222221222
  0.56% 2222222111

These 11 haplotypes represent 96.94% of total probability

Other commonly used options include the -a option to generate an approximate solution in datasets with many SNP markers (>20) and the -r option, which tries to avoid local minima in the likelihood by carrying out a number of random restarts.

A companion program to FUGUE, FUGUE-CC is suitable for the analysis of haplotypes in case-control datasets. Similar analysis could be carried out with the standard version of FUGUE and a little bit of scripting, but FUGUE-CC is a timesaver

For this example, we will use the cc.dat and cc.pedinput files. These files contain a set 44 affected and 43 unaffected individuals genotyped at 6 SNPs. To compare the haplotype frequencies in the case and control samples, run FUGUE-CC with the following options:

  prompt> fugue-cc -d cc.dat -p cc.ped -s 10

In the program output, you will see estimated haplotype frequencies and corresponding log-likelihoods for the combined sample (LLK_ALL), for cases only (LLK_CASES), for controls only (LLK_CONTROLS). In addition, you will see a log-likelihood ratio statistic defined as LLK_CASES + LLK_CONTROLS - LLK_ALL. The best way to evaluate its significance is to generate a number of permutated datasets and analyse each one.

The -s 10 command line option tells FUGUE to generate 10 such permutations. In this case, a similar was not observed in any of the permuted data sets and additional permutations are recommended. Here is the output with 100 permutations:

FUGUE FOR CASE-CONTROL DATA
(c) 2001-2003 Goncalo Abecasis

The following parameters are in effect:
                     Data File :          cc.dat (-dname)
                 Pedigree File :          cc.ped (-pname)
        Random Restarts for EM :               0 (-e9999)
Random Permutations for Sample :             100 (-s9999)

The pedigree file includes:
43 cases, 44 controls, 0 individuals of unknown phenotype
87 founders, 0 non-founders

Haplotyping Combined Sample
===========================

Haplotypes with estimated frequency > 0.001
 34.10% 112111
  1.19% 112112
  0.99% 121221
 19.53% 121222
  5.53% 122111
  0.79% 221221
 35.59% 221222
  2.29% 222111

These 8 haplotypes represent 100.00% of total probability
The logLikelihood of the data is -202.9073

Haplotyping Case Sample
=======================

Haplotypes with estimated frequency > 0.001
 20.56% 112111
  1.16% 121221
 25.22% 121222
  3.86% 122111
  1.19% 221221
 48.01% 221222

These 6 haplotypes represent 100.00% of total probability
The logLikelihood of the data is -84.8631

Haplotyping Control Sample
==========================

Haplotypes with estimated frequency > 0.001
 47.36% 112111
  2.30% 112112
  1.17% 121221
 13.83% 121222
  6.93% 122111
 23.64% 221222
  4.77% 222111

These 7 haplotypes represent 100.00% of total probability
The logLikelihood of the data is -103.5748

Haplotyping Random Permutations of the Data
===========================================

Permutation    1: llk(cases) = -128.953, llk(controls) =  -67.058, llk(sum) = -196.011
Permutation    2: llk(cases) =  -49.853, llk(controls) = -141.836, llk(sum) = -191.689
(... subsequent lines removed ...)


Summary of Results
==================

logLikelihood for Combined Sample:  -202.907
logLikelihood for Cases:             -84.863
logLikelihood for Controls:         -103.575
logLikelihood for Cases + Controls: -188.438
logLikelihood ratio:                  14.469
Permutations with higher ratio:        0/100

Hmm... Even with 100 permutations, none exceed the result in the original sample. This could be quite an interesting finding! ... Unfortunately, this is only a simulated dataset!


 
 

University of Michigan | School of Public Health | Abecasis Lab