FUGUE Project Home Page
FUGUE is the program used to construct haplotypes for the
chromosome 22 and
19 linkage disequilibrium maps.
FUGUE is currently under development, but it already provides
some functionaly not available in other programs. If you decide
to use fugue, please email me at
goncalo@umich.edu.
Source Distribution
This version is recommended for Unix users with access to the
GNU C++ compiler. To install FUGUE, unpack the archive below,
type make and follow instructions. Have fun!
fugue0.2.3.tar.gz
Example Files
This archive includes example input files only. These files are
used in the brief FUGUE tutorial at the bottom of this page.
fugueexamples.tar.gz
Precompiled Binaries
If you do not have access to a C++ compiler, one of the
following precompiled versions may work on your system:
Linuxfugue.tar.gz For GNU/LINUX systems
SunOSfugue.tar.gz For Sun Workstations
Windowsfugue.tar.gz For Windows Workstations
Brief Fugue Tutorial
This tutorial will give you a feel for how fugue works. To run it,
you will need to have FUGUE and a recent version of
MERLIN installed.
We will first see how to estimate haplotype frequencies in a sample
of families, unrelated individuals or both. This is a two step process,
where MERLIN is used to enumerate all possible haplotypes for each
founder (assuming no recombination) and FUGUE then uses an EM algorithm
to estimate haplotype frequencies.
We will use the three input files family.dat, family.ped and family.map.
For a description of input formats, see the
MERLIN tutorial. If you examine the input files, you will find out that they
include genotypes for 569 individuals in 77 families with between 3 and 4 generations.
A total of 10 SNP markers, with average heterozygosity of 48% are listed.
To ask MERLIN to list all possible nonrecombinant haplotypes
for each family, we will use the all, zero and founders
command line options. Issue the command:
prompt> merlin d family.dat p family.ped m family.map zero founders all
This will generate a merlin.chr file which details sets of possible haplotypes
for each family and a merlin.hap file which summarizes possible haplotype sets
for each founder. This later file will be automatically detected and used by FUGUE as
input. To run fugue, issue the command:
prompt> fugue t 0.005
Your screen output detail estimated haplotype frequencies (excluding haplotypes
with frequencies of zero or close to zero) and the estimated loglikelihood of the
data, accurate up to an arbitrary constant. The t 0.005 command line option
requests that only haplotypes with estimated frequencies of 0.005 or greater should
be displayed.
FUGUE  Frequency Using Graphs
(c) 2001 Goncalo Abecasis
The following parameters are in effect:
Input File : merlin.hap (fname)
Max Bits : 16 (b9999)
Restarts : 0 (r9999)
Convergence Threshold : 1e06 (c99.999)
Display Threshold : 0.005 (t99.999)
DivideAndConquer : OFF (a[+])
Filtering data...
Total: 390 Haplotypes in 76 Sets
Known: 0 Haplotypes in 0 Sets [UNDERESTIMATE]
1024 haplotype frequencies will be estimated
[~0 Mb of memory required]
Starting with equal allele frequencies...
Pass 9, log(lk) = 634.581
Best log(lk) = 634.58
Haplotypes with estimated frequency > 0.005
0.55% 1111111222
39.09% 1111112111
0.84% 1111112112
9.45% 1111121222
5.59% 1111122111
7.67% 1121121222
0.83% 1121122111
0.84% 2221112111
1.10% 2222221221
30.43% 2222221222
0.56% 2222222111
These 11 haplotypes represent 96.94% of total probability
Other commonly used options include the a option to generate an
approximate solution in datasets with many SNP markers (>20) and the r
option, which tries to avoid local minima in the likelihood by carrying out
a number of random restarts.
A companion program to FUGUE, FUGUECC is suitable for the analysis of
haplotypes in casecontrol datasets. Similar analysis could be carried out
with the standard version of FUGUE and a little bit of scripting, but
FUGUECC is a timesaver
For this example, we will use the cc.dat and cc.pedinput files.
These files contain a set 44 affected and 43 unaffected individuals genotyped at
6 SNPs. To compare the haplotype frequencies in the case and control samples,
run FUGUECC with the following options:
prompt> fuguecc d cc.dat p cc.ped s 10
In the program output, you will see estimated haplotype frequencies and
corresponding loglikelihoods for the
combined sample (LLK_ALL), for cases only (LLK_CASES), for controls only
(LLK_CONTROLS). In addition, you will see a loglikelihood ratio statistic
defined as LLK_CASES + LLK_CONTROLS  LLK_ALL. The best way to evaluate
its significance is to generate a number of permutated datasets and analyse
each one.
The s 10 command line option tells FUGUE to generate 10 such
permutations. In this case, a similar was not observed in any of the
permuted data sets and additional permutations are recommended. Here
is the output with 100 permutations:
FUGUE FOR CASECONTROL DATA
(c) 20012003 Goncalo Abecasis
The following parameters are in effect:
Data File : cc.dat (dname)
Pedigree File : cc.ped (pname)
Random Restarts for EM : 0 (e9999)
Random Permutations for Sample : 100 (s9999)
The pedigree file includes:
43 cases, 44 controls, 0 individuals of unknown phenotype
87 founders, 0 nonfounders
Haplotyping Combined Sample
===========================
Haplotypes with estimated frequency > 0.001
34.10% 112111
1.19% 112112
0.99% 121221
19.53% 121222
5.53% 122111
0.79% 221221
35.59% 221222
2.29% 222111
These 8 haplotypes represent 100.00% of total probability
The logLikelihood of the data is 202.9073
Haplotyping Case Sample
=======================
Haplotypes with estimated frequency > 0.001
20.56% 112111
1.16% 121221
25.22% 121222
3.86% 122111
1.19% 221221
48.01% 221222
These 6 haplotypes represent 100.00% of total probability
The logLikelihood of the data is 84.8631
Haplotyping Control Sample
==========================
Haplotypes with estimated frequency > 0.001
47.36% 112111
2.30% 112112
1.17% 121221
13.83% 121222
6.93% 122111
23.64% 221222
4.77% 222111
These 7 haplotypes represent 100.00% of total probability
The logLikelihood of the data is 103.5748
Haplotyping Random Permutations of the Data
===========================================
Permutation 1: llk(cases) = 128.953, llk(controls) = 67.058, llk(sum) = 196.011
Permutation 2: llk(cases) = 49.853, llk(controls) = 141.836, llk(sum) = 191.689
(... subsequent lines removed ...)
Summary of Results
==================
logLikelihood for Combined Sample: 202.907
logLikelihood for Cases: 84.863
logLikelihood for Controls: 103.575
logLikelihood for Cases + Controls: 188.438
logLikelihood ratio: 14.469
Permutations with higher ratio: 0/100
Hmm... Even with 100 permutations, none exceed the result in the original sample.
This could be quite an interesting finding! ... Unfortunately, this is only a simulated
dataset!
