LASER: Locating Ancestry from SEquence Reads

LASER is a program to estimate individual ancestry by directly analyzing shotgun sequence reads without calling genotypes. LASER uses principal components analysis (PCA) and Procrustes analysis to analyze sequence reads of each sample and place the sample into a reference PCA space constructed using genotypes of a set of reference individuals. With an appropriate reference panel, the estimated coordinates of the sequence samples reflect their ancestral background and can be used to correct for population stratification in association studies. LASER can accurately estimate ancestry even with modest amounts of data, such as the off-target sequence data generated by targeted sequencing experiments.

In version 2.0 or later, the software package includes a new program TRACE for tracing an individual's genetic ancestry based on genotype data. TRACE follows the same analysis framework as LASER and can accurately place study samples into a reference ancestry space using a relatively small number of genotypes. When using the same reference panel, LASER and TRACE can place sequenced and genotyped samples into the same ancestry space.

LASER can also perform standard PCA on genotype data to explore population structure and to create the reference ancestry space. Different options to compute PC scores and PC loadings have been implemented in the LASER program (version 2.01 or later).

Comments and suggestions are welcome; please email Chaolong Wang at chaolong@umich.edu or Gonçalo Abecasis at goncalo@umich.edu.

If you use LASER, please take a minute to fill out the registration form. We will keep you updated when a new version is released.

Reference for LASER:

  • C Wang*, X Zhan*, J Bragg-Gresham, HM Kang, D Stambolian, E Chew, K Branham, J Heckenlively, The FUSION Study, RS Fulton, RK Wilson, ER Mardis, X Lin, A Swaroop, S Zöllner, GR Abecasis (2014) Ancestry estimation and control of population stratification for sequence-based association studies. Nature Genetics, 46: 409-415 [link].

Downloads:

  • Software package: Version 2.01 (Linux 64-bit)
  • LASER manual: Detailed instructions and examples for running LASER
  • TRACE manual: Detailed instructions and examples for running TRACE
  • HGDP data: Include 632,958 autosomal SNPs for 938 unrelated individuals from the Human Genome Diversity Project (see Notes below).
  • Reference sequence: The human reference sequence hs37d5.fa.
  • Archive: Software package of version 1.03

Notes:

  • The HGDP data in Downloads are based on the Illumina 650K SNP data published by Li et al. (2008, Science 319: 1100-1104). We processed the data as described in our paper (Wang et al. 2014, Nature Genetics 46: 409-415). Main steps include updating genomic coordinates to Build 37, removing tri-allelic SNPs, flipping alleles to the forward strand, and formatting the data to a reference genotype format taken by the LASER program. We post the processed data to assist users of LASER. The original data can be downloaded from the Stanford HGDP website.

Software History:

  • Details of the version changes are documented in Section 8 of the LASER manual.
  • June 5, 2014 - Upload version 2.01 software package
  • May 19, 2014 - Upload version 2.0 software package
  • August 8, 2013 - Upload version 1.03 software package
  • June 19, 2013 - Upload version 1.02 software package
  • March 11, 2013 - Upload version 1.01 software package
  • February 1, 2013 - Upload version 1.0 software package, HGDP data, and the reference sequence hs37d5.fa