Background | Previous GAWs | Publications | GAW Advisory Committee | Home
Researchers at Microsoft have created an extremely large (120K individuals) synthetic dataset based on the SNP frequencies and pedigree from the GAW14 COGA dataset. The data set contains large amounts of population structure, family relatedness, and cryptic relatedness. The dataset should be useful to researchers who want to create GWAS algorithms that can handle such datasets. Details of how the dataset was generated can be found in the methods section of
C. Lippert, J. Listgarten, Y. Liu, C.M. Kadie, R.I. Davidson, and D. Heckerman. FaST linear mixed models for genome-wide association studies. Nature Methods, 8: 833-835, Oct 2011 (doi:10.1038/nmeth.1681).
The synthetic data based on COGA GAW14 is available here. Please email Jean MacCluer, firstname.lastname@example.org, for the password for the password that will be required to open the file. Click here for the README file. If you use the data in a publication, please cite the reference above.