The Genographic Project is studying the genetic signatures of ancient human migrations and creating an open-source research database. global scale using genetics as a tool. Samples are collected in two ways. First, the project comprises a consortium of ten scientific teams from around the world united by a core ethical and scientific framework that is responsible for sample collection and analysis in their respective region. Second, the project promotes public participation in countries around 535-83-1 manufacture the world and anyone can participate by purchasing a participation kit (Video S1). The mitochondrial DNA (mtDNA), typed in female participants, is usually inherited from 535-83-1 manufacture the mother without recombining, being particularly informative with respect to 535-83-1 manufacture maternal ancestry. Over the first 18 months of public participation in the project we have built up the largest to date database of mtDNA variants, containing 78,590 entries from around the world. Here, we describe the procedures used to generate, manage, and analyze the genetic data, and the first insights from them. We can understand new aspects of the structure of the mtDNA tree and develop much better ways of classifying mtDNA. We therefore now release this dataset and the new methods we have developed, and will continue to update them Igf1 as more people join the Genographic Project. Introduction The plethora of human mitochondrial DNA (mtDNA) studies in recent years has made this molecule one of the most extensively investigated genetic systems. Its abundance in human cells; uniparental, nonrecombining mode of inheritance; and high mutation rate compared to that of the nuclear genome, has made mtDNA attractive to scientists from many disciplines. Knowledge of mtDNA sequence variation is usually rapidly accumulating, and the field of anthropological genetics, which initially made use of only the first hypervariable segment (HVS-I) of mtDNA, is currently being transformed by total mtDNA genome analysis . While contemporary combined sources offers approximately 65,000 HVS-I records (Oleg Balanovsky, unpublished data) and over 2,000 total mtDNA sequences, troubles remain in standardizing these published data, as they report varying sequence lengths and different coding-region SNPs, and apply any number of methodologies for classifying haplotypes into informative haplogroups (Hgs) [2,3]. For example, some studies have defined the HVS-I range to comprise nucleotides 16093C16383 , some 16024C16365 , some 535-83-1 manufacture adhered to the widely accepted definition of 16024C16383 , while others extended the reported range to include positions such as 16390 and 16391 due to their predictive value in identifying certain specific clades [7,8]. Even more serious is usually the problem of Hg assignment, which, in the absence of total sequence data, is best achieved by genotyping a combination of coding-region biallelic polymorphisms. Forensic studies (which comprise a significant portion of the existing dataset) and many population studies published before 2002 have predicted Hgs based on the HVS-I motif alone, thereby ignoring the occurrence of homoplasy and back mutations [2,9]. Moreover, it has been shown that many published mtDNA databases contain errors that distort phylogenetic and medical conclusions [10C15]. Therefore, it has become abundantly clear that a phylogenetically reliable and systematically quality-controlled database is needed to serve as a standard for the comparison of any newly reported data whether medical, forensic, or anthropological . The Genographic Project, begun in 2005, allows any individual to participate by purchasing a buccal swab kit. Male samples are analyzed for a combination of male specific Y chromosome (MSY) short tandem repeat loci and SNPs. Female samples undergo a standard mtDNA genotyping process that includes direct 535-83-1 manufacture sequencing of the extended HVS-I (16024C16569) and the typing of a panel of 22 coding-region biallelic sites. Results are returned anonymously through the Internet (http://www.nationalgeographic.com/genographic) after passing a multi-layered quality check process in which phylogenetic principles are applied throughout, and which is supported by a specialized laboratory information management system. HVS-I haplotypes are reported based on the direct sequencing results. Hgs are defined by a combined use of the 22-SNP panel results and the HVS-I haplotypes. Following successful typing and reporting of the genotyping results, each participant may elect to donate his or her anonymous genotyping results to Genographic’s research database. The magnitude of the project and its worldwide scale offer a unique opportunity to create a large, rapidly expanding, standardized database of HVS-I haplotypes and corresponding coding-region SNPs. Here, we report our experience from.