overview
The aim of HGPGD is to provide the search of genetic differences from single gene-level or system-level. Details are as follows:
(1) Query types
query types |
single gene |
KEGG pathway |
GO category |
||
BP |
MF |
CC |
|||
number |
18,158 |
220 |
3,269 |
862 |
508 |
(2) population genetic features
(total 11 features,Click here for a detailed description: single gene, KEGG pathway, GO category):
allele frequency
Fst
r^2
Dprime
Block number
Block size
SNP density
Haplotype diversity
tagSNP percent
captured percent
average max r^2
(3) genetic differences between pair populations
There were total 11 HapMap populations:
ASW:African Americans from the American Southwest
CEU:Utah residents with Northern and Western European ancestry from the CEPH collection
CHB:Han Chinese in Beijing, China
CHD:Chinese in Metropolitan Denver, Colorado
GIH:Gujarati Indians in Houston, Texas
JPT:Japanese in Tokyo, Japan
LWK:Luhya in Webuye, Kenya
MEX:Mexican ancestry in Los Angeles, California
MKK:Maasai in Kinyawa, Kenya
TSI:Toscans in Italy
YRI:Yoruba in Ibadan, Nigeria
User can search genetic differences between any two populations (total 55 population pairs).
1. HapMap genotype data
We used public data from the HapMap project. The international HapMap project, launched in 2002, is an international effort to document the common SNPs in the human genome. Currently, the HapMap includes 11 sample populations: African Americans from the American Southwest (ASW), Utah residents with Northern and Western European ancestry from the CEPH collection (CEU), Han Chinese in Beijing, China (CHB), Chinese in Metropolitan Denver, Colorado (CHD), Gujarati Indians in Houston, Texas (GIH), Japanese in Tokyo, Japan (JPT), Luhya in Webuye, Kenya (LWK), Mexican ancestry in Los Angeles, California (MEX), Maasai in Kinyawa, Kenya (MKK), Toscans in Italy (TSI), and Yoruba in Ibadan, Nigeria (YRI). We selected 1,002 unrelated individuals and 1,063,592 autosomal SNPs in all 11 HapMap populations. 987,019 SNPs passed quality control (QC) criteria: Hardy-Weinberg equilibrium (HWE) p>0.001 in an individual population, call frequency >0.75, and minor allele frequency (MAF)>0.01 (Table 1).
Table 1 Summary of HapMap data
HapMap populations |
ASW |
CEU |
CHB |
CHD |
GIH |
JPT |
LWK |
MEX |
MKK |
TSI |
YRI |
total |
Number of HapMap samples |
83 |
174 |
86 |
85 |
88 |
89 |
90 |
77 |
171 |
88 |
176 |
1207 |
Number of Unrelated individuals |
49 |
116 |
86 |
85 |
88 |
89 |
90 |
50 |
143 |
88 |
118 |
1002 |
SNPs in all 11 populations |
1,063,592 |
|||||||||||
SNPs passed QC |
987,019 |
2. Human genome data
A total of 18,158 entries for autosomal gene information (there were at least 2 SNPs within these gene regions) were extracted from the "seq-gene" file downloaded from the NCBI ftp website. All records include chromosome, chr_start, chr_stop, feature_id (NCBI gene ID), "feature_type" of "gene" and "group_label" of "reference". Genes that had 0 or 1 common SNPs in HapMap were removed in our study. The average size of these genes was 38,353 bp.
3. Gene Ontology data
The GO project is a collaborative effort to develop and use ontologies to support biologically meaningful annotation of genes and their products. It provides an ontology of defined biological descriptors (GO terms) representing gene product properties and is structured as a directed acyclic graph. The GO project contains three ontologies: biological process (BP), describing a broad biological objective; molecular function (MF), describing the elemental activities of a gene product at the molecular level; and cellular component (CC), describing the location of the gene product. In this study, each GO category that was considered as a functional gene set was used to identify the association with genetic differences among the 11 HapMap populations. The "term" file (the definitions of each node or term) and the "graph_path" file (the parent-child relationships for each node) were downloaded from the Gene Ontology website. To associate the GO categories with gene IDs, the file "gene2go" was downloaded from the NCBI ftp. There were some entries which do not have support evidences, such as entries with Evidence codes: "NAS" (non-traceable author statement) and "ND" (no biological data available were removed). These entries were removed from "gene2go". Finally, HGPGD database contains 4,989 GO categories containing at least ten genes: BP, 3,629 categories; MF, 862 categories; and CC, 508 categories. .
4. KEGG data
220 functional pathways (at least 10 genes in each pathway) were from the KEGG pathway database. KEGG is an useful resource for systematic analysis of gene functions, which records networks of molecular interactions in the cells.
Here,you can search the population genetic differences of single gene, KEGG pathway and GO category.
home