ARTICLE
Detect and adjust for population stratification in
population-based association study using genomic
control markers: an application of Affymetrix
Genechips Human Mapping 10K array
Ke Hao1, Cheng Li1,2, Carsten Rosenow3 and Wing H Wong*,1,4
1Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA; 2Department of Biostatistics,
Dana Farber Cancer Institute, Boston, MA, USA; 3Genomics Collaboration, Affymetrix, Santa Clara, CA, USA;
4Department of Statistics, Harvard University, Cambridge, MA, USA
Population-based association design is often compromised by false or nonreplicable findings, partially due
to population stratification. Genomic control (GC) approaches were proposed to detect and adjust for this
confounder. To date, the performance of this strategy has not been extensively evaluated on real data.
More than 10 000 single-nucleotide polymorphisms (SNPs) were genotyped on subjects from four
populations (including an Asian, an African-American and two Caucasian populations) using GeneChips
Mapping 10 K array. On these data, we tested the performance of two GC approaches in different
scenarios including various numbers of GC markers and different degrees of population stratification. In
the scenario of substantial population stratification, both GC approaches are sensitive using only 20–50
random SNPs, and the mixed subjects can be separated into homogeneous subgroups. In the scenario of
moderate stratification, both GC approaches have poor sensitivities. However, the bias in association test
can still be corrected even when no statistical significant population stratification is detected. We
conducted extensive benchmark analyses on GC approaches using SNPs over the whole human genome.
We found GC method can cluster subjects to homogeneous subgroups if there is a substantial difference in
genetic background. The inflation factor, estimated by GC markers, can effectively adjust for the
confounding effect of population stratification regardless of its extent. We also suggest that as low as
50 random SNPs with heterozygosity 440% should be sufficient as genomic controls.
European Journal of Human Genetics (2004) 12, 1001–1006. doi:10.1038/sj.ejhg.5201273
Published online 15 September 2004
Keywords: population stratification; population-based study; association test; genomic control
Introduction
In theory, for equivalent sample size, association test is far
more powerful than pedigree-based linkage studies in
searching for genomic regions underlying human diseases.1
The basic idea behind association test is that the disease
alleles are more frequent in ascertained cases than in
controls. Markers physically close to the disease loci will
also be detected because of linkage disequilibrium (LD).
However, the application of this approach is compromised
by false or nonreplicable findings,2 partially due to
population stratification, which causes unlinked markers
to show association with the phenotype.3,4 Recent population
admixture also bias association test, and an example is
the spurious finding Received 18 February 2004; revised 11 June 2004; accepted 22 July 2004 between immunoglobulin haplotype
*Correspondence: Dr WH Wong, Department of Biostatistics, Harvard
University, Harvard School of Public Health, 655 Huntington Ave.,
Building II, Room 441, Boston, MA 02115, USA. Tel: þ1 617 432 4912;
Fax: þ1 617 739 1781; E-mail: wwong@hsph.harvard.edu
European Journal of Human Genetics (2004) 12, 1001–1006
& 2004 Nature Publishing Group All rights reserved 1018-4813/04 $30.00
www.nature.com/ejhgGm3,5,13,14 and NIDDM in the Gila River Indian Community.
5 The study was confounded by the subjects’ degree of
Caucasian genetic heritage.6
To overcome this serious danger, a correction strategy
has been proposed.6,7 It requires to genotype additional
unlinked markers, often called ‘genomic control (GC)
markers’, as the cost of detecting and correcting possible
confounders. Under the assumption of no association
between GC markers and phenotype and no population
stratification, the w2 statistics of association test between
the ith GC marker and case–control status, denoted as Yi
2,
follows a w2 distribution with one degree of freedom if
using additive genetic model. And the sum of the w2
statistics of n GC markers, denoted as Yn 2 , follows a w2
distribution with n degree of freedom, where we can easily
test whether the population stratification is present.
Furthermore, we assume the test statistic is inflated by a
factor l, Yn 2 /lBwn 2 . If we assume l is constant for all loci, we
can then use it to adjust the population stratification. One
robust way to estimate the inflation factor is:
l ¼ Median ðY2
i Þ=0:456;
where 0.456 is simply the median of w2 distribution with
one degree of freedom.8 We denote this method as the
combined w2 approach in this paper. An alternative
method, proposed by Pritchard et al,9 tackles this problem
in two steps. Firstly, the GC markers are used to separate
the study subjects into genetically homogeneous subgroups,
and second, the association tests will be conducted
within each subgroup.
To date, the performance of the GC approaches has not
been examined extensively in real genotype data. Previous
researches were based on simulated data or small number
of GC markers. The Affymetrix Mapping 10K array has
recently become available, offering the ability to genotype
more than 10 000 single-nucleotide polymorphisms (SNPs)
across the human genome in a timely manner.10 Using this
technology, we evaluated the current genomic control
approaches in testing and controlling for population
stratification.
Methods
Study subjects
Four groups of subjects were used in the current study, (1)
20 Asians, (2) 42 African-Americans, (3) 42 Caucasian
collected by Coriell Institute, and (4) 54 Caucasian subjects
collected from Utah, USA. The DNA samples of group (1–3)
were purchased from Coriell Institute, and the group (4)
samples were collected by Centre d’Etude du Polymorphisme
Humain (CEPH) research laboratory. All subjects
were unrelated individuals and remained anonymous to
the authors.
Genotyping
A measure of 250 ng genomic DNA of each subject was
digested with XbaI at 371C for 4 h. The DNA fragments
undergo ligation to a universal adaptor and then PCRamplification
with a common primer. The amplicon was
cleaved by partial DnaseI digest to shorter fragments, and
labeled with biotinylated ddATP using terminal deoxytransferase.
The labeled DNA was injected into the
microarray cartridge and incubated overnight. The hybridized
microarray was washed and stained following a
three-step protocol, and was scanned under the manufacturer’s
directions (Affymetrix). Finally, the genotype was
determined using an automated scoring software (Affymetrix).
The detailed genotyping procedure used has been
previously described elsewhere.10 This data set has
been made available to the public at
http://www.
affymetrix.com/support/developer/resource_center/index.
affx?terms¼no.
Statistical analysis
Only autosomal markers were used in the analysis. We
firstly compared the allele frequencies and heterozygosities
of the genotyped SNPs among populations. Second, we
evaluated the performance of genomic control method in
detecting population stratification through an iterative
procedure. We pooled genotypes from different groups
together, that is, Asian and African Americans, and
attempted to detect this mixture using GC approach. In
each iteration loop, we randomly selected n¼20 or 50
SNPs from the data set, calculated Yn 2 in the combined w2
approach, and conducted test for population stratification.
Overall, 10 000 iterations were carried out, and we
summarized the power as P (Po0.05). Furthermore, we
assessed the degree of bias it would cause in association
tests if we ignored the underlying population stratification.
The Armitage’s trend test for additive model was used.8 We
randomly assigned a fraction (0, 25, 50, 75 and 100%) of
each ethnic group to be disease affected, pooled two groups
together, and tested for disease–SNP association. This
simulation procedure was repeated 10 000 times, and we
recorded the rejection rate at 0.05 level. Upon observed
substantial population stratification, we also estimated the
inflation factor (l), and calculated the rejection rate again
with controlling for population stratification.
We also applied the Pritchard’s approach, which is a
model-based clustering method using unlinked SNPs to
infer population structures, and assign individuals to
clusters.9 The method is implemented in a software,
STRUCTURE (version 2), which was downloaded from
http://pritch.bsd.uchicago.edu. We evaluated this method
by pooling two ethnic groups together, and run STRUCTURE
to detect the population structure using 50 or 500
GC SNPs.
Performance of genomic control markers
K Hao et al
1002
European Journal of Human Genetics
Results
A total of 158 unrelated individuals from four ethnic
groups were genotyped on 10 043 SNP markers by the array,
with an overall call rate of 96.4%. These SNPs are fairly
polymorphic in our study samples, and the average
heterozygosity (440%) and allele frequency (420%) of
the SNPs were similar across all four ethnic groups (Table 1).
Using the combined w2 approach, we found 10–20 SNPs
were sufficient to detect population stratification in the
scenarios of Asian-Caucasian, Asian-African American and
Caucasian-African American mixture at the nominal 0.05
level (Table 2). The power in rejecting the null (no
population stratification) was over 95% in these cases by
only using 10 genomic control SNPs. However, when
mixing the Caucasian subjects collected by Coriell Inc. and
those collected by CEPH, we have limited power to detect
the stratification even using 50 random SNPs (Table 2).
One possibility stands as there was no significant population
stratification between these two groups of Caucasian
subjects, so that we can conduct association test without
adjustment. However, we observed substantial bias in the
test if we mix any two ethnic groups together (Table 3). In
the cases of mixing the two groups of Caucasian subjects,
the rejection rate could be more than 20% under the null
hypothesis (Table 3). It should be noted, that in the 0.5/0.5
situation of Table 3, we simulated case–control studies
matched on ethnicity. When sample size is small-tomoderate,
using asymptotic w2 distribution tends to yield
overestimated P-value and result into conservative test.11
Only when sample size becomes large, the asymptotic
P-value is accurate.11 As a consequence, in the 0.5/0.5
column of Table 3, the rejection rates were slightly less
than a level except when mixing the two Caucasian groups
where population stratification was less severe. Upon
observing strong bias in marker–disease association testing
if ignoring the population stratification, we used the
estimated inflation factor (l) to adjust the association
tests, and obtained correct rejection rate (Table 4). In
addition, we simulated situations of mixing two ethnic
groups (eg Asian and African-American), where one group
was matched in cases and controls but the other group was
mismatched. In this case, we also observed elevated false-
Table 1 Mean heterozygosity and allele frequency of the SNPs among study subjects
Groups Caucasian (n¼42) Utah (n¼54) Asian (n¼20) African-American (n¼42)
Heterozygosity (%) 45.9 45.8 41.3 46.8
Allele frequency (%) 25.3 25.0 22.8 25.2
Table 2 Power of testing for population stratification at 0.05 level*
10 random SNPs 20 random SNPs 50 random SNPs
Ethnic groups Power M(p) Power M(p) Power M(p)
Asian vs Caucasian 97.2% 3.5105 100% 3.3105 100% o1015
Utah vs Caucasian 9.0% 0.424 23.2% 0.192 38.8% 0.091
African-American vs Caucasian 99.2% 2.91011 100% o1015 100% o1015
Asian vs Utah 97.4% 2.8108 100% o1015 100% o1015
African-American vs Asian 99.4% 3.6109 99.9% o1015 100% o1015
African-American vs Utah 99.8% 6.51013 100% o1015 100% o1015
*Power is estimated on 10 000 iterations; M(p), median P-value.
Table 3 Rejection rate of association test under the null hypothesis at 0.05 levela
Case/control ratio 1/0 (%) 0.75/0.25 (%) 0.5/0.5 (%) 0.25/0.75 (%) 0/1 (%)
Asian vs Caucasian 55.4 28.1 4.1 28.0 56.8
Utah vs Caucasian 23.1 9.4 5.1 9.1 22.0
African American vs Caucasian 62.2 37.0 4.4 37.2 62.3
Asian vs Utah 57.5 28.9 4.2 27.2 57.6
African American vs Asian 60.9 32.5 4.3 32.5 60.2
African American vs Utah 66.2 40.0 4.3 40.7 65.4
aEstimation was based on 10 000 iterations. Caucasian, Caucasian samples collected by Coriell Institute. Utah, Caucasian samples collected by
CEPH lab.
Performance of genomic control markers
K Hao et al