BMG Genomics: An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data

Li-Ju WangCatherine W. ZhangSophia C. SuHung-I H. ChenYu-Chiao ChiuZhao LaiHakim BouamarAmelie G. RamirezFrancisco G. CigarroaLu-Zhe Sun & Yidong Chen

Abstract

Background

Europeans and American Indians were the major genetic ancestry of Hispanics in the U.S. These ancestral groups have markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause a biased genetic association study with cancer susceptibility variants specifically in Hispanics. For example, the incidence rate of liver cancer has been shown with a substantial disparity between Hispanic, Asian, and non-Hispanic white populations. Currently, ancestry informative marker (AIM) panels have been widely utilized with up to a few hundred ancestry-informative single nucleotide polymorphisms (SNPs) to infer ancestry admixture. Notably, current available AIMs are predominantly located in intron and intergenic regions, while the whole-exome sequencing (WES) protocols commonly used in translational research and clinical practice do not cover these markers. Thus, it remains challenging to accurately determine a patient’s admixture proportion without additional DNA testing.

Results

In this study, we designed a unique AIM panel that infers 3-way genetic admixture from three distinct and selective continental populations (African (AFR), European (EUR), and East Asian (EAS)) within evolutionarily conserved exonic regions. Initially, about 1 million exonic SNPs from selective three populations in the 1000 Genomes Project were trimmed by their linkage disequilibrium (LD), restricted to biallelic variants, and finally, we optimized to an AIM panel with 250 SNP markers, or the UT-AIM250 panel, using their ancestral informativeness statistics. Comparing to published AIM panels, UT-AIM250 performed better accuracy when we tested with three ancestral populations (accuracy: 0.995 ± 0.012 for AFR, 0.997 ± 0.007 for EUR, and 0.994 ± 0.012 for EAS). We further demonstrated the performance of the UT-AIM250 panel to admixed American (AMR) samples of the 1000 Genomes Project and obtained similar results (AFR, 0.085 ± 0.098; EUR, 0.665 ± 0.182; and EAS, 0.250 ± 0.205) to previously published AIM panels (Phillips-AIM34: AFR, 0.096 ± 0.127, EUR, 0.575 ± 0.290, and EAS, 0.330 ± 0.315; Wei-AIM278: AFR, 0.070 ± 0.096, EUR, 0.537 ± 0.267, and EAS, 0.393 ± 0.300). Subsequently, we applied the UT-AIM250 panel to a clinical dataset of 26 self-reported Hispanic patients in South Texas with hepatocellular carcinoma (HCC). We estimated the admixture proportions using WES data of adjacent non-cancer liver tissues (AFR, 0.065 ± 0.043; EUR, 0.594 ± 0.150; and EAS, 0.341 ± 0.160). Similar admixture proportions were identified from corresponding tumor tissues. In addition, we estimated admixture proportions of The Cancer Genome Atlas (TCGA) collection of hepatocellular carcinoma (TCGA-LIHC) samples (376 patients) using the UT-AIM250 panel. The panel obtained consistent admixture proportions from tumor and matched normal tissues, identified 3 possible incorrectly reported race/ethnicity, and/or provided race/ethnicity determination if necessary.

Conclusions

Here we demonstrated the feasibility of using evolutionarily conserved exonic regions to infer admixture proportions and provided a robust and reliable control for sample collection or patient stratification for genetic analysis. An R implementation of UT-AIM250 is available at https://github.com/chenlabgccri/UT-AIM250.

Learn More Button

 

 

 

Article Categories: All News, Research Paper

Since 2004, UT Health San Antonio, Greehey Children’s Cancer Research Institute’s (Greehey CCRI) mission has been to advance scientific knowledge relevant to childhood cancer, contribute to understanding its causes, and accelerate the translation of knowledge into novel therapies. Greehey CCRI strives to have a national and global impact on childhood cancer by discovering, developing, and disseminating new scientific knowledge. Our mission consists of three key areas — research, clinical, and education.

Stay connected with the Greehey CCRI on Facebook, Twitter, LinkedIn, and Instagram.