At the turn of the century, the biomedical research community saw a landmark global effort, when scientists built a reference human genome sequence to better understand the genes underlying health and disease. However, they stitched together this reference genome sequence from individual genomes of people belonging to a small slice of humanity.
“Most of the omics data, including transcriptomic data, is dominated by samples that have been obtained from individuals of European ancestry,” said Roderic Guigó, a computational genomics researcher at Barcelona Institute of Science and Technology. To bridge this gap, he teamed up with Marta Melé, a transcriptomics and functional genomics researcher at Barcelona Supercomputing Center.
Now, the researchers analyzed samples belonging to people from eight genetically diverse populations and identified thousands of novel transcripts not found in the reference transcriptome.1 Their findings, published in Nature Communications, highlight the extent of ancestry bias in gene maps, which prevents scientists from obtaining important insights about the biology and disease risk in non-European populations.
Current Human Gene Maps Are Biased Towards European Ancestry
It is not surprising that there is a Eurocentric bias, but it was interesting to see the extent, said Divya Tej Sowpati, a computational genomics researcher at the Center for Cellular and Molecular Biology, who was not involved in the study. “It is good that someone went ahead and showed it. It was important to catalog [it], and it’s commendable,” he added.
Members of the research team that uncovered the extent of ancestry bias pictured in facilities housing MareNostrum5, the supercomputer which was critical for processing the vast amounts of data generated by the study.
Mario Ejarque / BSC-CNS
Melé agreed that the ancestry bias in transcriptomes was not unexpected. “When we started this project, we had the suspicion that this might be the case, that [there] could be a bias,” she said. What surprised her was that nobody had looked at these differences despite the well-documented bias in the field towards samples from European ancestry.
For instance, a majority of the sequence of the original reference human genome came from just a handful of people, many of whom were enrolled through a newspaper advertisement in New York. By looking at more diverse genomes such as those from the GenomeIndia Project or Egyptian Multiomics Dataset, scientists uncovered previously unreported genes associated with disease risk between people from European and non-European ancestry.2
Long-Read RNA Sequencing Uncovers the Extent of European Bias in Genomics
Building on such studies, Guigó, Melé, and their team sought to investigate whether transcripts differed between the populations. For this, they used long-read RNA sequencing (RNA-seq), a technology that can sequence RNA molecules in a full-length transcript from end to end.3 They sequenced RNA extracted from B cell lines derived from 43 people belonging to eight populations across Africa, America, Asia, and Europe.
By employing a series of stringent filters to ensure quality, the team identified more than 155,000 transcripts from these samples. Of these, more than 41,000 were novel and had not been reported in any official gene map. Nearly 700 of these novel transcripts came from DNA regions previously thought to contain no genes.
To study the extent of European ancestry bias, the team then grouped the cell line samples as belonging to European or non-European ancestries and compared them against conventional reference maps. Compared to the former, the latter samples carried more novel transcripts, highlighting that non-European transcripts are less represented in reference gene maps.
Guigó, Melé, and their team also identified more than 2,200 population-specific transcripts present in one ancestry but not others. While non-European population-specific transcripts were mostly novel, most transcripts for European populations were already characterized.
Missing Transcripts Have Implications for Disease Biology
The team discovered that many of the novel ancestry-specific transcripts occurred in genes associated with autoimmune diseases, which present differently between the populations. Current reference maps do not contain information about such transcripts.
“When you lack [a] reference that is unbiased or that represents the populations fully…it has the potential for you to miss important connections…between genetics, diseases, and genetic ancestry,” said study coauthor Fairlie Reese, a genomics researcher in Melé’s group. This limits a better understanding, diagnosis, as well as treatment of diseases in non-European populations.
“[For] example…if there is a mutation…in a transcript that is not annotated, that we don’t have the [correct] map [of], we’re going to think that this mutation or this change doesn’t have any effect,” explained MelĂ©. In contrast, a more complete characterization of transcripts from all over the world can provide information about the implications of such mutations, “because we have the maps that are correct and are more representative of the whole humanity.”

Divya Tej Sowpati is a computational genomics researcher at the Center for Cellular and Molecular Biology, who was involved in the GenomeIndia Project that uncovered the genetic diversity in Indian population.
Shambhavi Garde
The data generated in the study can offer important insights into disease susceptibility and severity in populations all over the world, agreed Sowpati. “This is paving way to a new paradigm in RNA-seq or that kind of a transcript-based analysis.”
Despite this, Sowpati noted that the small sample size does not capture all of human genetic diversity. “But this is a good starting point.”
Guigó agreed, “This is not a sufficiently representative sampling of the human transcriptome diversity.” As part of the human pangenome project—an ongoing effort to build a more complete and more diverse human reference genome—scientists have catalogued transcriptomes of hundreds of populations and analyzing that data is important, he added.
“[This] is more like the tip of the iceberg,” said Melé. She hopes that scientists can expand such investigations to other populations as well as other cell types. “We need to fix it as a scientific community, not only [us], to get more representation of other populations and other cell types and tissues.”
