Scientist fully mapped centromeres in a diploid human cell line for the first time. Their work will help reveal maternal and paternal sequence divergence and advancing functional genomic research.
Due to their repetitive and complex DNA sequences, centromeres have been viewed as the “black boxes” of the genome for decades. Often overlooked in sequencing projects but playing a critical role in cell division, centromeres are a significant aspect of the genome that scientists are only just starting to understand.
“While the rest of the genome shuffles between the maternal and paternal origins, centromeres are inherited intact: one from the maternal and one from the paternal origin for each chromosome, carrying [ancestral] information,” said Simona Giunta, a human genomics researcher at the Sapienza University of Rome.
In a new study, Giunta and her team released the near-complete genome sequence—for the first time including both parental centromeres—of a human diploid cell line commonly used in laboratories around the world.1 The study establishes a foundation for generating high-quality reference genomes across all widely used cell lines, ensuring that functional genomic studies more accurately capture patient-specific genetic variation and better inform the development of tailored therapies.
When Giunta established her lab in 2021, she embarked on a quest to fill the unresolved gaps left in the official human genome reference at the time, especially around centromeres. She and her team described in a recent Science paper that human centromeres have a unique organization that is specific to each chromosome and consistent in different individuals.2 Now, in this new study, they released the assembled genome sequence of a reference human cell line to validate their prior results. “We put centromeres and any other region of the genome in the picture, opening a new way to do genome biology in every field,” Giunta said.
Derived from a noncancerous human retinal pigment epithelial (RPE) cell, RPE-1 is one of the most used reference cell line in experimental settings, often serving as a key model in drug discovery and genetic disease studies. As a diploid cell line, the RPE-1 genome contains both maternal and paternal centromeres, providing a perfect platform to enhance scientists’ understanding of the role of centromeric DNA in cell regulation and disease.
Highly repetitive regions, such as centromeres, are notoriously challenging to assemble. Therefore, the team utilized long-read sequencing technologies and advanced computational algorithms to capture these sequences in unprecedented detail. Remarkably, when compared with publicly available human reference genomes, most of the RPE-1 sequence maintains close similarity to recent high-quality human genomes, such as those found in the Human Pangenome Reference Consortium. They also found that there was no evidence of polyploidy or other extensive chromosomal rearrangements in the RPE-1 centromere sequences, underscoring the cell line’s continued value as a model for studying human cellular processes and functional genomics.
Separating the maternal and paternal haplotypes in genomic research is crucial, as it helps researchers understand how each allele uniquely influences gene activity and uncover hidden patterns of regulation.3 However, the assembly of centromeres in diploid or polyploid genomes is challenging because of the need to manually correct highly similar sequences on two homologous chromosomes. To resolve this challenge, the team relied on experts in genome manual gaps curation at The Rockefeller University. While the newly published genome may still contain some errors, it is significantly more complete than previous versions.
Giunta and her team also showed that in regions like centromeres, sequence identity between haplotypes drops as low as 85 percent, indicating that as more genomes are sequenced, the notion that humans are 99.9 percent genetically identical will be revised. Andrea Bodnar, a biochemist at Gloucester Marine Genomics Institute, who was not involved in this study but was part of the team that first immortalized the RPE-1 cell line, explained that the genomic information generated in this study will provide scientists with a robust new foundation for more reliable discoveries in human biology.4
“The near-complete assembly of a diploid genome that was carried out by one laboratory is a remarkable achievement. However, for others to use this work properly, future works will need to improve the tools available to validate and use this assembly for more precise genomic analysis,” said Cheng-Zhong Zhang, a bioinformatician at Harvard University, who was not involved in the study. According to Zhang, the authors also raise an important point of caution: Scientists should sequence the RPE-1 reference cell line in their labs against the new reference sequence to ensure that further transgene insertion didn’t alter their reference sequence.
“To really understand the role of the centromeres’ sequence component within the genome, we needed to lay the foundations. We are dedicating this to the people doing their experiment and now using this sequencing,” said Giunta. The study opens the door for the complete genome assembly of other cell lines to better understand disease-related mutations or genetic variants in all individuals.