To understand how organisms are related, researchers use molecular information to construct phylogenetic trees. Most of the time, scientists use thousands of protein-coding sequences to determine these relationships. However, when organisms evolve rapidly, acquire the same characteristics independently, or lose traits, accurately assigning a species a place on the tree of life becomes difficult.
In response to this, researchers are on the hunt for new genomic markers to help them resolve species’ lineages. Transposable elements, non-coding regions of DNA that can replicate themselves in the genome, present one potential option for acquiring more phylogenetic information. When researchers first considered these self-copying sequences for phylogeny studies, though, identifying these elements from the genome was difficult.
With advances in genome sequencing and assembly tools, researchers can now identify and analyze transposable elements more easily. Using these new technologies, researchers at the Okinawa Institute of Science and Technology constructed phylogenetic trees with transposable element sequences that closely resembled trees built using traditional gene elements.1 The findings, published in Current Biology, provide a pathway for using transposable elements alongside protein-coding genes to determine species’ origins.
According to Thomas Bourguignon, an evolutionary biologist and study coauthor, the study came about by serendipity. His graduate student, Cong Liu, was annotating termite genomes that the group was studying for another project. This process included marking transposons and sorting them into families.
In an attempt to manage his data, Liu decided to cluster species based on their transposons. When he saw that the grouping resembled a tree, he admitted, “I somehow took it for granted.” When Bourguignon spotted the coincidental arrangement, he thought it was worth exploring further.
“There’re these questions about, how much do transposons, let’s say, coevolve with their host somehow. It’s unclear how much phylogenetic signal there is,” said Bourguignon. He said that in the present work he and his team showed, “there is a huge amount of phylogenetic signal, so much so that you can basically use them to build the actual phylogeny.”
The team assembled transposable element libraries for 45 different species of termites and two species of cockroach, which served as more distantly related organisms. They then combined these individual libraries, representing almost 38,000 sequences, to study the use of these elements across species.
Researchers can organize transposable elements into families based on their sequence similarities. Bourguignon, Liu, and their team saw that the prevalence of certain transposable element families was greater in some species than others, with some of these elements even being specific to termite families. This suggested that transposable element families have the potential to be used for phylogenetic profiling.
Next, the team constructed two phylogenetic trees of these 47 species from their transposable element data and compared them to a model that they previously assembled using highly conserved genomic regions.2 One tree only considered the presence or absence of a transposable element family in the species genome, whereas the second tree focused on whether the transposon family was within 100 base pairs of the highly conserved regions of the termite genome.
Both trees made from the transposable element families closely resembled the model using conserved genomic regions. However, the tree that only considered the presence or absence of the families contained more differences in its organization of related species compared to the tree restricted to considering areas flanking conserved regions.
To further explore the use of transposable elements as phylogenetic markers, the researchers compared the similarity of their trees assembled from transposable element information to those that they built using either 13 mitochondrial protein genes or more than 1,400 orthologous genes.
Compared to the model tree, the tree made from orthologous genes had the closest resemblance, while the one made from mitochondrial genes was highly dissimilar to this initial tree. Again, the researchers saw that trees made from transposable element family data restricted to within 100 base pairs of the conserved genome regions produced models that were closer to the established trees.
Ultimately, the researchers concluded that, using modern genome sequencing and annotation tools, transposable elements can provide useful information to help distinguish phylogenetic relationships. “It’s a great exploration of how best to use this new resource,” agreed Max Telford, a zoologist studying animal evolution at University College London who was not involved with the study. One strength of the research, he said, was that transposable elements are more resistant to convergent evolution. “Anything that gives us more information means that we’re likely to be able to reconstruct those very difficult to resolve bits of the tree,” he said.
Telford continued, “It’s a great effort with some small caveats.” One of these, Telford said, was that researchers don’t know what the true evolutionary tree for these organisms should look like. Additionally, he said that previous efforts using transposable elements were even more specific about where in the genome the sequence needed to be to consider whether or not two species shared the element. Being stricter, Telford said, improves the ability to determine whether indeed the species have the same transposable element.
In the future, the team hopes to determine whether lower quality genomes can also provide useful insights. “It’s also something more general, and invites more tests in other lineages,” said Liu.
