For large language models (LLMs) like ChatGPT, accuracy often means complexity. To be able to make good predictions, ChatGPT must deeply understand the concepts and features that are associated with each word—but how it gets to this point is typically a black box.
Similarly, protein language models (PLMs), which are LLMs used by protein scientists, are dense with information. Scientists often have a hard time understanding how these models solve problems, and as a result, they struggle to judge the reliability of the models’ predictions.
Bonnie Berger is a mathematician and computer scientist at the Massachusetts Institute of Technology. She’s interested in using large language models to study proteins.
Bonnie Berger
“These models give you an answer, but we have no idea why they give you that answer,” said Bonnie Berger, a mathematician and computer scientist at the Massachusetts Institute of Technology. Because it’s difficult to assess the models’ performance, “people either put zero trust or all their trust in these protein language models,” Berger said. She believes that one way to calm these qualms is to try to understand how PLMs think.
Recently, Berger’s team applied a tool called sparse autoencoders, which are often used to make LLMs more interpretable, to PLMs.1 By making the dense information within PLMs sparser, the researchers could uncover information about a protein’s family and its functions from a single sequence of amino acids. This work, published in the Proceedings of the National Academy of Sciences, may help scientists better understand how PLMs come to certain conclusions and increase researchers’ trust in them.

James Fraser is a biophysicist at the University of California, San Francisco who uses computational approaches to study protein conformation. He was not involved in the study.
James Fraser
“[This study] tells us a lot about what the models are picking up on,” said James Fraser, a biophysicist at the University of California, San Francisco who was not involved in the study. “It’s certainly cool to get this kind of look under the hood of what was previously kind of a black box.”
Berger thought that part of people’s excitement about PLMs had come from AlphaFold’s success. But while both PLMs and AlphaFold are AI tools, they work quite differently. AlphaFold predicts protein structure by aligning a lot of protein sequences. Models like these typically boast a high level of accuracy, but researchers must spend considerable time and resources to train them.
On the other hand, PLMs are designed to predict features of a protein, like how it interacts with other proteins, from a single sequence. PLMs learn the relationship between protein sequence and function instead of the relationship between different protein sequences. While they learn much faster, they may not be as accurate.
“When large language models that only take a single sequence came along, people thought, ‘We should believe this too,’” Berger said. “But now, they’re at the stage of, ‘Oh my gosh, they’re not always right.’” To know when PLMs are right or wrong, researchers first need to understand them.
PLMs are highly complex. Each neuron in the neural network—AI’s equivalent of a brain—is assigned to more than one discrete unit of information, called tokens. Conversely, multiple neurons often process each token.

Onkar Gujral is a fifth-year mathematics PhD student at the Massachusetts Institute of Technology, advised by Bonnie Berger. He was the lead author of the study.
Onkar Gujral
“You store information in clusters of neurons, so the information is very tightly compressed,” said Onkar Gujral, a graduate student in Berger’s group who led the study. “Think of it as entangled information, and we need to find a way to disentangle this information.”
This is where the sparse autoencoders come in. They allow information stored in the neural network to spread out among more neurons. With less tightly packed information, researchers can more easily figure out which neuron in the network associates with which feature of a protein, much like how neuroscientists try to assign specific functions to brain regions.
Next, the team fed the processed information to Claude, an LLM, which added annotations such as the protein’s name, family, and related pathways. “By disentangling the information, we can now interpret what’s going on inside the protein language model,” Gujral said.
Fraser said, “This paper is among the first in a group of similar papers that came out roughly around the same time,” citing several preprint publications by other groups of researchers that also used sparse autoencoders to better understand PLMs.2-4
But Berger’s team didn’t think that disentangling information was enough. They also wanted to follow the models’ train of thought. To do this, the researchers used transcoders, a variant of sparse autoencoders that track how information changes from one “layer” of the neural network to another. “It might give you the model’s logic of thinking—its change of thoughts—which can give you more confidence in its output,” Berger said.
Fraser thought that the quest to make PLMs more interpretable is a “really cool frontier,” but he still questions its practicality. “We’ve got AI interpreting AI. Then we need more AI to interpret that result—we’re going down a rabbit hole,” he said. “It’s very, very hard to directly figure out what features the autoencoders are actually revealing.”
Berger, on the other hand, is confident that she’ll be able to put her tool to use. Her team previously developed a PLM to optimize antibody design for therapeutics and another to predict the interaction between drugs and their targets.5,6 She hopes to use sparse autoencoders and transcoders to better understand these models.