Close Menu
My Blog

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Nautilus debuts Voyager platform in push toward next-gen proteomics

    March 1, 2026

    First-in-Human Success for Prenatal Stem Cell Therapy in Spina Bifida

    February 28, 2026

    Pressure-Driven Pathway Links Liver Congestion to Fibrosis and Cancer

    February 28, 2026
    Facebook X (Twitter) Instagram
    X (Twitter) YouTube
    My BlogMy Blog
    Sunday, March 1
    • Home
    • About Us
    • Healthy Living
    • DNA & Genetics
    • Podcast
    • Shop
    My Blog
    Home»DNA & Genetics»Researchers Decode How Protein Language Models Think, Making AI More Transparent
    DNA & Genetics

    Researchers Decode How Protein Language Models Think, Making AI More Transparent

    adminBy adminSeptember 29, 2025No Comments5 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
    The Scientist Logo
    Share
    Facebook Twitter LinkedIn Pinterest Email

    For large language models (LLMs) like ChatGPT, accuracy often means complexity. To be able to make good predictions, ChatGPT must deeply understand the concepts and features that are associated with each word—but how it gets to this point is typically a black box.

    Similarly, protein language models (PLMs), which are LLMs used by protein scientists, are dense with information. Scientists often have a hard time understanding how these models solve problems, and as a result, they struggle to judge the reliability of the models’ predictions.

    Bonnie Berger is a mathematician and computer scientist at the Massachusetts Institute of Technology. She’s interested in using large language models to study proteins.

    Bonnie Berger

    “These models give you an answer, but we have no idea why they give you that answer,” said Bonnie Berger, a mathematician and computer scientist at the Massachusetts Institute of Technology. Because it’s difficult to assess the models’ performance, “people either put zero trust or all their trust in these protein language models,” Berger said. She believes that one way to calm these qualms is to try to understand how PLMs think.

    Continue reading below…

    Recently, Berger’s team applied a tool called sparse autoencoders, which are often used to make LLMs more interpretable, to PLMs.1 By making the dense information within PLMs sparser, the researchers could uncover information about a protein’s family and its functions from a single sequence of amino acids. This work, published in the Proceedings of the National Academy of Sciences, may help scientists better understand how PLMs come to certain conclusions and increase researchers’ trust in them.

    James Fraser poses in front of a blurred background of a laboratory bench. He’s wearing a dark grey tee underneath a blue/green checkered shirt.

    James Fraser is a biophysicist at the University of California, San Francisco who uses computational approaches to study protein conformation. He was not involved in the study.

    James Fraser

    “[This study] tells us a lot about what the models are picking up on,” said James Fraser, a biophysicist at the University of California, San Francisco who was not involved in the study. “It’s certainly cool to get this kind of look under the hood of what was previously kind of a black box.”

    Berger thought that part of people’s excitement about PLMs had come from AlphaFold’s success. But while both PLMs and AlphaFold are AI tools, they work quite differently. AlphaFold predicts protein structure by aligning a lot of protein sequences. Models like these typically boast a high level of accuracy, but researchers must spend considerable time and resources to train them.

    On the other hand, PLMs are designed to predict features of a protein, like how it interacts with other proteins, from a single sequence. PLMs learn the relationship between protein sequence and function instead of the relationship between different protein sequences. While they learn much faster, they may not be as accurate.

    “When large language models that only take a single sequence came along, people thought, ‘We should believe this too,’” Berger said. “But now, they’re at the stage of, ‘Oh my gosh, they’re not always right.’” To know when PLMs are right or wrong, researchers first need to understand them.

    PLMs are highly complex. Each neuron in the neural network—AI’s equivalent of a brain—is assigned to more than one discrete unit of information, called tokens. Conversely, multiple neurons often process each token.

    Onkar Gujral poses in a park, in front of a building called “THE PARK”. He’s wearing glasses, a blue T-shirt underneath a black jacket, and a dark blue headwrap.

    Onkar Gujral is a fifth-year mathematics PhD student at the Massachusetts Institute of Technology, advised by Bonnie Berger. He was the lead author of the study.

    Onkar Gujral

    “You store information in clusters of neurons, so the information is very tightly compressed,” said Onkar Gujral, a graduate student in Berger’s group who led the study. “Think of it as entangled information, and we need to find a way to disentangle this information.”

    Continue reading below…

    This is where the sparse autoencoders come in. They allow information stored in the neural network to spread out among more neurons. With less tightly packed information, researchers can more easily figure out which neuron in the network associates with which feature of a protein, much like how neuroscientists try to assign specific functions to brain regions.

    Next, the team fed the processed information to Claude, an LLM, which added annotations such as the protein’s name, family, and related pathways. “By disentangling the information, we can now interpret what’s going on inside the protein language model,” Gujral said.

    Fraser said, “This paper is among the first in a group of similar papers that came out roughly around the same time,” citing several preprint publications by other groups of researchers that also used sparse autoencoders to better understand PLMs.2-4

    But Berger’s team didn’t think that disentangling information was enough. They also wanted to follow the models’ train of thought. To do this, the researchers used transcoders, a variant of sparse autoencoders that track how information changes from one “layer” of the neural network to another. “It might give you the model’s logic of thinking—its change of thoughts—which can give you more confidence in its output,” Berger said.

    Fraser thought that the quest to make PLMs more interpretable is a “really cool frontier,” but he still questions its practicality. “We’ve got AI interpreting AI. Then we need more AI to interpret that result—we’re going down a rabbit hole,” he said. “It’s very, very hard to directly figure out what features the autoencoders are actually revealing.”

    Berger, on the other hand, is confident that she’ll be able to put her tool to use. Her team previously developed a PLM to optimize antibody design for therapeutics and another to predict the interaction between drugs and their targets.5,6 She hopes to use sparse autoencoders and transcoders to better understand these models.

    decode Language Making Models Protein Researchers Transparent
    Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
    Previous ArticleWhy It’s Easier for Men to Lose Weight Than Women
    Next Article Skipping Breakfast? Here’s What It’s Really Doing To Your Stress Levels
    admin
    • Website

    Related Posts

    A Video Report from AGBT

    February 27, 2026

    Novo Nordisk, Vivtex Ink Up to $2.1B Deal to Develop Oral Biologics for Metabolic Conditions

    February 27, 2026

    Increasing Rice Yields with Gene-Informed Selective Breeding

    February 27, 2026

    Mutant p53 Selective Reactivation Demonstrated in Advanced Solid Tumors

    February 27, 2026
    Leave A Reply Cancel Reply

    Our Picks

    9 Time-Saving Kitchen Gadgets for Fall at Amazon

    September 5, 2025

    Why Exercise Is So Important For Heart Health, From An MD

    September 5, 2025

    An Engineered Protein Helps Phagocytes Gobble Up Diseased Cells

    September 5, 2025

    How To Get Rid Of Hangnails + Causes From Experts

    September 5, 2025
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    Longevity

    Nautilus debuts Voyager platform in push toward next-gen proteomics

    By adminMarch 1, 20260

    Company’s new benchtop system promises a clearer view of proteins following validation at a leading…

    First-in-Human Success for Prenatal Stem Cell Therapy in Spina Bifida

    February 28, 2026

    Pressure-Driven Pathway Links Liver Congestion to Fibrosis and Cancer

    February 28, 2026

    A cellular atlas of aging comes into focus

    February 28, 2026

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us

    At FineGut, our mission is simple: to enhance your self-awareness when it comes to your gut health. We believe that a healthy gut is the foundation of overall well-being, and understanding the brain–gut connection can truly transform the way you live.

    Our Picks

    9 Time-Saving Kitchen Gadgets for Fall at Amazon

    September 5, 2025

    Why Exercise Is So Important For Heart Health, From An MD

    September 5, 2025

    An Engineered Protein Helps Phagocytes Gobble Up Diseased Cells

    September 5, 2025
    Gut Health

    Nautilus debuts Voyager platform in push toward next-gen proteomics

    March 1, 2026

    First-in-Human Success for Prenatal Stem Cell Therapy in Spina Bifida

    February 28, 2026

    Pressure-Driven Pathway Links Liver Congestion to Fibrosis and Cancer

    February 28, 2026
    X (Twitter) YouTube
    • Contact us
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    © 2026 finegut.com. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.