Meta TRIBE v2 Predicts Brain Responses to Media

Key Takeaways

Meta’s TRIBE v2 is designed to predict how the brain processes images, sounds, and speech.
The model was trained on more than 1,000 hours of fMRI data from 720 people.
It combines video, audio, and text signals, then turns them into brain maps with tens of thousands of voxels.
In tests, TRIBE v2 often matched the group’s average brain response better than a single person’s scan.
Meta has released the code, weights, and an interactive demo for researchers.

Meta’s TRIBE v2 is a new AI model that tries to predict how the human brain reacts to media. It looks at images, sounds, and speech, then estimates which parts of the brain will respond and how strongly. That makes it more than a clever tech demo. It could become a practical tool for neuroscience research, especially when real brain scans are slow, expensive, and noisy.

Here’s the catch: instead of running a fresh lab experiment every time researchers want to test a stimulus, TRIBE v2 can simulate the likely brain response first. Meta says the model was trained on a large fMRI dataset and can generalize to new subjects without retraining, which is a major step for “in-silico” neuroscience, or brain research done with computational models.

How TRIBE v2 works

TRIBE v2 takes three inputs: video, audio, and text. Each one is processed by a separate Meta model first. Text goes through Llama 3.2, audio through Wav2Vec-BERT 2.0, and video through Video-JEPA-2. After that, a transformer combines the signals and predicts activity across roughly 70,000 voxels, the tiny 3D units used in fMRI scans.

That setup matters because the brain rarely deals with just one sense at a time. When you watch a movie or listen to a podcast, your brain is handling multiple streams at once. TRIBE v2 is designed for that messier, real-world kind of input, not just simple lab stimuli. Meta’s tests suggest that combining all three channels can improve prediction, especially in regions where different senses meet and blend.

The model also seems to follow a useful rule: more training data helps. Meta reports that TRIBE v2 still has room to improve as larger fMRI datasets become available. That is a familiar pattern in modern AI, and it hints that brain-prediction models may keep getting stronger as neuroscience data grows.

Why researchers care

One of the most interesting parts of the story is not just that TRIBE v2 predicts brain activity, but that it reproduces classic neuroscience findings. In controlled tests, it identified known brain areas for faces, places, bodies, and language. It also matched expected patterns in language experiments, including the split between sentence processing and word lists. In other words, it did not just make random guesses. It echoed results scientists have spent decades documenting by hand.

That could save time in the lab. Instead of guessing which experiment is worth running, researchers could test ideas on the model first, then focus their expensive scans on the most promising questions. Meta also says the tool may help with future brain-inspired AI systems and, eventually, disease research. The company has made the model code, weights, and demo publicly available.

What TRIBE v2 still cannot do

Even with its progress, TRIBE v2 is not a full picture of the mind. It works through fMRI, which is useful but slow and indirect. That means it misses the faster electrical activity happening inside neurons. It also only covers three input types, so it does not handle touch, smell, or balance. And it treats the brain mostly as a receiver of input, not as an active system making choices and taking action.

So the real takeaway is balanced. TRIBE v2 is not a mind reader, and it is not ready to explain everything the brain does. But it is a strong sign that AI can help neuroscience move faster, especially in areas where human testing is costly and hard to scale. For researchers, that is a big deal. For everyone else, it is another reminder that AI is becoming deeply useful in places far beyond chatbots and image generation.