What Happens Inside the Brain When Someone Watches Your Ad

We built a tool that predicts real fMRI brain activation in response to video content, using Meta's TRIBE v2 neural encoding model. Here is what the neuroscience means for how you make creative decisions - and why measuring attention is no longer enough.

The problem with attention metrics

For years, video advertising has been measured by a single proxy: did someone watch it? View-through rate, average watch time, completion rate. These metrics tell you that eyes were on the screen. They do not tell you what the brain did with what it saw.

A viewer can watch 30 seconds of your video and retain nothing emotionally meaningful. Or they can glance at four seconds of a scene and have a strong associative memory encoded. The difference is not visible in the analytics dashboard. It is happening inside the brain - in the visual cortex, in the auditory processing regions, in the memory and emotion centers that determine whether your message actually lands.

This gap between viewership and cognitive impact is the fundamental unresolved problem in video creative. And it is why most A/B testing in video advertising tells you very little about why one creative outperforms another.

700+

human fMRI subjects trained the TRIBE v2 model

Meta AI Research, Algonauts 2025

20k

cortical vertices measured per frame of video

TRIBE v2 neural encoding model

brain regions tracked in real time: visual, auditory, language, emotion, memory, prefrontal

app.publicimpact.ai

What TRIBE v2 actually does

TRIBE v2 is a neural encoding model developed at Meta AI Research. It was trained on fMRI data from over 700 human subjects watching video content. The model learned, at a very fine resolution, how different visual and auditory signals map to brain activation patterns across the cortex.

When you upload a video to app.publicimpact.ai, the model processes every frame, extracts audio and visual features, runs them through the same encoding pathways learned from real brain data, and outputs predicted BOLD (Blood-Oxygen-Level-Dependent) signals across 20,000 cortical vertices. These are the same signals a neuroscientist would measure in an fMRI scanner - predicted, in seconds, from your video file.

Algonauts 2025 winner: TRIBE v2 achieved the highest predictive accuracy of any model in the Algonauts Challenge 2025 - the leading international benchmark for brain-computer alignment research. This is not a marketing model trained on click data. It is neuroscience research running inside a browser.

Source: Meta AI Research / Algonauts Challenge 2025

The six brain regions that matter for marketing

The tool aggregates the cortical predictions into six Regions of Interest (ROIs) that are directly interpretable for creative decisions:

ROI 1

Visual Cortex

Responds to motion, contrast, color, and compositional complexity. Peaks at cuts, movement, and high-contrast visual moments.

ROI 2

Auditory Cortex

Tracks tonal variation, speech rhythm, music, and background sound. Reveals whether your audio is working with or against your visual.

ROI 3

Language Areas

Processes spoken words, captions, and written text. Shows which moments of dialogue or narration actually get processed semantically.

ROI 4

Emotion (Amygdala)

The most commercially important signal. Emotional activation predicts memorability, social sharing, and purchase intent more reliably than any click metric.

ROI 5

Memory (Hippocampus)

Encodes episodic context. High memory activation at the moment your brand or product appears is the neurological correlate of brand recall.

ROI 6

Prefrontal Cortex

Associated with decision-making, evaluation, and intent. Elevated activity here during your CTA is a direct signal of conscious consideration.

How the analysis works end-to-end

Upload a video to app.publicimpact.ai and the pipeline runs automatically on an A100 GPU:

TranscriptionWhisperX extracts the spoken word with timestamp-level precision. Language is detected automatically - German, English, or any other language works without configuration.

Feature extractionTRIBE v2 processes visual frames and audio signals in parallel, extracting the multimodal features the neural encoder was trained on.

Neural encodingThe model predicts BOLD activation across 20,000 cortical vertices, frame by frame. This is the same process as simulating an fMRI scan of a viewer watching your video.

ROI time seriesPredictions are averaged over the six brain regions and plotted as time series. You can see exactly when Visual, Emotion, and Memory activation peaks and drops across the video timeline.

Brain heatmapsNilearn renders glass brain images at the moments of highest and lowest total activation - giving you a visual representation of which cortical areas lit up and when.

GPT-4o recommendationsBased on the ROI scores, transcript, and heatmaps, GPT-4o generates a qualitative analysis with specific, actionable recommendations for improving the creative.

What this means for how you make creative decisions

The practical implications are significant. Consider a 60-second brand film. Traditional analysis tells you average watch time and where drop-off occurs. Neural analysis tells you something much more useful: at second 38, when your presenter says the brand name for the first time, does the memory region spike? At second 52, when you show the product, does the emotional region activate? At your call-to-action, does prefrontal engagement rise - or has the video already exhausted cognitive load and produced a flat response?

These are the questions that determine whether a video converts. They cannot be answered by view counts. They can only be answered by understanding what is happening inside the brain of the viewer.

What the heatmaps reveal: In tests with existing ad creatives, the glass brain heatmaps consistently show two patterns. Videos with high emotional and memory activation at brand moments perform significantly better in recall studies. Videos with high visual activation but flat emotional response produce attention without purchase intent - the "I've seen this ad but can't remember what it was for" phenomenon.

Internal analysis, app.publicimpact.ai

Why this is possible now

Two things changed in the last 18 months. First, Meta's TRIBE v2 research matured to the point where cross-subject neural encoding predictions are accurate enough to be useful outside the lab. The Algonauts 2025 benchmark results confirmed that TRIBE v2's predictions correlate strongly with actual measured brain activity - across subjects who were not in the training data.

Second, the cost of GPU inference dropped to a point where running this kind of analysis per-video is commercially viable. The entire pipeline runs on a single A100 GPU in three to five minutes. A year ago, equivalent compute would have required a reserved cluster and a six-figure budget.

The result is a capability that previously existed only inside neuroscience research labs - now accessible to anyone with a video file and a browser.

How to use it in practice

Pre-launch creative testing: Upload two versions of an ad before spending budget. Compare emotional and memory activation across the timeline. The version with higher activation at brand and CTA moments will perform better in market.
Editing decisions: When you are choosing between cuts, the neural analysis tells you which version produces stronger activation at the moments that matter - not which one a focus group says they prefer.
Content audit: Upload your existing video library. Identify which pieces have produced genuine emotional and memory encoding versus those that have produced only visual attention.
Script optimization: Use the language ROI time series to see which sentences in your narration are actually being processed semantically - and which ones are being tuned out.

app.publicimpact.ai is live. Upload any video file. The first analysis takes three minutes for the GPU to start; subsequent uploads within the same session are faster. The output includes the full ROI time series, brain heatmaps, transcript alignment, and the GPT-4o creative analysis.

Das Problem mit Aufmerksamkeitsmetriken

Seit Jahren wird Video-Werbung mit einem einzigen Proxy gemessen: ob jemand zugeschaut hat. View-through-Rate, durchschnittliche Sehdauer, Completion Rate - diese Kennzahlen bestätigen, dass Augen auf dem Bildschirm waren. Was das Gehirn dabei gemacht hat, sagen sie nicht.

Jemand kann 30 Sekunden lang zuschauen und danach nichts Relevantes erinnern. Oder vier Sekunden einer Szene reichen aus, um eine starke Erinnerung zu hinterlassen. Dieser Unterschied zeigt sich nicht im Analytics-Dashboard. Er entsteht tief im Gehirn - im visuellen Kortex, in den Hörzentren, in den Gedächtnis- und Emotionsbereichen, die am Ende darüber entscheiden, ob deine Botschaft wirklich landet.

Genau diese Lücke zwischen dem, was jemand schaut, und dem, was im Kopf bleibt, ist das zentrale ungelöste Problem im Video-Marketing. Und der Grund, warum A/B-Tests dir fast nie erklären können, warum ein Creative besser performt als ein anderes.

700+

menschliche fMRT-Probanden haben das TRIBE v2-Modell trainiert

Meta AI Research, Algonauts 2025

20.000

kortikale Vertices werden pro Video-Frame gemessen

TRIBE v2 Neural Encoding Model

Gehirnregionen in Echtzeit: Visuell, Auditiv, Sprache, Emotion, Gedächtnis, Präfrontal

app.publicimpact.ai

Was TRIBE v2 tatsächlich macht

TRIBE v2 ist ein neuronales Kodierungsmodell, das bei Meta AI Research entwickelt wurde. Trainiert auf fMRT-Daten von über 700 Probanden, die echte Videos gesehen haben, hat das Modell in sehr feiner Auflösung gelernt, wie visuelle und auditive Signale auf Aktivierungsmuster im Gehirn einzahlen.

Wenn du ein Video auf app.publicimpact.ai hochlädst, analysiert das Modell jeden einzelnen Frame. Es extrahiert Audio- und Bildmerkmale und schickt sie durch dieselben Kodierungspfade, die aus echten Gehirnscans gelernt wurden. Das Ergebnis sind vorhergesagte BOLD-Signale über 20.000 kortikale Vertices - dieselben Signale, die ein Neurowissenschaftler im fMRT-Scanner messen würde. Nur dass hier das Ergebnis in Sekunden vorliegt, nicht nach Stunden im Labor.

Algonauts 2025-Gewinner: TRIBE v2 erzielte die höchste Vorhersagegenauigkeit aller Modelle im Algonauts Challenge 2025, dem führenden internationalen Benchmark für neuronale Kodierungsforschung. Kein Marketing-Modell, das auf Klickdaten trainiert wurde. Echte Neurowissenschaft - jetzt im Browser.

Quelle: Meta AI Research / Algonauts Challenge 2025

Die sechs Gehirnregionen, die für Marketing zählen

Das Tool fasst die kortikalen Vorhersagen in sechs Bereiche zusammen, die sich direkt für Creative-Entscheidungen nutzen lassen:

ROI 1

Visueller Kortex

Reagiert auf Bewegung, Kontrast, Farbe und Bildkomplexität. Springt bei harten Schnitten, schnellen Bewegungen und starken Kontrasten besonders stark an.

ROI 2

Auditiver Kortex

Verfolgt Tonhöhe, Sprachrhythmus, Musik und Hintergrundklang. Zeigt, ob dein Audio das Bild unterstützt oder dagegen arbeitet.

ROI 3

Sprachareale

Verarbeitet Gesprochenes, Untertitel und eingeblendeten Text. Zeigt, welche Teile deiner Narration wirklich ankommen - und welche einfach nicht gehört werden.

ROI 4

Emotion (Amygdala)

Das kommerziell wichtigste Signal überhaupt. Emotionale Aktivierung ist ein zuverlässigerer Indikator für Erinnerung, Teilen und Kaufabsicht als jede Klickmetrik.

ROI 5

Gedächtnis (Hippocampus)

Springt dieser Bereich genau dann an, wenn deine Marke ins Bild kommt, ist das der neurologische Fingerabdruck von echtem Marken-Recall.

ROI 6

Präfrontaler Kortex

Zuständig für Abwägen, Bewerten und Entscheiden. Steigt die Aktivität hier während deines CTAs, ist das ein direktes Signal für bewusstes Nachdenken.

Wie die Analyse im Detail abläuft

Einfach ein Video auf app.publicimpact.ai hochladen - der Rest läuft automatisch auf einer A100-GPU:

TranskriptionWhisperX erkennt das gesprochene Wort auf Sekunden genau. Die Sprache wird automatisch erkannt - Deutsch, Englisch oder alles andere, ohne manuelle Einstellung.

MerkmalsextraktionTRIBE v2 verarbeitet Bild- und Audiosignale parallel und zieht die multimodalen Merkmale heraus, auf denen das Modell trainiert wurde.

Neuronale KodierungDas Modell berechnet die BOLD-Aktivierung über 20.000 kortikale Vertices, Frame für Frame. Das entspricht in etwa einem simulierten fMRT-Scan eines Zuschauers, der dein Video ansieht.

ROI-ZeitreihenDie Vorhersagen werden über die sechs Gehirnbereiche gemittelt und als Zeitkurven dargestellt. Du siehst auf der Zeitachse genau, wann Aktivierung steigt - und wann sie abfällt.

Gehirn-HeatmapsNilearn rendert Glass-Brain-Bilder für die Momente mit der höchsten und niedrigsten Gesamtaktivierung - eine direkte visuelle Darstellung, welche Kortexbereiche wann aktiv waren.

GPT-4o-AnalyseAuf Basis der ROI-Werte, des Transkripts und der Heatmaps schreibt GPT-4o eine qualitative Einschätzung mit konkreten, umsetzbaren Verbesserungsvorschlägen für dein Creative.

Was das für deine Creative-Entscheidungen bedeutet

Nehmen wir einen konkreten Fall: einen 60-Sekunden-Markenfilm. Klassische Metriken sagen dir, wie lange die Leute zugeschaut haben und wo sie abgebrochen sind. Die neuronale Analyse stellt dir die Fragen, die wirklich zählen: Springt die Gedächtnisregion an, als dein Sprecher bei Sekunde 38 zum ersten Mal den Markennamen nennt? Aktiviert sich die Emotionsregion, wenn das Produkt bei Sekunde 52 ins Bild kommt? Und beim Call-to-Action: Steigt das präfrontale Engagement noch einmal an - oder ist die Aufmerksamkeit des Zuschauers da längst aufgebraucht?

Das sind die Fragen, die darüber entscheiden, ob ein Video konvertiert. Klickzahlen geben keine Antwort darauf. Die bekommst du nur, wenn du verstehst, was im Gehirn des Zuschauers tatsächlich passiert.

Was die Heatmaps immer wieder zeigen: In Tests mit bestehenden Ad-Creatives treten zwei Muster konsistent auf. Videos mit starker emotionaler Aktivierung und Gedächtnisaktivierung rund um die Markenmomente schneiden in Recall-Studien klar besser ab. Videos mit hoher visueller Aktivierung, aber flacher emotionaler Reaktion, erzeugen Aufmerksamkeit ohne Kaufabsicht - das klassische "Ich hab die Werbung gesehen, weiß aber nicht mehr wofür"-Phänomen.

Interne Analyse, app.publicimpact.ai

Warum das erst jetzt möglich ist

Zwei Dinge haben sich in den letzten 18 Monaten verändert. Erstens hat die TRIBE v2-Forschung bei Meta einen Reifegrad erreicht, bei dem neuronale Vorhersagen über verschiedene Probanden hinweg präzise genug sind, um außerhalb des Labors wirklich nützlich zu sein. Die Algonauts-2025-Ergebnisse haben das bestätigt: TRIBE v2-Vorhersagen korrelieren stark mit echten Hirnmessungen - sogar bei Probanden, die nicht im Training waren.

Zweitens sind die Kosten für GPU-Berechnungen stark gefallen. Eine Analyse dieser Art pro Video ist heute kommerziell machbar. Die gesamte Pipeline läuft auf einer einzigen A100-GPU in drei bis fünf Minuten. Noch vor einem Jahr hätte dieselbe Rechenleistung einen dedizierten Cluster und ein sechsstelliges Budget vorausgesetzt.

Eine Fähigkeit, die bisher nur in Forschungslaboren existierte, steht heute jedem offen - mit einer Video-Datei und einem Browser.

So setzt du es konkret ein

Creative-Tests vor dem Launch: Lade zwei Versionen einer Anzeige hoch, bevor du Budget einsetzt. Vergleich die Emotions- und Gedächtnisaktivierung über den gesamten Zeitverlauf. Die Version mit stärkerer Aktivierung an Marken- und CTA-Momenten wird im Markt besser abschneiden.
Schnittwahl: Wenn du zwischen verschiedenen Schnittversionen wählst, zeigt dir die neuronale Analyse, welche an den entscheidenden Momenten stärker zieht - nicht was eine Fokusgruppe gerade bevorzugt.
Bestandsanalyse: Lade deine vorhandenen Videos hoch. Schau, welche echte emotionale Erinnerungen hinterlassen haben - und welche nur Aufmerksamkeit erzeugt haben, ohne wirklich zu wirken.
Skriptoptimierung: Schau dir die Sprach-ROI-Kurve an und sieh, welche Sätze deiner Narration das Gehirn wirklich verarbeitet - und welche einfach durchrauschen.

app.publicimpact.ai ist live. Lade einfach eine Video-Datei hoch. Beim ersten Mal dauert der Start rund drei Minuten, weil die GPU erst hochfahren muss. Danach geht es schneller. Du bekommst die vollständigen ROI-Zeitkurven, die Glass-Brain-Heatmaps, die Transkriptsynchronisierung und die GPT-4o-Auswertung mit konkreten Empfehlungen.