Vi bruker informasjonskapsler for analyse og markedsføring. Les mer
Back to Lab
m51 Lab ProductMay 2026 · 11 min

M51 Cortex: the brain that learns while the agents work

We rebuilt the core of M51 AI OS. Cortex is an intelligence architecture that does not just retrieve information — it remembers, learns and reasons causally about what actually works for each individual customer. Here is what is live, why it is technically different, and what value it gives the marketing team.

Key findings (TL;DR)

  • M51 Cortex is the new core architecture powering all 17 AI agents in M51 AI OS, live in production from May 2026.
  • Cortex is not a better search engine. It is a learning system that remembers what the agents have done, tracks cause and effect in marketing data, and builds reusable knowledge over time.
  • Every single claim the agents rely on is traceable back to source, date and model. No anonymous answers, no source-less hallucinations.
  • Norwegian is first-class in Cortex, not a translation afterthought. Tokenizer, search, embeddings and reasoning are tested against Norwegian terminology.
  • For the marketing team that means: shorter time from question to qualified answer, higher trust in AI-generated reports, and a platform that gets better every week without anyone having to retrain anything.
  • Measured against the old Brain v1 RAG architecture, Cortex retrieves relevant information 8-16× more often on the same customer queries.

Background: why we rebuilt the brain

In 2025, M51 AI OS ran on what we internally called Brain v1. It was a competent retrieval architecture: documents were indexed, the agents found relevant chunks, and reports were written on that basis. It worked for one thing: answering questions about data that had already been written down.

It did not work for what we actually wanted: a system that gets smarter for every customer, every campaign and every decision it is involved in. Brain v1 had three structural limitations.

It forgot context between sessions. Each agent session started from scratch. The lessons from last week were not available as lessons, only as documents the agent might rediscover.

It could not distinguish correlation from causation. Marketing data is full of correlations that are not causal. Brain v1 found patterns but could not tell whether they were causes.

It had no mechanism for comparing what-happened with what-could-have-happened. Counterfactual reasoning was out of reach.

Cortex is the answer to those three limitations, gathered in one architecture. It was designed over the winter of 2025/26, built in sprints through the spring, and is now live for our pilot customers.

What Cortex actually is

Cortex is a multi-layer intelligence system that sits between the agents and the customer data. From the outside it looks like a search. On the inside it is four things at the same time.

A memory system with provenance

Every atomic claim stored in Cortex is tagged with source document, source line, time of extraction and the model that pulled it out. When an agent cites something, you know exactly where it comes from. It is not a feature on top, it is a design assumption running through every layer.

A reflection loop

After an agent has delivered an analysis or report, a dedicated reflection step distills what was useful, what was wrong, and what should be remembered next time. These reflections become updated beliefs in Cortex, not just logs no one reads.

A skill library

When a work pattern repeats several times and delivers quality, it gets packaged as a reusable skill with built-in verification logic. The next time a similar task comes in, the agent picks up the skill instead of starting from zero. This is procedural knowledge that compounds.

A causal model

In parallel with ordinary semantic and time-based representation, Cortex models what actually causes what in each customer's data. That is the difference between knowing "ad spend went up at the same time as conversions rose" and knowing "ad spend caused a specific lift in conversions in this segment but not in that one".

We will not go into more implementation detail in this article. What matters for the customer is what the sum of these four choices means in practice.

The four principles Cortex is built on

These four are the non-negotiable design choices that make Cortex different from a reinforced RAG setup. Each choice has a cost. We believe the cost is worth it.

1. Provenance is non-negotiable

Every claim that ends up in a report, a campaign proposal or a chat answer from Nova is traceable. You can click from "according to data from last quarter, churn has risen 12 percent" to the exact row in the data source, the date it was recorded and the extraction model that interpreted it. This is built for a market where AI-generated claims must be defensible to boards, customers and management, not just look convincing.

2. Cortex learns actively, not passively

A traditional RAG system receives information when someone feeds in new documents. Cortex receives information the same way, but also through the agents themselves generating new insight through their work, with the reflection layer distilling it back into the brain. That means an agent that has worked with a customer for three months has a qualitatively different understanding of that customer than an agent new to the data, even if both have access to the same raw documents.

3. Skills compound over time

When an agent finds a work pattern that delivers quality, for example a particular way to structure a monthly report for B2B SaaS customers, it can be promoted to a reusable skill. The skill has its own verification logic: it recognises when it fits and it knows when it does not. As the platform runs for more customers, the skill library grows. It is one of the few components in a SaaS platform where every new customer makes the product better for all the others, without any data leaking between them.

4. Causal sits alongside semantic

Most AI systems for marketing operate on similarity and chronology. "Which campaigns resemble this one?" "What happened just before this happened?" Cortex has an explicit causal model alongside those, because marketing decisions are causal by nature. The question is rarely "what does this resemble", it is almost always "what is making this happen".

What the customer actually notices

Architecture is interesting for engineers. For the marketing team, Cortex means five concrete things.

Shorter time from question to qualified answer

When Nova or one of the specialist agents is asked something, Cortex retrieves pre-synthesized context instead of searching from scratch. In our internal measurements, time from prompt to finished answer on complex analysis questions is reduced by 40 to 60 percent compared with Brain v1.

Traceable reports

Every KPI, every claim, every recommendation in an automatically generated report points back to the source data. That has value as trust, but also as compliance and as audit evidence if anything is questioned.

Consistency over time

An agent that works with the same customer over weeks and months remembers what is relevant for that customer specifically. Same tone of voice, same priorities, same context. This is not a fine-tuning problem, it is an architecture problem that Cortex solves through the reflection layer.

Better recommendations over time

As the skill library grows, suggested actions become more precise. A good example: campaign structures that have delivered documented returns at comparable customers can be proposed as a starting point, not as an idea from a blank slate.

Honest uncertainty

Cortex has confidence scoring as a first-class value. A claim that is weakly supported is flagged as weakly supported. A recommendation based on little data says so itself. This is the opposite of AI systems that deliver every answer with the same confident tone regardless of the evidence.

Where Cortex differs from the alternatives

There are reasonable comparisons and there are misleading comparisons. Here are the reasonable ones.

SystemWhat it isWhere Cortex is different
Traditional RAGRetrieves document chunks based on similarityCortex retrieves pre-synthesized claims with provenance, not raw chunks
Enterprise search (e.g. Glean)Single-tenant, designed for large enterprisesCortex is multi-tenant by design, optimized for the agency and team model
Per-user agent memory (Mem0, Letta, Zep)Remembers what a specific user has saidCortex models business data, not conversation history
GraphRAGKnowledge graph over static documentsCortex has semantic graph, time graph and causal graph in parallel, continuously updated
Generic LLM with context windowRemembers within a single conversationCortex remembers across conversations, agents and over time, with provenance

Most AI services for marketing today fall into one of these categories. Cortex takes the best from several and adds causality and reflection on top. It is not an incremental choice, it is an architecture decision we made explicitly to build a moat that grows with usage.

What it means for you as a customer

The comparison table above tells you how Cortex differs architecturally. What we measure internally is what actually happens when an agent looks for information in your customer data. Against the old Brain v1 RAG architecture, Cortex retrieves relevant information between 8 and 16 times more often on the same customer queries.

For you as a marketing director or analyst, that translates into three concrete things in daily work:

  • Far fewer answers where the agent "cannot find anything". If the information exists in your master folder, Cortex picks it up on the first attempt.
  • Reports and recommendations are built on several relevant data points at once, not just the first best hit. That delivers more precise conclusions and fewer surface-level observations.
  • The time you spend verifying AI output goes down. When retrieval hits the right thing, there is less to sniff out and double-check before you send the report onwards.

The explanation sits in the architecture: the provenance layer lets retrieval operate on pre-synthesized claims with source tags instead of raw text chunks. The causal and time graph filters out hits that look lexically relevant but are wrong in time or context. It is the difference between "resembles the question" and "is the right answer".

Norwegian is first-class

One detail that often disappears in AI marketing: which language the system is optimized for. Many English-first platforms treat Norwegian as a translation surface. The result is agents that understand Norwegian questions approximately but think in English and translate back.

In Cortex, Norwegian is a first-class citizen. Concretely that means:

  • Tokenizer choices are tested on Norwegian text, not optimized for English and hopefully reused on Norwegian.
  • The embedding model is chosen because it scores well on Norwegian benchmarks, not because it is popular in Silicon Valley.
  • The search layer handles Norwegian morphology (compound words, inflections, casing variants) correctly.
  • The reflection loop generates internal notes in the same language as the reports they are meant to support.

English is also fully supported. But it is not the default all the way down in Cortex.

See also: How well does AI know the Norwegian market? (GEO series, part 1)

What is live now, and what is coming

Cortex v2.0 is live in production from May 2026 for our existing customers. That means:

  • The full provenance layer is active. All claims and reports generated from launch onwards are traceable.
  • The reflection loop runs on every agent session and builds up learning per customer.
  • The skill library has a limited set of verified skills at launch and grows continuously based on real customer work streams.
  • The causal model is active for the most-used data sources (Google Ads, Meta Ads, GA4, Search Console). More sources arrive through the summer.

What we are working to complete later this year:

  • Cross-tenant federated learning, letting the platform answer questions like "which campaign patterns work for B2B SaaS customers under 50 employees" without customer data crossing tenant boundaries.
  • Counterfactual rehearsal at full scope, where the agents simulate alternative outcomes before giving a recommendation.
  • Expanded skill ecosystem with customer-specific skills that can be locked to a single customer for confidential work patterns.

Why we talk about this openly

Two reasons.

First, m51 Lab exists to make our technology choices verifiable. We publish what we build and why. That gives our customers something concrete to evaluate us on, not just sales presentations.

Second, we believe it is possible to talk about architecture without giving away how it is built. Cortex has technical choices that are defensively relevant for us as a company. Implementation details like which models extract claims, how the promotion gates for skills are calibrated, and which confidence formulas we use are not part of this article. Nor of the next one. They are part of the product you buy.

What we show here is enough for a technical evaluator to compare Cortex with the alternatives. It is enough for a marketing director to understand what the difference is for the working day. It is not a recipe for building it yourself, and it is not meant to be.

Three principles we carry forward

Traceability is a feature, not a cost

Every time we have made provenance easier, it has led to users trusting AI results more and using them more actively. That justifies the extra implementation complexity tenfold.

Learning that does not compound is not learning

A system that forgets between every session is not learning. The reflection layer in Cortex was the most expensive design choice we made and the one with the biggest effect on quality over time.

Causality is not an academic exercise

Marketing is causal work. An AI system that does not model cause and effect cannot give recommendations with integrity. It can describe, not recommend.

See also: more m51 Lab research

We pruned MiniMax-M2.7: the first public REAP variant

SeoGemma4 v2 vs Claude Sonnet: can open source write Norwegian SEO audits?

NorGEO-Bench: is your company mentioned by AI?


About M51 AI OS

M51 AI OS is a Norwegian SaaS platform that gives marketing teams and agencies a complete AI-driven operating system. 17 specialized AI agents work with customer data and brand context to automate analysis, reporting, ad production, SEO and campaign optimization. The platform runs on a curated mix of Claude Sonnet and Opus, GPT and our own fine-tuned models, coordinated through the Nova agent and powered by M51 Cortex as the intelligence core.

m51 Lab is the research and development arm behind the platform. We publish open research models on HuggingFace, document our architecture choices transparently and let lessons from Lab work flow directly into product decisions.

Want to see how Cortex works for your team? Book a demo.

Book a demo
Built in NorwayGDPR-compliantClaude Opus 4.6
Privacy