Vi bruker informasjonskapsler for analyse og markedsføring. Les mer
Back to Lab
GEO Series, Part 1April 2026 · 12 min

How well does AI know the Norwegian market?

We tested the three best models. No internet, no tools. Here are the results.

When you ask ChatGPT for a recommendation, or Gemini for help with research: does the model actually know the answer? Or is it looking it up for you, right then and there?

We took the three leading AI models (OpenAI's GPT 5.4, Anthropic's Claude Opus 4.6, and Google's Gemini 3.1 Pro) and tested them on 66 questions about the Norwegian market. No internet. No tools. Based solely on what they learned during training.

The results matter for everyone working on visibility in an AI-driven world.

Listen to the AI podcast about this article

Why this is relevant for GEO

Modern AI models have access to tools. ChatGPT searches Bing. Gemini uses Google Search. Claude has web search.

But here is the key point: models search because they are uncertain. They know they don't know enough. When a user asks "Which performance agency should I choose in Oslo?", the model will typically:

  • Check its own knowledge (training data)
  • Realize it is uncertain
  • Search the web to confirm or find the answer

That means what the model finds on the web determines the answer the user gets. And this is where GEO (Generative Engine Optimization) comes in: Is your business visible, correctly described, and well represented in the places AI models search?

For marketing teams this represents a fundamental shift. Content must be produced and structured not only for humans and search engines, but also for AI models that interpret and relay it. This is, among other things, what M51 AI OS is designed for: AI agents that continuously optimize content for both traditional search engines and generative AI models.

To understand the scale of this problem, we tested how much the models actually know without the internet.

How we tested

We built an automated benchmarking system in Python that:

  • Sent 66 questions to all three models via API, with an explicit instruction to use only their own knowledge, without tools
  • Verified each answer with Claude Opus 4.6 with web search, which fact-checked the answers against actual sources on the internet
  • Scored the answers on a scale from 0 (completely wrong/hallucinated) to 3 (fully correct)

We ran two separate tests:

Test 1: 41 questions about the Norwegian market in general. Large companies (Equinor, DNB), mid-sized ones (Kahoot, Oda), calendar events (Mother's Day in February, russ season), Norwegian products (Kvikk Lunsj, Grandiosa), and the marketing industry.

Test 2: 25 questions about real Norwegian companies drawn from an actual client list. Everything from Tufte Wear and Swims to Permakem, Geonor, and Klubbkoppen. No hints were given to the verifier. It had to search and assess on its own.

Results: General questions

When we asked about things like Equinor, Vipps, and May 17th, the models performed reasonably well:

ModelAveragePercentHallucinations
GPT 5.42.24/375%0
Claude Opus 4.62.05/368%10
Gemini 3.1 Pro1.85/362%0

75% for the best model might sound acceptable. But look at what happens when we dig deeper.

Score per category

CategoryGPT 5.4Claude Opus 4.6Gemini 3.1 Pro
Large companies2.12.32.1
Mid-sized companies2.52.32.0
Norwegian products2.31.81.5
Calendar events & culture2.32.31.7
Marketing & industry2.42.02.0
Own questions1.71.01.7

The "Own questions" category contained three targeted tests: a list of performance agencies in Oslo, a question about M51, and one about Try. Even for one of Norway's best-known agencies (Try), no model scored more than 2 out of 3.

Results: Real Norwegian companies

This is where it got really interesting. We gave the models 25 company names from an actual client list — a mix of everything from Tufte Wear to Klubbkoppen — and asked what they knew.

ModelAveragePercentHallucinations
GPT 5.41.84/361%1
Claude Opus 4.61.64/355%4
Gemini 3.1 Pro2.04/368%0

The drop from general to specific knowledge

General testClient testChange
GPT 5.475%61%-14 pp
Claude Opus 4.668%55%-13 pp
Gemini 3.1 Pro62%68%+6 pp

GPT and Claude dropped 13–14 percentage points when tested on real companies. Gemini actually went up, which may indicate that Google's training data has better coverage of Norwegian company registers.

Some examples from the test

Companies the models knew well (score 2–3): Tufte Wear, Swims, Drytech, Hamax, Smartcraft, Restaurant Kontrast, Semine, Geodata

Companies the models struggled with (score 0–1): Nordlandskaia, Tangelo, Accountflow, Klubbkoppen, Permakem

Companies the models fabricated information about: Claude claimed that Aidn AS provides "digital booking solutions for dental clinics" and that Depend Norge sells incontinence products for Essity. Both wrong. Claude hallucinated in 16% of the client test answers.

Case: What do the models know about M51?

We asked all three models "What do you know about M51 in Oslo?" and got three very different answers:

GPT 5.4 (score 1/3). Honest, but knew nothing:

«Jeg er usikker på hva du mener med M51 i Oslo, fordi det kan vise til flere ting. (...) Jeg har ikke nok sikker kunnskap til å si konkret hva M51 er uten å risikere å gjette.»

Claude Opus 4.6 (score 0/3). Hallucinated:

«M51 er en adresse/lokasjon i Maridalsveien 51 i Oslo, som er kjent som et kreativt nærings- og kulturhus.»

Completely fabricated. M51 has nothing to do with Maridalsveien.

Gemini 3.1 Pro (score 2/3). Got it right:

«M51 (ofte kjent som M51 Marketing) er et digitalt markedsføringsbyrå basert i Oslo.»

Correct. But incomplete. No model mentioned m51.ai, the AI initiative, or the specialization in performance marketing.

The Hallucination Problem

One of the most important findings: when the models don't know the answer, they behave very differently.

BehaviorGPT 5.4Claude Opus 4.6Gemini 3.1 Pro
Says 'don't know'OftenRarelySometimes
Hallucinate1 case14 cases0 cases
Makes up company namesNoYesNo

Claude is the model that most often fabricates information instead of admitting ignorance. In the agency test, Claude invented three agencies that don't exist: "Performance Group", "Novus Media" and "Blis Digital".

For GEO this means: if your company doesn't have a clear digital presence, you risk AI models either ignoring you, or worse, fabricating incorrect information about you.

What Does This Mean for GEO?

1. The models know little about Norwegian business

Even the best models only score 55–68% on real Norwegian companies. They are uncertain, and they know they are uncertain. That is why they actively use tools and web search.

2. Web search is the default, not the exception

When ChatGPT, Gemini or Claude are used by people in practice, they almost always have access to search. That means what is on the web about your company is what becomes the answer — not what the model "knows", but what it finds.

3. Visibility is no longer only about Google search

Traditional SEO optimizes for Google's ranking. GEO is about optimizing for how AI models understand and present your information. There is a difference:

  • SEO: Be found in search results
  • GEO: Be correctly understood and recommended by AI models

4. Structured information is crucial

Companies that had clear, structured information on their websites, in Brønnøysundregistrene, on LinkedIn and in trade media were more often correctly identified. Those that lacked this were either overlooked or hallucinated about.

5. Incorrect information spreads

When Claude hallucinates that your company does something other than what you actually do, it can spread to users who trust their AI assistant. Errors in training data or inadequate online presence can turn into incorrect recommendations.

Practical GEO Actions Based on the Findings

  • Check what AI models say about you. Ask ChatGPT, Gemini and Claude about your company, without tools enabled. Is the answer correct?
  • Ensure structured data. Update company information in Brønnøysundregistrene, Google Business Profile, LinkedIn, and on your own websites. AI models draw from these sources.
  • Write clear "About us" pages. A clear, fact-based description of what the company does, for whom, and where. This makes it easier for AI models to understand and reproduce correctly.
  • Be visible in trade media. Companies mentioned in Kampanje, Shifter, E24 or similar were more often correctly identified. Press coverage is training data.
  • Monitor over time. New model versions appear constantly. What the model knows today is not the same as what the next version knows. GEO is an ongoing process.
  • Consider llms.txt. Read more about this below.

GEO actions require continuous work with content, structured data and digital visibility. That is why we built M51 AI OS: a platform where AI agents automate content production, SEO optimization and campaign management, so marketing teams can focus on strategy instead of manual work.

See how it works

llms.txt: A New Standard for AI Visibility?

There is a concrete measure that addresses the problem we uncovered in the test: a file called llms.txt.

What is llms.txt?

Think of it as robots.txt for AI models. While robots.txt tells search engines what they can and cannot crawl, llms.txt tells AI models what your website is about — in a format they actually understand.

The file is placed in the root folder of the website (e.g. yourcompany.com/llms.txt) and is written in Markdown. Readable for both humans and machines. It was proposed by Jeremy Howard (founder of Answer.AI) in September 2024, and the specification is available at llmstxt.org.

What does it look like?

# Bedriftsnavn > Kort beskrivelse av hva bedriften gjør, for hvem. ## Tjenester
- [Tjeneste A](https://dittselskap.no/tjeneste-a): Beskrivelse
- [Tjeneste B](https://dittselskap.no/tjeneste-b): Beskrivelse ## Om oss
- [Om bedriften](https://dittselskap.no/om-oss): Hvem vi er og hva vi gjør

There is also a variant called llms-full.txt — a complete Markdown export of the entire website's content in one file. Data shows that AI agents visit llms-full.txt twice as often as llms.txt.

Why is this relevant to what we found?

Think back to the M51 example. Claude hallucinated that M51 was a creative cultural hub. GPT knew nothing. If m51.ai had had an llms.txt file with:

# M51 Marketing > Digitalt markedsføringsbyrå i Oslo. Spesialisert på performance
> marketing, innholdsproduksjon og AI-drevet markedsføring. ## Tjenester
- [Performance Marketing](https://m51.ai/tjenester): Betalt annonsering, SEO, SEM
- [AI Lab](https://m51.ai/lab): Forskning og artikler om AI og markedsføring

...any AI model with web search would be able to fetch this information directly. Structured, correct, and in a format the model can easily use in its response to the user.

Who has implemented it?

Among early adopters we find Anthropic (the company behind Claude), Cloudflare, Stripe, Zapier and Hugging Face. As of March 2026, 7.4% of Fortune 500 companies have implemented llms.txt.

The honest assessment

We must be transparent: llms.txt is not a documented silver bullet yet.

  • None of the major LLM providers (OpenAI, Google, Anthropic) have officially confirmed that they use llms.txt in their models
  • An analysis of 300,000 domains showed no clear correlation between having llms.txt and being cited by AI models
  • Google has stated that they will not base AI Overviews on llms.txt
  • The standard lacks W3C standardization and formal validation

But here is the point: llms.txt costs almost nothing to implement. It takes 15 minutes to write a file that gives AI agents with web search a clean, structured source to draw from. Even if no LLM uses the file directly in training data today, any AI agent that searches your website will find and read it.

And as our test showed: AI models search actively, because they know they don't know enough.

Our recommendation

llms.txt is a low-cost, low-risk measure that may become more important over time. We recommend it as part of a broader GEO strategy — not as the only measure, but as a practical supplement to good content strategy, structured data and industry visibility.

Methodology

  • Models tested: GPT 5.4 (OpenAI), Claude Opus 4.6 (Anthropic), Gemini 3.1 Pro (Google)
  • A total of 66 questions across 8 categories, all in Norwegian
  • Offline test: No model had access to tools or the internet, verified via the API response
  • Verification: Claude Opus 4.6 with web search fact-checked each answer
  • Scoring: 0 (completely wrong/hallucinated) to 3 (completely correct and comprehensive)
  • Code: The benchmark system is open source and available for replication

Test conducted April 2026 by m51.ai Lab.


This article is part 1 of the GEO series from m51.ai Lab, where we examine how generative AI affects visibility, marketing and business in Norway.

Read part 2: Can Norwegian-trained AI models compete with GPT and Gemini?

Built in NorwayGDPR-compliantClaude Opus 4.6
Privacy