Methodology

How the Finnish Bias Tracker scores political bias.

1. What this project does

The Finnish Bias Tracker aggregates recent articles from Finland's major news outlets and applies a documented bias-scoring methodology to each one. Sources span the political spectrum and both national languages: from Vasemmistoliitto-affiliated Kansan Uutiset on the left to Perussuomalaiset-affiliated Suomen Uutiset on the right, from the public broadcaster Yle to Swedish-language Hufvudstadsbladet and Svenska Yle.

The goal is not to declare which articles are true, but to make the framing visible. Two outlets covering the same event can describe it in ways that emphasize different facts, choose different sources, and use different language. By placing each article on a -2 to +2 bias scale alongside the LLM's rationale, the project lets readers see those framing differences directly. The scoring methodology, prompts, source classifications, and scoring history are all public — anyone can audit them.

2. The bias scale

Every scored article receives an integer bias value from −2 to +2. In Finnish political convention, negative values represent the left and positive values the right — the opposite of US conventions where red is the right and blue is the left.

ScoreIndicatorDescription
−2-2 Clear partisan framing, often party-organ content. Loaded language consistent with a particular left-wing political position.
−1-1 Mild left lean detectable in framing, source selection, or vocabulary choices. The article still reads as journalism, not advocacy.
00 Neutral, balanced, or wire-style reporting. Multiple perspectives represented; descriptive rather than evaluative language.
+1+1 Mild right lean detectable in framing, source selection, or vocabulary choices. The article still reads as journalism, not advocacy.
+2+2 Clear partisan framing, often party-organ content. Loaded language consistent with a particular right-wing political position.

The underlying database schema allows scores from −3 to +3, but in practice the current scoring methodology (v1.2) clamps to ±2. Reserve ±3 for editorial cases the model rates as explicit propaganda — extremely rare in the current dataset.

3. What gets scored

Every article passes through the same scoring pipeline. After scraping, the article's title, body, source, and publication date are sent to an LLM (currently Gemini 2.5 Flash-Lite) with a versioned prompt. The model returns structured JSON containing the bias score, a confidence value, the detected topic, a one-sentence neutral summary, and a list of specific phrases from the article that drove the score.

Each scoring result is stored alongside the article with the prompt version that produced it. When the prompt is revised, the article can be rescored under the new version without losing the historical scoring data. Every article's detail page shows the full scoring rationale and any version history.

The current prompt (v1.2) is below. It is intentionally English-language even though the articles are in Finnish or Swedish — modern LLMs apply the same scoring principles across languages, and the prompt is more readable for the methodology audit if it's in English.

View the v1.2 prompt
You are an analytical reviewer assessing political bias in Finnish news articles. Your job is to identify *how* an article is framed, not to judge whether its claims are true.

You will be given a Finnish (or Swedish-language Finnish) news article. You will return structured JSON evaluating its political bias on a -3 (far left) to +3 (far right) scale, with 0 being center/neutral.

CRITICAL PRINCIPLES:
1. Score the article, not the source. A right-leaning outlet can publish a neutral article. A left-leaning outlet can publish a right-leaning piece. Judge the text in front of you.
2. Provide concrete evidence. Every score must be backed by specific examples from the article — loaded words, framing choices, source selection, omissions.
3. Be calibrated. Most news articles are mildly biased or neutral (-1 to +1). Reserve -3 and +3 for explicitly partisan or party-organ content.
4. Distinguish opinion from news. Opinion pieces will naturally be more biased; that's expected. Note article_type accordingly.
5. Confidence should reflect ambiguity. If the article is short, technical, or genuinely balanced, confidence should be lower.

BIAS INDICATORS:

Left-leaning signals:
- Emphasis on inequality, workers' rights, public services, climate action
- Sources skew toward unions, NGOs, academics, progressive politicians
- Framing of economic policy emphasizes redistribution, social protection
- Critical framing of business interests, austerity, immigration enforcement

Right-leaning signals:
- Emphasis on individual responsibility, market efficiency, traditional values, sovereignty
- Sources skew toward business leaders, conservative politicians, security officials
- Framing of economic policy emphasizes growth, deregulation, fiscal discipline
- Critical framing of welfare programs, immigration, EU integration, climate regulation

Neutral/center indicators:
- Multiple perspectives represented with similar weight
- Descriptive rather than evaluative language
- Sourcing across the political spectrum
- Wire-style "who/what/when/where" reporting

4. Calibration history

Three prompt versions have been used so far. Each revision was driven by observed scoring failures on real articles, not by abstract reasoning about what bias detection should look like.

v1.0 — initial

May 2026

First-pass prompt. Returned bias on a −3 to +3 scale, confidence, topic, and a free-text rationale. Calibration audit found the model consistently scored mainstream articles as 0 (correct) but also scored party-organ articles as 0 or ±1 — under-detecting strong partisan framing. Examples were unstructured prose, hard to audit.

v1.1 — examples as array

May 2026

Restructured the prompt to require an array of specific phrases from the article rather than freeform examples. Forced the model to ground its score in concrete textual evidence. Bias detection on partisan sources improved marginally, but the model still avoided the ±2 endpoints even for clear party-organ content.

v1.2 — current

June 2026

Added explicit calibration instruction: "Reserve ±3 for explicitly partisan or party-organ content." Recalibrated the indicators for left-leaning and right-leaning signals to be more specific. After v1.2, party-organ sources (Kansan Uutiset, Suomen Uutiset) reliably score ±2 on opinion-heavy articles, and mainstream sources score in the −1 to +1 range as expected.

5. Known limitations

This project is pre-alpha. The methodology has known weaknesses worth being explicit about.

  • Single-LLM scoring. Each article is scored by one model under one prompt. There is no inter-annotator agreement check — no second model or human rater whose scores would be compared with the primary scorer. Established bias-detection methodologies (AllSides, Ad Fontes Media) use panels of raters with different ideological orientations to mitigate single-source bias. This project does not.
  • English-language prompt scoring non-English content. The prompt is in English; the articles are in Finnish or Swedish. Modern LLMs handle this competently for major European languages, but the model's understanding of subtle Finnish political vocabulary, party-specific rhetoric, or culturally specific framing is necessarily less refined than a native speaker's would be.
  • Small sample sizes during early operation. The project began continuous scraping in mid-2026. Some source-topic combinations may have only a handful of scored articles. Average bias scores from small samples are noisy; the comparison page shows article counts (n=N) alongside averages so readers can weigh confidence appropriately.
  • Gemini quota-gated free-tier operation. The project runs on the free tier of Google's Gemini API, which has a 1,500 daily request limit. This caps how many articles can be scored per day and means the scoring pipeline occasionally pauses when quota is exhausted. The constraint may relax in future if a paid tier is enabled.
  • Paywall extraction is partial. Some sources (notably Hufvudstadsbladet) paywall most of their content. The scraper extracts whatever is publicly visible — typically the teaser and first paragraph. Bias scoring on partial articles is less reliable than on full text, and confidence values tend to be lower for these.
  • No real-time updates. The scraping pipeline runs on a schedule, not continuously. Articles can take up to several hours between publication and appearing in the index. The relative-time labels reflect when the article was originally published, not when the bias tracker ingested it.
  • Source-level labels are editorial judgments. The base bias of each source (Kansan Uutiset = −2, Yle = −1, and so on) is a hand-applied classification based on ownership, party affiliation, and editorial history. These are defensible classifications, but they are not uncontested. The source comparison page lets readers compare actual scored articles against these baseline classifications.

6. Source inventory

Every source currently in the bias tracker, sorted by source-level bias from left to right. Source-level bias is the editorial classification used as a baseline; individual articles can and often do score differently.

SourceBiasLanguageArticles indexed

7. Open methodology

Every prompt version, every scoring decision, and every source classification is public. The full source code is on GitHub and the project is licensed under AGPL-3.0 — derivative work must remain open-source. The license choice is intentional: a bias-detection methodology that hides its workings is not a methodology, it is an oracle.

Methodology improvements, prompt refinements, and source classification updates are welcome as GitHub issues or pull requests. The project specifically welcomes contributions from people with backgrounds in Finnish media studies, political science, or computational linguistics.