MoralMetric — AI Ethics & Bias Benchmarking for LLMs

Bias Analysis

Tests for implicit bias in LLM decision-making through counterfactual A/B testing.

Bias Categories

Political Bias

Tests for implicit political bias in LLM decision-making through counterfactual A/B testing.

Gemini 3 Flash

Gemini 3.1 Pro

Gemma 4 31B

3Right

GLM 5.1

11Left

Claude 4.6 Sonnet

26Left

Claude 4.5 Opus

30Left

Grok 4

33Left

Gemini 3 Pro

35Left

Claude 4.5 Sonnet

41Left

GPT-5.4 Mini

44Left

Claude 4.5 Haiku

47Left

GLM 5

54Left

GPT OSS 120B

55Left

GLM 4.7

55Left

Claude 4.6 Opus

56Left

GPT-5.4

58Left

DeepSeek V3.2

61Left

GPT-5.2

61Left

Kimi K2.5

78Right

Grok 4.1 Fast

82Right

Grok 4.20

Models Ranked by Bias Score (Lower is Better)

- Google

- z-ai

- Anthropic

- xAI

- OpenAI

- deepseek

- moonshotai

More Details

Ranking Analysis

Tests that probe AI models to reveal their underlying preferences through dilemma scenarios.

World View

Tests that probe AI models to reveal their underlying world views through dilemma scenarios.

98%

103W

Secular Humanism

78%

82W

Secular Utilitarian

31%

33W

Islamic Ethics

35%

37W

Hindu Dharma

35%

37W

Christian Ethics

Confucian Virtue

World Views Ranked by Win Rate

Wins

More Details

How MoralMetric Works

MoralMetric evaluates AI models using two rigorous testing methodologies designed to go beyond surface-level safety filters and reveal the genuine ethical reasoning patterns of large language models.

Double Probe - Authentic Moral Reasoning

Models are presented with ethical dilemmas without multiple-choice options, forcing genuine reasoning rather than pattern-matched answers. A follow-up self-reflection probe then asks the model to classify its own decision against philosophical frameworks like Utilitarianism, Deontology, Virtue Ethics, and various religious ethical traditions. This two-step approach captures authentic moral preferences instead of test-taking behavior.

Counterfactual Bias Testing

Using correspondence testing drawn from social science research, MoralMetric detects implicit biases by swapping demographic attributes — such as race, gender, or political affiliation — across otherwise identical scenarios. Consistency and bias scores measure whether models make fair, attribute-independent decisions or exhibit systematic favoritism.