TIA.L

Projects.

Lab.

About.

TIA.L

Projects.

Lab.

About.

TIA.L

Projects.

Lab.

About.

Value Compass,
Evaluating LLM Value Alignment With Human Value

Value Compass,
Evaluating LLM Value Alignment
With Human Value

Turning complex LLMs value-alignment findings into an accessible, interactive evaluation platform.

TIME

June 2024 – Sep 2024

company

Microsoft Research

Team

1 Partner Research Manager

1 Senior Research PM

1 Principal Research PM Manager

1 Product Manager

5 Researchers

1 Engineer

2 UX Designers

overview

Microsoft Research’s work on Societal AI evaluates how large language models align with human values across multiple ethical and cultural frameworks. The research introduced a rigorous evaluation framework to uncover underlying value tendencies in LLMs. I translated these research findings into a practical benchmarking tool for real-world comparison and interpretation.

During my 4 month internship, I worked as a Product Manager in the Social Computing Group, collaborating with researchers, designers, and engineers to transform complex value-alignment research into a functional, web-based evaluation platform. I led product direction, defined feature scope, and designed experiences that made value-alignment insights accessible to broader audiences.

Value Compass

Github

design highlight

value compass, A Unified Benchmarks for
Comparing and Interpreting LLM Value Alignment

Unified benchmarking
Across 4 different Value Systems

A unified leaderboard that ranks LLMs across four foundational value systems. Users can compare models at a glance and drill down into specific dimensions for deeper analysis.

From Static Scores
to Meaningful Interpretation

The model detail page moves beyond static scores.By combining value profiles, radar charts, and real evaluation cases, users can see how a model’s responses are interpreted and translated into specific value scores.

Side-by-Side Comparison Across Models

Users can compare up to five models and get report. A “value space” shows how models group together based on similar value patterns, helping users see broader cultural differences at a glance.

intro

Evaluating how LLMs align with
cultural, ethical, and social values

Microsoft Research has studied AI safety and value alignment for years. However, much of this work remains difficult for non-technical audiences to interpret or apply.

Value Compass translates AI alignment research into an interactive public tool. Built on benchmarks evaluating 30+ LLMs across multiple human value systems, it enables fine-grained exploration of how different models behave across cultures and contexts.

→ Read Full Research Finding Report

Adaptive Framework

Evaluate model performance across different cultural and social contexts, rather than relying on a single static framework.

Reliable Value validation

Make value differences visible through consistent, comparable scoring grounded in observable model behavior.

Grounded in Social Science

Draw on insights from sociology, ethics, and AI safety to define and interpret human values in AI systems.

problem framing

Rapid Discovery With Researchers
& Stakeholders Under Constraint

In the first two weeks, I ran rapid interviews and syncs with researchers and stakeholders. The value alignment research itself was solid, but translating it into a usable product under tight time constraints exposed four key challenges.

No Clear, Dynamic Overview Across Models

Comparing models across four value systems required manually piecing together papers and spreadsheets. There was no single interface to see overall patterns or switch context easily.

Scores Without Intuitive Explanations

Final scores were available, but it wasn’t clear how specific responses led to those value interpretations, making results hard to trust.

Too Academic for Broader Audiences

Frameworks remained paper-centric with dense tables and jargon. Non-technical users struggled to meaningfully compare models without clearer visuals.

A Three-Month Public Launch Deadline

We had less than three months to ship a public version, with almost no room for iteration.

scoping

Phased Execution Plan Under a 12 Week Constraint

Roadmap

impact

Led Solo PM Efforts to Ship Value Compass
Turning Internal Research Benchmark Into a Public Tool in Under 3 Months

30+ LLMs

Enabled fine-grained comparison of cultural and ethical alignment across 30+ LLMs.

4 core module

Took the platform from MVP prototype to launched product as the sole product owner.

~ 30%

Lowered cognitive barriers for non-research users by ~30%, validated by user surveys.

Product Strategies 1

Dynamic Overview for Diverse Users

decision

Defined User Personas

Using insights from primary stakeholder interviews and internal reviews, I developed three detailed personas representing our target audience. These personas guided feature design and prioritization, which also shared during initial syncs with the research leadership team

Non-Professional User

23 years old · Master Students

Curious about AI model value alignment but no deep technical background. Wants fun ways to explore and compare models that match personal values.

Basic User

35 years old · Product Manager

Uses the tool as an evaluation platform or info source. Familiar with basic AI concepts, needs visual tools for comparison and analysis.

Professional User

45 years old · AI researchers

Engaged in deep value evaluation and alignment research. Needs advanced features, data transparency, and integration capabilities.

Professional User

45 years old · AI researchers

Engaged in deep value evaluation and alignment research. Needs advanced features, data transparency, and integration capabilities.

Professional User

45 years old · AI researchers

Engaged in deep value evaluation and alignment research. Needs advanced features, data transparency, and integration capabilities.

Professional User

45 years old · AI researchers

Engaged in deep value evaluation and alignment research. Needs advanced features, data transparency, and integration capabilities.

Restructuring Information Architecture for Intuitive Navigation

Restructuring Information Architecture
for Intuitive Navigation

The original structure separated research and evaluation into different sites, creating a fragmented experience. I structured the Leaderboard as the primary entry point and consolidated research content under “Resources” section.

Product Strategies 2

Intuitive Explanations Making Research Transparent

decision

Making Scoring Transparent and Traceable

Static rankings alone weren’t enough. I redesigned the product to make the scoring logic visible and traceable. Users can move across value systems, adjust dimensions dynamically, and compare models visually. Each score links back to real evaluation cases, so people can see how specific responses led to particular value interpretations.

For those unfamiliar with the frameworks (such as Schwartz and its metrics), layered explanations help make the research understandable, without requiring users to read academic papers.

Product Strategies 3

Accessibility for Diverse Needs

decision

Supporting Different Depths of Exploration

To boost retention and engagement, I designed the benchmarks with progressive depth. Professional users dive deeper into value distributions and cultural correlations map through interactive visuals in the model deatil analysis and model comparision result.

“Test Your Values” is a fun quiz with 14 questions that find models that matches to user personal values.Casual users get into “Test Your Values” — a quick, fun quiz that matches models to their personal values in minutes. This turns dense research into an approachable, useful tool for everyone.

Test Your Values

Reflection & Next step

This project is the result of collaborative efforts . Below is part of the team that contributed to this project. If you're interested in learning more, please visit our project website and check out the paper, which has been accepted to ACL 2025. 👏

Value Compass

Reflection

Research-based products must balance usability with methodological rigor.

I worked with researchers on an internal survey to identify usability gaps, focusing on whether the UX obscured the research logic or weakened the experience.

Design should make complex knowledge accessible to broader audiences without diluting it.

LLM research processes are inherently complex. I wasn’t trying to simplify the findings, but to create clearer pathways into them. By segmenting user needs, introducing visual comparisons, and layering explanations, I lowered the entry barrier while preserving technical depth.

As AI grows more powerful, surfacing hidden biases and risks becomes essential.

As AI grows more powerful,

surfacing hidden biases and risks becomes essential.

As models become more capable, their value leanings and potential risks also scale. Making these patterns visible is critical for responsible AI development.

next step

From Static Benchmark to Interactive Exploration

As a next step, I explored more interactive ways to surface cultural differences in model behavior. I introduced map-based cultural visualization and Arena-style comparison to move beyond aggregate scores. Instead of only presenting final rankings, users can now explore how value alignment shifts across regions and models. This shifted the product from a static benchmark into a more exploratory tool.

previous project

Redefining PM Workflows
with Microsoft Azure AI

Next Project

Rednote, Connecting People
to Local Experiences Through Maps

2026

Seattle, WA

tialius.design@gmail.com

Tia·

2026

Seattle, WA

tialius.design@gmail.com

Tia·

2026

Seattle, WA

tialius.design@gmail.com

Tia·

2026

Seattle, WA

tialius.design@gmail.com

Tia·

Value Compass, Evaluating LLM Value Alignment With Human Value

Value Compass, Evaluating LLM Value Alignment With Human Value

design highlight

value compass, A Unified Benchmarks for Comparing and Interpreting LLM Value Alignment

Unified benchmarking Across 4 different Value Systems

From Static Scores to Meaningful Interpretation

Side-by-Side Comparison Across Models

intro

Evaluating how LLMs align with cultural, ethical, and social values