Back to All Events

Alignment Evaluation — Dr. Xiaoyuan Yi (Microsoft Research Asia)

When we say an AI system is “aligned,” what do we actually mean?

And more importantly — how would we know?

Most alignment evaluation today relies on static benchmarks. Fixed tests. Frozen snapshots.

They check whether a model avoids certain outputs or follows instructions.
But they don’t tell us how a system behaves when things change.

Benchmarks age.
They leak.
They get optimized against.

And sometimes, they measure what a model knows about values rather than how it actually behaves.

This session explores a different direction.

From static benchmarks and their limits,
to dynamic evaluations that evolve with the system,
to psychometrics-based methods that borrow tools from how we measure human values and personality.

Because alignment isn’t a checklist.

The goal is to move to more valid assessments of what AI systems actually value and how they act on it.

Part of Module 8: Agentic, Alignment & Control Evaluations.

Dr. Xiaoyuan Yi is a Senior Researcher at Microsoft Research Asia in the Social Computing Group. His research focuses on value alignment and evaluation of large language models, and he has developed frameworks like the Value Compass Leaderboard and CLAVE for assessing LLMs' values in more rigorous, adaptive ways. He holds a PhD from Tsinghua University, has published 50+ papers at top conferences including ICLR, NeurIPS, ACL, and EMNLP, and co-developed Jiuge, a classical Chinese poetry generation system at Tsinghua.


Want to join this session?

Sign up to register and get notified about upcoming lectures.

Previous
Previous
March 25

Evaluating AI Agents — Dr. Cozmin Ududec