Back to All Events

Meta-Evaluations: The Scientific and Institutional Foundations of AI Evaluation — Patricia Paskov

Everyone is building AI evaluations. But what makes those evaluations credible enough to inform real decisions?

As AI systems become more powerful and more embedded in society, evaluations increasingly shape decisions about deployment, safeguards, procurement, compliance, and public claims about capability and risk. But many evaluations remain difficult to compare, reproduce, or interpret, and the standards needed to judge when evidence is strong enough for a given purpose are still underdeveloped.

This session explores the emerging field of meta-evaluation: the effort to build the scientific and institutional foundations that make AI evaluations more trustworthy and decision-relevant.

We will examine how standards of rigor should vary with the claim at stake, why proportionality matters, and what kinds of documentation and methodological discipline can make evaluation evidence more legible across contexts. We will also look beyond methods to the wider ecosystem: shared standards, interoperable infrastructure, access and verification mechanisms, and the incentives and institutions needed to support independent scrutiny.

Part of Module 10: Governance, Policy, and Regulation.

Patricia Paskov is an AI evals and policy researcher at RAND and a Research Affiliate at the Oxford Martin AI Governance Initiative. Her work focuses on building the scientific and institutional foundations for AI evaluation — from methodological guidance on human baselines and benchmark design to frameworks for third-party AI auditing. In Spring 2026, she is collaborating with Cooperative AI Foundation to build multi-agent evals.

She is also the Resilience Section Lead of the 2026 International AI Safety Report and a Working Group Chair at the EvalEval Coalition. Her research has been published at ICML (where it received a 2025 Spotlight Award), Science, and FAccT, and in policy venues including RAND and Carnegie. She has presented to stakeholders including the US Department of Defense, the EU AI Office, and Microsoft Research.

Want to join this session?

Sign up to register and get notified about upcoming lectures.

Previous
Previous
April 8

Sociotechnical Approach to AI Evaluation — Laura Weidinger