Trustworthy AI Element Out of Context

Tuesday, August 19, 2025
11:30 AM - 12:00 PM
AI Risk Summit Track 1 (Salon I)

About This Session

Foundation models are increasingly deployed in embodied AI systems, such as vision-language-action (VLA) models for humanoid robots, but ensuring their trustworthy performance outside their original development context remains challenging. Current model cards, short documents describing a model’s intended use and performance, often provide only static, high-level metrics that lack the specificity needed for safe reuse in new operational scenarios. In this paper, we propose an evaluation framework that embeds an AI model’s Operational Design Domain (ODD) into its model card and testing regimen. Our approach draws on the automotive concept of a Safety Element out of Context (SEooC), treating pre-trained AI components as modular safety elements developed on assumptions. We introduce novel techniques to characterize and validate model behavior across ODD dimensions: using pre-trained models as feature embedders to generate pseudo-labels for test data segmentation, and applying perturbation methods to stress-test robustness in rare or extreme conditions. By coupling thorough ODD-aligned evaluations with clear documentation of assumptions and results, developers can systematically assess whether an AI model (e.g. a VLA policy in a robot) can be trusted in a new context or if additional training and safeguards are required. This paradigm for trustworthy AI out-of-context facilitates the safe reuse of advanced AI models in embodied systems, accelerating innovation in robotics while managing safety, and ethical risks.

Speaker

Barnaby Simkin

Barnaby Simkin

Director, Trustworthy AI - NVIDIA

Barnaby is leading the development of a scalable AI risk management system spanning multiple business units, ensuring alignment with overarching legal and operational requirements. This involves building an ecosystem of tools that support engineering teams in assessing trustworthiness, and increasing oversight across the company by promoting the use of standardized documentation across the company e.g. model cards and risk/impact assessments.