Operationalizing Agentic AI Safety & Evaluation for Multi-Agent Financial Systems

Video by FINOS via YouTube
Operationalizing Agentic AI Safety & Evaluation for Multi-Agent Financial Systems

Vincent Caldeira (Field CTO at Red Hat) and Valentina Rodriguez Sosa (Principal Architect at Red Hat) map out a comprehensive, technical architecture for deploying multi-agent AI safely into production within regulated financial environments. They cover evaluation-driven development (EDD), open telemetry trace analysis, guardrailing economics, and automated red-teaming.

🇬🇧 Join us in London! Catch the latest on Agentic AI and DevSecOps at OSFF London on June 25, 2026: https://hubs.ly/Q041YV9Z0 (Use Code: 26YTOSFFLN20C)

🕒 Timestamps:
0:00 Introduction: System Behavior vs. Component Safety
0:50 Strategic Context: The Financial Interest in AI Agents
1:32 Architectural Differences: Traditional BPM vs. Non-Deterministic Multi-Step Workflows
2:05 Intent-Based Orchestration and Self-Correction Loops
2:36 The AgentOps Life Cycle: Building for Autonomy
3:05 Evaluation-Driven Development (EDD) Explained
3:34 Practical Dev Cycle: Executing the Harness Inner/Outer Loops
4:26 Telemetry Foundations: Using OpenTelemetry Standards
4:47 Capture Strategy: Generating Trace Telemetry for LLM Calls & Tools
5:24 Emphasizing Trajectory Validation Over Final Output
5:37 Managing Statistical Fat Tails in Non-Deterministic Systems
6:30 LLM-as-a-Judge: Reviewing Chain-of-Thought Decisions
7:02 FINOS Case Study: The "Finite Agent" Earnings Call Analysis Workflow
8:10 Operationalizing Workloads and the OWASP Top 10 for LLMs
9:24 Software Supply Chain Trusted Provenance for AI Artifacts
9:52 Guardrailing Architectures: Content Compliance and Cost Reduction Economics
11:43 Security Control: Signing Artifacts and Models with Sigstore
12:42 Automated Red-Teaming at Scale: Deploying Garak for Adversarial Testing
13:45 Closing Summary: Bridging Safety and Innovation

📊 The Problem: The Statistical Fat Tail of Non-Deterministic Agents Traditional financial software relies on deterministic step-based pathways managed by standard Business Process Management (BPM) systems. Multi-agent systems, however, utilize intent-based orchestration—allowing models to dynamically pick loops, leverage system tools, and self-correct on the fly. This introduces a massive architectural risk: because agents are non-deterministic, they cannot be completely validated through traditional testing. A single prompt deviation could trigger an unpredictable execution trajectory, leading to regulatory failure, data liability, or runaway compute costs.

🏗️ The Solution: Evaluation-Driven Development & Telemetry Architectures
Vincent and Valentina detail an end-to-end operational framework built explicitly to mitigate non-deterministic risks:
* Evaluation-Driven Development (EDD): Shifting testing to evaluate the complete trajectory (the sequence of agent thoughts and tool calls) rather than just checking the final output.
* OpenTelemetry Trace Baselines: Instrumenting agents to produce uniform open-telemetry trace logs for every tool engagement and LLM inference, serving as the debugging foundation for LLM-as-a-Judge validation architectures.
* Automated Adversarial Testing (Garak): Replacing finite human testing schedules with automated open-source red-teaming pipelines to run up to 70,000 statistical execution paths—stress-testing the system for prompt injection, shell breaking, and PI leakage.

⚙️ Why This Matters for Financial Engineering
* Guardrailing Cost Economics: Implementing input/output guardrails acts as an operational defense line—blocking malicious or redundant text blocks to significantly reduce institutional token expense overheads.
* Cryptographic Attestation (Sigstore): Enforcing cryptographic supply-chain signing on data pipelines and model configurations ensures verifiable provenance across all deployment environments.

🌐 More about FINOS: https://www.finos.org/
📧 Join our newsletter: https://www.finos.org/sign-up
🎙️ Listen to our Open Source in Finance Podcast: https://www.youtube.com/@FINOS/podcasts
LinkedIn: https://www.linkedin.com/company/finosfoundation

#FINOS #OSFFToronto #RedHat #AgenticAI #LLMOps #AgentOps #OpenTelemetry #DevSecOps #Sigstore #Garak #ResponsibleAI

Source