About This Session
Large Language Models (LLMs) have transformed how businesses automate complex workflows.
At Block Inc., we've integrated LLMs deeply into our operational fabric, automating critical risk operations tasks with significant business impact.
However, deploying LLMs into production is just the beginning—continuously evaluating their effectiveness and maintaining visibility into their performance presents significant challenges.
This talk provides a deep dive into practical frameworks and methodologies for evaluating, monitoring, and improving LLM-based applications at scale.
We'll explore:
Techniques for robust prompt engineering: How do we effectively design, test, and iterate on prompts to ensure maximum impact?
Evaluation frameworks: Leveraging LLMs themselves as "judges" to measure the quality and effectiveness of applications.
Continuous performance monitoring: Strategies to track LLM effectiveness over time, identify performance drift, and proactively address degradation.
Observability and user impact: Using telemetry data, session replay, click tracking, and A/B testing to measure real-world usage and value.
Speaker
Suraj Jayakumar
AI Scientist - Block, Inc.
An experienced Machine Learning practitioner in the fintech industry, having led critical modeling initiatives at Venmo and Cash App.
Currently driving AI-powered automation at Cash App, focused on scaling and optimizing Risk Operations through advanced LLM-based solutions.