Skip to main content

Explore

Vault Agent Featured Products MCP Server

The Show

Episodes Hands-On

Connect

Build Team About

Cameron Rohn

cameronrohn.com
cameron-rohn
Cam10001110101
CamRohn100011

Tom Spencer

tomspencer.co
tomspencer
spencerthomas
surfcodetom

Channels

YouTube
Spotify Podcast
The-Build-Podcast
Vault API

© 2025 The Build. All rights reserved.

← Back to Explore

Demos Are Misleading

Tom Spencer · Episode: Ep 8 (Audio Only) · Category: points_of_view

Evaluating LLMs based on a few online demos is unreliable because single examples don’t capture a model’s varied behaviors across tasks.

Segment: Segment 2

Start Time: 17:48