- Uncover business-critical issues hidden in support conversations
- Replace disconnected metrics with insights that drive action
- Equip teams across product, ops, and support with data that matters
- Connect the dots between customer experience and company outcomes
Increase in CSAT
30%
Decrease in agent ramp time
120%
Increase in monthly coaching sessions
300%
CHALLENGE
Limited Insight Into Chatbot Accuracy and Risk
Betterment’s chatbot vendors provided only high-level, black-box metrics. The team could see outcomes, but never the reasons behind them — no visibility into missed intents, flawed answers, or whether the bot followed required disclosures.
To fill these gaps, the team ran manual, multi-iteration sprints to analyze bot performance, but each cycle was time-consuming and narrow in scope, offering little confidence in outcomes.
Without a transparent, scalable way to evaluate accuracy, detect hallucinations, or understand where models broke down, leaders couldn’t assess chatbot readiness or ensure customer safety in a regulated environment.
SOLUTION
A Transparent Framework to Analyze Chatbot Models
Betterment used Maestro to establish a clear, consistent framework for evaluating how their scripted and generative chatbots performed. The team gained transparent visibility into accuracy gaps, disclosure issues, hallucination risk, and how reliably each model followed required guardrails.
With this insight, Betterment could pinpoint where responses broke down, understand what needed refinement, and improve models far more efficiently.
Maestro provided a unified, scalable way to measure model behavior giving the team confidence in their decisions about chatbot readiness and improvement.









