- Uncover business-critical issues hidden in support conversations
- Replace disconnected metrics with insights that drive action
- Equip teams across product, ops, and support with data that matters
- Connect the dots between customer experience and company outcomes
Increase in CSAT
30%
Decrease in agent ramp time
120%
Increase in monthly coaching sessions
300%
Company Overview
Betterment is a digital financial services company offering automated investing, retirement planning, and cash management. With a complex product and a fast-growing customer base, Betterment’s CX team needed more from their bots—better performance, stronger compliance controls, and real visibility into what was happening in every conversation.
The Challenge
Chatbots often serve as the first point of interaction between brands and customers. When their performance is subpar, it can negatively impact customer satisfaction. Natalie Langdale, Chatbot Strategy Manager at Betterment understands the importance of getting chatbot performance right. In this article, we share Betterment's approach to Chatbot QA and the techniques used to boost chatbot accuracy, efficiency, and overall experience through MaestroQA.
Betterment’s rapid growth presented several challenges when it came to maintaining chatbot performance and customer satisfaction:
- Managing high ticket volumes: With a growing customer base, the number of inquiries increased significantly. Betterment needed their chatbot to handle basic queries while allowing human agents to focus on complex issues.
- Limited visibility in reporting: While Betterment’s chatbot platform, Ada, provided high-level metrics such as containment rate and automated resolution rate, the data was not granular enough to identify the root causes of issues or knowledge gaps in chatbot responses.
- High-level metrics that don’t show the full picture: The chatbot tool’s basic stats failed to give a complete understanding of where the bot was falling short, making it difficult to refine performance and help maintain consistent, accurate responses.
- High stakes compliance: In a highly regulated industry, Betterment needed to ensure their chatbot was never offering inaccurate or unlicensed financial advice.
- Bot ≠ Agent: Despite being their top-volume support channel, the chatbot wasn’t held to the same quality standards as human agents.
The Turning Point
When Betterment began exploring generative models, the stakes rose. Their compliance team required assurance that AI responses were being monitored, reviewed, and improved over time.
To move forward with a generative bot, Betterment needed:
- A rigorous QA process that could evaluate every response for compliance and quality
- A way to spot risks fast and take immediate action
- A shared QA framework that worked across both scripted and generative bots
The Solution
Betterment implemented a robust Chatbot QA program with MaestroQA, allowing them to address both the performance and compliance challenges while gaining deeper insights into chatbot interactions.
- Building a Comprehensive QA Rubric Betterment developed a tailored QA rubric to evaluate their chatbot on four critical areas:
- Accuracy: Ensuring that the chatbot understands the user’s inquiry and responds with the correct information.
- Completeness: Verifying that every inquiry is fully addressed.
- Compliance: Ensuring that the bot adheres to strict financial regulations and does not provide misleading or incorrect advice.
- Clarity: Maintaining a conversational tone that aligns with Betterment’s brand while being clear and concise. - Using MaestroQA to Dig Deeper: While Ada provided general metrics, MaestroQA allowed Betterment to go beyond surface-level data. The platform enabled detailed analysis of chatbot conversations, helping identify specific gaps in knowledge and opportunities for improvement. By reviewing individual interactions, the team was able to improve chatbot responses in real time.
- Regulatory Compliance with RAG (Retrieval-Augmented Generation): To avoid compliance risks, Betterment implemented RAG technology. This model supports the bot’s ability to cite verified sources, reducing the likelihood of generating incorrect or inappropriate responses. The integration of MaestroQA further strengthened this by providing detailed feedback on the bot’s performance.
Using MaestroQA, Betterment:
- Evaluated bot performance with the same rigor as human agents
- Identified specific response issues and retraining opportunities
- Identified trends to update training content and improve outputs
- Connected bot and human CSAT for a holistic customer view
Impact
With a focus on both chatbot performance and compliance, Betterment achieved substantial improvements:
- Improved Visibility into Performance: The combination of MaestroQA and Ada allowed Betterment to move beyond high-level metrics, providing detailed insights that enabled continuous improvement of chatbot interactions.
- Better Compliance Management: RAG technology and regular checks helped maintain the chatbot’s adherence to legal standards, reducing compliance risks and increasing trust in chatbot responses.
- Accurate and Relevant Responses: Detailed analysis of conversations through MaestroQA allowed the team to fine-tune responses and close knowledge gaps, significantly improving the quality of chatbot interactions.
Since implementing Chatbot QA, Betterment has seen the following improvements to their chatbot performance:
- BSAT: +5%
- Containment Rate: +2%
- Automated Resolution Rate: +2%
Strategic Outcomes
Betterment’s QA insights didn’t just improve individual conversations—they drove strategic decisions:
- Compliance approval: RAG plus QA transparency enabled internal signoff for generative AI use in support
- Vendor switch: Persistent quality issues with Ada’s generative model—surfaced via QA data—led to Betterment onboarding a new hybrid vendor (Ultimate AI)
- Deeper root cause analysis: MaestroQA’s dashboards helped surface underlying friction drivers, especially in cases where containment dipped or negative feedback spiked
“Maestro helped us uncover what high-level metrics couldn’t. Without it, we wouldn’t have caught the consistency issues in Ada’s generative model—or had the data to justify moving on.” — Natalie Langdale
What’s Next
Betterment is now rolling out its new hybrid chatbot model and applying the same QA rigor from day one. With every conversation scored, documented, and acted on, they’re evolving bots from basic deflection tools into compliant, high-performing extensions of their CX team.