Can QA Be Automated with ChatGPT-Type Technology?
Manual QA programs were invented to solve the Metrics Trust Gap. With the rise of AI-powered tools like ChatGPT, there’s growing interest in automating QA processes. But can machine learning truly replicate the objectivity and sensibility of human QA analysts? To answer this, we conducted an experiment comparing ChatGPT’s ability to grade customer service tickets with that of a human QA analyst.
Are people in your company asking if QA can be automated w/ ChatGPT-type technology?
What % of QA do you think can be automated w/ ChatGPT-type technology?
Experiment Design
ChatGPT, an AI-powered tool, has the capability to automate parts of the QA process, but can it match the objectivity and sensibility of human QA analysts?
In our experiment, we aimed to investigate this question by tasking ChatGPT with grading 200 tickets based on a single question: "Did the agent demonstrate active listening?" We then compared ChatGPT's results with those of a human QA analyst.
Why Active Listening?
Active listening was chosen as the primary focus due to the following reasons:
It is fair and applicable across various companies.
It doesn't require knowledge of internal systems or company-specific data.
GPT does not have access to these systems, and thus cannot grade based on them.
Active listening remains a complex and intriguing criterion.
Phase 1: Data Collection and Cleaning
Preparing and cleaning the data is essential for accurate results in testing, as the quality of the underlying data cannot be compromised and requires proper labeling, formatting, and anonymization.
Our data collection involved 5 key parts:
Assembled a small group of test customers
Selected one Yes/No question from each Scorecard that had at least 200 scores
Focused on chat and email conversations
Anonymized and removed any sensitive information
Cleaned the data to exclude "internal" notes, chatbots, and other non-customer-agent messages
Phase 2: Example Prompts
GPT, as a text prediction tool, relies on high-quality prompts to generate accurate answers.
Our prompt consisted of four parts:
Generic context
Question context
Specific examples of good and bad active listening
The conversation to grade
Objective of Experiment
Determine if the agent displayed active listening skills, allowing it to offer personalized recommendations and identify opportunities for customer success.
Challenges in Prompt Engineering
During the experiment, we encountered several challenges, such as the limitations of prompt size and the risk of over-specification, which could lead to errors or hallucinations by the AI.
Phase 3: Experiment Results
The study assessed GPT's performance in answering 200 questions, comparing it to a human grader.
10 iterations were conducted to enhance the prompts following initial testing.
The final results revealed a 58% alignment between GPT and the human grader.
Unfortunately, this level of alignment did not meet the expectations of pilot customers to integrate it into their QA program.
Chapter 4: Exploring Future Possibilities
Potential Follow-Up Ideas
The results do not imply that GPT cannot work. There is still a lot to explore:
Further prompting tests needed for GPT, but complexity can lead to errors.
Third-party vendor customization necessary for high-quality automated QA, effectiveness of ChatGPT uncertain.
Johnny Appleseed may need more resources for AutoQA, GPT-based approach effectiveness unknown.
Optimal results may require a combination of narrow models and expert systems.
Data cleaning is crucial for accuracy and security.
Conclusion: The Ongoing Battle Between AI and Human QA
While ChatGPT shows potential in automating QA processes, the current technology falls short of replacing human analysts. As AI continues to evolve, businesses will need to explore new strategies and tools to meet the demands of high-quality customer service.
If you're interested in learning more about ChatGPT and its potential for automating QA processes, request a demo today. Stay updated on the latest trends in customer service technology by signing up for our CEO Series.