ChatGPT retook a practice CPA exam after failing the first time (not so different from the 50% of people who fail on their first attempt) and passed comfortably.
The major difference was that the original Accounting Today experiment used ChatGPT 3.5 and this latest experiment, outlined in a recent academic paper, used Version 4.0.
ChatGPT 4.0’s scores were:
AUD – 87.5%;
BEC – 85.7%;
FAR – 78%; and,
REG – 82%.
The researchers, who include the academics behind another study which tested ChatGPT against accounting undergrad questions, first tested GPT 4.0 in a “zero shot” scenario.
ChatGPT explains a “zero shot scenario” as one where the model is provided with a prompt or a question, along with some high-level instructions or descriptions, but no explicit training on the specific task. The model relies solely on its pre-existing knowledge and general understanding to generate a response, without any additional fine-tuning or exposure to specific examples.
In this scenario, ChatGPT 4.0 performed a little better than Version 3.5 but still failed, with an average score of 67.8%. This is the equivalent of sitting down for the CPA exam having not done any real studying, choosing to rely instead on just what one already knows.
The researchers then tried a “10-shot” scenario, where they first primed the AI with 10 sample accounting questions to provide subject matter training and get the AI used to thinking like an accountant. This test also included slight changes to the settings (which is possible when accessing 4.0 via the API versus the web client, which is how Accounting Today did its experiment) in order to eliminate randomness in models’ responses and reduce creativity.
In this scenario, ChatGPT 4.0 scored an average of 74.4% across all sections. The bot came up just short of the 75% needed to pass.
Finally, the researchers used “chain of thought” prompting to further prime ChatGPT. Chain-of-thought prompting can be thought of as breaking a larger problem into several intermediate steps to get the final answer, taking advantage of the bot’s ability to remember things in the conversation and apply them to its responses, versus seeing every prompt as an independent event. Functionally, this is equivalent to studying before the exam.
It was this setup — using chain-of-thought prompting on a model that was previously primed with 10 accounting questions — that resulted in ChatGPT passing the practice exam with an average of 84.3% across all four sections.
“The results of our study demonstrate that ChatGPT can perform sufficiently well to pass important accounting certifications. This calls into question the ‘competitive advantage’ of the human accountant relative to the machine,” said the study’s conclusion. “To our knowledge, for the first time, AI has performed as well as a majority of human accountants on a real-world accounting task. This raises important questions of how will machine and accountant work together in the future. We encourage research to help understand where machine and human abilities are best deployed in accounting. We also encourage research that develops and invents the capabilities for machines to perform greater amounts of accounting work — freeing accountants to innovate and add greater value to their organizations and society.”
The paper also found that ChatGPT 4.0, run through similar 10-shot training and chain-of-reasoning prompting techniques, will pass the exam for Certified Management Accountants with an average 86.6%, the exam for Certified Internal Auditors with an average of 85.5%, and the test for Enrolled Agents with an average of 83.8%.
David Wood, a Bringham Young University and one of the main authors on both this paper as well as a previous paper that tested ChatGPT 3.5 against accounting class questions, told Accounting Today that the results show that accountants cannot afford to ignore AI.
“I am amazed and excited by how fast this technology is changing. So far, using ChatGPT in my own work has made me more productive and I enjoy it! It has allowed me to add creativity to my work and remove some of the mundane, boring parts of my job. The more I use this technology, the more I believe it is going to prove disruptive and change what we do as accountants and educators. My overall belief is that the changes will be positive, but I do think it will be a bumpy process implementing this technology into our work,” he said.
Credit: Source link