Is ChatGPT lying to you?

The discussions about artificial intelligence, and specifically ChatGPT, run the gamut from “It’s the answer to all the things” to “The machines are going to kill us all.” In my limited testing of its efficacy, I’ve found it is an efficient way to gather information when compared to manual searches, and it can write decent summaries to long blocks of text. I’ve talked to other people who have used it to help write or debug code, find recipes using specific ingredients, solve complicated math problems, and explain complex concepts by adding “explain like I am 5” to their query.

I’ve found it is not an eloquent writer by any stretch of the imagination. The few tests I’ve run required a good deal of editing to make the content interesting to read and pull the reader through the story. But as far as I can tell, the information it has pulled is accurate. I can foresee cases where it finds content that is out of date — or wrong on its face — and “unknowingly” returns false data. But so will any search engine you use.

Which is why a New York Times story a friend sent me recently stopped me in my tracks. In summary, an attorney named Steven Schwartz used ChatGPT to write a legal brief for a case he brought in federal district court. His client, Roberto Mata, was suing Avianca Airlines for injuries he sustained when a metal serving cart hit his knee on a 2019 flight from El Salvador to New York. The airline asked the court to dismiss the suit because the statute of limitations had expired, but Schwartz countered with the aforementioned brief citing multiple court decisions supporting their argument that the suit was still valid.

This is pretty standard operating procedure so far for a rather mundane lawsuit. But when the airline’s attorneys started digging into the 10-page brief, they could not locate the referenced cases, so the judge ordered Schwartz to provide copies. He returned to ChatGPT, requested the decisions and sent its response to court.

Here’s where it gets very weird. It turns out none of the cases was real; they had been entirely fabricated by ChatGPT. The named cases, including Martinez v. Delta Air Lines, Zicherman v. Korean Air Lines and Varghese v. China Southern Airlines simply did not exist and were filled with complete legal nonsense. ChatGPT not only generated fake case names, but the details of each case as well.

With this in my back pocket, I decided to ask ChatGPT, “Are you capable of lying?”

It responded, “As an AI language model, I don’t have intentions or emotions, so I don’t have the capability to lie or tell the truth. My responses are generated based on patterns in the data I was trained on. When I state that I don’t have the capability to lie, it means that I don’t have a personal agenda or the ability to intentionally deceive. However, as an AI, I can still produce incorrect or misleading responses if the information in my training data is flawed or if I misunderstand the context of the question. It’s important to understand that while AI language models like me aim to be as helpful and accurate as possible, we are not infallible and may not always provide perfect responses. Users should always verify critical information from reliable sources and exercise their judgment when using information from AI models.”

This makes sense, since it is pulling from available data, which could be incorrect. However, it does not explain how it created a series of legal cases from whole cloth. In its efforts to “learn” and become a more useful tool, is AI acquiring the ability to lie?

Google AI researchers have said that large language models, like ChatGPT, can hallucinate, meaning they can return content that includes mistakes that are plausible but incorrect or nonsensical. Programmers, however, seem confident that they can overcome these hallucinations by better programming and updating the neural network to immediately reward good responses and return those that are incorrect, but will these changes stick? Or will the AI simply learn a new way to mimic human behavior and more creative ways to lie?

Remember that ChatGPT was just introduced publicly in November 2022, so we aren’t even a full year into its use. But one thing we should take away from Mata v. Avianca Airlines is that we can’t simply trust that ChatGPT is correct all the time.

Businesses, including accounting firms, may want to place some rules around its use and require employees to doublecheck any data it may return in response to a query. It is not inherently good or bad, but a tool that we need to use properly … that is, until the machines really do take over.

(For more on protecting your firm from ChatGPT’s capacity to “hallucinate,” see “Don’t be a victim of AI hallucinations.“)

Credit: Source link