The ChatGPT Lawyer Case: Mata v. Avianca
The short version of the story is that a lawyer used the services of a paralegal to look up supporting cases for suing somebody, the paralegal used ChatGPT, ChatGPT made stuff up, the lawyer signed off on the suit, it went to court, THEN IT GOT FOUND OUT.
The longer version of the story can be found at Inner City Press:
Plaintiff’s counsel Steven A. Schwartz, not admitted to SDNY, admits in an affidavit signed May 24 and docketed on May 25 that while “Peter Loduca, Esq, an associate for the law firm of Levidow Levidow & Oberman, PC became the attorney of record on the case” he, Schwartz, “continued to perform all of the legal work that the case required.”
…
As Schwartz formally puts it, “your affiant consulted the artificial intelligence website Chat GPT [and] did locate and cite the following cases in the affirmation in opposition submitted, which this Court has found to be nonexistent:
Varghese v. China Southern Airlines Co Ltd, 925 F.3d 1339 (11th Cir. 2019)
Shaboonv. Egyptair 2013 IL App (1st) 111279-U (Il App. Ct. 2013)
Petersen v. Iran Air 905 F. Supp 2d 121 (D.D.C. 2012)
Martinez v. Delta Airlines, Inc, 2019 WL 4639462 (Tex. App. Sept. 25, 2019)
Estate of Durden v. KLM Royal Dutch Airlines, 2017 WL 2418825 (Ga. Ct. App. June 5, 2017)
Miller v. United Airlines, Inc, 174 F.3d 366 (2d Cir. 1999).
Again, those cases do not exist. Schwartz writes that “the citations and opinions in question were provided by Chat GPT which also provided its legal source and assured the reliability of ts content. Excerpts from the queries presented and responses provided are attached” to his affidavit.
The judge grilled the lawyer in the case on June 8th, last week and… oof. I am not a lawyer but I did not get the idea that this was one of the good interactions with a judge:
Judge Castel: Did you look for the Varghese case?
Schwartz: Yes. I couldn't find it.
Judge Castel: And yet you cited it in your filing.
Schwartz: I had no idea Chat GPT made up cases. I was operating under a misperception.— Inner City Press (@innercitypress) June 8, 2023
The entire thread is jaw-dropping. Read the whole thing:
OK – now the Chat GPT case, Mata v. Avianca, sanctions against the lawyer(s) who filed a brief with non-existent or hallucinated cases. Inner City Press is covering the case https://t.co/sa0Kw08K3L and will live tweet, thread below pic.twitter.com/rWswMTcWv8
— Inner City Press (@innercitypress) June 8, 2023
Pretty soon the judges will be AI too, and will react “ah yes, Estate of Durden v. KLM Royal Dutch Airlines — very on point.”Report
My cringe muscles wore out.Report
Chatgpt has no DEPTH of understanding of serious fields. It has wonderful command over English and can make stuff up.
I figured this out on in the first 5 minutes of using Chat when I asked it to code up a trading engine. Everything was technically correct (meaning I think it’s code would compile) but structurally it was a disaster. It’s a tool for me, not a replacement of me.
These lawyers not only did no checking (zero) on Chat’s work, but this was probably the first time they’d ever used Chat… or they’ve never, ever, checked Chat’s work to see where it goes off the rails.Report
I don’t think this is a depth issue. It would be a depth issue if ChatGPT brought back ‘random’ cases or tendentiously related cases in support of it’s affidavit… instead it invented cases. That’s more of a personality disorder than a depth issue. If it were a depth issue, would might think it would get better with more practice; if it’s a personality disorder, I’m not sure what we should expect with more practice.Report
I’m no expert, but at a high level the problem of fake citations seems solvable for someone who wanted to make a law-centric LLM, using some combination of RLHF and supplemental corpus — the LLM has to be trained that citations are not subject to pattern match but have to be pulled from dataset of actual cases.Report
Then it would become a depth problem.
So there are a few ways to look at this cautionary tale:
1. LLM will just get better as we add more parameters to it’s learning model.
2. As we add more parameters (like from GPT1 to 4) will it reach a point where we can’t quite tell where it needs a better model? Which is to say, it v.4 passed a legal ‘sniff test’ even if it failed at what a competent lawyer is expected to validate.
3. If/when we reach that point, how will we know whether it’s serving up ‘knowledge’ or substituting ‘gnosis’
4. And here I’m not even talking about Super Intelligence or consciousness.Report
That doesn’t solve anything. It would just start making up the contents of actual cases instead of making up the contents of fictional cases.
Everyone here is wildly overestimating how these things work. They have no knowledge whatsoever except what words look good next to other words. They don’t even vaguely comprehend any aspect of what they are saying, they don’t understand what different nouns are talking about or what they are saying those nouns are doing or anything.
It just turns out that words looking good next to other words is how humans generally score communications.Report
My advice to every single other legal group out there that has had a case against this firm in the last two years: Go back and double-check that the cases exist.
My advice to every single legal group out there that has had a case against anybody in the last year: Go back and double-check that the cases exist.Report
This appears to be a personal injury lawyer whose state lawsuit was removed to federal court, where I believe he was not admitted to practice though he’s practiced law for 30 years. His firm doesn’t have a subscription to a legal research service that covers federal law. Since the injury occurred inflight btw/ El Salvador and New York, the case implicated the Montreal Convention on aviation liability, to which an additional issue was raised about the effect of the stay issued under federal bankruptcy law on the limitations running.
This mainly looks like a lawyer who initially thought he was handling a standard PI case, who suddenly found himself outside his area of experience (and his group’s support). He relied on technology that he didn’t understand to bridge that gap. But any lawyer that didn’t check an opposing party’s legal authority would mostly just be uncovering their own incompetence.Report
Would this level of incompetence qualify as “gross”?Report
“A gross is a unit of measurement equal to 144 units,” ChaptGPT esq.Report
I think a lot of people overestimate the degree of difficult legal research required of typical lawyering. Most PI cases are largely fact-driven within the framework of legal principles that are repeated from court opinion to court opinion. In those cases if a lawyer cites a court opinion for well-known proposition (“airlines have a duty of reasonable care to their customers”, probably few would check to see if the case existed or if its being misrepresented. If a court opinion is presented for a novel or highly determinative legal proposition (“airlines have no duty of care to their customers,” I can’t imagine the opponent not reading it.Report
Generally speaking, about half the cases cited even in good briefs can safely be ignored. Usually because all they’re cited for is uncontested general propositions. It’s still a good idea to check them. Recently, an opponent of mine cited some cases for proposition X, which was true. But the real issue was closely-related proposition Y. I checked the cases and was able to say, “As the very cases plaintiff cites make clear, [proposition Y].”Report
The lawyers were probably tired of writing the same opposition time and time again and decided to take a shortcut.Report