ADRIAN MA, HOST:
Five Canadian news outlets are suing OpenAI. That's the owner of the generative AI software ChatGPT. And these news outlets claim OpenAI violated copyright law by using their articles to train its large language model. Now they're seeking damages that could amount to billions of dollars. OpenAI defends itself by saying it relies on publicly available data, and it abides by international copyright principles. To help unpack all this a bit, we've called Pina D'Agostino. She's a law professor at York University in Toronto, Canada. Pina, thanks for taking the time.
PINA D'AGOSTINO: Thanks for having me.
MA: So I just sort of gave the gist of the legal challenge here. But could you briefly break down why this is such a big deal?
D'AGOSTINO: Well, it's a big deal because we're talking about valuable content and who has the right to access and own that content. And so it's about issues that we've had for a long time now, ownership access and control to valuable works. So training data - well, that's great to say training data, but it's actually about newspaper articles and different pieces that authors have written and published in these leading publishing houses that are now being cannibalized, if you will, digested by the large language models of OpenAI.
MA: What do you make of OpenAI's argument that it's abiding by, you know, fair use and other international copyright principles?
D'AGOSTINO: Well, of course, it's going to say that. But in actuality, they were trying to license and pay for it, but they just didn't reach an agreement or an amount they were willing to go with. So they decided to keep using the works, the content because they needed it. So I actually don't buy it. If it's valuable enough to train these large language models, it's also worthy of some compensation.
MA: What do you think some of the hurdles will be for the news organizations in this case?
D'AGOSTINO: Well, you already mentioned it - right? - the fair use, fair dealing in Canada. We have fair dealing. And so whenever there's copyright infringement, you can use a fair dealing defense, saying that the OpenAI, in this case, is allowed to use these valuable works because, you know, they could say that it's for research. It's to train. It's to educate the models. And all of these are grounds that can constitute fair dealing and therefore an exception which would not attract compensation.
MA: You know, OpenAI has faced a number of similar challenges to this from artists and other news organizations.
D'AGOSTINO: Yeah.
MA: And we even see the issue of artificial intelligence come up in other professions, like the world of acting or the auto industry. You're someone who's been following this closely from a legal and sort of ethical perspective. How are you thinking about ways that AI can be developed in a way that is responsible, and it also kind of enables people to create things?
D'AGOSTINO: That's a great question. And I really believe that we have the mechanisms in society. We just need to go by them. So licensing - responsible AI starts with licensing. Why not license the content? And it's just a matter of agreeing on a price because by not allowing a license to take place, it's saying that it's worth nothing or that the authors' works, the publishers' works are worth nothing. But obviously, it is worth something very valuable to OpenAI because without it, they couldn't do anything. And it's also about the good quality training data - so garbage in, garbage out. In this case, they want, like, not garbage. They want, like, the good stuff, and this is the publishers' stuff. So you need to pay for that.
MA: We've been speaking with Pina D'Agostino, who's a law professor at York University in Toronto. Pina, thanks so much for being here.
D'AGOSTINO: My pleasure. Thanks for having me.
MA: NPR reached out to OpenAI for comment about the lawsuit and did not receive a response before this broadcast.
(SOUNDBITE OF MUSIC) Transcript provided by NPR, Copyright NPR.
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.