AI Litigation Update - Creative Law Center

AI Litigation Update

Dateline: July 23, 2023

Legal challenges to generative AI are on the rise. This post, an AI litigation update, is a snapshot in time looking at what is going on in two of the active cases:

  • Andersen, et al. v. Stability AI Ltd. The case brought by three artists on behalf of themselves and a class of similarly situated individuals against the AI image generator companies Stable Diffusion, MidJourney and Deviant Art. ("Artists' Class Action")
  • Getty Images (US), Inc. v. Stability AI. Getty Images claims Stability AI ‘unlawfully’ scraped millions of copyright protected images from its site to train its AI machine.

Andersen v. Stability: AI Litigation Round 1

The Artists' Class Action was brought in January 2023. In April, all the defendants filed motions to dismiss the case. A hearing on the motions was held a few days ago.

There was no ruling from the judge at the hearing, but he said said he was inclined to toss "almost everything" in the case. His skepticism of the artists' claims seems to come from his belief that there is no "substantial similarity" between the artists' works and the output from the AI image generators.

Before I talk about why I think the judge's comments during the hearing miss the mark, let's quickly review what it takes to prove copyright infringement.

There are two ways to prove copyright infringement. The first is by showing direct copying. Direct copying in the context of copyright infringement refers to the act of exactly or very closely duplicating original, copyrighted material without the authorization of the copyright holder. Getting evidence of direct copying is difficult which is why most cases rely on the second way of proving infringement.

The second way to prove infringement is by showing that the defendant had access to the original protected work and that the infringing work is substantially similar.

"Access" refers to the opportunity or ability of the alleged infringer to view, hear, or otherwise come into contact with the copyrighted work. If the alleged infringer had no reasonable opportunity to access to the work, it is less likely that they could have copied it.

"Substantial similarity" refers to the degree to which the accused work resembles the copyrighted work. If the two works are substantially similar, it may tend to prove that copying occurred.

The Artists Allege Direct Copying, Not Access + Substantial Similarity

During the hearing Judge Orrick is reported to have said, "I don't think the claim regarding output images is plausible at the moment, because there's no substantial similarity between images created by the artists and the AI systems."

I don't know what the judge was looking at when he made that statement, but in her opinion piece in the New York Times, lead plaintiff Sarah Andersen included a visual comparison of her work with work generated by the AI image generator. As she says, "It’s not perfect — but it has captured the signature elements of my drawing style."

This is where I'm stuck, the judge's comment doesn't make sense to me. I don't think the case should turn on whether the output is substantially similar. Andersen alleges direct copying in her complaint. 

Andersen v. Stability AI Complaint, excerpt. (Click for full size) 

If I go to the library and take out some books, then make copies for my home library and return the originals, the copies of those books in my house are a direct infringement. It can't be otherwise, regardless of what I do with those books.

The Artists' complaint alleges that is exactly what Stability did -- "Stability has embedded and stored compressed copies of the Training Images." It's direct infringement regardless of the output.

Without expert testimony and a clear understanding of how AI machine training works, I don't think the judge should dismiss this piece of the case. AI is complicated and technical. It deserves close, informed scrutiny.

How AI works. Slide from Creative Law Center February 2023 workshop: AI and Your Creative Work. (Click for full size) 

AI is a series of complex computer algorithms and programs that process and analyze (massive amounts of) data in order to identify patterns and make predictions. The second step: "process and analyze data" is what the AI companies want a free pass on in this case and in all AI litigation. They want to use all the data they can get their hands on. 

Correction: They have used all the data they can get their hands on.

For the AI companies, there are two best possible outcomes in any AI litigation. The first best outcome is a dismissal of the case against them. The second best outcome is a finding that the "process and analyze data" step in their business model is considered to be fair use.

I don't see how a judge can dismiss a well-pled case without understanding how AI machines use the input data. That understanding can only come from expert testimony. And expert testimony comes during the discovery phase of the case and at trial.

The problem with the Artists' Class Action complaint is that the judge does not think it is well-pled. He wants more facts and specificity.

And, not for nothin' (to borrow a favorite phrase from Brian Tyler Cohen), to address the judge's concern that the output from the AI image generator is not substantially similar to the protected creative work, research has proven that identical images can be generated from the input data, not just those that are substantially similar.

training data and exact copies

Training data and exact copies produced from AI image generators. (Click for full size)

What's Next for the Artists' Class Action

To be clear, this post only addresses the part of the Artists' Class Action complaint that alleges direct copyright infringement of Sarah Andersen's work. She is the only named plaintiff who has copyright registrations on her work. The other artists had not, at the time the complaint was filed, registered their work with the U.S. Copyright Office.

In the United States, if you want to enforce your rights in court, you must have a copyright registration on your creative work. Without a copyright registration, the infringement claims of the other artists should get dismissed.

The judge did signal that while he is likely to dismiss most of the case, he will give the artists the opportunity to amend their complaint to address some of its deficiencies. We need to think of this complaint as a first pancake, an initial attempt that is not going to turn out as well as the next one.

Getty v. Stability: AI Litigation Round 0

Getty's lawsuit against Stability for copyright infringement (and other claims like trademark tarnishment) has gotten bogged down in procedural and jurisdictional matters that aren't particularly interesting. It's as though Stability is playing a corporate shell game, "You sued the wrong entity. We're Stability Ltd., not Stability, Inc."

Stability is trying to get the case either dismissed or moved out of Delaware to California where the Artists' Class Action is pending. Getty has spent the time since filing its complaint in March 2023 trying to sort these preliminary issues out.

Getty's case, on the face of its complaint, is stronger than the Artists'.  Getty has filed for copyright registrations on thousands of the images that it licenses. Getty also has a copyright registration on its database of images. The database includes all the image meta data containing information like the alt-text, or description of the image, that tells the AI machine what it is looking at.

Powerpoint slide revealing alt-text.

Powerpoint slide revealing alt-text.

The above image of the woman with a magnifying glass was licensed from Adobe and used in a PowerPoint deck. PowerPoint used AI to generate the alt-text description of the image.

Getty can prove that Stability used its images and database to train its AI machine because examples of the output, the AI-generated images, contain the Getty watermark.

AI generated image with Getty logo.

AI-generated image with Getty logo.

Getty included this AI-generated image in its complaint. It's ugly, but it is clearly using information gleaned from the Getty database of images. The grotesqueness of the image supports a claim for trademark tarnishment, according to Getty.

A Final Word

There are lots of open questions on AI-generated content, not the least of which is how the Supreme Court's recent decision on fair use in Warhol v. Goldsmith will play in these cases. 

AI litigation is not like an algebra text. There are no answers in the back of the book. Perhaps there will be some answers from these two cases. It's worth keeping an eye on them.

I'll continue to provide updates. Sometimes, however, my updates come in the form of emails. Join my email list by downloading the Fair Use Guide if you'd like to receive those updates. 

AI Litigation Update

About the Author

Kathryn Goldman helps small business people, writers, artists, and creative professionals make a living from their creative work by teaching them how to protect and enforce their rights. She is an attorney who writes these posts to help you be more thoughtful about intellectual property and the law as you build your business, write your stories, and create your art.

  • I wondered why they didn’t include that AI image shown in the NYT piece, and it’s possible they didn’t because it was actually obtained by uploading one of Sarah’s original works in order to generate it.
    That’s exactly what happened with Kelly McKernan, they’d made a FB and Twitter post showing images that were pretty similar to one of their paintings – although I’d argue the style doesn’t really look anything like theirs beyond it being watercolor – claiming it was generated using just a prompt in MidJourney with their name in it.
    However, when I tested the exact same thing I got nothing like that in MidJourney, in all versions none of the images with that prompt looked anything like their work.
    I asked on their FB post, showing my tests with the very different images, if they were sure their original wasn’t uploaded / linked in order to get something like that, and after they checked with the person who provided that image result to them that yes, indeed they did upload / link to Kelly’s original image which is the only reason why it looked so similar.
    Someone would have to force it that way to get something at all like their work.

    I have to wonder if that same person provided that generated the image shown in the NYT article by Sarah Andersen without at first disclosing that they used her original work to get that result too, and maybe the attorney had already been contacted with plans in motion to file before they found out the images were not in fact achieved with only a prompt because that example is not in the Exhibits filed.

    The person who provided Kelly with the forced image admitted to that in a Twitter thread responding to me, also saying they ‘smoke a lot of weed i don’t remember this shit’ when responding to some of the details I was asking them for.
    If you look up a tweet thread from user @Zn2plusC that says ‘none of these uses img2img’ that shows a collection of images that don’t look like any of Kelly’s works, you’ll see the reply that says ‘these did though’ and those are the images that Kelly had shared at first incorrectly stating they were made with a prompt only.

    Kelly also has said some things that may end up helping their defendants & hurting their case, such as tweeting in response to a Medium article about AI art that used their name to generate Mona Lisa images in Kelly’s style, saying “this isn’t even how I’d paint the Mona Lisa”, which contradicts their argument that these models are copying or stealing their original painting styles.

    Question is, are examples like those shown in your blog post submittable as evidence if they don’t include the lead plaintiffs’ work being copied? I thought class actions were supposed to be based on lead plaintiffs with the best examples of what the lawsuit is claiming, but is that correct?
    The examples were rare instances, the researchers used the 350,000 most duplicated images in the dataset out of the billions in it (since humans duplicate images all over the web there are popular ones that will show up a lot more), generated 175 million images from those 350K images and it turned out 94 images that were similar out of the 175 million.
    Definitely not something happening in the vast majority of uses, but then we don’t know how much the output is going to be focused on now as infringing or just the input for training.

    If the judge wants the specific works infringed to be pointed out as he seemed to be requesting, it seems they’d need to show proof these AI models are copying specific works in order to show it’s actually there in the model unless that’s turned up in discovery, if it makes it that far. Especially regarding MidJourney & DeviantArt since they didn’t make the LAION dataset if they’re going for infringement at the training stage.

    I did tests with Sarah’s name as well and the results also didn’t look like her work except for being black & white illustrations of people with big eyes, but they look more like Tim Burton’s style than hers.
    That might be enough to show she’s in their dataset simply because of the B&W aspect with big eyes, but if it needs to be tied to specific works then it might not.

    • One of the things it seems that the judge wants is information about what the AI machines were trained on. If the artists who are plaintiffs in the class action uploaded their own artwork in order to generate an AI image that is substantially similar to their own, that’s a problem. Outside of the information contained in the LAION, AI companies aren’t sharing what their machines have been trained on. So, if the artist did not upload their own images and if they are able to generate images that closely resemble their style, I would argue that that is enough to move forward into the discovery phase of the case to obtain information about the actual training data.

      • I agree that it would be enough to move to discovery if indeed they generated images resembling their work without uploading the originals to force it.
        Being given falsified evidence by an irresponsible person and then finding out it was done in a way that would discredit it, is the only reason I can fathom why they didn’t include such evidence in their claim if they had it.
        If that’s true then would they have to bring in a new plaintiff that actually does have evidence of copies to save the case?

  • Paul Pearson says:

    To me, the samples are clear-cut copies. There seem to be little difference. Ask Joe public what the difference is, and I think they would say, “they are copies”, – which says it all.

    • Yeah those definitely look like copies, but none of them are any of the plaintiffs’ copies. Do the examples have to be those belonging to the lead plaintiffs? I thought they did but I could be wrong

      • The examples of the infringing copies would not have to be generated by the named plaintiffs in order to be used as evidence. But there has to be enough information about how they were generated in order to authenticate them and admit them into evidence.

        • So the copies wouldn’t have to be generated by the named plaintiffs, but would they need to be copies of the plaintiffs’ works? Or could they use a generated copy of anyone’s work as their evidence?
          I thought the plaintiffs would need to show copying of their own work in order to be the lead plaintiffs in a class action, but is that not true?

  • >
    Share
    Tweet
    Share
    Pin