Founder of Midjourney admits the obvious about image use consent
David Holz, creator of the now very popular and also controversial AI image rendering platform Midjourney, has admitted that his company never received any consent for the use of hundreds of millions of images it used. These were fed to its AI for the sake of training it at image generation.
As could be expected, this has outraged many photographers and creators of all kinds even more than the existence of Midjourney itself has made some angry.
The revelation by Holz was first made public virally by Twitter users who shared an interview with the entrepreneur conducted by Forbes Magazine in September of this year.
During the Q&A with the famous business magazine, the Midjourney founder is at one point asked if he sought consent from living artists or owners of work still under copyright for his AI training.
Holz bluntly answered, “No. There isn’t really a way to get a hundred million images and know where they’re coming from.” This of course should have been obvious.
On the scale at which Midjourney needed existing images to learn its internal techniques, finding enough could onyl have been achieved with mass scraping or something like it.
Gaining permission from the creators of so many different photos and other visuals would indeed have been extraordinarily difficult if not impossible.
The Midjourney creator further elaborated, “It would be cool if images had metadata embedded in them about the copyright owner or something. But that’s not a thing; there’s not a registry.”
He also said that “There’s no way to find a picture on the internet, and then automatically trace it to an owner and then have any way of doing anything to authenticate it.”
Even if such a registry did indeed exist for so many photos, the sheer quantity of them, likely in the hundreds of millions, would make actually seeking consent from each image’s creator into a task worthy of the pyramid builders.
Despite the implicit logistics hurdles, the artistic Twitter backlash from some corners against this admission by Holz shouldn’t have surprised anyone.
words of David Holz (midjourney founder), from forbes article (link below): pic.twitter.com/rnWP28rrag
— Maciej Kuciara (@maciej_kuciara) December 20, 2022
What’s more, this backlash comes right on the heels of an earlier but still ongoing protest against the art sale platform ArtStation because this latter site is now allowing AI-rendered images to be sold on it.
Many artists unhappy with what ArtStation has done, or sympathetic to those who are selling their human-created art on the platform would also be unhappy with the Midjourney revelation.
Holz further explained in his interview that his AI rendering platform was essentially trained by a dataset created through conducting a “big scrape of the internet”.
“We use the open data sets that are published and train across those, And I’d say that’s something that 100% of people do”, according to Holz’s justification of his techniques.
Artists have been disagreeing. As one parodied, “‘We just stole all the copyrighted artwork, mushed it through an AI, reproduced it infinitely, and make money off of it,”
Another artist named David Lung was puzzled about how “David Holz blatantly admits to theft and copyright infringement in this article! His attitude is, ‘yeah, we stole from you to build a platform that we make a profit from, what are you going to do about it,”
Others have claimed to be doubtful about Holz’s claim that the images he used had no metadata, citing their own habit of always including embedded metadata and contact info in any art they post online.
Despite this suspicion about Holz lying on that count, he likely wasn’t, at least in part. A scrape of the internet big enough to produce more than a hundred million images would be bound to dredge up many with no contact info or metadata at all.
Midjourney’s creations have made both photographers and digital artists of all kinds uneasy. The platform can be asked to create remarkably good, even spectacular visuals with little more than prompts from a human user.
While it’s still a bit early to tell what results from this, there have been predictions about the death of the graphic arts industry at the very least.
Holz didn’t make his critics much happier by also adding that creators today can’t completely opt out of having their human work used for an AI training dataset.
Users can’t even completely avoid the risk of being named by Midjourney users in their prompts as examples of what look or style these AI users want their creations to resemble.
There are websites that have fought back against this entire phenomenon by giving artists tools for finding out if they’ve been “used”.
One example is a site called “Have I been Trained”, which claims to search across over 5.8 billion images to find out if a specific image by a specific person has been added to an AI data set.
Undoubtedly, many of the images used by AI datasets didn’t even come from professional artists and photographers, but quite a few did, and some of them could recognizably be used by these algorithms for generating specific visuals.
With that said, one other major problem for artists who are angry about this and Holz’s confessions is that no clear-cut legal framework has yet been established about whether such training sets are examples of copyright theft.
In other words, even if an artists could provably demonstrate that their work was being used without permission to create certain visuals, taking that claim to court successfully is a whole other work of legal art.