The latest tool to take over the internet is a powerful AI imagery generator that has turned the heads of many creatives.
You’ve probably seen the most popular trend to take over . . . well, Twitter, I suppose. The tool is pretty simple in concept — the users type in two descriptive words or phrases that, supposedly, are two wildly different themes and ideas. Think Darth Vader and kittens or Top Gun and . . . something lame. This tool combines the two phrases or ideas and creates individual images that morph the two together, creating one original image. Think of it like automated photoshopping.
It’s easy to see how something like this would take off. There are endless opportunities for comedic pairings and shocking imagery that are destined to go viral.
So how does the tool work, who created it, and what can we expect to see in the future from these types of apps?
How Does It Work?
Currently, the most popular app that you’ll see on social media is “DALL-E mini.” It creates 9-block image sets with variations of the same weird-looking theme. The results are ever-so-skewed, so they’re not hyper-realistic. They just look like weird drawings . . . almost.
Developed by OpenAI, an AI research nonprofit (co-founded by Elon Musk), DALL-E is a multimodal implementation of GPT-3 (generative pre-trained transformer) with 12 billion parameters, which “swaps text for pixels.” The tool is trained on “text-image pairs” from the internet. So, basically, it just scours the internet very quickly, connecting phrases with images. It uses “zero-shot” learning to generate output from a description and cue.
I had to look up what “zero-shot” learning is and found that — if a computer learns what a horse is but not a zebra, zero-shot learning is the idea that the computer can deduce that a zebra looks like a striped horse. So it can make some sort of distinction. It’s not too different from how we arrive at conclusions through deduction . . . right?
The Problem
Similar to the reaction that deepfakes elicit, I’ve seen some people questioning the ethics behind the whole thing. It’s not too difficult to understand this viewpoint, as any type of AI self-creation will always make us a little uneasy — for different reasons. I’ve seen some comments from people who are put off by the inconceivable idea of two “things” being in one “thing” together. It’s all relative to what images you’re using.
The issue of morality aside, as a creative, I can’t help but think of this as a potential way I lose work and/or credibility in the coming years as it gets easier and easier to replace photo-editing application skills with something like DALL-E.
So you’re probably thinking Are you serious? Those images look like a deep-fried nightmare that can only live in the meme-verse, not something cohesive, artistic, or (dare I say) professional. Well, that’s because we’re using the free, not-as-advanced version of the tool. That’s right — there’s a more powerful tool called DALL-E 2 that requires more from its users, regulating specific content restrictions and allowances. (You have to join a waitlist if you want to use it. In case you’re wondering . . . yes, I am on the waitlist.)
So what does a more powerful DALL-E look like? Well, hold onto your butts.
This image comes directly from their site — a case study for the tool. The text description was “An astronaut riding a horse in a photorealistic style.”
So, as you can see, it’s not just “real”-looking photos. There are many variations to how your images can look. This artistic freedom is what worries me so much, as the capability of the tool is so advanced, it’s hard to imagine this tool failing to find applications everywhere.
Impact on Photography and Video
While DALL-E isn’t specifically a threat to original video creation, it’s not hard to see where this could go. There are already several video-generation tools available; they’re just not capable of producing what the average shooter can create right now. There are plenty of AI-generated scripts and stories that are starting to trickle out into the world, but no one is producing total video+audio generation all in one package.
It’s also easy to visualize what this threat looks like to creatives. If a client is working on a new ad campaign, and they have a specific request for a desert valley with big bluffs in the background and a winding river cutting through at golden hour with a car parked in the bottom-left corner of the road . . . well, they could pay for this shoot with all that entails: models, location, photographer, photo assistance, lodging, transportation — the list goes on. Or, they could just do it themselves. Save a ton of money without leaving their office and assign the project to an intern.
Maybe I’m just being paranoid; maybe I’m just insecure about the future of my own skills. Nevertheless, the fear is real!
But it’s not all bad. I think there are endless fun applications for these types of tools. For instance, I have a good friend who created an Instagram account that is meant to replicate a modern “film photographer.” He uses all of the same hashtags for the Kodak stocks, old cameras, cliche slogans for captions — the only catch is that none of the images are real. They’re all 100% computer generated. Even “his” profile picture is computer generated. It’s a fun, comedic bit that I appreciate — I enjoy that people are making light of a seemingly dark situation.
The idea of AI-generated users isn’t new, though. We’ve been dealing with bots in comment sections and forums for years now, and it’s becoming more common for internet users to be skeptical of almost everything we see online. Our clients know this is as well, so it creates a weird dynamic where most creatives now are just trying to reach a certain level of originality and specificity for their own artistic voices. Perhaps this new tool is just one more iron in the fire of inspiration we need to make good art.
Are you inspired yet?
My Own Experience
Every time I used the app, my reactions couldn’t have been stranger. Usually, upon seeing them for the first time, I laughed. Then I caught myself just staring at them in silence, almost mesmerized by the strangeness. It’s like looking at a dog sniff around in the yard. You don’t know why it’s so transfixing, but it is.
So here are a few of my own creations in all of their weird, random glory.
Okay, just going to keep moving . . .
This one was my personal favorite, for obvious reasons. Okay, moving on.
So after trying a few pop culture mashups, I figured I’d try something stock photo-esque to see just what we’re dealing with here. So I typed in “sunset mountaintop,” and this is what it made . . .
My first thought was that it wasn’t terrible; it wasn’t great either — it’s just kind of what I expected. That being said, it did generate these images on its own, and some of them are kind of convincing.
So, now that we have a basic understanding of what DALL-E is, what it can do, and how it works, what do you think? Is this here to stay? Will this completely change the creative landscape for professionals and their careers? Only time will tell.
Cover image created using DALL-E mini.
Looking for some creative assets you can use today? We’ve got you covered.