You might have noticed that I’ve been doing an awful lot of introspective thinking and writing lately.
Sure, giving in to nostalgia can be fun. Who doesn’t enjoy thinking about the 80s, or remembering how it was to start a business? I want to write these stories down to preserve them, because I know that memories are malleable.
But there’s another reason I like to focus on my own past, and on the world I grew up in. It’s because I am the sum of all those experiences, and because I am the lens through which I interpret the world.
Think about this in the context of glasses you’re wearing that make everything look bigger than it actually is, or greener, or whatever. Ideally, you’d take those glasses off to see the world as it is, but you can’t take yourself off.
This means that the only way to get a clearer view of the world is to do the next best thing: to take these distortions into account. That’s why introspection is so important: if you don’t understand yourself, it’s going to be very unlikely that you’ll be able to have an accurate perception of events.
Perception among human beings can vary widely. Just ask anyone what they think about politics, human rights, religion… you name it, people see it differently.
Here’s something that’s at least as interesting to me: our generative AI models also seem to have very different interpretations of the world.
Today, I’ve brought in
, who runs two publications I enjoy and subscribe to here on Substack, to help tell the rest of this story. Daniel is especially helpful here, in this conversation, because he has spent considerable energy and time understanding how different AI image generators work.Our idea is to generate three images from three of the very best image generators out there, but to use the exact same prompt (instructions), so we can see how each AI “sees the world” differently.
If you want to read about how to create some of these images, Daniel runs
, and he shares some excellent tips on how to craft better images. I’ve used some of these tips here. You might also check out , Dan’s experiment in determining whether AI can be funny. Don’t miss the comments section.When using different text-to-image models, you'll be forgiven for expecting all of them to respond in the same way to a given prompt.
After all, they're a product of machine learning, calculating and predictable. We give them an instruction, and they spit out a formulaic result.
Right?
But very quickly, you'll notice that each model has its own "personality," if you will.
Take the following prompt:
"Surrealist painting of a medieval village viewed through a pair of neon glasses"
Here's how Midjourney V6 responds to this:
Here's DALL-E 3:
And finally, here's Stable Diffusion XL:
Not quite the same vibe, is it?
Midjourney is more literal, simultaneously showing us the view with and without the glasses.
DALL-E 3 ends up incorporating the pair of glasses into the composition itself.
Stable Diffusion XL ditches the physical glasses altogether, opting instead to present us with the view from the inside as if affected by this invisible pair of neon glasses we're wearing.
That's not to mention that their use of colors, styles, etc. varies a great deal, too.
Three different models. Three different lenses.
Of course, there's a perfectly straightforward explanation for this: Each model is trained on a different underlying dataset, by different teams with their own aesthetic biases, using differing technical approaches, and so on.
But I find it curious how text-to-image models, in a way, end up wearing their own sets of "lenses."
Each model is "the sum of its experiences," as Andrew put it.
Much like us, text-to-image models can arrive at different interpretations of the world, even when seemingly given the exact same starting point.
I don't want to get too philosophical or ascribe human qualities to AI models, but isn't it interesting how they---even if inadvertently---might offer us a new prism through which to view our human experience?
If we're willing to let them, that is.
Let’s circle back to something Daniel noticed briefly here:
Of course, there's a perfectly straightforward explanation for this: Each model is trained on a different underlying dataset, by different teams with their own aesthetic biases, using differing technical approaches, and so on.
In a similar way, each of us has grown up in a different environment. Every one of us has been exposed to a “different underlying dataset”—the sum of all our life’s experiences. And, like today’s generative AI models, we’ve all got slightly different software—our DNA.
Now, think about yourself for a moment. Have you ever had to take your own lens into context?
Let us know if this helps you think a little today!
I've been doing a lot of self reflection lately, as well. Many of the ideas I have accepted as part of myself did not, in fact, come from me, and I'm sorting out what I want to hold onto and what I want to reject from here on out. We are so impacted by views from our parents and siblings, society, religion, schools, and the media, and it's just exhausting. I've decided to take away the power those entities have had and give it all back to myself.
Another great post, Andrew. Fascinating and a bit scary (most humans are afraid of a change this great). The genie’s out of the bottle as far as AI is concerned. You touch on some deeply philosophical questions here. Interesting how societies tend to develop faster technologically than socially. Can we really wrap our heads around the meaning of these breakthroughs fast enough? Do many of us consider the greater meaning of that which we have wrought? We already have trouble reflecting on our own lenses, biases. We really have no system of checks and balances for the select few who create the underlying datasets that run AI. Their interpretation of the prompts are the result of their programming. Can the programming be manipulated to give us the answers someone wants us to see, not only in AI-generated images but in content as well? (Sorry, rambling on a 6:00 a.m...)