Using an AI Art Generator for My Novel's Character

Ellie Sullivan of Eternity's Echo is brought to life with technology... and I muse on what this means for the future.

by H.C. Southwark

Lately I've been experimenting with an AI art generator to get some ideas on how my characters look outside of my own head. Of course I already have a good sense of their appearances, I can describe them in the prose of the novel just fine. But you know what they say: a picture is worth many words.

The trick is, the AI art generator doesn't know them like I do.

Honestly, there is a learning curve to this, and I can see that there is a talent differential between those who have mastered the technology and those who haven't. It's not as simple as type and click.

With that throat clearing out of the way... here she is: Ellie Sullivan, the protagonist of my novel, Eternity's Echo.

There's a lot to like about this image of her.

For starters, Ellie is a sixteen year old girl with dirty blonde hair that is wild from mistreatment. She wears a red coat and a red scarf that hides the wounds on her neck.

In addition to getting these basic details mostly correct, the image also has a certain wistfulness to it, with the play of light and her gaze fixed upward.

In the story, Ellie is dead—she is currently serving as a soul reaper, guiding souls to the other side, and she's not very happy about it. I was struck by this image simply because it's out of character for her current circumstances, but very much in-character for what she wants.

However, it's one of the latest images I've generated. It took many, many discards to get decent results.

On the other side of accuracy—which is to say, not accurate at all—are images like these.

I'll be the first to admit they are cool, but they are not Ellie at all. The first makes her look like a superheroine (or villainess?) in a comic book. The other screams medieval D&D. Eternity's Echo is fantasy, but the characters do not look any different than an ordinary person, except maybe a little unkempt... when you can see them at all (perks of being a reaper).

This is what happens when typing "red coat, red scarf" into the AI generator goes wrong. My best guess is that the AI has been trained on a lot of fantasy images, and my other instructions were fantasy-oriented. In that sense, the machine has done its job. Just not specifically enough.

A closer attempt here, with the red scarf that I wanted to visually mark Ellie among the cast. However, the coat is all wrong. Leather? Black? I suppose one could get away with leather in November in Colorado Springs, under the right weather conditions. But come February, good luck.

Maybe it's a good thing Ellie is already dead?

I couldn't help but keep this image, though, because the pure snark in her face... absolutely Ellie. I was aiming for an image out of the first chapter , where she is waiting for her latest "customer" to die... and the machine read the prompt "annoyed, snarky" just right.

A rare example of the bot reading the prompt "long hair" as, well... long.

Given that the dataset probably includes a lot of "modeling" images, rather than everyday images of ordinary people, prompts like "messy" and "ragged" and "tangled" for hair just ends up being fashionably tousled.

Additionally, the scarf is missing... it's surprisingly hard to get a robot to picture a scarf, no matter how I try to emphasize it.

However, the robots seem to like when things become more abstract, perhaps because then they can focus more on the character in the foreground. Yay for the scarf making an appearance!

The extremely abstract but very lovely clock-ish face in the background came about after dozens of variations of prompts, plus some image2image work. It's the kind of thing a human designer could do in moments but took hours to gradually coax the machine to spit out. Even then, the results are always a little wonky—is that a clock, a compass, or a roulette table?

However, perhaps there is a thing such as "over-prompting," a point in which the diminishing return actually inverts and worsens the results. I kept butting into this wall over and over.

This is a positive result from such an experiment, where the abstract "clock halo" in the distance has instead become the background, raising the question of where the character actually is. Alternate dimension? I could hypothesize she is standing in front of one of the doors of the Hells, this one a big broad wooden door with a clock-like frosted window. I'll go with that.

Also, goodbye scarf. We hardly knew ye.

However, sometimes surprises result in new delights, like this composition of the clouds helping the fine lines of the "clock halo." This was a result of image2image with the image below.

This image is one of a dozen variations on the original, all of which have a lot of good things going for them. The scarf is present, the hair is wild, the crossed-arms posture is defensive, stubborn, and subtly vulnerable at the same time... it all screams "Ellie."

It's enough to forgive the short hair, too-light coat, and the... large clock necklace thing? I'm half convinced she has a stethoscope looped around her neck.

This image, however, has an additional flaw that about half of the viewers pick up on right away. If you're not catching it, follow that sleeve down to the pocket.

This image is a good ending for this experiment, I think. Those who read the book know what is happening here.

This is Ellie with Niles, her mentor. The AI bots have quite a difficult time with ensemble images. For example, in this here, the prompt "blue coat" was completely ignored in favor of red, and they're both blondes. Even their pants are vaguely green? Color coordinated reapers!

In all, I can say that I'm quite pleased with how the AI art is turning out. My skills at it are markedly better than when I first started a month before producing these images. And I'm still just a noob.

I would never have been able to produce these images myself. Perhaps with the right amount of time, motivation, and pictorial references, I could have produced something approaching the last image with Ellie and Niles. However, that is far away and beyond too much time for me to dedicate to that. As a writer, my time is better spent actually telling the stories about these characters.

I also could not have offloaded this work onto a human being... the amount of time to digitally paint the images in this post would have been hours upon hours. I am also a poor starving artist who can hardly swim for myself, so holding someone else above water too? That's not happening.

There's been a lot of debate on what AI art generators mean for the future. Add in AI speech and text bots (like the infamous ChatGPT), and the major intellectual arts of pictorial and verbal mediums are seemingly under a robot siege. It's still far too early to know the future, though.

One consequence I haven't heard people talk much about, however, is the increased accessibility of art in general. If AI art generators did not exist, then it's not that I would have hired an artist to produce these images. They simply would not exist.

AI means I was able to generate images from my books, from my imagination, in a way impossible before.

In a way, AI has "democratized" artwork.

We may be looking at a future where art is even more present in our lives, because everyone can generate whatever image comes into their head. It's possible this will be a world full of clutter. But another way to look at it is that perhaps the opposite will happen: what if the future will be beautiful?

Maybe we will just have to live through it to see.

Using an AI Art Generator for My Novel's Character

LEARN

JOIN