Thursday, December 1, 2022

Hello DALL-E, well hello, DALL-E...

"A sad Al Capone sitting on a park bench 
all alone" suffers from not resembling the 
mobster. The park bench is fine.
     The press prides itself as watchdogs, investigators, the people who dig up the bad news and shine the light of day on it.
     But there's also a pervasive cheerleader aspect to the media that doesn't get acknowledged often enough. At least not publically. Privately, we've all had the experience where some trusted writer or critic or news source raves about something — a new restaurant, movie, even a computer program — so much that we try it out ourselves, and what was presented as a fragrant ladle of savory wonder turns out, in real life, to be an unappealing spoonful of mediocre meh.
      In late September, the Washington Post went wild for DALL-E — yes, the name is a mashup pun of "Dali" and the movie "Wall-E"— an artificial intelligence image generator, in "AI can now create any image in seconds, bringing wonder and danger," by Nitasha Tiku.

     "Wonder and danger." That seemed something to try out myself. Particularly as someone who needs fresh images on a daily basis, for my blog, and sometimes for special projects — for my new book, I had to hire an artist and pay her slightly more than I received for writing the book itself.
     Wouldn't it be easier to just plug my needs into a free program that "has dazzled the public, attracting digital artists, graphic designers, early adopters, and anyone in search of online distraction"?
     I went to the site and signed up and found WaPost-level enthusiasm.
     "DALL·E 2 can create original, realistic images and art from a text description," it promised. "It can combine concepts, attributes, and styles."
     Let's go! I pursed my lips, thought hard, then typed, "Dante Alighieri smoking a cigarette on the moon."
     "Something went wrong," a message told me, vaguely. "Please try again in a few minutes." I did. Something was still wrong. A couple more attempts, then I deployed my "wait a while" strategy and came back the next day.
      In the fresh dawn of a new day, an image of the dour Florentine was now doable. DALL-E tossed off four drawings. Only in the first did the man look vaguely like Dante. In one he was in inexplicable whiteface. None were on the moon — the moon was behind him. They did get the cigarette right. But at least it was working.
     Okay, try something else. A Beatles lyric came to me. I typed: "Rocking horse people eating marshmallow pie." DALL-E understood "rocking horse" and "people" but couldn't combine them. It could do "marshmallow" but not in a pie.
     Again again, as the Teletubbies used to cry.
     "A train station filled with birds reading newspapers"

     The first image was quite pleasant, though missed the "birds reading newspapers" aspect. None were up to the level of a skilled high school art student working on an assignment.
    I tried to think of an image I might actually need: guns. I write about guns from time to time, so asked. "Many handguns all pointed at the viewer." Here DALL-E balked.
    "It looks like this request may not follow our design policy" pouted a dog and cat — I bet they hired a. artist to render them — looking at me reproachfully. A reminder that the online world, when it isn't busily vile, is reflexively timid. 
     When I wanted to see "Criminals on a street corner in Chicago" all the images where of men in hoodies with their backs to the viewer — earlier DALL-E versions were criticized for the race bias the creators had initially hardwired into the system. Obviously nobody wants to be accused of that.  Their idea of a Chicago cop was closer to a mall security guard. They did create images of a beautiful sunflower.

     I'd say 90 percent of my requests generated unusable garbage. "Humpty Dumpty juggling maraschino cherries" showed AI has yet to grasp the concept of the famous eggman — they seemed to think he was a clown, with the same splayed, clawlike hands DALL-E likes to render; AI, like all novice artists, seems to do a particularly bad job of hands. 
     "A beautiful woman playing Scrabble with martians" contained women who weren't beautiful, games that weren't Scrabble, and no denizens of the red planet. (Beauty of course is subjective, but DALL-E faces, composites of many photos, have a blurred stitched-together quality more scary than appealing)
     I thought I'd try out a real life example. In my book, for the day the Cubs finally won the World Series, I thought of the Hindu god, ganesh, "the remover of obstacles" and asked artist Lauren Nassef to draw it with arms holding various Cubs tokens. I asked DALL-E: "The elephant-headed god ganesh wearing a Cubs cap and holding a pennant." I got this:


     It couldn't do a Cubs cap — trademark infringement? — but it got ganesh down fairly well and we executed enough to give me pause. Was DALL-E getting better? Or perhaps my standards were just lowering.
     Are artists endangered? We have to remember that the human default is to fear doom at every new development. A century and a half after Edison offered recorded music, there are still jobs for violinists and violin-makers.
     But art, well, while I'd encourage you to go online and give it a spin — it's free — there are still more than a few bugs in the system. For the moment. One thing about technology is, it gets better. I scoffed when my wife bought a robot vacuum. Now I love it; no cleaning lady ever vacuumed downstairs with the singular focus of our Eufy.
     I went back to the Post story, to see if maybe I missed some faint whiff of disappointment. I found a little:
     "The ability to create original, sometimes accurate, and occasionally inspired images from any spur-of-the-moment phrase, like a conversational Photoshop, has startled even jaded internet users with how quickly AI has progressed."
     "Sometimes accurate, and occasionally inspired..." So there was a wink there. The Post hedged its bets, went for subtle. It takes a certain confidence to declare something not up to its advertising. I don't know why they didn't want to come out and say it flat out: WALL-E, not so hot. Maybe Jeff Bezos is an investor.
     I was about to splash the above in the newspaper, maybe on a Monday across two pages, tisk-tisking the entire operation, when I had breakfast with my younger son, and he raved about DALL-E, and whipped out his phone, and showed me images that were much more arresting than mine. That gave me pause. I realized I was trying for broad esoterica when what was needed was narrow, concrete  and specific. I held this, for over a month, and figured I was safe to let it rip here in the less rigorous world of online punditry. 
     Since then, using DALL-E (an apt name, given that Dali too was given to excessive hype and fraudulence) I've dialed back my scope and increased my success rate. "An angry man in profile vomiting a stream of fire" for a Caren Jeskey post gave me exactly that (actually three: the program gives you four different options for every request). I couldn't find an apt photo for Wednesday's column on anti-semitism, and asked WAll-E for a picture crowded with hooked noses. The automaton delivered, again nothing great, but good enough. The future isn't here, AI artwise, but we sure can hear its approaching footsteps.

Wall-E had the Ford Mustang down fairly well, but failed at the "made of red brick" part.

 












3 comments:

  1. Funnily enough, I just signed up for that today, after seeing the program mentioned on Eric Zorn’s blog, with a tip of his hat to you. Then my husband did the same, after I created a little scene involving his favorite historical figures in an unlikely setting.

    ReplyDelete
  2. Wasn't it Johnny Carson who used to do "Stump the Band" on his show? The audience would submit obscure and weird song titles for the house musicians. Let's try "Stump the AI"...

    In eighth grade, I had this zany Japanese art teacher. He was a lot like Jack Soo, the actor and comedian. After a class field trip to the Art Institute, we were ordered to draw people from a different time who were doing something from our own time. Crazy, man, crazy1

    I was very into cars and driving at 13 and read a lot of books and magazines about them. I drew a reasonable facsimile of a '60 T-Bird, with two of the Pharaoh's bodyguards, who were trying to scrub off the Egyptian schmutz. To make it a bit more authentic, I added a couple of pyramids and palm trees, and a crescent moon, as well.

    I've never been able to draw worth jack, so the result was as terrible as it was ludicrous. But my efforts did earn me a passing grade. Probably for originality, if nothing else. Four years later, my sister was doing the same kinds of things in his classes. She, too, thought he was a goofy guy.

    For old times' sake, Mr. S, let's try "Two Ancient Egyptians Washing a Thunderbird"...if it got the Mustang, it'll get the car right. But not those two guys, or the background. Stumped again? Hell, yeah...and thanks muchly!

    ReplyDelete
  3. While it's interesting to see the results of your requests, I don't care much about DALL-E one way or the other. Clearly, it'll take a few upgrades before it can match the drawing in your book of ganesh wearing a Cubs hat along *with* wearing a giant #1 finger, while holding a baseball and a hot dog...

    However, I *am* duly amazed by "for my new book, I had to hire an artist and pay her slightly more than I received for writing the book itself." Say what, now? I mean those sketches are swell, but holy moly!

    ReplyDelete

Comments are vetted and posted at the discretion of the proprietor.