San Francisco artificial intelligence research laboratory OpenAI announced last week its image generation AI, DALL-E, has received a major upgrade, reported MIT Technology Review.
DALL-E 2, as the upgraded tool is called, converts text prompts into images like its predecessor. However, the new version is reportedly far more advanced, creating images that more accurately match the text prompt and can even be tweaked to incorporate different styles.
If you were hoping that working in a creative field would be a surefire way to avoid AI automating your hob, it looks like not even that field of expertise is safe.
DALL-E 1, the earlier version of the tool, seemed like a fun party trick: input a few simple words, such as “avocado + armchair,” and the tool would valiantly produce an image for the absurd mashup. The results bore the trademark psychedelic visuals and glitches common to AI image production, but were nonetheless impressive.
DALL-E 2 is far more specific and accurate. Two test phrases provided by OpenAI, “Teddy bears mixing sparkling chemicals as mad scientists, steampunk” and “A macro 35mm film photography of a large family of mice wearing hats cozy by the fireplace,” resulted in storybook-ready perfection. Requests to generate images in the style of Vermeer or Gaugin were also quite successful.
“One way you can think about this neural network is transcendent beauty as a service,” Ilya Sutskever, cofounder and chief scientist at OpenAI, told MIT. “Every now and then it generates something that just makes me gasp.”
But these are examples of moments when DALL-E 2 performed to the utmost of its ability. A prompt to depict an astronaut riding a horse in the style of Andy Warhol leaves something to be desired. The same prompt, in photorealistic style as opposed to Warhol’s style, is much more impressive. Yet a closer look reveals some weaknesses. Like many beginner artists, DALL-E 2 seems to have some trouble depicting hands and feet.
DALL-E 2 can also be used to edit existing pictures. For example, a dog sitting on an armchair can be replaced with a cat. Though DALL-E 2 could potentially have a big impact on how people produce images, to the effect that something like Photoshop engendered, the purpose of developing DALL-E and its iterations comes from a larger research project on the development of AGI, or artificial general intelligence, a phrase that represents a truly intelligent agent.
“Our aim is to create general intelligence,” researcher Prafulla Dhariwal told MIT. “Building models like DALL-E 2 that connect vision and language is a crucial step in our larger goal of teaching machines to perceive the world the way humans do, and eventually developing AGI.”
OpenAI has not yet released DALL-E 2 as an easily accessible software as they are still testing out the technology. The researchers are attempting to insure that it is not used to create violent images or deep fakes, amongst other concerns.
However, there are plans to eventually release DALL-E 2 to the public.