The art in artificial intelligence – you won’t believe your eyes

AUGUST 29, 2022, was the big day for the 150-year old Colorado State Fair in Pueblo, US. Among the numerous horse and livestock competitions, a series of more traditional art competitions, including home-made dolls, quilts, porcelain, needlework, as well as canned carrots, medicinal remedies and holiday breads, were run.

An important event was the announcement of the winners of the various fine art categories. This year a relatively unknown artist by the name of Jason Allen has beaten 20 other artists by taking the first place in the digital category with his work Théâtre D’opéra Spatial.

The artwork that Jason Allen presented is absolutely magnificent and looks like a professional painting reminiscent of the Renaissance period and Gustave Moreau from the late 19th century. In the art piece, three classical figures dressed in flowing robes in a baroque opera theatre stare through a circular viewport into a glowing landscape with bright sunlight.

The only dilemma is that Allen did not “paint” Théâtre D’opéra Spatial, but used the artificial intelligence (AI) software, Midjourney, to generate the artwork. In fact Allen is not an artist in the traditional sense of the word, but the president of a Colorado-based company, Incarnate Games, that produces tabletop fantasy games.

Naturally, the disclosure by Allen caused tremendous controversy on social media where working artists and art aficionados castigated him for causing the death of artistry and creative jobs and also for being deceptive in winning with a machine generated piece. Others accused him of automated plagiarism since the AI relies on millions of learned art pieces to design a work of art.

Allen, however, emphasised the human element and that his input was instrumental in the shaping of the award winning painting. According to him, art should not be judged merely by its method of creation. Perhaps the best solution would be a separate category for AI-created art. AI technology is merely a tool like the paintbrush and could empower new inventions and reshape our world. But without the person, there is no creative force.

Photography was also not considered an art form for a long time since people claimed that it merely entailed the pushing of a button. Only later people realised that it is about composition, colour and light. The same is true of AI. It can open up a new world of possibilities for artists. It will not disappear and should rather be embraced by artists as just another new tool. AI can contribute to the enigmatic beauty of an artwork but the soul, emotion and human effort will always make the artwork unique.

Allen’s artificial intelligence artwork is an excellent example of how rapidly AI-generated art has advanced over the last few years. Using machine learning and trained on billions of internet images, the AI systems have resolutely pushed the boundaries of what computers can create.

Modern text-to-image software tools such as DALL-E 2 and Midjourney, as well as AI-powered art tools such as Wombo dream, NightCafe and starryai, have significantly increased in sophistication. They cannot only generate fake people, objects or locations but can also mimic entire visual styles such as a storybook, cartoon, historical diagram or a photograph.

In April 2022 when OpenAI released DALL-E 2, the world was shocked by its ability to transform a scene described in words (called a “prompt”) into numerous visual styles that can be mundane, photorealistic or fantastic. Not long after the release of DALL-E 2, Google and Meta announced their own text-to-image AI models.

The commercial software, Midjourney, became a very popular AI art generator due to its capability to allow people to freely create new images on command by merely using the prompt “/imagine”. Within sixty seconds four newly generated images will be created. Allen also used Adobe Photoshop to remove visual artefacts or add missing detail and Gigapixel AI to increase the quality and sharpness of the picture.

Interestingly, Stable Diffusion, an open source image synthesis model, has just been released and allows any person with a PC and decent graphics processing unit (GPU) to design almost any visual reality they can dream of. The software has the ability to imitate nearly all virtual styles. All that needs to be done is to feed it a descriptive phrase and the results will appear on the screen like magic.

Stable Diffusion is the creation of Emad Mostaque, a former hedge fund manager from London. He formed a company, Stability AI, to make novel applications of deep learning available to the masses. Stable Diffusion was released on August 22, 2022, as open source software and matches the quality of DALL-E 2.

The free and open source software, Stable Diffusion, led to dozens of creative projects such as the upgrading of MS-DOS game art, conversion of Minecraft graphics into realistic graphics, transforming movie scenes into 3D, translating childlike scribbles into rich illustrations, and many more. Just like Adobe Photoshop in the 1990s, image synthesis makes the visualisation of ideas possible, lowers the barriers to entry and accelerates the capabilities of artists.

Stable Diffusion was trained on 5.85 billion publicly accessible images taken from the internet, including personal blogs and amateur-art sites such as Pinterest, Flickr, Getty Images and DeviantArt. Stable Diffusion has thus absorbed the styles of many living artists, leading to some artists forcefully objecting to this practice. However, Allen refuted this argument by emphasising that even people learn art by looking at art created by other people and studying their techniques.

During the training the model associates certain words with images and the relationship between the coloured pixels. The model never duplicates any images in the source set, but create a novel combination of styles based on the learning.

Although free and open source, Stable Diffusion is quite arduous to install and also resource hungry, especially with regard to the GPU and memory.

After the release of Stable Diffusion many people raised serious concerns with regard to its cultural and economic impact, as well as its possible misuse. Although Stable Diffusion includes certain filters and an invisible tracking watermark embedded in the images, these restrictions can easily be circumvented since it is open source code. This can possibly lead to the creation of images containing propaganda, violent imagery, pornography, copyright violation, celebrity deepfakes, and many more.

If historical technology trends will be followed, it is quite possible that the software that now needs a powerful graphics processing unit, in the near future will run on a smartphone, which may lead to an explosion in AI creative output. Stable Diffusion and other models are already experimenting with dynamic video generation and it is quite possible that we will see photorealistic video generation via text prompts soon. Thereafter, audio, music, real-time video games, 3D virtual reality experiences may follow. It will be the era of unlimited entertainment generated on demand and in real-time. It seems that the Star Trek “holodeck” experience, allowing us to create anything we can imagine, will soon be with us.

We are just beginning to explore the power of AI synthesis. However, this technology is not without dangers. We will never know in the future if a piece of media came from an actual camera or if we are communicating with a human. Without new verification systems it will be difficult to believe what you see online in future. But, technology will keep on changing and improving since “the only constant is change” as the Ancient Greek philosopher Heraclitus said in his great wisdom around 500 BC.

Professor Louis C H Fourie is an Extraordinary Professor of the University of the Western Cape.

BUSINESS REPORT