A clarification on the principles of AI painting

This text contains content generated by ChatGPT (GPT3.5-Turbo), but not all of it is copied verbatim.

Artificial intelligence technology is developing, and the field of AI painting has also attracted attention from all walks of life. Among them, AI painting, since the end of 2022, as a controversial emerging technology, has touched the nerves of many people. However, many people who have expressed their views on AI painting have a lot of misconceptions about the principles of AI painting. One of the most common misconceptions is that AI painting simply combines the works of painters and, at most, smoothes the seams with so-called technical methods, without any originality or creativity. This idea is a betrayal of rigorous scientific research spirit and an insult to the spirit of truth-seeking in art. It is detrimental to both AI painting and the research and development of traditional art. Therefore, it is necessary to specifically explain the principles of a group of AI paintings led by Stable Diffusion. It should also be noted that although Stable Diffusion was the first to cause a sensation in AI painting, it is not the most advanced painting program in the modern computer science field, especially in non-human aspects, Midjourney may be superior.

Stable Diffusion Official Repository NovelAI Official Website Midjourney Official Website

First of all, we need to admit that inherent ways of thinking undoubtedly lead us to different positions and understandings of AI painting. For people accustomed to traditional thinking, it is a very natural idea to think that AI painting simply combines the works of painters. Because their thinking is fixed, after seeing a large number of standardized, mass-produced traditional art or commercial art aesthetics, they naturally regard AI painting as an extension of traditional art, making it difficult to fundamentally understand the form and meaning of AI painting. This inherent conceptual imprisonment keeps them confined to the framework of traditional art and makes it difficult for them to see the new deconstruction and transcendence of AI painting. The viewpoint of jigsaw puzzles is the result of old ideas and a half-bottle learning attitude, or a way for people with ulterior motives to spread panic to the public. Many painters easily believe that AI painting is a jigsaw puzzle, also because jigsaw puzzles are a classic means of plagiarism, which coincides with the undeniable fact of AI copyright disputes.

However, we cannot deny that AI painting does have some flaws. In terms of drawing certain specific objects, AI painting still has some limitations and is not as flexible and capable of naturally expressing emotions and subtle expressions as human artists. It is difficult to depict specific things such as hands and lines. Similarly, the creation of AI painting is not completely independent. They need to extract previous artworks for learning and imitation, so there is a certain "routine" and "limited creativity". Moreover, because the content in the training set violates copyright and even human rights, these works also face complex public opinion storms and legal disputes. However, this does not mean that AI painting is simply a collage, or as some people think, a so-called "alchemy" process. On the contrary, this more reflects that AI is a complex industrial product that is difficult to make definitive conclusions about. They can approach human thinking and expression more naturally by strengthening self-learning and deep learning algorithms. Because if it's just a simple jigsaw puzzle, how can elements like hands and lines become blurry? Could it be that the "original image" was originally blurry and passed over in one stroke? Think about it, it's probably not. Faced with technology that they have never understood, instead of making up stories and speculating about its internal principles, it is better to treat it as a simple "black box", which is more practical than making up stories.

Logical reasoning is empty, but when it comes to practice, the official repository of Stable Diffusion has already explained the specific working principles. It just involves a lot of professional terminology, and "jigsaw puzzle" is an incorrect oversimplification. The essence of Stable Diffusion is actually a denoising algorithm during generation. It can be understood as a "black technology" that makes unclear images clear. An illustrative analogy is that in winter, a lot of water vapor condenses on the window, and the outside world becomes blurry. This is what Stable Diffusion does when generating an image—it analyzes the user's prompt words through natural language processing (NLP) and generates extremely blurry blocks based on this information, just like thick fog on the glass. However, by iterating on the noisy image, which is like wiping the glass with a piece of paper, a high-resolution and realistic image can gradually be generated. This paper is a technology that frequently appears in all AI, called "gradient descent algorithm". With its power, it erases the fog called "image noise". Mathematically, to explain it with knowledge from middle school, it is like finding the vertex of an upward-opening function graph, such as a quadratic function, where the abscissa of the vertex is -b/2a, and the principles are similar. It's just that the function that the computer processes, unlike those functions learned in middle school, has only one parameter. On the contrary, it can have billions of parameters. With the powerful computing power of CPUs or GPUs, computers can minimize a specific loss function, that is, gradually approach the lowest point of a certain n-dimensional function step by step, thereby generating an image that matches specific input conditions.

The original meaning of Diffusion is "spreading", which is the inherent job of this AI. (Stable is just self-praise, describing stability.) The part of the image processing involved in this process first requires preparing a noisy image. This image contains elements of random noise and is extremely unclear. The generation of this image is based on the processing results of the user's prompt words by NLP and some random numbers selected from the training set. In fact, there is not much difference from the process of human painting, except that the number of iterations is much larger—human painting requires drafting, and this noisy image is the draft of AI. At this time, AI, like a human, will determine what should be in what place and what color it should be. Then, using a deep neural network (the formation of this network is also based on a large number of image training, adjusting the weights and parameters), it iterates on the image, and each iteration will "diffuse" the noisy image, transforming it into a more detailed and complex image. So the paper or cloth used to wipe the window is made by blending those images together. After each iteration, the quality of the generated image is evaluated using mathematical functions, and adjustments are made based on the feedback for the next iteration.

From this perspective, the principles of Stable Diffusion are not so evil, at least not blatantly copying. However, this does not mean that the works produced using Stable Diffusion can claim complete copyright and receive complete protection from the law. This issue should actually be taken to a higher level and questioned about copyright itself. Because whether the principles of SD can be regarded as a kind of learning behavior similar to humans, so the works produced can claim complete copyright, or should they be regarded as a technical experiment, a product destined to be placed in the public domain, as a niche hobby for CS enthusiasts, strictly prohibited for commercial use?

However, in any case, we cannot ignore the practical value of AI painting or simply take sides on AI painting. Through AI painting, we can greatly improve the efficiency of digital graphics and algorithmic drawing, and accelerate the pace of technological progress. Through AI painting, small creators can overcome the limitations of knowledge and funds and provide better creativity for their works. They can raise some donations (not directly profit from AI painting) and have the freedom to choose to collaborate with human painters, promoting the development of art to a new stage. At the same time, AI painting will also empower technology and bring new opportunities to CS, which is conducive to improving computers as a great tool of the Internet.

In summary, when understanding and recognizing AI painting, we need to have a dialectical way of thinking and make comparisons and analysis. We should not only see its limitations, nor should we exaggerate its advantages and practical value. Many CS practitioners have found that the explosive growth of AI from 2022 to 2023 is not innovative in principle. They just use larger and larger training sets, that's all. No matter how big the obstacles are, AI will continue to evolve and develop, and technological iterations will inevitably have many impacts on art. At the same time, traditional art creators should also be respected. Art is not an industry, but a means for humans to pursue beautiful things and a social means. Promoting the common development of artificial intelligence and cultural art is the most likely mainstream voice in the future.

SHSEE Diary Project
SHSEE Diary Project#

As time goes by, words remain. There are 49 days left until the high school entrance examination, and I hope to record what I see, hear, and feel by writing a diary until about a week before the exam.
This is the 2nd article in this project. Thank you for your support!

Centralized Blog Personal Homepage Blog Navigation

A clarification on the principles of AI painting

SHSEE Diary ProjectSHSEE Diary Project#

SHSEE Diary Project
SHSEE Diary Project#