AI-based Illustration Tools - Bristol + Bath Creative R&D

There exists a plethora of computer-aided illustration tools. These range from those developed by mainstream companies, such as Adobe and Autodesk, to popular apps developed for the Ipad. In spite of the widespread popularity of these tools, each of them usually requires significant effort by illustrators. There are also programming tools, such as PyGame, which enable programmers to create illustrations but involve significant effort. Writers are adept at weaving vivid imaginary tales through text but there are, currently, no tools that would allow the generation of illustrations through using descriptions. Towards this end, we could benefit from recent advances in AI, such as text-based image generation. The aim is to explore a means for interactive text and image-based generation tools that could aid the easy generation of illustrations, which could be aimed at specific use-cases, such as helping illustrate books for children.

Basic Approach

At the heart of developing AI-based tools to help obtain illustrations through text and interactions is a specific class of generative modelling techniques, such as Generative Adversarial Networks i.e. GANs. These techniques consist of a pair of neural networks that take on adversarial roles against each other. One is a generator network that aims to generate images, i.e. acts as a forger that can generate images and the other network is a discriminator network, i.e. can be thought of as an art expert that can realistically detect forged arts. These networks are trained as adversaries of each other with the first trying to fool the other and the second trying to recognise the trick. The fundamental approach has come a long way since the initial works in 2016 by Goodfellow et a, where very basic image generation was possible, to the recent realistic image generation techniques published in Style GANs by Karras et al that can generate portraits of high quality.
While GANs can be used for the generation of images, there are two more techniques that are fundamental for enabling us to explore the present approach to help illustrators. These are based on the idea of being able to generate images from text and the ability to interactively edit images generated by GANs.

Fig 1: Progress in image generation using GANs over the years. Figure credit Bolei Zhou (2021)

Text to Images

The initial ideas for text to images were explored by works, such as those by Reed et al in 2016. Since then, the techniques have progressed and, in particular, the recent works by Dall-E by OpenAI (Ramesh et al.) have shown that it is possible to generate realistic image samples that are based on textual cues. Interestingly, the approach shows that large-scale learning based on the correlation between images and text enables novel generation of images to be possible based on textual cues. An example of this is shown in figure 2, where novel examples are generated based on textual cues that the model was not trained on.

Fig 2: Generation of images based on text. Image from Dall-E by OpenAI (Ramesh et al).

Interactive Image Generation

While the ability to generate images based on text provides us with many interesting possibilities, we may want to also be able to manipulate the generated image. For instance, we might wish to edit the placement of the various objects or change their scale. In order to do so, we need to be able to edit. This is an interesting challenge and there have been interesting works that aim to provide controls for editing the generated images. For instance, the GANspace work by Härkönen et al shows means for obtaining interpretable controls as shown in figure 3. However, these controls so far have been mainly for realistic images. More research would be required to obtain controls for editing illustrations.

Incorporating Existing Tools

The approach discussed so far has been driven purely by AI techniques, such as GANS. An alternative direction is to incorporate methods based on existing programming tools, which help generate illustrations, such as Manim, PyGL or Blender utilities. These are higher level APIs that use graphics rendering pipelines such as OpenGL in order to generate illustrations. A possible interesting research challenge is to incorporate such pipelines in the image generation pipeline. These could enable rapid data driven image generation methods that could benefit from libraries of such programs and the generations and learn from the adaptation of the parameters.

Fig 3: Interpretable controls for editing GAN images. Images from GANspace by Härkönen et al

Some progress in this aspect has been made through our work (Raghav et al.) in terms of the shading of sketches. In this work, we had shown that, when given a sketch, it is possible to automate the task of generating shading or hatching. The hatching or shading of lines assumes an understanding of the 3D shape of an object while the drawing of a sketch is done on a 2D surface. Through our work, we are able to automate and generate an accurate hatch pattern through an AI-based approach that uses GANs. An illustration of our work is shown in figure 4.

Fig 4: Generation of shading from a sketch, our work presented in Raghav et al.

Future directions

The success of our research so far shows that it is now possible to consider AI-based tools that could help generate illustrations. However, the works I have presented are specific to the kind of data sets that the various AI models are trained on. In order to obtain tools specific for a particular domain, such as helping illustrate children’s books, it would be required to obtain datasets at scale.

Moreover, the works developed for text-based image generation and interactive image generation have mainly focused on techniques that are trained on realistic image collections. In order to develop techniques that help illustrators, it would be important to adapt these techniques for text based image generation and interactive editing for illustrations. The domain of illustrations is very wide and, so, it would not be feasible to develop tools for such a wide range. It would be more feasible to aim at specific applications, such as illustrations for children’s books.In order to develop specific AI-based tools, we would also need further inputs from illustrators, to know what tools they would prefer that would help them create illustrations more easily. Finally, the use of such AI-based tools would also need to be guided by ethical guidelines that ensure fairness and representation for all demographics of society.

Possible Impact

Developing tools for specific sectors, such as children’s book publishing, could help publishers to rapidly produce more books that could encourage a new generation of readers and reduce costs. This could help us to begin to understand the future of cooperation between artists and humans and could lead to the next generation of collaboration between artists and AI tools.

References

• Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio: “Generative Adversarial Networks”, 2014; [http:// arxiv.org/abs/1406.2661 arXiv:1406.2661]

• Tero Karras, Samuli Laine, Timo Aila: “A Style-Based Generator Architecture for Generative Adversarial Networks”, 2018; [http://arxiv.org/abs/1812.0494... arXiv:1812.04948].

• Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever, “Zero-Shot Text-to-Image Generation”, 2021: [https://arxiv.org/abs/ 2102.12092], blog: https://openai.com/blog/dall-e...;

• Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, Sylvain Paris, “GANSpace: Discovering Interpretable GAN Controls”, NeuRIPS 2020; [https://arxiv.org/abs/2004.02546]

• Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee, “Generative Adversarial Text to Image Synthesis”, ICML 2016; [https://export.arxiv.org/abs/ 1605.05396]

• Raghav B. Venkataramaiyer, Abhishek Joshi, Saisha Narang, Vinay P. Namboodiri, “SHAD3S: A model to Sketch, Shade and Shadow”, WACV 2021; [https://arxiv.org/abs/2011.06822]