Amplified Publishing Category
AI-based Illustration Tools
by Vinay P. Namboodiri
There exist a plethora of computer-aided illustration tools. These range from those developed by mainstream companies such as Adobe and Autodesk to popular apps developed for IPad. In spite of the widespread popularity of these tools, each of these tools usually requires significant effort by illustrators. On the other end, there are programming tools such as PyGame that enable programmers to create illustrations. These involve significant programming efforts by programmers to create illustrations. Writers are adept at weaving vivid imaginary tales through text. However, there are presently no tools that would allow the generation of illustrations by descriptions. Towards this end, we could benefit from recent advances in AI such as text-based image generation. The aim is to explore a means for interactive text and image-based generation tools that could aid the easy generation of illustrations. This could be aimed at specific use-cases such as helping illustrate books for children.
At the heart of developing AI-based tools to help obtain illustrations through text and interactions is a specific class of generative modeling techniques such as Generative Adversarial Networks i.e. GANs. These techniques comprise a pair of neural networks that take on adversarial roles against each other. One is a generator network that aims to generate images i.e. acts as a forger that can really generate images and the other network is a discriminator network, i.e. can be thought of as an art expert that can realistically detect forged arts. These networks are trained as adversaries of each other with one aiming to fool the other and the other aiming to catch the first one at its task. The fundamental approach has come a long way since the initial works in 2016 by Goodfellow et al. where very basic image generation was possible to the recent realistic image generation techniques published in the likes of Style GANs by Karras et al. that can generate portraits of high quality.
While GANs could be used for the generation of images, there are two more techniques that are fundamental for enabling us to explore the present approach to help illustrators. These are based on the idea of being able to generate images from text and the ability to interactively edit images generated by GANs.
Fig 1: Progress in image generation using GANs over the years. Figure credit Bolei Zhou (2021)
Text to Images
The initial ideas for text to images were explored by works such as those by Reed et al. in 2016. Since then, the techniques have progressed and particularly the works such as the recent works by Dall-E by OpenAI (Ramesh et al. ) have shown that it is possible to generate realistic image samples that are based on textual cues. Interestingly, the approach shows that large-scale learning based on the correlation between images and text enables novel generation of images to be possible based on textual cues. An example for the same is shown in figure 2, where novel examples are generated based on textual cues that the model was not trained on.
Fig 2: Generation of images based on text. Image from Ramesh et al, project titled Dall-E by OpenAI
Interactive Image Generation
While the ability to generate images based on text provides us with very interesting possibilities, we may want to also be able to manipulate the generated image. For instance, one may wish to edit the placement of the various objects or change the scale of some of the generated objects. In order to do that, one would wish to obtain means for editing the generated samples. This is an interesting challenge and there have been interesting works that aim to provide controls for editing the generated images. For instance, the GANspace work by Härkönen et al. shows means for obtaining interpretable controls as shown in figure 3. However, these controls so far have been mainly for realistic images. More research would be required to obtain controls for editing illustrations.
Incorporating Existing Tools
The approach discussed so far are driven purely by the AI techniques such as GANS. An alternative direction is to incorporate methods based on existing programming tools that help generate illustrations such as Manim, PyGL or Blender utilities. These are higher level APIs that use graphics rendering pipelines such as OpenGL in order to generate illustrations. A possible interesting research challenge is to incorporate such pipelines in the image generation pipeline. These could enable rapid data driven image generation methods that could benefit from libraries of such programs and the generations and learn from the adaptation of the parameters.
Fig 3: Interpretable controls for editing GAN images. Images from GANspace by Härkönen et al
Some progress in this aspect has been made through our work (Raghav et al.) in terms of shading of sketches. In this work, we had shown that given a sketch, it is possible to automate the task of generating shading or hatching. Hatching/Shading of lines assumes an understanding of a 3D shape of an object while the drawing of a sketch is done on a 2D surface. Through our work, we are able to automate and generate an accurate hatch pattern through an AI-based approach that uses GANs. An illustration of our work is shown in figure 4.
Fig 4: Generation of shading from a sketch, our work presented in Raghav et al.
The success through the various research works shows that it is possible to now consider AIbased tools that could help generate illustrations. However, the works presented are specific to the kind of data sets that the various AI models are trained on. In order to obtain tools specific for a particular domain such as helping illustrate children’s books, it would be required to obtain datasets at scale.
Moreover, the works developed for text-based image generation and interactive image generation have mainly focused on techniques that are trained on realistic image collections. In order to develop techniques that help illustrators, it would be important to adapt these techniques for textbased image generation and interactive editing to illustrations. Moreover, the domain of illustrations is very wide. It would be not feasible to probably develop tools for such a wide range. It would be more feasible to aim at specific applications such as illustrations for children’s books.
In order to develop specific AI-based tools, we would also need further inputs from illustrators, artists to know what tools they would prefer that would help them create illustrations more easily.
Finally, the use of such AI-based tools would also need to be guided by ethical guidelines that ensure fairness and representation for all demographics of society.
Developing tools for specific sectors such as children’s book publishing could aid such sectors to rapidly develop more books that could encourage the new generation and reduce costs to this sector. Moreover, this could help us to understand the future of cooperation between artists and humans and could lead to the next generation of collaboration between artists and AI tools.
• Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio: “Generative Adversarial Networks”, 2014; [http:// arxiv.org/abs/1406.2661 arXiv:1406.2661]
• Tero Karras, Samuli Laine, Timo Aila: “A Style-Based Generator Architecture for Generative Adversarial Networks”, 2018; [http://arxiv.org/abs/1812.0494... arXiv:1812.04948].
• Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever, “Zero-Shot Text-to-Image Generation”, 2021: [https://arxiv.org/abs/ 2102.12092], blog: https://openai.com/blog/dall-e...;
• Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, Sylvain Paris, “GANSpace: Discovering Interpretable GAN Controls”, NeuRIPS 2020; [https://arxiv.org/abs/2004.02546]
• Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee, “Generative Adversarial Text to Image Synthesis”, ICML 2016; [https://export.arxiv.org/abs/ 1605.05396]
• Raghav B. Venkataramaiyer, Abhishek Joshi, Saisha Narang, Vinay P. Namboodiri, “SHAD3S: A model to Sketch, Shade and Shadow”, WACV 2021; [https://arxiv.org/abs/2011.06822]