Intelli-Paint: Towards Developing More Human-Intelligible Painting Agents (ECCV 2022)

1Australian National University, 2Adobe Research

We propose Intelli-paint which tries to address the need for more human-intelligible painting agents. (Left) Painting sequence visualization which demonstrates that our method exhibits significantly higher resemblance with the human painting style as opposed to previous state of the art. (Right) This resemblance is achieved through 1) a progressive layering strategy which allows for a more human-like evolution of the canvas, 2) a sequential attention mechanism which focuses on different image regions in a coarse-to-fine fashion and 3) a brushstroke regularization formulation which allows our method to obtain detailed results while using significantly fewer brushstrokes.

Abstract

Stroke based rendering methods have recently become a popular solution for the generation of stylized paintings. However, the current research in this direction is focused mainly on the improvement of final canvas quality, and thus often fails to consider the intelligibility of the generated painting sequences to actual human users. In this work, we motivate the need to learn more human-intelligible painting sequences in order to facilitate the use of autonomous painting systems in a more interactive context (e.g. as a painting assistant tool for human users or for robotic painting applications). To this end, we propose a novel painting approach which learns to generate output canvases while exhibiting a painting style which is more relatable to human users. The proposed painting pipeline Intelli-Paint consists of 1) a progressive layering strategy which allows the agent to first paint a natural background scene before adding in each of the foreground objects in a progressive fashion. 2) We also introduce a novel sequential brushstroke guidance strategy which helps the painting agent to shift its attention between different image regions in a semantic-aware manner. 3) Finally, we propose a brushstroke regularization strategy which allows for ~60-80% reduction in the total number of required brushstrokes without any perceivable differences in the quality of generated canvases. Through both quantitative and qualitative results, we show that the resulting agents not only show enhanced efficiency in output canvas generation but also exhibit a more natural-looking painting style which would better assist human users express their ideas through digital artwork.

Need for more human-intelligible painting sequences?

Interactive Painting Applications. The practical merits of a stroke based rendering approach over pixel based image stylization methods (e.g. using GANs, VAEs) relies on its ability to mimic the human artistic creation process. Infact, several previous works including Paint Transformer, Optim and RL describe this ability to mimic a human-like painting process as their motivation for using a brushstroke based approach for image generation. The idea is that once trained, the learned painting agent can then act as a painting assistant / teaching tool for human users (Paint Transformer, RL {Huang et al., 2019}).

Robotic Painting Applications. Robotic applications for expression of AI creativity are being increasingly explored. For instance Pindar Van's cloudpainter robot has gained widespread global attention for the automated creation of artistic paintings. Our contribution is significant in this direction, as our method not only learns a painting sequence which is more interpretable to actual human users, but more importantly it provides an *efficient painting plan* which would allow a robotic agent to paint a vivid scene using significantly less number of total brushstrokes as compared to previous works.


Fig. 2 - Overview of the progressive gridding inference strategy from previous works. The painting agent divides the overall image into successively finer grids, and then proceeds to paint each of them in parallel.



Despite the above mentioned motivations, the generation of competitive results using previous works is invariably dependent on a progressive grid-based division strategy. In this setting, the agent divides the overall image into successively finer grids, and then proceeds to paint each of them in parallel. Experimental analysis reveals that this not only reduces the efficiency of the final agent, but also leads to mechanical (grid-based) painting sequences (refer Fig. 2) which are not directly applicable to actual human users.



prev_work

Fig. 3 - Previous works exhibit a mechanical (grid-based) painting
sequence which is not directly applicable to human users.

Mimicking the human painting process

In order to address the need for more human-intelligible painting sequences, we propose a novel Intelli-Paint pipeline which learns to paint canvases while mimicking the human painting process using three main modules.

  1. Progressive Layering
  2. Sequential Brushstroke Guidance
  3. Brushstroke Regularization


We next analyse the importance of each module in learning a more human-relatable painting style.

Progressive Layering

The human painting process is often progressive and multi-layered. That is, instead of painting everything on the canvas at once, humans often first paint a basic background layer before progressively adding each of the foreground objects on top of it (refer Fig. 1). However, such a strategy is hard to learn using previous works which directly minimize the pixel wise distance the generated canvas and the target image.

As shown in Fig. 4 below, we propose a progressive layering module, which much like a human artist, allows the painted canvas to evolve in multiple successive layers.

Trulli Trulli
Trulli Trulli
Trulli Trulli
Trulli Trulli

Fig. 4 - Abalation analysis on role of progressive layering. (Left,Right) Painting sequences w/o and with use of progressive layering. Instead of starting to paint everthing on the canvas at once (left), the progressive layering approach (right) allows the painting agent to draw a given canvas in multiple layers (e.g. painting a realistic background layer before progressively adding foreground objects in above).

Sequential Brushstroke Guidance

Once the background layer has been painted, the sequential brush-stroke guidance strategy helps our method to add different foreground features in a semantic-aware manner.

Trulli Trulli
Trulli Trulli

Fig. 5 - Abalation analysis on role of sequential guidance. (Left,Right) Painting sequences w/o and with use of sequential guidance after painting the background layer. Instead of adding brushstrokes randomly all over the canvas (left), our approach (right) constrains the painting agent to focus / refine different image areas in a coarse to fine semantic-aware manner.

Brushstroke Regularization

The current works on autonomous painting are often limited to using (an almost) fixed brush stroke budget irrespective of the complexity of the target image. Experiments reveal that this not reduces the efficiency of the generated painting sequence but also results in redundant / overlapping brushstroke patterns (refer Fig. 4) which impart an unnatural painting style to the final agent. As shown in Fig. 6 below, we propose a brushstroke regularization formulation which removes the above painting redundancies, thereby considerably improving both painting efficiency and human-relatability of our approach.


Trulli Trulli

Fig. 6 - Extreme ablation analysis on role of brushstroke regularization. (Left, Right) Painting sequences before and after brushstroke regularization. The example on the left represents an extreme illustration of overlapping patterns occurring as a result of using a predetermined brushstroke budget (1000 brushstrokes) irrespective of the complexity of the target image (e.g. notice the unnatural way in which brushstrokes combine to form the final image). On right, we see how our brushstroke regularization formulation removes these rendundancies, which allows our method to paint a vivid scene in less than 100 brush strokes while exhibiting a more natural painting style.

Painting Efficiency

Trulli

Fig. 7 - Qualitative method comparison w.r.t. painting efficiency (Left) Comparing final canvas outputs while using ~ 300 brushstrokes for (b) Ours, (c) Paint Transformer, (d) Optim, (e) RL and (f) Semantic-RL. We observe that our approach results in more accurate depiction of the fine-grain features in the target image while using a low brushstroke count.

Resemblance with Human Painting Style

Trulli

Fig. 8 - Qualitative method comparison w.r.t resemblance with the human painting style. We compare different methods (b-f). All painting sequences are generated using a different brushstroke count (indicated in the boxes), so as to ensure similar pixel-wise reconstruction loss with the target image. The corresponding frames for each sequence are computed after ~10%, 40%, 60% and 100% of the overall painting episode. We observe that our method offers higher resemblance with the human painting style (column-a) as compared to previous works.

Conclusion

In this paper, we emphasize that the practical merits of an autonomous painting system should be evaluated not only by the quality of generated canvas but also by the interpretability of the corresponding painting sequence by actual human artists. To this end, we propose a novel Intelli-Paint pipeline which uses progressive layering to allow for a more human-like evolution of the painted canvas. The painting agent focuses on different image areas through a sequence of coarse-to-fine localized attention windows and is able to paint detailed scenes while using a limited number of brushstrokes. Experiments reveal that in comparison with previous state-of-the-art methods, our approach not only shows improved painting efficiency but also exhibits a painting style which is much more relatable to actual human users. We hope our work opens new avenues for the further development of interactive and robotic painting applications in the real world.


BibTeX

If you find our work useful in your research, please cite the following works:
@inproceedings{singh2022intelli,
      title={Intelli-Paint: Towards Developing More Human-Intelligible Painting Agents},
      author={Singh, Jaskirat and Smith, Cameron and Echevarria, Jose and Zheng, Liang},
      booktitle={European Conference on Computer Vision},
      pages={685--701},
      year={2022},
      organization={Springer}
    }