What Anime CLIP Sees

CLIP is a neural network that is able to score the similarity between a text phrase and an image. If the image is generated by another neural network (or any differentiable program), we can differentiate the scoring function with respect to the parameters of that network. For a particular set of parameters we not only get a particular score, but a direction. This direction shows how to change those parameters such that the generated image is more like the target phrase.

The following images are generated using Aydao's This Anime Does Not Exist StyleGAN2 model in nagolinc's colab. The phrase fed into clip takes the form "This is X."

The results here show that CLIP is able to express semantic concepts in this domain. "This is isolation" yields a man alone, his face covered. With "This is the world going insane." the faces are laughing mad. "This is a hero." captures dynamic figures, flashy yellows and reds.

Click an image to get a larger gallery view.

This is isolation.

This is you dying.

This is the world going insane.

This is boundaryless, infinite void.

This is the return of the heroic figure.

This is a hero.

This is the transcendence of mortality.