This is great. Those examples are not the best quality, but they're impressive.
That prompted me to generate ambigrams with stable diffusion. The results looked odd, as ambigrams tend to, but the "text" was largely illegible. I wonder when the state of the art will be able to handle that request.
It's odd that image AIs are not ready to overlay text. If you ask Dall-E or Midjourney also to say a few letters they do lots of nearest random neighbors by not just scrambling the idea of the word but also scribbling anything on top that it thinks looks remotely like writing but is not in any language. Maybe it's still developing the ability to read and maybe secretly creating a completely new script and lang.
It's a side effect of the way the text input is represented before being used by the model. It doesn't get the text as a sequence of chars but as a sequence of tokens.
This paper [1] shows that giving character-level awareness to the model can improve the "visual spelling".
Since the technical nugget is hidden in the code, the fun trick here is to alternate on odd and even steps between moving toward a duck in the latent space image and moving toward a rabbit in the 90-degree rotated version of the latent space image.
(Normally you would feed the output of step n right back in as input to step n+1. That’s what is not happening as usual here.)
ambigrams are cool you can rotate the term you want to ambigram-ize and write it underneath the word and gently merge them together, "column by column" it's nice you ask because I recently saw a "youtube short" about the name Klint that nicely depicts the idea (volume warning) https://www.youtube.com/shorts/3I6rkpAQXmI
I've generally been disappointed by my prompts for optical illusions. I thought it would be better at it. An optical illusion is basically what happens when you relax the constraints on a graphical depiction, allowing objects to be connected in ways that are inconsistent with 3d geometry. The trick is that the inconsistency has to be global not local. Anywhere you zoom in on still looks like normal 3d space. I expected SD to be good at this, as a priori it never had a conception of how 3d space must look to begin with.
Here's where it sucked. It seems to have learned the superficial aesthetic of an optical illusion or of "Escher" without learning the relevant component. It spits out things that either aren't optical illusions, or are just random disconnected spattering of geometrical inconsistencies without any overarching theme. A person made optical illusion will generally have a single main loop of impossibly connected objects, or at least some simple overall topology. The illusion is expected to exist on the global scale of the image, not as a weird pocket of a mostly normal image.