Stable Diffusion has recently gained a lot of attention in the AI and computer vision community. It is a diffusion model trained on text prompts and images, able to generate images that somehow resemble the given textual prompt. The implementation and trained weights of the model have been made publicly available, in contrast to e.g. Google’s Imagen.
stable_diffusion (this link opens in a new window) by sremes (this link opens in a new window)
A simple Stable Diffusion demo
I’ve put together a small demo of the model using the following Python libraries: gradio, transformers and diffusers. Gradio provides an easy way to setup simple web interfaces to test machine learning models. I’m also using a machine translation model from huggingface to translate Finnish prompts to English, which is the language expected by the Stable Diffusion model.
While setting up this demo, I also needed to figure out the ways to install CUDA in WSL2/Ubuntu on Windows, as well as setting port forwarding and firewall rules to enable access to the gradio app running within WSL2 from the local network.