So far, we have been using AI services like ChatGPT and Midjourney to help us with content creation. But there is still a lot of manual work involved in adding new coloring pages:
- Create and download an image via Midjourney
- Ask ChatGPT to create a description (& an optional title)
- Manually register the new image in the database file
We don’t mind the work involved using Midjourney. It’s OK to review the generated images, maybe refine them and then select the version we like most. But the rest of the work is quite cumbersome and it would be great to automate it.
The envisioned target process should look as follows:
- Create and download image via Midjourney
- Run a script that automatically describes the coloring page and registers it in the database (best as part of the build pipeline)
The script would need to do something like:
- find newly added images
- describe the images
- generate a title
- register the image in the db.json file
For this we intend to use Hugging Face transformers, as they allow us to easily play around with many different pretrained models.
First look
As mentioned in the last blog post, I found a image to text model that seems to be working pretty well with our generated AI coloring pages. But how do we find and use these models in our code?
First of all, there are many models listed on Hugging Face. I personally find the search via tasks useful.
Many of the models can be tested right away online to get a first impression. That’s a really great feature I think, but some of the models that would be interesting like LLama-2 don’t support this unfortunately.
Many (or maybe all?) models provide a simple example on the model card to get started easily.
Before we can use it, we need to prepare a Python environment as described here: Installation (huggingface.co)
Run the demo
We are using gitpod.io for developing www.usmalbilder.ch. The base image seems to be well equipped for our purpose.
We just created the environment following the documentation and copied over the example code and ran it:
That worked! This is the demo image used:
and the model outputs: a photography of a woman and her dog on the beach
Not sure about the cell phone though 😉
Conclusion
We have discussed how we would like to use AI models to automate some of our manual work and ran an example using Hugging Face transformers. I think being able to integrate AI models that easily into applications and workflows is really great and should further accelerate the adoption of AI.