Get The Latest Updates From The Frontlines of a DTC Operator

Thanks you're all signed up!
Something went wrong! Try again

✅ No spam.
✅ No 'bUy My CoUrSe'
✅ Unsubscribe whenever.

AI Image Generation For Ecommerce Product Photography

AI is moving a mile a minute. But so far, I haven't seen many people applying to the e-commerce/DTC world. I had a stab at combining Dreambooth and Stable Diffusion over the last week to create AI-generated product photography images for one of my SKUs. The results were surprisingly decent. And of course, since the underlying tech is all open-source, you can replicate this for your brand if you want.

Headshot of Greg Aubert

Nov 7, 2022

Jan 30, 2022

·

4

 min read

AI Image Generation For Ecommerce Product Photography

What Did The Results Look Like?

Let's start at the end so you can just see what this thing can do.
I used some existing product photos I had. They were roughly cropped but I was a bit lazy as a few of the images have other elements in there.

Images In

These were the input images (displayed as a GIF here for your viewing convenience)

Images Out

And here's an example of the final result of asking the AI to kindly display my product with a forest background:

Final result

To me, this is pretty good. It passes the bar of believability and certainly meets (or probably beats) the standard of what I could achieve myself on the weekend with an iPhone or camera.

Immediately, I'm thinking of all the time and money saved from having to organise photoshoots on-location.
I'll share these ideas and more interesting AI image results further below.

But first, I'll do a quick background on how it's done.

How It Works

Behind the scenes, it's a combination of Stable Diffusion and Dreambooth.

  • Stable Diffusion - billionaire Emad Mostaque opened Pandora's Box on 22nd August 2022 by releasing a truly open-source text-to-image AI model. Previous models such as DALL-E were gated to some degree.
  • Dreambooth - around October 2022, Google released their model which allows you to specify a subject which you can then generate images of.

Why is Dreambooth needed?
Remember, the goal was to generate ecommerce product photos of my chosen SKU.
With Stable Diffusion, it's text-to-image only. This means I can only use words to prompt it. This could probably work for Coca Cola but sadly there isn't enough image data of my particular product in their training sets, so the AI will have no idea what I'm talking about.

Dreambooth bridges that gap.
It allows me to 'teach' it what a word means.
I can upload a bunch of my own images and assign them to a keyword. So now the AI knows what it means and can happily generate away with that as the subject.

After setting everything up you can then run prompts like below.
'coffeetube' is my keyword, which it now understands to mean my Mushroom Coffee product.

Overcoming Some Snags

Sadly, the AI is not perfect (yet).
One thing it struggles with is text and logos on packaging. I hoped this would be ok for it since I didn't need it to change it but rather keep it exactly as is it. But apparently not.
Here's what the raw initial output is like.

Not so good on the packaging text

It kind of got it shape-wise, but it's clearly mangled.
Unfortunately, 'kind-of' isn't good enough for logos and writing on packaging.

However, on the plus side it did very well with the complexity of the shadows, the indentation of the sand and the impracticalities of bringing a french press to the beach.

So while instant perfection would have been nice, it's still good. Because out of the two, fixing text/logos is the easier job to do manually. It can be done using the original label vector files and a high-res real photo.

In contrast, manually creating the shadows, lighting changes and sand patterns is a lot more work to achieve a photo-realistic result.
Here's the same image post-edit:

All fixed :)

Pushing Things Further

Ok let's have some fun and see what this thing can do.

First, a warm-up: "photo of a coffeetube in a kitchen"
This is a highly probable photo that I'd want, showing the product in the environment where it's used.

Nailed it - I'm happy.

Next up, how about some images I couldn't realistically get?

Prompt: "photo of a coffeetube on train tracks"

Granted, the shape is a little off. But if I spent more time generating variations I could have improved this since the AI has already shown it's capable of getting this right.

The cool thing is that it got the train tracks part of it spot-on.

Train-related jeopardy isn't particularly part of the brand identity. But if it were, I can only imagine the increase in content production (and decrease in my employer's liability insurance premiums).

Next up, I tried putting in a prompt with a famous landmark.
The idea behind this was:

  • These are hard photos to get in real life
  • The AI models should have tonnes of images of it

So here we go: "photo of a coffeetube next to the Eiffel Tower"

Damn, nice.

I really like this photo. Not least because it's so much more of a pain in the arse to get in real life versus the kitchen one for example.

This prompted an around-the-world tour of landmarks from the comfort of my keyboard.

Egyptian pyramids
Iguazu Falls - probably my favourite image out of the whole project
Machu Picchu
Great Wall Of China

The final one was interesting as it managed to render a pretty convincing hand in a typical influencer-style product pose.
Annoyingly though, the product looks too small relative to the hand.
(And there is for some reason an enormous pile of Mushroom Coffee at the photographer's feet).

Limitations Today

I found that the AI models were strongest at generating images of the product in various background environments (forest, beach, kitchen etc.) or next to landmarks.

I struggled to get it to render:

  • people with the product, especially with faces
  • the main subject (i.e. coffeetube) interacting with a second subject.
    E.g. "photo of a dog playing with a coffeetube"

How DTCs Can Use This Today

Despite being only a matter of weeks old, I believe this tech is ready for real use cases today.

1. MVPs for Paid Acquisition

Armed with this, Performance Marketers can spin up new paid ads tests without pestering creative teams for new product shoots.
You can have a completely new idea and feasibly have ads and landing pages live the same day to test it - all with new creative.

2. Creative Production

Creative teams are usually stretched thin with requests for all kinds of images. While you wouldn't use this for your top-priority jobs (outdoor ads, print articles etc) there could be a place for it for the smaller ones. This means when you do run a photoshoot, you can focus on the main goals without the distraction of dozens of low-impact photo requests.

3. New Product Development Concepting

On the manufacturing side, you can quickly create variations of your product and send them to your suppliers to make samples.