OpenAI brings new AI image generation directly to GPT-4o

OpenAI unveils advanced image generation in ChatGPT and Sora

Mar 27, 2025

OpenAI has introduced powerful new AI image-generation capabilities into ChatGPT and Sora, powered by its cutting-edge GPT-4o model.

This upgrade enables users to generate and edit images directly within the chat interface, eliminating the need for external tools like DALL-E…

…or Midjourney. 😬

With GPT-4o’s multimodal capabilities, users can now:

Create new images from scratch using natural language prompts.
Edit existing images, including those featuring people, while maintaining realism and scene complexity, again, with natural language prompts.
Upload and modify files within ChatGPT, streamlining the image creation process.

It may not sound like much, but it is. This is big.

The Age of Multimodal AI Tools is here.

First it was Gemini 2.0 Flash. Now OpenAI.

Chatting with your Image Generator (and soon, Video Generators) will be a basic requirement from now on.

When was the last time you had a chat with Midjourney?

Here's what you need to know:

Users can generate/edit images using natural language or uploaded files.
GPT-4o understands what you ask from it, leveraging its vast knowledge to understand context better, leading to better and more relevant high-quality visuals.
Its multimodal capabilities allow it to generate more accurate text rendering and better integration of images within conversations.
As it understands text and images, it can generate structured visuals like menus, diagrams, and infographics with readable text, addressing a key limitation of previous models.
Image editing has been improved, allowing modifications to existing images while accurately handling complex scenes with up to 20 distinct objects. It can see the image, analyze it, and understand your request.
The updated image-generation abilities are now available to ChatGPT Free, Plus, Team, and Pro users.

So, it is good?

Let’s see some examples. Consider the following: Each one of these requires an understanding of the context, something previous image generators can’t do.

1. Studio Ghibli-style memes

Users quickly discovered the tool can transform regular photos into artwork mimicking the whimsical, hand-drawn aesthetic of Studio Ghibli films. People jumped on this, turning everything from selfies to memes into Ghibli-esque scenes, and it’s taken off across social media, especially on platforms like X.

2. Fictional Movie Posters

@thekitze shared this poster on X.

3. Web Screenshots

4. Extract assets from images

5. Voxel Art

6. Comic Stories

7. UI Designs

8. Marketing Ads

@Salmaaboukarr shared on X, using the following prompt: Create BIC creative ad, add the pen as hair accessory used to tie models hair up. compelling messaging

So, what do you think? Is it good? is it hype?

Michael Banner

Mar 28

Can you do things like character and style references in ChatGPT? This is one thing that makes me use Midjourney over DALL-E based images as I can keep consistency very high. Even asking GPT to tweak an image (in past experience) results in other changes that I never asked for.

Expand full comment

1 reply by Erik Knöbl | AI Video Creator

1 more comment...