Generate Images Directly in ChatGPT with GPT-4o

Words Matter: OpenAI’s New Image Generator Delivers Accurate Text

The official release of “Images in ChatGPT” by OpenAI introduces a transformative feature for image production capabilities directly within the ChatGPT platform. The innovative GPT-4o model enables users to generate images directly in their chat conversations, which represents a major breakthrough in AI content creation.

Enhanced Image Generation Capabilities and User Accessibility

The “Images in ChatGPT” feature functions on all ChatGPT subscription levels, including Plus, Pro, Team, and free, to expand user access to advanced image creation capabilities. Taya Christianson from OpenAI revealed free tier users face DALL-E 3-like image creation limits of about three per day, but noted these restrictions could change depending on demand. Users passionate about DALL-E will maintain access through their own specialized custom GPT.

The research lead at OpenAI, Gabriel Goh, highlighted GPT-4o’s revolutionary capacity as an “omnimodal” model that can process various data forms, including text, images, audio, and video. The model now boasts improved binding abilities, which resolve a longstanding problem in AI-generated imagery. GPT-4o stands apart from past models by accurately processing 15 to 20 objects without causing color or shape misinterpretation.

The system demonstrates exceptional text rendering capabilities. AI-created images typically display text that is either malformed or meaningless. Goh explained that their development required multiple months of iterative progress to achieve correctness. Even though perfect text rendering continues to be difficult, especially with small text sizes, the team has reached a consistent standard that makes text in images reliably functional.

The system’s design moves away from typical diffusion models found in image generation and instead uses an autoregressive method. The autoregressive method, which produces images from left to right and top to bottom like text generation, appears to enhance its text rendering and binding performance.

At their presentation, OpenAI demonstrated various uses for their system, such as creating detailed scientific diagrams like Newton’s prism experiment, together with accurate labeling, as well as producing multi-panel comics with consistent characters and dialogue, and crafting informational posters that feature precise text. The demonstration included practical applications where transparent background images were created for stickers, restaurant menus, and logos.

ChatGPT’s multimodal product lead, Jackie Shannon, highlighted how the system utilizes its world knowledge capabilities. When I start drawing an image, I work within my personal artistic limits, but I also utilize my extensive accumulated world knowledge. The model incorporates world knowledge during image generation, which allows users to request an image of Newton’s prism experiment without providing an explanation of the experiment.

According to OpenAI, the improved quality and capabilities of their system compensate for the longer image generation time. According to Shannon, despite needing work on latency improvements, the quality of images, together with their capabilities and world knowledge, compensates for the extra wait time users experience.

Addressing Misuse and Ensuring Responsible AI Deployment

OpenAI highlighted its strong protective measures while responding to worries about possible misuse. The system’s design includes measures to prevent watermark removal as well as block sexual deepfake production and reject requests for CSAM content. OpenAI produced images without visual watermarks but attached standard C2PA metadata to identify them as creations by OpenAI. The organization operates proprietary tools that allow for internal verification of images.

According to Shannon, each system has limitations, but our safeguards keep evolving as this serves as our initial approach. Users retain ownership of images created with ChatGPT and have full discretion to employ them according to OpenAI’s terms of use.

OpenAI expands ChatGPT capabilities and explores new realms of AI-powered creativity through “Images in ChatGPT,” while delivering a robust visual expression tool to users within the chat interface.