"ChatGPT Image Generator Simplifies Photo Manipulation for All"

Revolutionizing Image Manipulation: OpenAI’s GPT Image 1.5

For most of photography’s roughly 200-year history, altering a photo convincingly required either a darkroom, some Photoshop expertise, or, at minimum, a steady hand with scissors and glue. However, on Tuesday, OpenAI released a groundbreaking tool that simplifies this process to just typing a sentence.

Contents

Revolutionizing Image Manipulation: OpenAI’s GPT Image 1.5 A Competitive Landscape Key Features of GPT Image 1.5 The Multimodality of the Model Enhanced Image Editing Capabilities

A Competitive Landscape

This innovation isn’t entirely unexpected; OpenAI had been developing a conversational image-editing model since the introduction of GPT-4o in 2024. Google, however, launched a public prototype earlier in March, later refining it into a popular model known as Nano Banana, along with a premium variant called Nano Banana Pro. The positive reception of Google’s offerings in the AI community captured OpenAI’s attention and spurred them to accelerate their own developments.

-28

Headphones

Skullcandy Grom Wireless Headphones: Kid-Safe & Comfortable!

Buy Now

-25

Computer & Accessories

UGREEN Revodok 105: Ultimate 5-in-1 USB-C Hub for All Devices!

Buy Now

-36

Headphones

Soundcore Q20i: Ultimate Noise-Cancelling Headphones!

Buy Now

-27

Computer & Accessories

Unlock Dual Displays: 14″ FHD Laptop Screen Extender!

Buy Now

Key Features of GPT Image 1.5

OpenAI’s new GPT Image 1.5 is an advanced AI image synthesis model that boasts impressive capabilities. It reportedly generates images up to four times faster than its predecessor while also being approximately 20 percent cheaper through the API. This model, rolled out to all ChatGPT users, represents a significant leap toward making photorealistic image manipulation an accessible activity that requires little to no visual skills.

The “Galactic Queen of the Universe” added to a photo of a room with a sofa using GPT Image 1.5 in ChatGPT.

The Multimodality of the Model

What sets GPT Image 1.5 apart is its designation as a “native multimodal” image model. This means that image generation occurs within the same neural network that processes language prompts. In contrast, DALL-E 3—a previous OpenAI image generator built into ChatGPT—utilized a different technique called diffusion for image creation. The latest model treats images and text as similar forms of data, or “tokens,” enabling a more integrated processing approach.

Enhanced Image Editing Capabilities

With this multimodal framework, GPT Image 1.5 excels at altering visual reality far more effectively than previous models. Users can change aspects like a subject’s pose, position, and even the scene’s perspective. Furthermore, it can remove objects from images, modify visual styles, adjust clothing, and refine specific areas while maintaining facial likeness throughout multiple edits. This interactive experience allows users to converse with the AI, refining and adjusting images in a manner akin to editing a draft email in ChatGPT.

This advancement signifies a major milestone in user-friendly image manipulation technologies, democratizing the ability to create and alter images in remarkable ways.

For more information, visit the full article Here.

Image Credit: arstechnica.com