The final time OpenAI’s ChatGPT launched a picture technology mannequin, it rapidly went viral throughout the web. Individuals have been captivated by the power to create Ghibli-style portraits of themselves, turning private recollections into animated paintings. Now, ChatGPT is taking issues a step additional with a brand new natively multimodal mannequin “gpt-image-1” which powers picture technology immediately inside ChatGPT and is now accessible through API. On this article we are going to discover the important thing options of OpenAI’s gpt-image-1 mannequin and the way to use it for picture technology and modifying.
What’s gpt-image-1?
gpt-image-1 is the most recent and most superior multimodal language mannequin from OpenAI. It stands out for its capability to generate high-quality photographs whereas incorporating real-world information into the visible content material. Whereas gpt-image-1 is really helpful for its sturdy efficiency, the picture API additionally helps different specialised fashions like DALL·E 2 and DALL·E 3.

The Picture API presents three key endpoints, every designed for particular duties:
- Generations: Create photographs from scratch utilizing a textual content immediate.
- Edits: Modify present photographs utilizing a brand new immediate, both partially or fully.
- Variations: Generate variations of an present picture (accessible with DALL·E 2 solely).

Additionally Learn: Imagen 3 vs DALL-E 3: Which is the Higher Mannequin for Photos?
Key Options of gpt-image-1
gpt-image-1 presents a number of key options:
- Excessive-fidelity photographs: Produces detailed and correct visuals.
- Numerous visible kinds: Helps a variety of aesthetics, from photograph sensible to summary.
- Exact picture modifying: Permits focused modifications to generated photographs.
- Wealthy world information: Understands advanced prompts with contextual accuracy.
- Constant textual content rendering: Renders textual content inside photographs reliably.
Availability
The OpenAI API allows customers to generate and edit photographs from textual content prompts utilizing the GPT Picture or DALL·E fashions. At current, picture technology is accessible solely by the Picture API, although assist for the Responses API is actively being developed.
To learn extra about gpt-image-1 click on right here.
gpt-image-1 Pricing
Earlier than diving into the way to use and deploy the mannequin, it’s vital to grasp the pricing to make sure its efficient and budget-conscious utilization.
The gpt-image-1 mannequin is priced per token, with totally different charges for textual content and picture tokens:
- Textual content enter tokens (prompts): $5 per 1M tokens
- Picture enter tokens (uploaded photographs): $10 per 1M tokens
- Picture output tokens (generated photographs): $40 per 1M tokens
In sensible phrases, this roughly equates to:
- ~$0.02 for a low-quality sq. picture
- ~$0.07 for a medium-quality sq. picture
- ~$0.19 for a high-quality sq. picture
For extra detailed pricing by picture high quality and determination, seek advice from the official pricing web page right here.

Word: This mannequin generates photographs by first creating specialised picture tokens. Due to this fact, each latency and total value rely upon the variety of tokens used. Bigger picture dimensions and better high quality settings require extra tokens, growing each time and value.
The right way to Entry gpt-image-1?
To generate the API key for gpt-image-1:
- Register to the OpenAI platform
- Go to Mission > API Keys
- Confirm your account
For this, first, go to: https://platform.openai.com/settings/group/normal. Then, click on on “Confirm Group” to begin the verification course of. It’s quire much like any KYC verification, the place relying on the nation, you’ll be requested to add a photograph ID, after which confirm it with a selfie.
You could observe this documentation supplied by Open AI to raised perceive the verification course of.
Additionally Learn: The right way to Use DALL-E 3 API for Picture Era?
gpt-image-1: Palms-on Software
Lastly it’s time to see how we will generate photographs utilizing the gpt-image-1 API.
We shall be utilizing the picture technology endpoint to create photographs primarily based on textual content prompts. By default, the API returns a single picture, however we will set the n parameter to generate a number of photographs without delay in a single request.
Earlier than working our essential code, we have to first run the code for set up and establishing the atmosphere.
!pip set up openai
import os
os.environ['OPENAI_API_KEY'] = "<your-openai-api-key>"
Producing Photos Utilizing gpt-image-1
Now, let’s attempt producing a picture utilizing this new mannequin.
Enter Code:
from openai import OpenAI
import base64
consumer = OpenAI()
immediate = """
A serene, peaceable park scene the place people and pleasant robots are having fun with the
day collectively - some are strolling, others are enjoying video games or sitting on benches
beneath bushes. The ambiance is heat and harmonious, with delicate daylight filtering
by the leaves.
"""
outcome = consumer.photographs.generate(
mannequin="gpt-image-1",
immediate=immediate
)
image_base64 = outcome.information[0].b64_json
image_bytes = base64.b64decode(image_base64)
# Save the picture to a file
with open("utter_bliss.png", "wb") as f:
f.write(image_bytes)
Output:

Modifying Photos Utilizing gpt-image-1
gpt-image-1 presents a variety of picture modifying choices. The picture edits endpoint lets us:
- Edit present photographs
- Generate new photographs utilizing different photographs as a reference
- Edit components of a picture by importing a picture and masks indicating which areas ought to be changed (a course of often known as inpainting)
Modifying an Picture Utilizing a Masks
Let’s attempt modifying a picture utilizing a masks. We’ll add a picture and supply a masks to specify which components of it ought to be edited.

The clear areas of the masks shall be changed primarily based on the immediate, whereas the colored areas will stay unchanged.
Now, let me ask the mannequin so as to add Elon Musk to my uploaded picture.
Enter Code:
from openai import OpenAI
consumer = OpenAI()
outcome = consumer.photographs.edit(
mannequin="gpt-image-1",
picture=open("/content material/analytics_vidhya_1024.png", "rb"),
masks=open("/content material/mask_alpha_1024.png", "rb"),
immediate="Elon Musk standing in entrance of Firm Emblem"
)
image_base64 = outcome.information[0].b64_json
image_bytes = base64.b64decode(image_base64)
# Save the picture to a file
with open("Elon_AV.png", "wb") as f:
f.write(image_bytes)
Output:

Factors to notice whereas modifying a picture utilizing gpt-image-1:
- The picture you need to edit and the corresponding masks have to be in the identical format and dimensions, and every ought to be lower than 25MB in dimension.
- The immediate you give can be utilized to explain your entire new picture, not simply the portion being edited.
- When you provide a number of enter photographs, the masks shall be utilized solely to the primary picture.
- The masks picture should embody an alpha channel. When you’re utilizing a picture modifying software to create the masks, be sure that it’s saved with an alpha channel enabled.
- In case you have a black-and-white picture, you should use a program so as to add an alpha channel and convert it into a legitimate masks as supplied beneath:
from PIL import Picture
from io import BytesIO
# 1. Load your black & white masks as a grayscale picture
masks = Picture.open("/content material/analytics_vidhya_masked.jpeg").convert("L")
# 2. Convert it to RGBA so it has house for an alpha channel
mask_rgba = masks.convert("RGBA")
# 3. Then use the masks itself to fill that alpha channel
mask_rgba.putalpha(masks)
# 4. Convert the masks into bytes
buf = BytesIO()
mask_rgba.save(buf, format="PNG")
mask_bytes = buf.getvalue()
# 5. Save the ensuing file
img_path_mask_alpha = "mask_alpha.png"
with open(img_path_mask_alpha, "wb") as f:
f.write(mask_bytes)
Greatest Practices for Utilizing the Mannequin
Listed below are some ideas and finest practices to observe whereas utilizing gpt-image-1 for producing or modifying photographs.
- You’ll be able to customise how your picture seems by setting choices like dimension, high quality, file format, compression degree, and whether or not the background is clear or not. These settings enable you to management the ultimate output to match your particular wants.
- For sooner outcomes, go together with sq. photographs (1024×1024) and normal high quality. You too can select portrait (1536×1024) or panorama (1024×1536) codecs. High quality will be set to low, medium, or excessive, and each dimension and high quality default to auto if not specified.
- Word that the Picture API returns the base64-encoded picture information. The default format is png, however we will additionally request it in jpeg or webp.
- In case you are utilizing jpeg or webp, then you can even specify the output_compression parameter to manage the compression degree (0-100%). For instance, output_compression=50 will compress the picture by 50%.
Purposes of gpt-image-1
From inventive designing and e-commerce to schooling, enterprise software program, and gaming, gpt-image-1 has a variety of functions.
- Gaming: content material creation, sprite masks, dynamic backgrounds, character technology, idea artwork
- Inventive Instruments: paintings technology, fashion switch, design prototyping, visible storytelling
- Schooling: visible aids, historic recreations, interactive studying content material, idea visualization
- Enterprise Software program: slide visuals, report illustrations, data-to-image technology, branding property
- Promoting & Advertising and marketing: marketing campaign visuals, social media graphics, localized content material creation
- Healthcare: medical illustration, affected person scan visuals, artificial picture information for mannequin coaching
- Structure & Actual Property: inside mockups, exterior renderings, structure previews, renovation concepts
- Leisure & Media: scene ideas, promotional materials, digital doubles
Limitations of gpt-image-1
The GPT-4o Picture mannequin is a strong and versatile software for picture technology, however there are nonetheless a number of limitations to remember:
- Latency: Extra advanced prompts can take as much as 2 minutes to course of.
- Textual content Rendering: Whereas considerably higher than the DALL·E fashions, the mannequin should face challenges with exact textual content alignment and readability.
- Consistency: Though it will probably generate visually constant photographs, the mannequin could sometimes wrestle to take care of uniformity for recurring characters or model parts throughout a number of photographs.
- Composition Management: Even with improved instruction-following capabilities, the mannequin could not at all times place parts precisely in structured or layout-sensitive designs.
Mannequin Comparability
Right here’s how OpenAI’s gpt-image-1 compares with the favored DALL·E fashions:
Mannequin | Endpoints | Options |
DALL·E 2 | Generations, Edits, Variations | Decrease value, helps concurrent requests, contains inpainting functionality |
DALL·E 3 | Generations solely | Increased decision and higher picture high quality than DALL·E 2 |
gpt-image-1 | Generations, Edits (Responses API coming quickly) | Glorious instruction-following, detailed edits, real-world consciousness |
Conclusion
OpenAI’s gpt-image-1 showcases highly effective picture technology capabilities with assist for creation, modifying, and variations all coming from easy textual prompts. Whereas the technology of photographs could take a while, the standard and management it presents make it extremely sensible and rewarding total.
Picture technology fashions like this facilitate sooner content material creation, personalization, and sooner prototyping. With built-in customization choices for dimension, high quality, format, and many others. and even inpainting capabilities, gpt-image-1 presents builders full and clear management over the specified output.
Whereas some may fear that this expertise might substitute human creativity, it’s vital to notice that such instruments goal to reinforce human creativity and be useful instruments for artists. Whereas we should always undoubtedly respect originality, we should additionally embrace the comfort that this expertise brings. We should discover the correct steadiness the place such instruments assist us innovate with out taking away the worth of genuine, human-made work.
Login to proceed studying and revel in expert-curated content material.