ERNIE-Image img2img Complete Guide: Professional Workflows from Sketch to Refinement
Published: 2026-05-06
Author: Yan Ming
Tags: img2img, Image-to-Image, ComfyUI, Denoise, ERNIE-Image
Introduction
ERNIE-Image, Baidu's open-source 8B text-to-image model, is best known for its text-to-image (text2img) generation capabilities. However, the official ERNIE-Image collection on HuggingFace explicitly lists support for both "text2img and img2img" modes.
img2img (image-to-image) is one of the most practical features in AI image generation: you provide a reference image, and the model generates a new image based on the original's content and structure, guided by your text prompt. From sketch rendering to style transfer, from photo restoration to concept design — img2img opens endless possibilities.
This article provides an in-depth guide to ERNIE-Image's img2img capabilities, covering both ComfyUI and diffusers deployment methods, plus essential denoise parameter techniques.
What is img2img? The Difference from text2img
Core Concepts
text2img (text-to-image): Starts from scratch, generating a completely new image from a text description. Denoise = 1.0 — the model has full creative freedom.
img2img (image-to-image): Starts from an existing image, transforming it based on a text description. Denoise < 1.0 — the model recreates the image while preserving parts of the original.
Workflow Comparison
text2img:
Prompt → Encoding → Empty Latent Space → Diffusion Sampling → Decoding → Output Image
img2img:
Prompt + Input Image → Encoding → VAE Encode Input → Add Noise → Diffusion Sampling → Decoding → Output Image
The key to img2img is the denoise parameter — it controls how much of the original image is preserved:
- denoise = 1.0: Equivalent to text2img — original image information is completely overwritten
- denoise = 0.8: Major changes — preserves basic composition
- denoise = 0.5: Moderate changes — retains many original details
- denoise = 0.2: Subtle refinements — color grading, minor style tweaks
Denoise Parameter Deep Dive
The denoise parameter is the heart of img2img, determining the balance between "new" and "old." Here's what different values produce:
Denoise Value Quick Reference
| Denoise | Change Level | Use Cases | Recommended Steps |
|---|---|---|---|
| 0.1-0.2 | Minimal | Color tweaks, brightness adjustment | 8-15 |
| 0.2-0.3 | Subtle | Style fine-tuning, denoising | 10-20 |
| 0.3-0.5 | Moderate | Style transfer, material changes | 15-30 |
| 0.5-0.7 | Significant | Scene modification, outfit changes | 20-40 |
| 0.7-0.85 | Major | Sketch rendering, concept design | 30-50 |
| 0.85-0.95 | Almost new | Fresh creation preserving composition | 40-50 |
| 1.0 | Completely new | Equivalent to text2img | 8(Turbo)/50(Standard) |
Practical Examples
Example 1: Photo → Anime Style (denoise = 0.4)
Input: Real person photograph
Prompt: "a cute anime girl, detailed eyes, chibi style, pastel colors"
Denoise: 0.4
Result: Preserves facial contours and expressions, converts to anime art style.
Example 2: Sketch → Detailed Render (denoise = 0.75)
Input: Hand-drawn architectural sketch
Prompt: "modern glass office building, sunset lighting, photorealistic, architectural photography"
Denoise: 0.75
Result: Preserves building outline and structure, generates photorealistic rendering.
Example 3: Old Photo Restoration (denoise = 0.15)
Input: Blurry vintage photograph
Prompt: "clear portrait, high resolution, sharp details, warm lighting"
Denoise: 0.15
Result: Enhances clarity and color while preserving original content.
ComfyUI img2img Workflow Setup
Core Nodes
The ERNIE-Image img2img workflow builds on the text2img workflow with these additions/modifications:
- Load Image — Load the reference image
- VAE Encode — Encode input image to latent space representation
- KSampler — Key node, set denoise < 1.0
Step-by-Step Setup
Step 1: Load Model and Encoders
Load Diffusion Model → ERNIE-Image-Turbo.safetensors
Checkpoint Loader → T5-XXL + VAE
Step 2: Load Input Image
Load Image → Select your reference image
Step 3: VAE Encode Input
VAE Encode → Connect Load Image and VAE nodes
This converts the pixel-space image to latent space representation — critical for img2img.
Step 4: Prompt Encoding
CLIP Text Encode (Positive) → Your positive prompt
CLIP Text Encode (Negative) → Negative description (e.g., "blurry, low quality")
ERNIE-Image's Prompt Enhancer (PE) works in img2img mode too. Enable PE for complex style transfers, disable it for precise reproduction.
Step 5: KSampler Configuration
KSampler:
model → Load Diffusion Model output
positive → CLIP Text Encode (Positive)
negative → CLIP Text Encode (Negative)
latent_image → VAE Encode output (KEY: not empty latent!)
denoise → 0.4 (adjust based on needs)
steps → 8 for Turbo, 30-50 for Standard
cfg → 6.0
sampler_name → euler
scheduler → normal
Step 6: VAE Decode Output
VAE Decode → Connect KSampler output
Save Image → Save the result
Complete Node Connection Diagram
[Load Image] ──→ [VAE Encode] ──→ [KSampler.latent_image]
│
[CLIP Text Encode (+)] ──→ [KSampler.positive] │
[CLIP Text Encode (-)] ──→ [KSampler.negative] ──→ [VAE Decode]
[Load Diffusion Model] ──→ [KSampler.model] ──→ [Save Image]
Diffusers Python API
For programmatic workflows, use HuggingFace Diffusers:
Installation
pip install -U diffusers transformers accelerate
Code Example
import torch
from diffusers import DiffusionPipeline
from PIL import Image
Load model
pipe = DiffusionPipeline.from_pretrained(
"baidu/ERNIE-Image-Turbo",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
Load input image
input_image = Image.open("reference.jpg").convert("RGB")
input_image = input_image.resize((1024, 1024))
img2img generation
prompt = "a futuristic cityscape at night, neon lights, cyberpunk style"
output_image = pipe(
prompt=prompt,
image=input_image,
strength=0.6, # Equivalent to ComfyUI's denoise
num_inference_steps=8, # Turbo mode
guidance_scale=6.0
).images[0]
output_image.save("output.jpg")
Key Parameters:
- strength: Equivalent to denoise. Range: 0.0-1.0
- num_inference_steps: Turbo = 8 steps, Standard = 30-50 steps
- guidance_scale: CFG guidance strength, recommend 5.0-7.0
Practical Use Cases
1. Sketch to Render
Workflow: Hand-drawn sketch → ERNIE-Image img2img → Detailed rendering
Input: Pencil sketch of building/character/product
Prompt: "photorealistic [description], high detail, professional rendering"
Denoise: 0.7-0.8
Tip: The cleaner the sketch, the better the output. Use black-and-white line art — avoid colored distractions.
2. Style Transfer
Workflow: Original image → ERNIE-Image img2img → New style image
Input: Any image
Prompt: "[target style] style, [description]"
Denoise: 0.3-0.5
Example Prompts:
"oil painting style, Van Gogh inspired, thick brush strokes""watercolor illustration, soft colors, dreamy atmosphere""cyberpunk neon style, dark background, glowing elements"
3. Image Enhancement/Restoration
Workflow: Low-quality image → ERNIE-Image img2img → Enhanced HD image
Input: Blurry, low-resolution, or noisy image
Prompt: "high resolution, sharp details, professional photography, 4K quality"
Denoise: 0.1-0.3
Tip: Low denoise values preserve original content while improving quality.
4. Photo Stylization
Workflow: Regular photo → ERNIE-Image img2img → Artistic photo
Input: Ordinary photograph
Prompt: "[style] photograph, dramatic lighting, cinematic"
Denoise: 0.3-0.5
Recommended Styles:
"film photography, Kodak Portra 400""black and white portrait, dramatic shadows""double exposure, surreal composition"
5. Content Expansion (Outpainting Variant)
While ERNIE-Image has dedicated outpainting (see EI-017), img2img can achieve similar results:
Input: Cropped image (upscaled with blank areas)
Prompt: Maintain original scene description, add expansion direction details
Denoise: 0.5-0.7
PE (Prompt Enhancer) in img2img
ERNIE-Image's built-in Prompt Enhancer has unique behavior in img2img mode:
When to Enable PE
- Style transfer: PE enriches style descriptions for finer results
- Sketch rendering: PE adds material, lighting details
- Creative exploration: When you want the model to take creative liberties
When to Disable PE
- Precise reproduction: When strict prompt adherence is needed
- Image restoration: When minimal changes are desired
- Commercial projects: When predictable output matters
Testing Method
Generate 2 images with PE on and 2 with PE off for the same input and prompt. Compare and choose the best approach.
FAQ
Q: Can img2img change image dimensions?
Yes, but identical input/output resolution produces the best results. If you need different dimensions, resize the input image externally first.
Q: What's the best denoise value?
No single answer. Start at 0.4 and adjust:
- Not enough change → increase denoise
- Too much change → decrease denoise
- Adjust in 0.1 increments
Q: Why is img2img quality worse than text2img?
Common causes:
- Wrong denoise value: Too high loses the original, too low makes minimal changes
- Unclear prompt: img2img needs more precise descriptions
- Poor input quality: Blurry or low-resolution input degrades output
Q: Can I chain multiple img2img passes?
Absolutely! Multi-pass img2img is an advanced technique:
- First pass: denoise=0.5 — style transfer
- Second pass: denoise=0.2 — detail refinement
- Third pass: denoise=0.1 — final polish
Summary
ERNIE-Image's img2img capability provides powerful tools for secondary image creation:
- The denoise parameter is key — master it and you master img2img
- ComfyUI workflows are most flexible — visual operation, easy debugging
- Diffusers API suits batch processing — programmatic, extensible
- PE module works in img2img too — choose on/off based on the scenario
From sketch to refinement, photo to art — img2img transforms ERNIE-Image from just a "text-to-image" tool into a true AI image creation partner.
References: ERNIE-Image HuggingFace Collection, ComfyUI Official Documentation, Diffusers img2img Documentation