ERNIE-Image img2img Complete Guide: Professional Workflows from Sketch to Refinement

Published: 2026-05-06
Author: Yan Ming
Tags: img2img, Image-to-Image, ComfyUI, Denoise, ERNIE-Image

Introduction

ERNIE-Image, Baidu's open-source 8B text-to-image model, is best known for its text-to-image (text2img) generation capabilities. However, the official ERNIE-Image collection on HuggingFace explicitly lists support for both "text2img and img2img" modes.

img2img (image-to-image) is one of the most practical features in AI image generation: you provide a reference image, and the model generates a new image based on the original's content and structure, guided by your text prompt. From sketch rendering to style transfer, from photo restoration to concept design — img2img opens endless possibilities.

This article provides an in-depth guide to ERNIE-Image's img2img capabilities, covering both ComfyUI and diffusers deployment methods, plus essential denoise parameter techniques.

What is img2img? The Difference from text2img

Core Concepts

text2img (text-to-image): Starts from scratch, generating a completely new image from a text description. Denoise = 1.0 — the model has full creative freedom.

img2img (image-to-image): Starts from an existing image, transforming it based on a text description. Denoise < 1.0 — the model recreates the image while preserving parts of the original.

Workflow Comparison

text2img: Prompt → Encoding → Empty Latent Space → Diffusion Sampling → Decoding → Output Image

img2img: Prompt + Input Image → Encoding → VAE Encode Input → Add Noise → Diffusion Sampling → Decoding → Output Image

The key to img2img is the denoise parameter — it controls how much of the original image is preserved:

denoise = 1.0: Equivalent to text2img — original image information is completely overwritten
denoise = 0.8: Major changes — preserves basic composition
denoise = 0.5: Moderate changes — retains many original details
denoise = 0.2: Subtle refinements — color grading, minor style tweaks

Denoise Parameter Deep Dive

The denoise parameter is the heart of img2img, determining the balance between "new" and "old." Here's what different values produce:

Denoise Value Quick Reference

Denoise	Change Level	Use Cases	Recommended Steps
0.1-0.2	Minimal	Color tweaks, brightness adjustment	8-15
0.2-0.3	Subtle	Style fine-tuning, denoising	10-20
0.3-0.5	Moderate	Style transfer, material changes	15-30
0.5-0.7	Significant	Scene modification, outfit changes	20-40
0.7-0.85	Major	Sketch rendering, concept design	30-50
0.85-0.95	Almost new	Fresh creation preserving composition	40-50
1.0	Completely new	Equivalent to text2img	8(Turbo)/50(Standard)

Practical Examples

Example 1: Photo → Anime Style (denoise = 0.4)

Input: Real person photograph
Prompt: "a cute anime girl, detailed eyes, chibi style, pastel colors"
Denoise: 0.4

Result: Preserves facial contours and expressions, converts to anime art style.

Example 2: Sketch → Detailed Render (denoise = 0.75)

Input: Hand-drawn architectural sketch
Prompt: "modern glass office building, sunset lighting, photorealistic, architectural photography"
Denoise: 0.75

Result: Preserves building outline and structure, generates photorealistic rendering.

Example 3: Old Photo Restoration (denoise = 0.15)

Input: Blurry vintage photograph
Prompt: "clear portrait, high resolution, sharp details, warm lighting"
Denoise: 0.15

Result: Enhances clarity and color while preserving original content.

ComfyUI img2img Workflow Setup

Core Nodes

The ERNIE-Image img2img workflow builds on the text2img workflow with these additions/modifications:

Load Image — Load the reference image
VAE Encode — Encode input image to latent space representation
KSampler — Key node, set denoise < 1.0

Step-by-Step Setup

Step 1: Load Model and Encoders

Load Diffusion Model → ERNIE-Image-Turbo.safetensors
Checkpoint Loader → T5-XXL + VAE

Step 2: Load Input Image

Load Image → Select your reference image

Step 3: VAE Encode Input

VAE Encode → Connect Load Image and VAE nodes

This converts the pixel-space image to latent space representation — critical for img2img.

Step 4: Prompt Encoding

CLIP Text Encode (Positive) → Your positive prompt
CLIP Text Encode (Negative) → Negative description (e.g., "blurry, low quality")

ERNIE-Image's Prompt Enhancer (PE) works in img2img mode too. Enable PE for complex style transfers, disable it for precise reproduction.

Step 5: KSampler Configuration

KSampler:
  model → Load Diffusion Model output
  positive → CLIP Text Encode (Positive)
  negative → CLIP Text Encode (Negative)
  latent_image → VAE Encode output (KEY: not empty latent!)
  denoise → 0.4 (adjust based on needs)
  steps → 8 for Turbo, 30-50 for Standard
  cfg → 6.0
  sampler_name → euler
  scheduler → normal

Step 6: VAE Decode Output

VAE Decode → Connect KSampler output
Save Image → Save the result

Complete Node Connection Diagram

[Load Image] ──→ [VAE Encode] ──→ [KSampler.latent_image]
                                                         │
[CLIP Text Encode (+)] ──→ [KSampler.positive]           │
[CLIP Text Encode (-)] ──→ [KSampler.negative]      ──→ [VAE Decode]
[Load Diffusion Model] ──→ [KSampler.model]            ──→ [Save Image]

Diffusers Python API

For programmatic workflows, use HuggingFace Diffusers:

Installation

pip install -U diffusers transformers accelerate

Code Example

import torch
from diffusers import DiffusionPipeline
from PIL import Image
Load model
pipe = DiffusionPipeline.from_pretrained(

"baidu/ERNIE-Image-Turbo",

torch_dtype=torch.float16

)

pipe = pipe.to("cuda")
Load input image
input_image = Image.open("reference.jpg").convert("RGB")

input_image = input_image.resize((1024, 1024))
img2img generation
prompt = "a futuristic cityscape at night, neon lights, cyberpunk style"

output_image = pipe(

prompt=prompt,

image=input_image,

strength=0.6,          # Equivalent to ComfyUI's denoise

num_inference_steps=8, # Turbo mode

guidance_scale=6.0

).images[0]
output_image.save("output.jpg")

Key Parameters:

strength: Equivalent to denoise. Range: 0.0-1.0
num_inference_steps: Turbo = 8 steps, Standard = 30-50 steps
guidance_scale: CFG guidance strength, recommend 5.0-7.0

Practical Use Cases

1. Sketch to Render

Workflow: Hand-drawn sketch → ERNIE-Image img2img → Detailed rendering

Input: Pencil sketch of building/character/product
Prompt: "photorealistic [description], high detail, professional rendering"
Denoise: 0.7-0.8

Tip: The cleaner the sketch, the better the output. Use black-and-white line art — avoid colored distractions.

2. Style Transfer

Workflow: Original image → ERNIE-Image img2img → New style image

Input: Any image
Prompt: "[target style] style, [description]"
Denoise: 0.3-0.5

Example Prompts:

"oil painting style, Van Gogh inspired, thick brush strokes"
"watercolor illustration, soft colors, dreamy atmosphere"
"cyberpunk neon style, dark background, glowing elements"

3. Image Enhancement/Restoration

Workflow: Low-quality image → ERNIE-Image img2img → Enhanced HD image

Input: Blurry, low-resolution, or noisy image
Prompt: "high resolution, sharp details, professional photography, 4K quality"
Denoise: 0.1-0.3

Tip: Low denoise values preserve original content while improving quality.

4. Photo Stylization

Workflow: Regular photo → ERNIE-Image img2img → Artistic photo

Input: Ordinary photograph
Prompt: "[style] photograph, dramatic lighting, cinematic"
Denoise: 0.3-0.5

Recommended Styles:

"film photography, Kodak Portra 400"
"black and white portrait, dramatic shadows"
"double exposure, surreal composition"

5. Content Expansion (Outpainting Variant)

While ERNIE-Image has dedicated outpainting (see EI-017), img2img can achieve similar results:

Input: Cropped image (upscaled with blank areas)
Prompt: Maintain original scene description, add expansion direction details
Denoise: 0.5-0.7

PE (Prompt Enhancer) in img2img

ERNIE-Image's built-in Prompt Enhancer has unique behavior in img2img mode:

When to Enable PE

Style transfer: PE enriches style descriptions for finer results
Sketch rendering: PE adds material, lighting details
Creative exploration: When you want the model to take creative liberties

When to Disable PE

Precise reproduction: When strict prompt adherence is needed
Image restoration: When minimal changes are desired
Commercial projects: When predictable output matters

Testing Method

Generate 2 images with PE on and 2 with PE off for the same input and prompt. Compare and choose the best approach.

FAQ

Q: Can img2img change image dimensions?

Yes, but identical input/output resolution produces the best results. If you need different dimensions, resize the input image externally first.

Q: What's the best denoise value?

No single answer. Start at 0.4 and adjust:

Not enough change → increase denoise
Too much change → decrease denoise
Adjust in 0.1 increments

Q: Why is img2img quality worse than text2img?

Common causes:

Wrong denoise value: Too high loses the original, too low makes minimal changes
Unclear prompt: img2img needs more precise descriptions
Poor input quality: Blurry or low-resolution input degrades output

Q: Can I chain multiple img2img passes?

Absolutely! Multi-pass img2img is an advanced technique:

First pass: denoise=0.5 — style transfer
Second pass: denoise=0.2 — detail refinement
Third pass: denoise=0.1 — final polish

Summary

ERNIE-Image's img2img capability provides powerful tools for secondary image creation:

The denoise parameter is key — master it and you master img2img
ComfyUI workflows are most flexible — visual operation, easy debugging
Diffusers API suits batch processing — programmatic, extensible
PE module works in img2img too — choose on/off based on the scenario

From sketch to refinement, photo to art — img2img transforms ERNIE-Image from just a "text-to-image" tool into a true AI image creation partner.

References: ERNIE-Image HuggingFace Collection, ComfyUI Official Documentation, Diffusers img2img Documentation

ERNIE-Image img2img Complete Guide: Professional Workflows from Sketch to Refinement

Table of Contents

ERNIE-Image img2img Complete Guide: Professional Workflows from Sketch to Refinement

Introduction

What is img2img? The Difference from text2img

Core Concepts

Workflow Comparison

Denoise Parameter Deep Dive

Denoise Value Quick Reference

Practical Examples

ComfyUI img2img Workflow Setup

Core Nodes

Step-by-Step Setup

Step 1: Load Model and Encoders

Step 2: Load Input Image

Step 3: VAE Encode Input

Step 4: Prompt Encoding

Step 5: KSampler Configuration

Step 6: VAE Decode Output

Complete Node Connection Diagram

Diffusers Python API

Installation

Code Example

Load model

Load input image

img2img generation

Practical Use Cases

1. Sketch to Render

2. Style Transfer

3. Image Enhancement/Restoration

4. Photo Stylization

5. Content Expansion (Outpainting Variant)

PE (Prompt Enhancer) in img2img

When to Enable PE

When to Disable PE

Testing Method

FAQ

Q: Can img2img change image dimensions?

Q: What's the best denoise value?

Q: Why is img2img quality worse than text2img?

Q: Can I chain multiple img2img passes?

Summary