ERNIE-Image img2img Complete Guide: Professional Workflows from Sketch to Refinement

May 6, 2026

ERNIE-Image img2img Complete Guide: Professional Workflows from Sketch to Refinement

Published: 2026-05-06
Author: Yan Ming
Tags: img2img, Image-to-Image, ComfyUI, Denoise, ERNIE-Image


Introduction

ERNIE-Image, Baidu's open-source 8B text-to-image model, is best known for its text-to-image (text2img) generation capabilities. However, the official ERNIE-Image collection on HuggingFace explicitly lists support for both "text2img and img2img" modes.

img2img (image-to-image) is one of the most practical features in AI image generation: you provide a reference image, and the model generates a new image based on the original's content and structure, guided by your text prompt. From sketch rendering to style transfer, from photo restoration to concept design — img2img opens endless possibilities.

This article provides an in-depth guide to ERNIE-Image's img2img capabilities, covering both ComfyUI and diffusers deployment methods, plus essential denoise parameter techniques.


What is img2img? The Difference from text2img

Core Concepts

text2img (text-to-image): Starts from scratch, generating a completely new image from a text description. Denoise = 1.0 — the model has full creative freedom.

img2img (image-to-image): Starts from an existing image, transforming it based on a text description. Denoise < 1.0 — the model recreates the image while preserving parts of the original.

Workflow Comparison

text2img:
  Prompt → Encoding → Empty Latent Space → Diffusion Sampling → Decoding → Output Image

img2img:
Prompt + Input Image → Encoding → VAE Encode Input → Add Noise → Diffusion Sampling → Decoding → Output Image

The key to img2img is the denoise parameter — it controls how much of the original image is preserved:

  • denoise = 1.0: Equivalent to text2img — original image information is completely overwritten
  • denoise = 0.8: Major changes — preserves basic composition
  • denoise = 0.5: Moderate changes — retains many original details
  • denoise = 0.2: Subtle refinements — color grading, minor style tweaks

Denoise Parameter Deep Dive

The denoise parameter is the heart of img2img, determining the balance between "new" and "old." Here's what different values produce:

Denoise Value Quick Reference

Denoise Change Level Use Cases Recommended Steps
0.1-0.2 Minimal Color tweaks, brightness adjustment 8-15
0.2-0.3 Subtle Style fine-tuning, denoising 10-20
0.3-0.5 Moderate Style transfer, material changes 15-30
0.5-0.7 Significant Scene modification, outfit changes 20-40
0.7-0.85 Major Sketch rendering, concept design 30-50
0.85-0.95 Almost new Fresh creation preserving composition 40-50
1.0 Completely new Equivalent to text2img 8(Turbo)/50(Standard)

Practical Examples

Example 1: Photo → Anime Style (denoise = 0.4)

Input: Real person photograph
Prompt: "a cute anime girl, detailed eyes, chibi style, pastel colors"
Denoise: 0.4

Result: Preserves facial contours and expressions, converts to anime art style.

Example 2: Sketch → Detailed Render (denoise = 0.75)

Input: Hand-drawn architectural sketch
Prompt: "modern glass office building, sunset lighting, photorealistic, architectural photography"
Denoise: 0.75

Result: Preserves building outline and structure, generates photorealistic rendering.

Example 3: Old Photo Restoration (denoise = 0.15)

Input: Blurry vintage photograph
Prompt: "clear portrait, high resolution, sharp details, warm lighting"
Denoise: 0.15

Result: Enhances clarity and color while preserving original content.


ComfyUI img2img Workflow Setup

Core Nodes

The ERNIE-Image img2img workflow builds on the text2img workflow with these additions/modifications:

  1. Load Image — Load the reference image
  2. VAE Encode — Encode input image to latent space representation
  3. KSampler — Key node, set denoise < 1.0

Step-by-Step Setup

Step 1: Load Model and Encoders

Load Diffusion Model → ERNIE-Image-Turbo.safetensors
Checkpoint Loader → T5-XXL + VAE

Step 2: Load Input Image

Load Image → Select your reference image

Step 3: VAE Encode Input

VAE Encode → Connect Load Image and VAE nodes

This converts the pixel-space image to latent space representation — critical for img2img.

Step 4: Prompt Encoding

CLIP Text Encode (Positive) → Your positive prompt
CLIP Text Encode (Negative) → Negative description (e.g., "blurry, low quality")

ERNIE-Image's Prompt Enhancer (PE) works in img2img mode too. Enable PE for complex style transfers, disable it for precise reproduction.

Step 5: KSampler Configuration

KSampler:
  model → Load Diffusion Model output
  positive → CLIP Text Encode (Positive)
  negative → CLIP Text Encode (Negative)
  latent_image → VAE Encode output (KEY: not empty latent!)
  denoise → 0.4 (adjust based on needs)
  steps → 8 for Turbo, 30-50 for Standard
  cfg → 6.0
  sampler_name → euler
  scheduler → normal

Step 6: VAE Decode Output

VAE Decode → Connect KSampler output
Save Image → Save the result

Complete Node Connection Diagram

[Load Image] ──→ [VAE Encode] ──→ [KSampler.latent_image]
                                                         │
[CLIP Text Encode (+)] ──→ [KSampler.positive]           │
[CLIP Text Encode (-)] ──→ [KSampler.negative]      ──→ [VAE Decode]
[Load Diffusion Model] ──→ [KSampler.model]            ──→ [Save Image]

Diffusers Python API

For programmatic workflows, use HuggingFace Diffusers:

Installation

pip install -U diffusers transformers accelerate

Code Example

import torch
from diffusers import DiffusionPipeline
from PIL import Image

Load model

pipe = DiffusionPipeline.from_pretrained(
"baidu/ERNIE-Image-Turbo",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

Load input image

input_image = Image.open("reference.jpg").convert("RGB")
input_image = input_image.resize((1024, 1024))

img2img generation

prompt = "a futuristic cityscape at night, neon lights, cyberpunk style"
output_image = pipe(
prompt=prompt,
image=input_image,
strength=0.6, # Equivalent to ComfyUI's denoise
num_inference_steps=8, # Turbo mode
guidance_scale=6.0
).images[0]

output_image.save("output.jpg")

Key Parameters:

  • strength: Equivalent to denoise. Range: 0.0-1.0
  • num_inference_steps: Turbo = 8 steps, Standard = 30-50 steps
  • guidance_scale: CFG guidance strength, recommend 5.0-7.0

Practical Use Cases

1. Sketch to Render

Workflow: Hand-drawn sketch → ERNIE-Image img2img → Detailed rendering

Input: Pencil sketch of building/character/product
Prompt: "photorealistic [description], high detail, professional rendering"
Denoise: 0.7-0.8

Tip: The cleaner the sketch, the better the output. Use black-and-white line art — avoid colored distractions.

2. Style Transfer

Workflow: Original image → ERNIE-Image img2img → New style image

Input: Any image
Prompt: "[target style] style, [description]"
Denoise: 0.3-0.5

Example Prompts:

  • "oil painting style, Van Gogh inspired, thick brush strokes"
  • "watercolor illustration, soft colors, dreamy atmosphere"
  • "cyberpunk neon style, dark background, glowing elements"

3. Image Enhancement/Restoration

Workflow: Low-quality image → ERNIE-Image img2img → Enhanced HD image

Input: Blurry, low-resolution, or noisy image
Prompt: "high resolution, sharp details, professional photography, 4K quality"
Denoise: 0.1-0.3

Tip: Low denoise values preserve original content while improving quality.

4. Photo Stylization

Workflow: Regular photo → ERNIE-Image img2img → Artistic photo

Input: Ordinary photograph
Prompt: "[style] photograph, dramatic lighting, cinematic"
Denoise: 0.3-0.5

Recommended Styles:

  • "film photography, Kodak Portra 400"
  • "black and white portrait, dramatic shadows"
  • "double exposure, surreal composition"

5. Content Expansion (Outpainting Variant)

While ERNIE-Image has dedicated outpainting (see EI-017), img2img can achieve similar results:

Input: Cropped image (upscaled with blank areas)
Prompt: Maintain original scene description, add expansion direction details
Denoise: 0.5-0.7

PE (Prompt Enhancer) in img2img

ERNIE-Image's built-in Prompt Enhancer has unique behavior in img2img mode:

When to Enable PE

  • Style transfer: PE enriches style descriptions for finer results
  • Sketch rendering: PE adds material, lighting details
  • Creative exploration: When you want the model to take creative liberties

When to Disable PE

  • Precise reproduction: When strict prompt adherence is needed
  • Image restoration: When minimal changes are desired
  • Commercial projects: When predictable output matters

Testing Method

Generate 2 images with PE on and 2 with PE off for the same input and prompt. Compare and choose the best approach.


FAQ

Q: Can img2img change image dimensions?

Yes, but identical input/output resolution produces the best results. If you need different dimensions, resize the input image externally first.

Q: What's the best denoise value?

No single answer. Start at 0.4 and adjust:

  • Not enough change → increase denoise
  • Too much change → decrease denoise
  • Adjust in 0.1 increments

Q: Why is img2img quality worse than text2img?

Common causes:

  1. Wrong denoise value: Too high loses the original, too low makes minimal changes
  2. Unclear prompt: img2img needs more precise descriptions
  3. Poor input quality: Blurry or low-resolution input degrades output

Q: Can I chain multiple img2img passes?

Absolutely! Multi-pass img2img is an advanced technique:

  1. First pass: denoise=0.5 — style transfer
  2. Second pass: denoise=0.2 — detail refinement
  3. Third pass: denoise=0.1 — final polish

Summary

ERNIE-Image's img2img capability provides powerful tools for secondary image creation:

  • The denoise parameter is key — master it and you master img2img
  • ComfyUI workflows are most flexible — visual operation, easy debugging
  • Diffusers API suits batch processing — programmatic, extensible
  • PE module works in img2img too — choose on/off based on the scenario

From sketch to refinement, photo to art — img2img transforms ERNIE-Image from just a "text-to-image" tool into a true AI image creation partner.


References: ERNIE-Image HuggingFace Collection, ComfyUI Official Documentation, Diffusers img2img Documentation

ERNIE-Image Team