ERNIE-Image Character LoRA Training Complete Guide: From Dataset Preparation to ComfyUI Deployment

may. 26, 2026

ERNIE-Image Character LoRA Training Complete Guide: From Dataset Preparation to ComfyUI Deployment

Abstract: Character consistency is the core pain point in AI image generation. This article provides a deep dive into the complete ERNIE-Image LoRA character training workflow — from high-quality dataset construction to training parameter tuning and ComfyUI deployment, ensuring your AI character maintains identity across different scenes, angles, and expressions.


Why Character LoRA Is the Watershed Moment for AI Image Generation

In the AI image generation space, character consistency has been the single biggest challenge. Have you ever experienced these scenarios:

  • The first character image looks perfect, but the second one looks completely different
  • Different expressions and angles of the same character appear as entirely different people
  • After training a LoRA, the character's features are "eaten away," resulting in identical outputs

ERNIE-Image's 8B DiT architecture excels at LoRA training. A Reddit user noted: "Unlike Z-Image Turbo, ERNIE-Image seems to be really good for LoRA training" — a sentiment that resonates widely in the community.


Part 1: Core Concepts of Character LoRA Training

What is LoRA?

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique. By injecting low-rank matrices into the attention layers of a pre-trained model, it adapts to specific styles or characters using only 1-10% of the original parameters.

ERNIE-Image LoRA Training Advantages:

  • Moderate parameter count: 8B DiT architecture balances training cost and output quality
  • Text rendering preserved: Character LoRA training doesn't break ERNIE-Image's text rendering capability
  • Layout compatibility: Character LoRA works alongside ERNIE-Image's poster/infographic generation

Character LoRA vs IP-Adapter vs Reference-Only

Method Identity Retention Flexibility Training Cost Best Use Case
Character LoRA ⭐⭐⭐⭐⭐ ⭐⭐⭐ Medium Fixed character, multi-scene reuse
IP-Adapter ⭐⭐⭐ ⭐⭐⭐⭐⭐ No training Temporary character reference
Reference-Only ⭐⭐ ⭐⭐⭐⭐ No training Style transfer primarily

Bottom line: If you need the same character across 10+ images in different scenes, character LoRA is your only reliable option.


Part 2: Dataset Preparation — 80% of LoRA Quality

The Golden Rule: 15-30 High-Quality Images

Community consensus from Reddit: "25-30 images should give you good results for lora training." Too few leads to underfitting; too many causes overfitting.

Dataset Construction Checklist

Required angles (at least 2-3 per category):

  1. Front portrait — Clear face, neutral expression
  2. Side/3/4 profile — Shows character silhouette
  3. Full body — Shows proportions, clothing style
  4. Different expressions — Smile, serious, surprised, etc.
  5. Different lighting — Natural, indoor, backlit

Image quality standards:

  • Resolution ≥ 512×512 (recommended 768×768 or higher)
  • No watermarks, blur, or excessive filters
  • Character occupies at least 50% of the frame
  • Simple backgrounds to avoid training interference

❌ Images to avoid:

  • Group shots (multiple characters)
  • Extreme angles (bird's-eye, low-angle >45°)
  • Distorted images (wide-angle distortion)
  • Mixed in images of different characters

Caption Generation Strategy

Method 1: General Description + Trigger Word

Trigger word: {mycharacter} a young woman, [detailed clothing/features description], [scene description], [lighting description]

Example:

{mycharacter} a young Asian woman with long black hair, wearing a red qipao, standing in a traditional Chinese garden, soft morning light

Method 2: Layered Description (Recommended)

Trigger word + Fixed character traits + Variable descriptions (scene, expression, clothing)

This approach allows post-training control through the variable portion while keeping character identity stable.


Part 3: Training Parameter Deep Dive

Base Configuration

Parameter Recommended Notes
Learning Rate 1e-4 ~ 5e-5 Lower for character LoRA
Network Rank 32 ~ 64 Characters need higher rank
Network Alpha 16 ~ 32 Typically half of rank
Epochs 10 ~ 20 Too many = overfitting
Batch Size 1 ~ 4 Depends on VRAM
Optimizer AdamW8bit VRAM-friendly
Dataset Repeats 10 ~ 20 Adjust based on dataset size
Resolution 512 or 768 Match training target

ERNIE-Image Specific Parameters

# ERNIE-Image LoRA Training Configuration
model: baidu/ERNIE-Image
base_model_revision: main
network_dim: 64
network_alpha: 32
learning_rate: 1e-4
lr_scheduler: cosine
lr_warmup_steps: 100
max_train_steps: 2000
mixed_precision: bf16
train_batch_size: 2
resolution: 768

Overfitting Detection

Good training signals:

  • Loss drops rapidly in first 1000 steps, then stabilizes
  • Character features gradually become clearer
  • Consistent character across different prompts

Overfitting signals:

  • Loss continues dropping but quality degrades
  • All generated images look identical
  • Character features "consume" background details

Solutions: Early stopping, reduce epochs, add regularization images.


Part 4: ComfyUI Deployment Workflow

Basic ERNIE-Image + Character LoRA Workflow

[Load Checkpoint: ERNIE-Image Base]
       ↓
[Load LoRA: character_lora.safetensors] → weight: 0.8~1.0
       ↓
[CLIP Text Encode] → positive prompt
[CLIP Text Encode] → negative prompt
       ↓
[Empty Latent Image] → 768x768
       ↓
[Sampler] → DPM++ 2M Karras, 20-30 steps, CFG 5-7
       ↓
[VAE Decode]
       ↓
[Save Image]

Advanced: Character LoRA + ControlNet Combo

[Character LoRA] → Maintain character identity
[ControlNet: OpenPose] → Control character pose
[ControlNet: Canny] → Control composition
→ Stack all three for precise character control

Advanced: Character LoRA + IP-Adapter Style Transfer

[Character LoRA] → Character identity consistent
[IP-Adapter] → Reference specific art style
→ Character stays the same, style changes

Part 5: Common Pitfalls and Solutions

Pitfall 1: LoRA Weight Too High

Symptom: Character features too strong, rigid images

Solution: Lower LoRA weight to 0.6-0.8, or adjust CFG scale

Pitfall 2: Insufficient Dataset Diversity

Symptom: Character only generates in specific scenes/angles

Solution: Add multi-angle, multi-expression training data

Pitfall 3: Trigger Word Pollution

Symptom: Character generates without trigger word — model memorized character instead of trigger

Solution: Ensure every training image uses the trigger word, never use real character names

Pitfall 4: Overtraining Reduces Generalization

Symptom: All generated characters look identical

Solution: Reduce training steps, add regularization images (similar but different characters)


Part 6: Practical Case — Training an Anime Character LoRA

Dataset (20 images)

  • Front portraits × 5 (different expressions)
  • Side/3/4 profiles × 4
  • Full body × 5 (different angles)
  • Half body × 3
  • Close-ups (eyes/hands) × 3

Training Configuration

network_dim: 64
network_alpha: 32
learning_rate: 1e-4
max_train_steps: 2500
epochs: 15

Test Prompts

Front: {mycharacter} a beautiful anime girl, long silver hair, blue eyes, smiling, white dress, fantasy background, cinematic lighting

Side: {mycharacter} side view, looking over shoulder, sunset lighting, long silver hair flowing in wind

Full body: {mycharacter} full body, standing in a magical forest, glowing particles, dynamic pose, detailed fantasy outfit


Part 7: ERNIE-Image Character LoRA vs Competitors

Dimension ERNIE-Image Flux.2 SDXL
Character Consistency ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐
Text Rendering ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐
Training Cost ⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
Community Resources ⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Commercial License Apache 2.0 ✅ Apache 2.0 ✅ Open RAIL++

ERNIE-Image's unique edge: After character LoRA training, text rendering capability is preserved. You can generate posters and infographics with the character's name — something other models struggle with.


Summary

Character consistency LoRA training is the key step that transforms AI image generation from "toy" to "tool." ERNIE-Image, with its 8B DiT architecture and Apache 2.0 open-source license, provides a highly cost-effective platform for character LoRA training.

Key takeaways:

  1. Dataset quality > training parameters — 15-30 high-quality, multi-angle images are fundamental
  2. Overfitting is the most common trap — use early stopping and regularization data
  3. ComfyUI workflows enable rapid deployment of training results
  4. ERNIE-Image's text rendering + character LoRA is a unique combination advantage

References: Reddit r/StableDiffusion, r/ComfyUI, HuggingFace baidu/ERNIE-Image, dev.to LoRA Training Guide, RunDiffusion Character Consistency Template

ERNIE-Image Team

ERNIE-Image Character LoRA Training Complete Guide: From Dataset Preparation to ComfyUI Deployment | Blog