ERNIE-Image Character LoRA Training Complete Guide: From Dataset Preparation to ComfyUI Deployment

Abstract: Character consistency is the core pain point in AI image generation. This article provides a deep dive into the complete ERNIE-Image LoRA character training workflow — from high-quality dataset construction to training parameter tuning and ComfyUI deployment, ensuring your AI character maintains identity across different scenes, angles, and expressions.

Why Character LoRA Is the Watershed Moment for AI Image Generation

In the AI image generation space, character consistency has been the single biggest challenge. Have you ever experienced these scenarios:

The first character image looks perfect, but the second one looks completely different
Different expressions and angles of the same character appear as entirely different people
After training a LoRA, the character's features are "eaten away," resulting in identical outputs

ERNIE-Image's 8B DiT architecture excels at LoRA training. A Reddit user noted: "Unlike Z-Image Turbo, ERNIE-Image seems to be really good for LoRA training" — a sentiment that resonates widely in the community.

Part 1: Core Concepts of Character LoRA Training

What is LoRA?

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique. By injecting low-rank matrices into the attention layers of a pre-trained model, it adapts to specific styles or characters using only 1-10% of the original parameters.

ERNIE-Image LoRA Training Advantages:

Moderate parameter count: 8B DiT architecture balances training cost and output quality
Text rendering preserved: Character LoRA training doesn't break ERNIE-Image's text rendering capability
Layout compatibility: Character LoRA works alongside ERNIE-Image's poster/infographic generation

Character LoRA vs IP-Adapter vs Reference-Only

Method	Identity Retention	Flexibility	Training Cost	Best Use Case
Character LoRA	⭐⭐⭐⭐⭐	⭐⭐⭐	Medium	Fixed character, multi-scene reuse
IP-Adapter	⭐⭐⭐	⭐⭐⭐⭐⭐	No training	Temporary character reference
Reference-Only	⭐⭐	⭐⭐⭐⭐	No training	Style transfer primarily

Bottom line: If you need the same character across 10+ images in different scenes, character LoRA is your only reliable option.

Part 2: Dataset Preparation — 80% of LoRA Quality

The Golden Rule: 15-30 High-Quality Images

Community consensus from Reddit: "25-30 images should give you good results for lora training." Too few leads to underfitting; too many causes overfitting.

Dataset Construction Checklist

Required angles (at least 2-3 per category):

Front portrait — Clear face, neutral expression
Side/3/4 profile — Shows character silhouette
Full body — Shows proportions, clothing style
Different expressions — Smile, serious, surprised, etc.
Different lighting — Natural, indoor, backlit

Image quality standards:

Resolution ≥ 512×512 (recommended 768×768 or higher)
No watermarks, blur, or excessive filters
Character occupies at least 50% of the frame
Simple backgrounds to avoid training interference

❌ Images to avoid:

Group shots (multiple characters)
Extreme angles (bird's-eye, low-angle >45°)
Distorted images (wide-angle distortion)
Mixed in images of different characters

Caption Generation Strategy

Method 1: General Description + Trigger Word

Trigger word: {mycharacter} a young woman, [detailed clothing/features description], [scene description], [lighting description]

Example:

{mycharacter} a young Asian woman with long black hair, wearing a red qipao, standing in a traditional Chinese garden, soft morning light

Method 2: Layered Description (Recommended)

Trigger word + Fixed character traits + Variable descriptions (scene, expression, clothing)

This approach allows post-training control through the variable portion while keeping character identity stable.

Part 3: Training Parameter Deep Dive

Base Configuration

Parameter	Recommended	Notes
Learning Rate	1e-4 ~ 5e-5	Lower for character LoRA
Network Rank	32 ~ 64	Characters need higher rank
Network Alpha	16 ~ 32	Typically half of rank
Epochs	10 ~ 20	Too many = overfitting
Batch Size	1 ~ 4	Depends on VRAM
Optimizer	AdamW8bit	VRAM-friendly
Dataset Repeats	10 ~ 20	Adjust based on dataset size
Resolution	512 or 768	Match training target

ERNIE-Image Specific Parameters

# ERNIE-Image LoRA Training Configuration
model: baidu/ERNIE-Image
base_model_revision: main
network_dim: 64
network_alpha: 32
learning_rate: 1e-4
lr_scheduler: cosine
lr_warmup_steps: 100
max_train_steps: 2000
mixed_precision: bf16
train_batch_size: 2
resolution: 768

Overfitting Detection

Good training signals:

Loss drops rapidly in first 1000 steps, then stabilizes
Character features gradually become clearer
Consistent character across different prompts

Overfitting signals:

Loss continues dropping but quality degrades
All generated images look identical
Character features "consume" background details

Solutions: Early stopping, reduce epochs, add regularization images.

Part 4: ComfyUI Deployment Workflow

Basic ERNIE-Image + Character LoRA Workflow

[Load Checkpoint: ERNIE-Image Base]
       ↓
[Load LoRA: character_lora.safetensors] → weight: 0.8~1.0
       ↓
[CLIP Text Encode] → positive prompt
[CLIP Text Encode] → negative prompt
       ↓
[Empty Latent Image] → 768x768
       ↓
[Sampler] → DPM++ 2M Karras, 20-30 steps, CFG 5-7
       ↓
[VAE Decode]
       ↓
[Save Image]

Advanced: Character LoRA + ControlNet Combo

[Character LoRA] → Maintain character identity
[ControlNet: OpenPose] → Control character pose
[ControlNet: Canny] → Control composition
→ Stack all three for precise character control

Advanced: Character LoRA + IP-Adapter Style Transfer

[Character LoRA] → Character identity consistent
[IP-Adapter] → Reference specific art style
→ Character stays the same, style changes

Part 5: Common Pitfalls and Solutions

Pitfall 1: LoRA Weight Too High

Symptom: Character features too strong, rigid images

Solution: Lower LoRA weight to 0.6-0.8, or adjust CFG scale

Pitfall 2: Insufficient Dataset Diversity

Symptom: Character only generates in specific scenes/angles

Solution: Add multi-angle, multi-expression training data

Pitfall 3: Trigger Word Pollution

Symptom: Character generates without trigger word — model memorized character instead of trigger

Solution: Ensure every training image uses the trigger word, never use real character names

Pitfall 4: Overtraining Reduces Generalization

Symptom: All generated characters look identical

Solution: Reduce training steps, add regularization images (similar but different characters)

Part 6: Practical Case — Training an Anime Character LoRA

Dataset (20 images)

Front portraits × 5 (different expressions)
Side/3/4 profiles × 4
Full body × 5 (different angles)
Half body × 3
Close-ups (eyes/hands) × 3

Training Configuration

network_dim: 64
network_alpha: 32
learning_rate: 1e-4
max_train_steps: 2500
epochs: 15

Test Prompts

Front: {mycharacter} a beautiful anime girl, long silver hair, blue eyes, smiling, white dress, fantasy background, cinematic lighting Side: {mycharacter} side view, looking over shoulder, sunset lighting, long silver hair flowing in wind

Full body: {mycharacter} full body, standing in a magical forest, glowing particles, dynamic pose, detailed fantasy outfit

Part 7: ERNIE-Image Character LoRA vs Competitors

Dimension	ERNIE-Image	Flux.2	SDXL
Character Consistency	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Text Rendering	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐
Training Cost	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐
Community Resources	⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Commercial License	Apache 2.0 ✅	Apache 2.0 ✅	Open RAIL++

ERNIE-Image's unique edge: After character LoRA training, text rendering capability is preserved. You can generate posters and infographics with the character's name — something other models struggle with.

Summary

Character consistency LoRA training is the key step that transforms AI image generation from "toy" to "tool." ERNIE-Image, with its 8B DiT architecture and Apache 2.0 open-source license, provides a highly cost-effective platform for character LoRA training.

Key takeaways:

Dataset quality > training parameters — 15-30 high-quality, multi-angle images are fundamental
Overfitting is the most common trap — use early stopping and regularization data
ComfyUI workflows enable rapid deployment of training results
ERNIE-Image's text rendering + character LoRA is a unique combination advantage

References: Reddit r/StableDiffusion, r/ComfyUI, HuggingFace baidu/ERNIE-Image, dev.to LoRA Training Guide, RunDiffusion Character Consistency Template