Version: 1.0 Date: 2025-10-31 Status: Research & Planning
Local translation enables privacy-preserving, offline translation directly in the browser or on the user’s machine without external API calls. This document analyzes the available options for the Quarto Review translation module.
| Model Name | Parameters | Size | Languages | Quality |
|---|---|---|---|---|
Xenova/nllb-200-distilled-600M |
600M | ~600MB | 200 | Good |
Xenova/nllb-200-1.3B |
1.3B | ~1.3GB | 200 | Better |
facebook/nllb-200-3.3B |
3.3B | ~3.3GB | 200 | Best |
Recommended for Quarto Review: Xenova/nllb-200-distilled-600M
Loading Time (First Run):
Translation Speed (WASM Backend):
Short sentence (10 words): 1-2 seconds
Medium paragraph (50 words): 5-8 seconds
Long paragraph (200 words): 20-30 seconds
Full document (1000 words): 100-150 seconds
Translation Speed (WebGPU Backend):
Short sentence (10 words): 0.1-0.2 seconds (10x faster)
Medium paragraph (50 words): 0.5-1 second (10x faster)
Long paragraph (200 words): 2-3 seconds (10x faster)
Full document (1000 words): 10-15 seconds (10x faster)
Memory Usage:
Quality (for EN ↔ NL, EN ↔ FR):
import { pipeline } from '@xenova/transformers';
// Initialize (first time downloads model)
const translator = await pipeline(
'translation',
'Xenova/nllb-200-distilled-600M',
{ device: 'webgpu' } // or 'wasm' for CPU
);
// Translate
const result = await translator('Hello world', {
src_lang: 'eng_Latn',
tgt_lang: 'nld_Latn',
});
console.log(result[0].translation_text); // "Hallo wereld"
| Model Name | Parameters | Size | Languages | Quality |
|---|---|---|---|---|
Xenova/m2m100_418M |
418M | ~400MB | 100 | Good |
facebook/m2m100_1.2B |
1.2B | ~1.2GB | 100 | Better |
Recommendation: Use NLLB-200 instead (better quality, more languages)
Multiple language-pair-specific models:
Xenova/opus-mt-en-nl (EN→NL only)Xenova/opus-mt-nl-en (NL→EN only)Xenova/opus-mt-en-fr (EN→FR only)Xenova/opus-mt-fr-en (FR→EN only)Each: ~300MB
Loading Time:
Translation Speed (WASM):
Short sentence (10 words): 0.5-1 second
Medium paragraph (50 words): 2-3 seconds
Long paragraph (200 words): 8-12 seconds
Full document (1000 words): 40-60 seconds
Translation Speed (WebGPU):
Short sentence (10 words): 0.05-0.1 seconds
Medium paragraph (50 words): 0.2-0.4 seconds
Long paragraph (200 words): 1-2 seconds
Full document (1000 words): 5-10 seconds
Pros:
Cons:
| Model | Parameters | Size | Quality | Browser? |
|---|---|---|---|---|
| Phi-3-mini | 3.8B | 2.4GB | Excellent | Yes (WebGPU) |
| Mistral-7B | 7B | 4.5GB | Excellent | Marginal |
| Llama-3-8B | 8B | 5GB | Excellent | No |
Translation Speed (Phi-3-mini, WebGPU):
Short sentence (10 words): 5-10 seconds
Medium paragraph (50 words): 30-60 seconds
Long paragraph (200 words): 2-5 minutes
Full document (1000 words): 10-25 minutes
Recommendation: NOT suitable for interactive translation in browser
| Model | Backend | Time | Relative Speed |
|---|---|---|---|
| NLLB-200-600M | WebGPU | 0.5-1s | 🔥🔥🔥 (Best) |
| Opus-MT | WebGPU | 0.2-0.4s | 🔥🔥🔥🔥 (Fastest) |
| NLLB-200-600M | WASM | 5-8s | 🔥 (Slow) |
| Opus-MT | WASM | 2-3s | 🔥🔥 (OK) |
| Phi-3-mini | WebGPU | 30-60s | ❄️ (Very Slow) |
| Model | BLEU Score | Human Rating | Use Case |
|---|---|---|---|
| GPT-4 | 45-55 | ⭐⭐⭐⭐⭐ Excellent | Production (API) |
| Claude 3.5 | 45-55 | ⭐⭐⭐⭐⭐ Excellent | Production (API) |
| NLLB-200-600M | 25-35 | ⭐⭐⭐ Good | Local/Privacy |
| Opus-MT | 24-32 | ⭐⭐⭐ Good | Local/Speed |
| Google Translate | 35-45 | ⭐⭐⭐⭐ Very Good | Production (API) |
| M2M-100 | 22-30 | ⭐⭐ Fair | Legacy |
| Model | RAM | VRAM (WebGPU) | Total |
|---|---|---|---|
| NLLB-200-600M | 600MB | 500MB | 1.1GB |
| Opus-MT | 300MB | 300MB | 600MB |
| NLLB-200-1.3B | 1.3GB | 1GB | 2.3GB |
| Phi-3-mini | 2.4GB | 2GB | 4.4GB |
| Feature | Chrome | Edge | Safari | Firefox |
|---|---|---|---|---|
| WASM (CPU) | ✅ 100% | ✅ 100% | ✅ 100% | ✅ 100% |
| WebGPU | ✅ 113+ | ✅ 113+ | ✅ 18+ | 🟡 Flag |
| Transformers.js | ✅ | ✅ | ✅ | ✅ |
WebGPU Global Support (Oct 2024): ~70%
Strategy: Start with WASM, upgrade to WebGPU if available
// translation/providers/local-ai.ts
export class LocalAIProvider extends TranslationProvider {
private translator: any = null;
private backend: 'wasm' | 'webgpu' = 'wasm';
async initialize(): Promise<void> {
// Detect WebGPU support
const hasWebGPU = await this.detectWebGPU();
this.backend = hasWebGPU ? 'webgpu' : 'wasm';
console.log(`Using ${this.backend} backend for translation`);
this.translator = await pipeline(
'translation',
'Xenova/nllb-200-distilled-600M',
{
device: this.backend,
dtype: this.backend === 'webgpu' ? 'fp16' : 'q8',
progress_callback: (progress) => {
// Show download progress to user
this.emitProgress(progress);
}
}
);
}
async detectWebGPU(): Promise<boolean> {
if (!navigator.gpu) return false;
try {
const adapter = await navigator.gpu.requestAdapter();
return adapter !== null;
} catch {
return false;
}
}
async translate(text: string, from: Language, to: Language): Promise<string> {
await this.initialize();
const result = await this.translator(text, {
src_lang: this.getLanguageCode(from),
tgt_lang: this.getLanguageCode(to),
});
return result[0].translation_text;
}
}
Pros:
Cons:
Strategy: Use fast Opus-MT for EN↔NL↔FR, fallback to API for others
export class HybridLocalProvider extends TranslationProvider {
private opusModels = new Map<string, any>();
private fallbackProvider?: TranslationProvider;
async initialize(): Promise<void> {
// Pre-load common pairs
await this.loadOpusModel('en', 'nl');
await this.loadOpusModel('nl', 'en');
await this.loadOpusModel('en', 'fr');
await this.loadOpusModel('fr', 'en');
// Set up fallback (could be NLLB or API)
this.fallbackProvider = new OpenAIProvider(config);
}
private async loadOpusModel(from: Language, to: Language): Promise<void> {
const key = `${from}-${to}`;
const modelName = `Xenova/opus-mt-${from}-${to}`;
const translator = await pipeline('translation', modelName, {
device: 'webgpu',
});
this.opusModels.set(key, translator);
}
async translate(text: string, from: Language, to: Language): Promise<string> {
const key = `${from}-${to}`;
const opusModel = this.opusModels.get(key);
if (opusModel) {
// Use fast Opus-MT
const result = await opusModel(text);
return result[0].translation_text;
}
// Fallback to API
return this.fallbackProvider!.translate(text, from, to);
}
}
Pros:
Cons:
Strategy: Offload translation to Web Worker, batch requests
// translation/workers/translation-worker.ts
import { pipeline } from '@xenova/transformers';
let translator: any = null;
self.addEventListener('message', async (event) => {
const { type, payload } = event.data;
if (type === 'initialize') {
translator = await pipeline(
'translation',
'Xenova/nllb-200-distilled-600M',
{ device: payload.device }
);
self.postMessage({ type: 'ready' });
return;
}
if (type === 'translate') {
const { texts, from, to } = payload;
// Batch translation
const results = await Promise.all(
texts.map(text => translator(text, {
src_lang: from,
tgt_lang: to,
}))
);
const translations = results.map(r => r[0].translation_text);
self.postMessage({
type: 'translations',
payload: translations,
});
}
});
// translation/providers/worker-local-ai.ts
export class WorkerLocalAIProvider extends TranslationProvider {
private worker: Worker;
private ready = false;
constructor(config: LocalAIConfig) {
super();
this.worker = new Worker(
new URL('../workers/translation-worker.ts', import.meta.url),
{ type: 'module' }
);
this.worker.addEventListener('message', (event) => {
if (event.data.type === 'ready') {
this.ready = true;
}
});
this.worker.postMessage({
type: 'initialize',
payload: { device: 'webgpu' },
});
}
async translateBatch(texts: string[], from: Language, to: Language): Promise<string[]> {
await this.waitForReady();
return new Promise((resolve) => {
const handler = (event: MessageEvent) => {
if (event.data.type === 'translations') {
this.worker.removeEventListener('message', handler);
resolve(event.data.payload);
}
};
this.worker.addEventListener('message', handler);
this.worker.postMessage({
type: 'translate',
payload: {
texts,
from: this.getLanguageCode(from),
to: this.getLanguageCode(to),
},
});
});
}
private async waitForReady(): Promise<void> {
while (!this.ready) {
await new Promise(resolve => setTimeout(resolve, 100));
}
}
}
Pros:
Cons:
Implement a tiered translation system with automatic fallbacks:
Tier 1: Opus-MT (WebGPU) → Fast, local, common pairs
↓ (if pair not available)
Tier 2: NLLB-200 (WebGPU) → Slower, local, all languages
↓ (if WebGPU not available)
Tier 3: NLLB-200 (WASM) → Slow, local, fallback
↓ (if user prefers cloud)
Tier 4: OpenAI/Google API → Fast, high-quality, requires API key
Phase 1: NLLB-200 with WASM (baseline)
Phase 2: Add WebGPU support
Phase 3: Add Opus-MT for common pairs
Phase 4: Add Web Worker
interface LocalTranslationConfig {
// Auto-detect best option
mode: 'auto' | 'opus-mt' | 'nllb-200' | 'hybrid';
// Backend preference
backend: 'auto' | 'webgpu' | 'wasm';
// Model variant
model: 'fast' | 'balanced' | 'quality';
// fast = Opus-MT (300MB)
// balanced = NLLB-200-600M (600MB)
// quality = NLLB-200-1.3B (1.3GB)
// Download behavior
downloadOnLoad: boolean; // true = immediate, false = on-demand
// Performance
useWebWorker: boolean;
maxBatchSize: number;
}
const DEFAULT_CONFIG: LocalTranslationConfig = {
mode: 'hybrid', // Opus-MT + NLLB fallback
backend: 'auto', // WebGPU if available
model: 'balanced', // NLLB-200-600M
downloadOnLoad: false, // Don't block initial load
useWebWorker: true, // Non-blocking
maxBatchSize: 10, // Translate 10 sentences at once
};
// Don't initialize until user enables translation
if (config.translation?.enabled) {
// Show "Preparing translation..." message
await translationModule.initialize();
}
// Cache translations to avoid re-translating
const cache = new Map<string, string>();
async translate(text: string, from: Language, to: Language): Promise<string> {
const key = `${from}-${to}-${hash(text)}`;
if (cache.has(key)) {
return cache.get(key)!;
}
const translation = await this.translator(text, { src_lang: from, tgt_lang: to });
cache.set(key, translation);
return translation;
}
// Translate multiple sentences in one call
const sentences = ['Hello', 'How are you?', 'Goodbye'];
const results = await Promise.all(
sentences.map(s => translator(s, { src_lang: 'eng_Latn', tgt_lang: 'nld_Latn' }))
);
// Translate visible sentences first
async translateDocument() {
const visibleSentences = this.getVisibleSentences();
const invisibleSentences = this.getInvisibleSentences();
// Translate visible immediately
await this.translateBatch(visibleSentences);
this.render();
// Translate invisible in background
await this.translateBatch(invisibleSentences);
this.render();
}
// Use quantized models for faster inference
const translator = await pipeline(
'translation',
'Xenova/nllb-200-distilled-600M',
{
device: 'webgpu',
dtype: 'q8', // 8-bit quantization (smaller, faster, slight quality loss)
}
);
Setup: EN → NL, Modern browser (Chrome), WebGPU
| Model | Total Time | User Experience |
|---|---|---|
| NLLB-200 (WebGPU) | 50-75 seconds | Good (progressive) |
| NLLB-200 (WASM) | 500-750 seconds | Poor (too slow) |
| OpenAI GPT-4 (API) | 10-15 seconds | Excellent |
Setup: FR → EN, MacBook Pro M1, WebGPU
| Model | Total Time | User Experience |
|---|---|---|
| Opus-MT (WebGPU) | 2-3 seconds | Excellent |
| NLLB-200 (WebGPU) | 5-8 seconds | Good |
| NLLB-200 (WASM) | 50-75 seconds | Fair |
Setup: User types sentence, auto-translate to target
| Model | Delay | User Experience |
|---|---|---|
| Opus-MT (WebGPU) | <0.5s | Feels instant |
| NLLB-200 (WebGPU) | 1-2s | Acceptable |
| NLLB-200 (WASM) | 5-8s | Frustrating |
| OpenAI API | 0.5-1s | Excellent |
Recommended: Hybrid Opus-MT + NLLB-200 (WebGPU preferred)
Rationale:
First-time user:
1. User clicks "Enable Translation"
2. Show: "Downloading translation model... (600MB)"
3. Progress bar: 0% → 100%
4. Show: "Preparing translator... (5s)"
5. Show: "Ready to translate!"
6. Translation now available instantly (cached)
Returning user:
1. User clicks "Enable Translation"
2. Show: "Loading translator... (1s)"
3. Show: "Ready to translate!"
4. Translation available instantly
During translation:
1. User clicks "Translate Document"
2. Show progress: "Translating... 15/150 sentences"
3. Progressive rendering (show sentences as translated)
4. Completion: "Translation complete! (45 seconds)"
End of Local Translation Options Analysis