🎙 TTS Pruebak

tts-pruebak.musikak.com

🎤 Comparativa: Clonación de voz

Voz clonada desde el audio de Lazkao Txiki usando 3 modelos diferentes · Mismo texto en cada modelo

🔷 REFERENCIA ORIGINAL
Lazkao Txiki (original) Referencia
0.0s 5168 KB Euskera · Bertsos
🔶 F5-TTS (flow-matching, ~24x RTF)
F5-TTS · Texto general 19.3s
19.3s 904 KB 24kHz
Texto

"Welcome to this test. This is a voice cloned from Lazkao Txiki using F5 TTS. The quick brown fox jumps over the lazy dog."

F5-TTS · Texto financiero 25.6s
25.6s 1198 KB 24kHz
Texto

"Numbers are handled naturally here. Five point two million dollars and forty two percent. This is how this voice sounds when reading financial data in English."

🔶 MOSS-TTS-Nano (Audio Tokenizer + LLM, 0.1B)
Nano · Texto general 100M
0.0s 2910 KB 48kHz est.
Texto

"Welcome to this test. This is a voice cloned from Lazkao Txiki using MOSS TTS Nano. The quick brown fox jumps over the lazy dog."

Nano · Texto financiero 100M
0.0s 3300 KB 48kHz est.
Texto

"Numbers are handled naturally here. Five point two million dollars and forty two percent. This is how this voice sounds when reading financial data in English."

🔶 MOSS-TTS-Realtime (1.7B/2.33B, LLM + Codec)
Realtime · Texto general 2.33B
0.0s 1530 KB 48kHz estéreo
Texto

"Welcome to this test. This is a voice cloned from Lazkao Txiki using MOSS TTS Realtime model. The quick brown fox jumps over the lazy dog."

Realtime · Texto financiero 2.33B
0.0s 1530 KB 48kHz estéreo
Texto

"Numbers are handled naturally here. Five point two million dollars and forty two percent. This is how this voice sounds when reading financial data in English."

📊 Comparativa de modelos

Modelo Params Sample rate Canales RTF (CPU) Clonación Idiomas
F5-TTS ~335M 24 kHz Mono ~24x ✅ Zero-shot Cualquiera
MOSS-TTS-Nano 100M 48 kHz Estéreo ~8x ✅ Zero-shot 20
MOSS-TTS-Realtime 2.33B 48 kHz Estéreo ~15x ✅ Zero-shot 20