GitHub - agentem-ai/izwi-audio: Inference for hugginface audio models

GitHub
Izwi is a high-performance, Rust-based TTS inference engine optimized for Qwen3-TTS on Apple Silicon using MLX.

Summary

Izwi is a Rust-based, high-performance text-to-speech (TTS) inference engine specifically designed for Qwen3-TTS models running on Apple Silicon (M1+) by leveraging MLX for unified memory and Metal GPU acceleration. Key features include ultra-low-latency streaming, direct model management via a React-based UI, and OpenAI-compatible REST API endpoints. It supports various Qwen3-TTS models for base speech generation and custom voice cloning using reference audio, as well as Qwen3-ASR models for speech-to-text transcription. Deployment is supported via Docker or native installation on macOS/Linux, with detailed quick start guides provided for both production and development environments.

(Source:GitHub)