GitHub - agentem-ai/izwi-audio: Inference for hugginface audio models
Summary
Izwi is a Rust-based, high-performance text-to-speech (TTS) inference engine specifically designed for Qwen3-TTS models running on Apple Silicon (M1+) by leveraging MLX for unified memory and Metal GPU acceleration. Key features include ultra-low-latency streaming, direct model management via a React-based UI, and OpenAI-compatible REST API endpoints. It supports various Qwen3-TTS models for base speech generation and custom voice cloning using reference audio, as well as Qwen3-ASR models for speech-to-text transcription. Deployment is supported via Docker or native installation on macOS/Linux, with detailed quick start guides provided for both production and development environments.
(Source:GitHub)