AI Engineer (Speech & Voice Systems)

Jobs
IYKYKnow.ai

IYKYKnow.ai

-

🌎 Remote

Posted on: 3 September, 2025

AI Engineer (Speech & Voice Systems)

About the Role

We are hiring an AI Engineer specializing in Speech & Voice Systems to own the design and optimization of speech recognition, text-to-speech, and wake-word detection pipelines for a next-generation consumer AI product. The role focuses on delivering natural, low-latency voice interactions that work reliably on constrained hardware (edge devices) and scale seamlessly with cloud infrastructure. Responsibilities

  • Develop and optimize speech-to-text (STT) systems (e.g., Whisper, Vosk) for low-latency recognition.
  • Implement and enhance text-to-speech (TTS) systems (Coqui, Piper, VITS), including multi-voice support and style variations.
  • Prototype and refine wake-word detection and integrate noise suppression, VAD, AGC, and audio normalization for robust performance.
  • Apply model optimization techniques (quantization, ONNX, CTranslate2, GGUF/ggml) for offline/CPU-first inference.
  • Design caching, streaming, and batching strategies to meet real-time performance targets (<2s response).
  • Collaborate with backend engineers to expose APIs (/stt, /tts, /wakeword) for integration into apps.
  • Monitor performance via observability dashboards (Prometheus/Grafana, OpenTelemetry).
  • Ensure privacy-first design: local-first processing with optional cloud fallback. Qualifications
  • 5+ years of professional experience in applied AI, with at least 3 years focused on speech technologies.
  • Proven experience building production-grade STT/TTS systems.
  • Strong knowledge of audio/DSP fundamentals: resampling, denoising, VAD, loudness normalization.
  • Proficiency in Python (PyTorch); experience with FastAPI/Docker for model serving.
  • Familiarity with wake-word frameworks (Porcupine, Snowboy) and streaming audio integration.
  • Track record of delivering low-latency speech systems optimized for edge devices. Nice to Have
  • Experience with multilingual voice systems.
  • Familiarity with real-time streaming architectures (WebRTC, gRPC).
  • Exposure to IoT/edge deployment.

Tags:
ai
ml
Share the job:

Related Jobs