Scitex-notification – Give AI agents a voice: TTS, phone calls, SMS
Seven unanswered audio alerts trigger a phone call — works through iPhone Silent Mode.
An open-source AI Voice Agent that integrates with Asterisk/FreePBX using Audiosocket/RTP technology
Adds AI voice to legacy Asterisk systems without ripping out existing telephony.
Businesses with legacy phone systems
Retell AI · Vapi · Bland AI
My repo was shared here once before by someone else so I wanted to follow up with the progress since then.
https://news.ycombinator.com/item?id=46380399
I've been working with Asterisk/FreePBX systems for years. I wanted to add AI voice capabilities to legacy phone systems without paying per-minute SaaS fees or ripping out the entire telephony stack.
So I built AVA, a self-hosted AI voice agent that can integrate into any traditional phone system. While most solutions demand expensive migrations to cloud-only providers, AVA provides a self-hosted path to connect AI agents to existing phone systems while ensuring data privacy and lowering operational costs
AVA is a Dockerized Python app that sits alongside your Asterisk server. It connects via ARI (Asterisk REST Interface) and routes call audio to AI providers — OpenAI Realtime, Deepgram, Google Live API, ElevenLabs, Telnyx, or fully local models (Vosk + llama.cpp + Piper). You can mix and match STT/LLM/TTS in a modular pipeline, or use a single provider end-to-end.
Two audio transport paths: We support both AudioSocket (low-latency TCP with TLV framing) and ExternalMedia RTP (UDP, better for NAT). A transport orchestrator auto-negotiates sample rates and codecs between what Asterisk sends on the wire and what each AI provider expects — so you can run 8kHz ulaw from Asterisk into a provider that wants 24kHz linear16 without manual config.
Session lifecycle: A typed session store tracks every call from StasisStart through hangup — audio diagnostics, barge-in counts, provider state, conversation turns. Every call is fully observable and debuggable after the fact.
Barge-in and VAD were the hardest problems. We use a dual-mode VAD — WebRTC VAD combined with energy-based RMS detection, scored into a single confidence value (40% WebRTC weight, 40% energy ratio, 20% agreement bonus). Frame smoothing prevents single-frame glitches from triggering false interrupts. When barge-in fires, we kill active playback (both streaming and file-based) via ARI, flush provider audio buffers, release conversation gating tokens, and optionally suppress provider output for a configurable window to prevent pre-barge audio from re-queuing. The system supports three interrupt sources: local VAD, Asterisk's native talk detection events, and provider-side interruption signals.
The hardest latency challenge was bridging legacy SIP/RTP with modern WebSocket streams. We use a two-container architecture: a lightweight orchestrator for ARI state management and an optional heavier container for local model inference. There are 6 pre-validated golden baseline configs if you just want something working out of the box, plus an Admin UI for visual setup.
Try the live demo: (925)-736-6718 Option 5 for Google, 6 for Deepgram, 7 for Openai realtime, 8 for Local hybrid and 9 for Elevenlabs
Code is MIT. I'd love feedback on the transport layer (src/core/transport_orchestrator.py) and the VAD tuning (src/core/vad_manager.py).
Seven unanswered audio alerts trigger a phone call — works through iPhone Silent Mode.
CTF-style flags for voice prompt injection make learning LLM security actually fun.
Voice agent that actually reads WhatsApp and controls Android—OpenClaw for your pocket.
Voice agent orchestration with no-code studio, but orchestrates off-the-shelf APIs like everyone else.
Runs real-time vision-keyed voice agents on a laptop CPU without custom silicon or training.
Polished product, but recruiter bots and warm intro networks already solve this.