We dropped Go for Rust in our real-time telephony AI media plane

Name: We dropped Go for Rust in our real-time telephony AI media plane
Availability: InStock
Author: bajpailabs

by bajpailabs·May 21, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryBold Bet

Go's GC pauses break voice AI conversations — Rust's deterministic latency actually solves this.

Strengths

•20ms PCM frame processing with zero GC jitter is legitimately hard engineering
•Sub-500ms SLA across ASR, LLM, and TTS pipeline shows real production constraints
•Whitepaper details actual latency budgets, not just marketing claims about 'real-time'

Weaknesses

•Whitepaper-only submission with no working demo or code repository visible
•Hospitality booking niche may limit broader adoption beyond call centers

Post Description

In building Vivik, an execution-grade telephony AI engine, we faced a brutal constraint: the human conversational loop.

In psychoacoustics, a delay under 250 ms feels instantaneous. At 500 ms, users notice lag. Beyond 800 ms, conversations start feeling strained, and by 1.5 seconds, the illusion of real-time interaction collapses.

That creates an extremely tight latency budget for voice AI:

• Network RTT: 50–200 ms • LLM inference: 200–800 ms • TTS synthesis: 100–400 ms • ASR processing: 100–300 ms

To consistently stay under a sub-500 ms SLA, the orchestration and media layers themselves must add almost no overhead.

We initially built the entire system in Go. It worked well for concurrency and distributed orchestration, but under production-scale load, we hit an architectural wall: non-deterministic GC tail latency.

The Media Plane processes raw PCM audio in strict 20 ms frames. Even tiny scheduling delays create audible jitter, packet drift, and conversational instability.

Under a 25,000 RPS stress test:

• Go implementation → P99 latency: 1,550 ms • Rust (Tokio) implementation → P99 latency: 310 ms

The issue wasn’t average latency. It was the tail.

Even highly optimized GC pauses become catastrophic in real-time telephony. A tiny scheduler interruption under heavy throughput creates queue backpressure that cascades across live audio streams. In practice, a 1.5-second spike means the system goes silent mid-sentence.

We solved this by separating the architecture into two isolated worlds:

1. Control Plane (Go + NATS) Handles orchestration, routing, distributed state, and API coordination. Managed GC is acceptable here because it never touches live media streams.

2. Media Plane (Rust) Handles resampling, low-pass filtering, VAD, and packet-level audio processing with deterministic memory behavior.

Rust’s ownership model eliminates the need for a background garbage collector entirely. Allocation and deallocation are resolved at compile time, allowing the Media Plane to maintain a flat latency profile even under sustained throughput.

We also eliminated traditional synchronization primitives.

Mutexes in real-time audio systems introduce priority inversion risks that immediately surface as glitches or packet jitter. Instead, the engine relies on fully lock-free communication patterns:

• SPSC ring buffers for PCM transfer between socket and DSP threads • Michael-Scott queues using atomic CAS operations for multi-producer coordination

Rust’s SIMD support additionally allowed us to leverage AVX-512 and ARM NEON instructions to process multiple audio samples per instruction cycle, significantly increasing call density per CPU core.

The takeaway: managed runtimes are exceptional for distributed systems and asynchronous I/O. But once your workload crosses into hard real-time media constraints and human perceptual boundaries, averages stop mattering. Tail latency becomes the entire system.

By separating orchestration from deterministic signal processing, we reduced P99 latency from 1,550 ms to a stable 310 ms under load.

Our full engineering breakdowns, including the mathematical foundations behind our O(n) dual-gate VAD signal logic, are detailed in the Vivik whitepaper:

https://vivik.bajpailabs.com/whitepaper

Would love to hear how others are approaching real-time media constraints alongside LLM execution boundaries.