§ 00 — LOADING STACK
◆
LangGraphLangGraph
Azure OpenAIAzure OpenAI
QdrantQdrant
Arize PhoenixArize Phoenix
LLangfuse
MMastra
Next.jsNext.js
SSupabase
LANGGRAPH ◆ AZURE ◆ QDRANT ◆ ARIZE ◆ LANGFUSE ◆ MASTRA ◆ NEXT.JS ◆ SUPABASE ◆ LANGGRAPH ◆ AZURE ◆ QDRANT ◆ ARIZE ◆ LANGFUSE ◆ MASTRA ◆ NEXT.JS ◆ SUPABASE
KKomal Vardhan.
HomeWorkAboutWritingResourcesContact
HomeWorkWritingResourcesAboutContact
Build like an engineer. Teach like a friend.

© 2026 Komal Vardhan Lolugu

Sitemap
  • Home
  • Work
  • About
  • Writing
  • Contact
  • Resources
Elsewhere
  • LinkedIn · 3.5K
  • Medium · Writing
  • Instagram
  • GitHub
  • Topmate
Newsletter

A field note every other Sunday. No hype, no AI spam. Unsubscribe anytime.

Designed & built by Komal. Made in India.
← All work
2025 · Developer toolsnpm · v0.2.1Author & Maintainer

azure-realtime-webrtc

The missing npm package for Azure OpenAI's Realtime API. Handles ephemeral tokens, SDP negotiation, WebRTC data channels, audio streams, and function calling — so you can build voice AI in minutes, not days.

View live ↗GitHub →
0Runtime dependencies — pure TypeScript
32+Typed server events
3High-level SDK classes: VoiceAssistant, TextChat, ToolAgent
5 linesTo a working voice assistant
§ 01

The Problem

Azure OpenAI's Realtime API supports WebRTC but there was no official SDK. Developers had to manually implement SDP negotiation, ephemeral token exchange, peer connections, audio stream management, and function calling loops — each a multi-day task.

§ 02

The Solution

Built azure-realtime-webrtc: a TypeScript SDK with zero runtime dependencies. Low-level RealtimeClient handles WebRTC/WebSocket with 32+ typed events. High-level SDK adds VoiceAssistant, TextChat, and ToolAgent. Server module provides Express middleware so API keys never reach the browser.

§ 02b

How it works

01
Token server

Your server calls Azure for an ephemeral token. API key stays server-side. createRealtimeMiddleware() sets this up in one call.

02
SDP negotiation

Browser fetches ephemeral token, creates RTCPeerConnection, sends SDP offer to Azure. Azure returns SDP answer.

03
Bidirectional streams

WebRTC media track carries audio both ways. Data channel carries all JSON events as typed TypeScript events.

04
Function calling loop

registerTool() wires a handler to a tool definition. SDK executes the handler and sends the result back automatically.

§ 03

What I Learnt

  • 01

    SDP negotiation with ephemeral tokens is the hardest part — the token has a short TTL and the WebRTC handshake must complete before it expires.

  • 02

    Zero runtime dependencies is a hard constraint that forces better design — you implement exactly what the use-case needs, nothing more.

  • 03

    Audio-synced transcript (buffering text and dripping it at speech pace) makes voice AI feel dramatically more natural.

  • 04

    Framework guides (React, Next.js, Vue, Angular, Vanilla JS) are as important as the API itself.

§ 04

Technologies Used

TypeScriptTypeScript

Full strict types — 32+ server events, 11 client events, all SDK classes

WebRTCWebRTC

Browser-to-Azure peer connection — audio media + JSON data channel

WebSocketWebSocket

Alternative connection mode for server-side / Node.js usage

Express (optional peer dep)Express (optional peer dep)

Token server middleware — rate limiting, CORS, SDP proxy

RollupRollup

Tree-shakeable ESM + CJS build — four entry points

TypeScriptTypeScript
WebRTCWebRTC
WebSocketWebSocket
Express (optional peer dep)Express (optional peer dep)
RollupRollup
← All workWork together ↗
← PreviousAI Dev LensNext →az-realtime-webrtc (Python)