Azure OpenAI's Realtime API supports WebRTC but there was no official SDK. Developers had to manually implement SDP negotiation, ephemeral token exchange, peer connections, audio stream management, and function calling loops — each a multi-day task.
The missing npm package for Azure OpenAI's Realtime API. Handles ephemeral tokens, SDP negotiation, WebRTC data channels, audio streams, and function calling — so you can build voice AI in minutes, not days.
Azure OpenAI's Realtime API supports WebRTC but there was no official SDK. Developers had to manually implement SDP negotiation, ephemeral token exchange, peer connections, audio stream management, and function calling loops — each a multi-day task.
Built azure-realtime-webrtc: a TypeScript SDK with zero runtime dependencies. Low-level RealtimeClient handles WebRTC/WebSocket with 32+ typed events. High-level SDK adds VoiceAssistant, TextChat, and ToolAgent. Server module provides Express middleware so API keys never reach the browser.
Your server calls Azure for an ephemeral token. API key stays server-side. createRealtimeMiddleware() sets this up in one call.
Browser fetches ephemeral token, creates RTCPeerConnection, sends SDP offer to Azure. Azure returns SDP answer.
WebRTC media track carries audio both ways. Data channel carries all JSON events as typed TypeScript events.
registerTool() wires a handler to a tool definition. SDK executes the handler and sends the result back automatically.
SDP negotiation with ephemeral tokens is the hardest part — the token has a short TTL and the WebRTC handshake must complete before it expires.
Zero runtime dependencies is a hard constraint that forces better design — you implement exactly what the use-case needs, nothing more.
Audio-synced transcript (buffering text and dripping it at speech pace) makes voice AI feel dramatically more natural.
Framework guides (React, Next.js, Vue, Angular, Vanilla JS) are as important as the API itself.
Full strict types — 32+ server events, 11 client events, all SDK classes
Browser-to-Azure peer connection — audio media + JSON data channel
Alternative connection mode for server-side / Node.js usage
Token server middleware — rate limiting, CORS, SDP proxy
Tree-shakeable ESM + CJS build — four entry points