On-Device Text-to-Speech SDK for Embedded, Offline, and Hybrid Deployments | Switchboard Audio SDK

On-Device Text-to-Speech SDK for Embedded, Offline, and Hybrid Deployments

Switchboard is the audio SDK that runs speech synthesis directly on device, with no internet connection required at runtime. Cloud TTS is available as a bring-your-own-provider option.

Most TTS SDKs are cloud services with a mobile wrapper. Every audio failure in those systems traces back to a network problem the SDK itself cannot solve.

Get the SDK   Read the docs

What "on-device TTS" actually means

Cloud-first TTS routes every synthesis request through a remote API: your text goes out, encoded audio comes back, and your app plays it. In a tunnel, on an aircraft, in a warehouse with spotty Wi-Fi, or on a device with no data plan, the request fails and the audio stops.

On-device speech synthesis runs a voice model entirely on the local processor. On iOS and macOS, Switchboard uses CoreML. On Android, CPU execution produces the most efficient result. Desktop targets support both CoreML and CUDA.

Hybrid edge-cloud TTS: the Switchboard architecture

Switchboard supports both on-device and cloud TTS, and the decision of which to use is yours. Many deployments use both: on-device for latency-sensitive or offline-capable paths, cloud for voices or quality tiers that require it.

Switchboard provides the on-device execution layer. Audio output streams before synthesis completes regardless of which path handles the request.

Who this is for

The clearest fit is applications that cannot guarantee connectivity at runtime. Mobile software in field-service, navigation, and enterprise contexts has workflows that continue through dead zones that a cloud-only TTS SDK would simply silence.

High-volume workloads have a different motivation. Cloud TTS pricing compounds at scale, and applications that synthesize large volumes of routine output benefit from keeping that work local. Switchboard's MAU-based licensing ties costs to user count rather than output volume.

Switchboard is also a practical foundation for any team that wants to avoid coupling their application to a single cloud vendor. Switching providers is a configuration change.

How Switchboard compares

The most frequently evaluated alternatives in the hybrid edge-cloud TTS space are Cartesia and ElevenLabs. Both are cloud-only services: synthesis runs on their infrastructure, requires a network connection, and is billed per character.

Switchboard's differentiator is the on-device execution layer, with cloud synthesis integrated through your own provider.

SDK features

Audio output begins streaming before synthesis completes, which keeps perceptible latency low for longer utterances. Cloud integration is bring-your-own: you configure your preferred TTS provider endpoint and define when your application routes to it.

Native SDK packages are available for iOS and Android. React Native is supported, and the EdgeSpeech demo on GitHub provides a working reference implementation. Flutter integration is also documented.

Integration

Working examples across platforms are maintained in the Switchboard public repositories on GitHub. Full documentation and integration guides are available at docs.switchboard.audio.

Pricing

Switchboard uses MAU-based licensing rather than per-character billing. The free tier covers up to 10K MAUs with no credit card required. Growth pricing is $100/month per 10K MAUs. Offline apps, hardware deployments, and other specialized configurations are handled under the Custom plan.

Full pricing details, including flexible models for teams that do not measure MAUs, are at switchboard.audio/pricing.

Common questions

Can I use on-device synthesis for some requests and cloud synthesis for others? Yes. Your application controls routing, so you can direct specific voices or use cases to cloud while keeping routine synthesis local.

Which cloud TTS providers work with Switchboard? Any cloud TTS service your application can call works alongside Switchboard's on-device layer.

Is on-device synthesis quality comparable to cloud? Switchboard's on-device voices use neural TTS models rather than concatenative synthesis, and in most listening contexts they are perceptually indistinguishable from cloud output.

What licensing applies to offline and hardware deployments? These fall under the Custom plan. Contact the team to discuss your deployment.

Get started

Start building   Read the docs   Talk to the team

Want to discuss your project?