Picovoice Alternative: On-Device Speech Recognition with Switchboard SDK

If you're evaluating alternatives to Picovoice for on-device speech-to-text, this page covers how Switchboard SDK compares, what migration looks like on iOS and Android, and what you gain by switching.

Why Developers Look for a Picovoice Alternative

Picovoice solves a real problem: on-device voice processing without a cloud dependency. Porcupine handles wake words, Cheetah and Leopard handle streaming and file-based speech recognition, and Rhino handles intent extraction. On its own terms, the SDK is solid.

What developers typically run into:

Model inflexibility. Picovoice uses proprietary acoustic models. Customization requires going through Picovoice's console, and you're bound to their release cadence.
Pipeline isolation. Picovoice handles speech recognition as a standalone concern. Integrating it alongside noise suppression, voice effects, real-time communication, or music playback requires bespoke glue code that quickly becomes the hardest part of the project.
Pricing at scale. The free tier is limited, and the per-device or per-usage model creates unpredictable costs as your app grows.
No visual tooling. Building and debugging the audio pipeline is a code-only exercise.

Switchboard SDK is an audio pipeline SDK that takes a different approach: rather than providing isolated voice primitives, it gives you a modular, node-based audio graph where STT, VAD, effects, communication, and playback all live in the same composable system.

On-Device STT Parity

Switchboard delivers on-device speech recognition through the Whisper extension combined with the SileroVAD extension. The pipeline runs locally on-device with GPU acceleration on both iOS and Android, with no audio leaving the device.

The architecture works as follows: SileroVAD monitors the microphone input continuously, detects speech start and end boundaries, and triggers the Whisper STT node to transcribe only the segments that contain speech. Because Whisper only runs when SileroVAD detects speech, it avoids the false positives that plague threshold-only approaches.

Two model sizes are supported out of the box: ggml-tiny.en for lower latency and ggml-base.en for higher accuracy. Because Whisper is an open model, you are not locked into a proprietary model ecosystem.

You can review working implementations here:

Audio Transcription App for Android: real-time transcription with configurable VAD thresholds and silence duration, Kotlin/Compose
Voice Control App for iOS: SwiftUI app driven entirely by voice commands using Whisper STT + SileroVAD

The main gap to be aware of: Switchboard does not have a native equivalent to Porcupine's always-on wake word detection. If your use case requires a persistent, ultralow-power wakeword listener running before the main pipeline is active, that is worth evaluating carefully. For apps where voice is user-initiated (push-to-talk, tap-to-speak, or any UI-triggered flow), this gap does not apply.

Migration Guide

Conceptual Shift

With Picovoice, you configure individual recognizer objects (a wake word handle, an STT handle, an intent handle) and wire them together in application code. With Switchboard, you define an audio graph (a JSON configuration describing nodes and the connections between them), and your application code interacts with named nodes through events and actions. The Switchboard Editor lets you construct and validate that graph in a browser before writing any native code.

iOS Migration

The iOS SDK uses Swift and integrates via Swift Package Manager. The Whisper and SileroVAD extensions are initialized at app startup and loaded into the audio graph before the engine starts.

The Voice Control App for iOS demonstrates the full initialization and event subscription pattern for on-device STT. The repo includes the AudioGraph.json configuration alongside the Swift integration layer, so you can see exactly how the graph drives application logic without the two being entangled.

Key migration steps:

Remove Picovoice SDK dependencies and recognizer initialization.
Add the Switchboard SDK and Whisper and SileroVAD extensions.
Define your audio graph in JSON (or export it from the Switchboard Editor).
Initialize the extensions at app startup and start the engine with your graph configuration.
Subscribe to the transcription event on your STT node and route the output into your existing command-matching or intent logic.

If you were using Rhino for intent recognition, step 5 is where you plug in your existing intent layer. Switchboard delivers transcribed text, and your intent logic operates on that text independently.

Android Migration

The Android SDK targets Kotlin. The Audio Transcription App for Android covers the full integration, including the SwitchboardHandler pattern for keeping SDK interactions cleanly separated from the ViewModel and UI layers.

The Android example also includes real-time VAD configuration controls (adjustable threshold and silence duration), which is useful if you're coming from Picovoice and want to tune sensitivity to match your existing behaviour.

Key migration steps on Android follow the same pattern as iOS, with Kotlin idioms and the appropriate Android extension packages substituted.

Ready to Migrate?

The Switchboard Audio SDK documentation is the starting point for the full API reference and extension catalogue.

Contact hello@synervoz.com if you're migrating a production integration and want support.