ElevenLabs Integration¶
Official ElevenLabs API Documentation
ElevenLabs is not just “voice mode.” Through KeyPool you can build production speech products against the official ElevenLabs SDK/API while using a KeyPool team token and a stable service endpoint.
Use this page when building real applications: low-latency text-to-speech, speech-to-text, voice conversion, sound effects, voice discovery, usage accounting, and agent/conversation APIs.
Base URL and Auth¶
Use your KeyPool team token as the SDK API key and point the official SDK at the ElevenLabs KeyPool base URL:
KEYPOOL_BASE_URL=https://keypool.example.com
ELEVENLABS_BASE_URL=$KEYPOOL_BASE_URL/v1/elevenlabs
Official SDKs append ElevenLabs' own /v1/... paths. For example, baseUrl=https://your-keypool.example.com/v1/elevenlabs plus the SDK's /v1/speech-to-text becomes:
https://your-keypool.example.com/v1/elevenlabs/v1/speech-to-text
For raw HTTP, both forms below are accepted by KeyPool:
/v1/elevenlabs/v1/speech-to-text # exact provider path, preferred
/v1/elevenlabs/speech-to-text # convenience form; KeyPool adds /v1
For ElevenLabs SDKs, set apiKey to your KeyPool team token. The SDK will send the expected ElevenLabs-style auth header to KeyPool.
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
export const elevenlabs = new ElevenLabsClient({
apiKey: process.env.KEYPOOL_TOKEN!,
baseUrl: `${process.env.KEYPOOL_BASE_URL}/v1/elevenlabs`,
});
from elevenlabs.client import ElevenLabs
client = ElevenLabs(
api_key=os.environ["KEYPOOL_TOKEN"],
base_url=f"{os.environ['KEYPOOL_BASE_URL']}/v1/elevenlabs",
)
Speech-to-Text Availability¶
Speech-to-text availability can differ from general ElevenLabs account or voice endpoints. If speech-to-text is unavailable through your KeyPool workspace, client code should treat 503 NO_AVAILABLE_KEYS as temporary STT capacity exhaustion rather than a bad KeyPool token.
For repeated 503 responses on STT, retry later or ask your KeyPool administrator whether STT is enabled for your workspace.
Choose the Right ElevenLabs API¶
| Use case | Endpoint family | SDK resource | Notes |
|---|---|---|---|
| Fast voice generation | POST /v1/text-to-speech/{voice_id} |
client.textToSpeech.convert() / client.text_to_speech.convert() |
Returns audio stream. Use output_format for telephony or media pipelines. |
| Low-latency voice | POST /v1/text-to-speech/{voice_id}/stream |
client.textToSpeech.stream() |
Use optimize_streaming_latency for interactive UX. |
| Word/character alignment | text-to-speech with timestamps | convertWithTimestamps |
Useful for karaoke captions, lip sync, subtitles. |
| Speech transcription | POST /v1/speech-to-text |
speechToText.convert() |
Requires model_id (scribe_v2 or scribe_v1) and file or cloud_storage_url. |
| Voice changer | POST /v1/speech-to-speech/{voice_id} |
speechToSpeech.convert() |
Preserves timing/emotion from input audio. |
| Sound effects | POST /v1/sound-generation |
textToSoundEffects.convert() |
Great for games, videos, notification sounds. |
| Voice selection | /v1/voices/search, /v1/voices/{voice_id} |
voices.search(), voices.get() |
Search by name, labels, category, voice type. |
| Account/quota | /v1/user |
user.get() |
Inspect subscription character count/limit. |
| Conversational AI | /v1/convai/... |
conversationalAi.* |
Agent setup, signed URLs, WebRTC tokens, conversation history. |
Text-to-Speech: Production Pattern¶
Pick a stable voice ID, request a practical output format, and consume the returned stream. For phone/RTC flows, use ulaw_8000, alaw_8000, pcm_16000, or Opus formats. For web playback, use MP3.
TypeScript¶
import { writeFile } from "node:fs/promises";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
const client = new ElevenLabsClient({
apiKey: process.env.KEYPOOL_TOKEN!,
baseUrl: `${process.env.KEYPOOL_BASE_URL}/v1/elevenlabs`,
});
async function streamToBuffer(stream: ReadableStream<Uint8Array>) {
const reader = stream.getReader();
const chunks: Uint8Array[] = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
return Buffer.concat(chunks);
}
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
text: "ElevenLabs request through KeyPool.",
modelId: "eleven_multilingual_v2",
outputFormat: "mp3_44100_128",
voiceSettings: {
stability: 0.45,
similarityBoost: 0.8,
style: 0.2,
useSpeakerBoost: true,
},
// Optional: deterministic-ish generation for repeatable tests.
seed: 1234,
});
await writeFile("speech.mp3", await streamToBuffer(audio));
Python¶
import os
from pathlib import Path
from elevenlabs.client import ElevenLabs
client = ElevenLabs(
api_key=os.environ["KEYPOOL_TOKEN"],
base_url=f"{os.environ['KEYPOOL_BASE_URL']}/v1/elevenlabs",
)
audio_iter = client.text_to_speech.convert(
"JBFqnCBsd6RMkjVDRZzb",
text="ElevenLabs request through KeyPool.",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
voice_settings={
"stability": 0.45,
"similarity_boost": 0.8,
"style": 0.2,
"use_speaker_boost": True,
},
seed=1234,
)
Path("speech.mp3").write_bytes(b"".join(audio_iter))
Low-Latency Streaming TTS¶
For voice assistants or live UX, stream audio instead of waiting for a full file. The optimize_streaming_latency query option trades quality for latency:
0: default quality1-3: progressively lower latency4: lowest latency; can mispronounce numbers/dates because text normalization is reduced
const stream = await client.textToSpeech.stream("JBFqnCBsd6RMkjVDRZzb", {
text: "This chunk can start playing before the full response is ready.",
modelId: "eleven_flash_v2_5",
outputFormat: "mp3_44100_128",
optimizeStreamingLatency: 2,
});
// Pipe `stream` to your HTTP response, Web Audio pipeline, or a file.
Speech-to-Text with Scribe¶
Official API requirements:
- Endpoint:
POST /v1/speech-to-text - Multipart form body
- Required:
model_id=scribe_v2orscribe_v1 - Required input: exactly one of
fileorcloud_storage_url - Minimum audio length: 100 ms
file_format="pcm_s16le_16"can reduce latency when sending raw 16kHz mono PCM; use WAV/MP3/etc. asother/default.
TypeScript¶
const file = new Blob([await Bun.file("meeting.wav").arrayBuffer()], { type: "audio/wav" });
const transcript = await client.speechToText.convert({
file,
modelId: "scribe_v2",
languageCode: "en", // optional; omit for auto-detect
tagAudioEvents: true,
diarize: true,
timestampsGranularity: "word",
});
console.log(transcript.text);
Python¶
with open("meeting.wav", "rb") as audio:
transcript = client.speech_to_text.convert(
file=audio,
model_id="scribe_v2",
language_code="en",
tag_audio_events=True,
diarize=True,
timestamps_granularity="word",
)
print(transcript.text)
For larger files, prefer cloud_storage_url so the client does not upload through your application server.
Voice Discovery and Selection¶
Do not hardcode demo voices forever. Search by category, labels, language, or voice type, then store selected voice_id in your product configuration.
const voices = await client.voices.search({
pageSize: 10,
voiceType: "default",
category: "premade",
search: "narration",
includeTotalCount: false,
});
for (const voice of voices.voices ?? []) {
console.log(voice.name, voice.voiceId, voice.labels, voice.previewUrl);
}
voices = client.voices.search(
page_size=10,
voice_type="default",
category="premade",
search="narration",
include_total_count=False,
)
for voice in voices.voices or []:
print(voice.name, voice.voice_id, voice.labels, voice.preview_url)
Voice Changer / Speech-to-Speech¶
Voice changer keeps timing and emotion from input audio while changing the speaker voice.
const input = new Blob([await Bun.file("source.wav").arrayBuffer()], { type: "audio/wav" });
const changed = await client.speechToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
audio: input,
modelId: "eleven_english_sts_v2",
outputFormat: "mp3_44100_128",
removeBackgroundNoise: true,
});
with open("source.wav", "rb") as audio:
changed = client.speech_to_speech.convert(
"JBFqnCBsd6RMkjVDRZzb",
audio=audio,
model_id="eleven_english_sts_v2",
output_format="mp3_44100_128",
remove_background_noise=True,
)
Sound Effects¶
Sound generation is useful for product UX, games, video editing, and synthetic datasets.
const sfx = await client.textToSoundEffects.convert({
text: "A short soft confirmation chime, friendly SaaS product UI",
durationSeconds: 1.2,
promptInfluence: 0.45,
outputFormat: "mp3_44100_128",
});
sfx = client.text_to_sound_effects.convert(
text="A short soft confirmation chime, friendly SaaS product UI",
duration_seconds=1.2,
prompt_influence=0.45,
output_format="mp3_44100_128",
)
Account and Quota Introspection¶
const user = await client.user.get();
const sub = user.subscription;
console.log({
tier: sub.tier,
used: sub.characterCount,
limit: sub.characterLimit,
remaining: sub.characterLimit - sub.characterCount,
});
user = client.user.get()
sub = user.subscription
print({
"tier": sub.tier,
"used": sub.character_count,
"limit": sub.character_limit,
"remaining": sub.character_limit - sub.character_count,
})
Error Handling¶
| Status/code | Meaning | Recommended handling |
|---|---|---|
401 AUTH_* from KeyPool |
Missing/invalid team token | Fix KeyPool token; do not retry with same token. |
403 SERVICE_FORBIDDEN |
Token policy does not allow ElevenLabs | Ask admin to add ElevenLabs scope. |
429 QUOTA_EXCEEDED |
KeyPool team-token quota hit | Back off until quota window resets. |
503 NO_AVAILABLE_KEYS |
The requested ElevenLabs capability is temporarily unavailable through your KeyPool workspace | Retry later or switch feature path. |
Upstream 422 |
Request validation error | Fix payload (model_id, file, voice ID, enum names). |
Upstream 401 detected_unusual_activity |
ElevenLabs rejected the request | Retry later or ask your KeyPool administrator to check service availability. |
Development Checklist¶
- Use official ElevenLabs SDKs, not generic AI SDK wrappers, when you need full ElevenLabs feature coverage.
- Configure
baseUrl/base_urlto KeyPool and use the KeyPool team token asapiKey. - Pick output formats deliberately;
ulaw_8000/alaw_8000for telephony,mp3_44100_128for web, PCM for low-latency pipelines. - For STT, use
scribe_v2first and includelanguage_codewhen known. - Use voice search at setup time; store
voice_id, not voice names. - Avoid sending secrets in browser code. If using Conversational AI signed URLs/WebRTC tokens, mint them server-side through KeyPool.
Interactive API Reference¶
For all ElevenLabs endpoints and in-browser testing, see API Reference → ElevenLabs.