Skip to content

ElevenLabs Integration

Official ElevenLabs API Documentation

ElevenLabs is not just “voice mode.” Through KeyPool you can build production speech products against the official ElevenLabs SDK/API while using a KeyPool team token and a stable service endpoint.

Use this page when building real applications: low-latency text-to-speech, speech-to-text, voice conversion, sound effects, voice discovery, usage accounting, and agent/conversation APIs.

Base URL and Auth

Use your KeyPool team token as the SDK API key and point the official SDK at the ElevenLabs KeyPool base URL:

KEYPOOL_BASE_URL=https://keypool.example.com
ELEVENLABS_BASE_URL=$KEYPOOL_BASE_URL/v1/elevenlabs

Official SDKs append ElevenLabs' own /v1/... paths. For example, baseUrl=https://your-keypool.example.com/v1/elevenlabs plus the SDK's /v1/speech-to-text becomes:

https://your-keypool.example.com/v1/elevenlabs/v1/speech-to-text

For raw HTTP, both forms below are accepted by KeyPool:

/v1/elevenlabs/v1/speech-to-text   # exact provider path, preferred
/v1/elevenlabs/speech-to-text      # convenience form; KeyPool adds /v1

For ElevenLabs SDKs, set apiKey to your KeyPool team token. The SDK will send the expected ElevenLabs-style auth header to KeyPool.

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

export const elevenlabs = new ElevenLabsClient({
  apiKey: process.env.KEYPOOL_TOKEN!,
  baseUrl: `${process.env.KEYPOOL_BASE_URL}/v1/elevenlabs`,
});
from elevenlabs.client import ElevenLabs

client = ElevenLabs(
    api_key=os.environ["KEYPOOL_TOKEN"],
    base_url=f"{os.environ['KEYPOOL_BASE_URL']}/v1/elevenlabs",
)

Speech-to-Text Availability

Speech-to-text availability can differ from general ElevenLabs account or voice endpoints. If speech-to-text is unavailable through your KeyPool workspace, client code should treat 503 NO_AVAILABLE_KEYS as temporary STT capacity exhaustion rather than a bad KeyPool token.

For repeated 503 responses on STT, retry later or ask your KeyPool administrator whether STT is enabled for your workspace.

Choose the Right ElevenLabs API

Use case Endpoint family SDK resource Notes
Fast voice generation POST /v1/text-to-speech/{voice_id} client.textToSpeech.convert() / client.text_to_speech.convert() Returns audio stream. Use output_format for telephony or media pipelines.
Low-latency voice POST /v1/text-to-speech/{voice_id}/stream client.textToSpeech.stream() Use optimize_streaming_latency for interactive UX.
Word/character alignment text-to-speech with timestamps convertWithTimestamps Useful for karaoke captions, lip sync, subtitles.
Speech transcription POST /v1/speech-to-text speechToText.convert() Requires model_id (scribe_v2 or scribe_v1) and file or cloud_storage_url.
Voice changer POST /v1/speech-to-speech/{voice_id} speechToSpeech.convert() Preserves timing/emotion from input audio.
Sound effects POST /v1/sound-generation textToSoundEffects.convert() Great for games, videos, notification sounds.
Voice selection /v1/voices/search, /v1/voices/{voice_id} voices.search(), voices.get() Search by name, labels, category, voice type.
Account/quota /v1/user user.get() Inspect subscription character count/limit.
Conversational AI /v1/convai/... conversationalAi.* Agent setup, signed URLs, WebRTC tokens, conversation history.

Text-to-Speech: Production Pattern

Pick a stable voice ID, request a practical output format, and consume the returned stream. For phone/RTC flows, use ulaw_8000, alaw_8000, pcm_16000, or Opus formats. For web playback, use MP3.

TypeScript

import { writeFile } from "node:fs/promises";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

const client = new ElevenLabsClient({
  apiKey: process.env.KEYPOOL_TOKEN!,
  baseUrl: `${process.env.KEYPOOL_BASE_URL}/v1/elevenlabs`,
});

async function streamToBuffer(stream: ReadableStream<Uint8Array>) {
  const reader = stream.getReader();
  const chunks: Uint8Array[] = [];
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    chunks.push(value);
  }
  return Buffer.concat(chunks);
}

const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
  text: "ElevenLabs request through KeyPool.",
  modelId: "eleven_multilingual_v2",
  outputFormat: "mp3_44100_128",
  voiceSettings: {
    stability: 0.45,
    similarityBoost: 0.8,
    style: 0.2,
    useSpeakerBoost: true,
  },
  // Optional: deterministic-ish generation for repeatable tests.
  seed: 1234,
});

await writeFile("speech.mp3", await streamToBuffer(audio));

Python

import os
from pathlib import Path
from elevenlabs.client import ElevenLabs

client = ElevenLabs(
    api_key=os.environ["KEYPOOL_TOKEN"],
    base_url=f"{os.environ['KEYPOOL_BASE_URL']}/v1/elevenlabs",
)

audio_iter = client.text_to_speech.convert(
    "JBFqnCBsd6RMkjVDRZzb",
    text="ElevenLabs request through KeyPool.",
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128",
    voice_settings={
        "stability": 0.45,
        "similarity_boost": 0.8,
        "style": 0.2,
        "use_speaker_boost": True,
    },
    seed=1234,
)

Path("speech.mp3").write_bytes(b"".join(audio_iter))

Low-Latency Streaming TTS

For voice assistants or live UX, stream audio instead of waiting for a full file. The optimize_streaming_latency query option trades quality for latency:

  • 0: default quality
  • 1-3: progressively lower latency
  • 4: lowest latency; can mispronounce numbers/dates because text normalization is reduced
const stream = await client.textToSpeech.stream("JBFqnCBsd6RMkjVDRZzb", {
  text: "This chunk can start playing before the full response is ready.",
  modelId: "eleven_flash_v2_5",
  outputFormat: "mp3_44100_128",
  optimizeStreamingLatency: 2,
});

// Pipe `stream` to your HTTP response, Web Audio pipeline, or a file.

Speech-to-Text with Scribe

Official API requirements:

  • Endpoint: POST /v1/speech-to-text
  • Multipart form body
  • Required: model_id = scribe_v2 or scribe_v1
  • Required input: exactly one of file or cloud_storage_url
  • Minimum audio length: 100 ms
  • file_format="pcm_s16le_16" can reduce latency when sending raw 16kHz mono PCM; use WAV/MP3/etc. as other/default.

TypeScript

const file = new Blob([await Bun.file("meeting.wav").arrayBuffer()], { type: "audio/wav" });

const transcript = await client.speechToText.convert({
  file,
  modelId: "scribe_v2",
  languageCode: "en",             // optional; omit for auto-detect
  tagAudioEvents: true,
  diarize: true,
  timestampsGranularity: "word",
});

console.log(transcript.text);

Python

with open("meeting.wav", "rb") as audio:
    transcript = client.speech_to_text.convert(
        file=audio,
        model_id="scribe_v2",
        language_code="en",
        tag_audio_events=True,
        diarize=True,
        timestamps_granularity="word",
    )

print(transcript.text)

For larger files, prefer cloud_storage_url so the client does not upload through your application server.

Voice Discovery and Selection

Do not hardcode demo voices forever. Search by category, labels, language, or voice type, then store selected voice_id in your product configuration.

const voices = await client.voices.search({
  pageSize: 10,
  voiceType: "default",
  category: "premade",
  search: "narration",
  includeTotalCount: false,
});

for (const voice of voices.voices ?? []) {
  console.log(voice.name, voice.voiceId, voice.labels, voice.previewUrl);
}
voices = client.voices.search(
    page_size=10,
    voice_type="default",
    category="premade",
    search="narration",
    include_total_count=False,
)
for voice in voices.voices or []:
    print(voice.name, voice.voice_id, voice.labels, voice.preview_url)

Voice Changer / Speech-to-Speech

Voice changer keeps timing and emotion from input audio while changing the speaker voice.

const input = new Blob([await Bun.file("source.wav").arrayBuffer()], { type: "audio/wav" });
const changed = await client.speechToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
  audio: input,
  modelId: "eleven_english_sts_v2",
  outputFormat: "mp3_44100_128",
  removeBackgroundNoise: true,
});
with open("source.wav", "rb") as audio:
    changed = client.speech_to_speech.convert(
        "JBFqnCBsd6RMkjVDRZzb",
        audio=audio,
        model_id="eleven_english_sts_v2",
        output_format="mp3_44100_128",
        remove_background_noise=True,
    )

Sound Effects

Sound generation is useful for product UX, games, video editing, and synthetic datasets.

const sfx = await client.textToSoundEffects.convert({
  text: "A short soft confirmation chime, friendly SaaS product UI",
  durationSeconds: 1.2,
  promptInfluence: 0.45,
  outputFormat: "mp3_44100_128",
});
sfx = client.text_to_sound_effects.convert(
    text="A short soft confirmation chime, friendly SaaS product UI",
    duration_seconds=1.2,
    prompt_influence=0.45,
    output_format="mp3_44100_128",
)

Account and Quota Introspection

const user = await client.user.get();
const sub = user.subscription;
console.log({
  tier: sub.tier,
  used: sub.characterCount,
  limit: sub.characterLimit,
  remaining: sub.characterLimit - sub.characterCount,
});
user = client.user.get()
sub = user.subscription
print({
    "tier": sub.tier,
    "used": sub.character_count,
    "limit": sub.character_limit,
    "remaining": sub.character_limit - sub.character_count,
})

Error Handling

Status/code Meaning Recommended handling
401 AUTH_* from KeyPool Missing/invalid team token Fix KeyPool token; do not retry with same token.
403 SERVICE_FORBIDDEN Token policy does not allow ElevenLabs Ask admin to add ElevenLabs scope.
429 QUOTA_EXCEEDED KeyPool team-token quota hit Back off until quota window resets.
503 NO_AVAILABLE_KEYS The requested ElevenLabs capability is temporarily unavailable through your KeyPool workspace Retry later or switch feature path.
Upstream 422 Request validation error Fix payload (model_id, file, voice ID, enum names).
Upstream 401 detected_unusual_activity ElevenLabs rejected the request Retry later or ask your KeyPool administrator to check service availability.

Development Checklist

  • Use official ElevenLabs SDKs, not generic AI SDK wrappers, when you need full ElevenLabs feature coverage.
  • Configure baseUrl/base_url to KeyPool and use the KeyPool team token as apiKey.
  • Pick output formats deliberately; ulaw_8000/alaw_8000 for telephony, mp3_44100_128 for web, PCM for low-latency pipelines.
  • For STT, use scribe_v2 first and include language_code when known.
  • Use voice search at setup time; store voice_id, not voice names.
  • Avoid sending secrets in browser code. If using Conversational AI signed URLs/WebRTC tokens, mint them server-side through KeyPool.

Interactive API Reference

For all ElevenLabs endpoints and in-browser testing, see API Reference → ElevenLabs.