ElevenLabs Integration¶

Official ElevenLabs API Documentation

ElevenLabs is not just “voice mode.” Through KeyPool you can build production speech products against the official ElevenLabs SDK/API while using a KeyPool team token and a stable service endpoint.

Use this page when building real applications: low-latency text-to-speech, speech-to-text, voice conversion, sound effects, voice discovery, usage accounting, and agent/conversation APIs.

Base URL and Auth¶

Use your KeyPool team token as the SDK API key and point the official SDK at the ElevenLabs KeyPool base URL:

KEYPOOL_BASE_URL=https://keypool.example.com
ELEVENLABS_BASE_URL=$KEYPOOL_BASE_URL/v1/elevenlabs

Official SDKs append ElevenLabs' own /v1/... paths. For example, baseUrl=https://your-keypool.example.com/v1/elevenlabs plus the SDK's /v1/speech-to-text becomes:

https://your-keypool.example.com/v1/elevenlabs/v1/speech-to-text

For raw HTTP, both forms below are accepted by KeyPool:

/v1/elevenlabs/v1/speech-to-text   # exact provider path, preferred
/v1/elevenlabs/speech-to-text      # convenience form; KeyPool adds /v1

For ElevenLabs SDKs, set apiKey to your KeyPool team token. The SDK will send the expected ElevenLabs-style auth header to KeyPool.

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

export const elevenlabs = new ElevenLabsClient({
  apiKey: process.env.KEYPOOL_TOKEN!,
  baseUrl: `${process.env.KEYPOOL_BASE_URL}/v1/elevenlabs`,
});

from elevenlabs.client import ElevenLabs

client = ElevenLabs(
    api_key=os.environ["KEYPOOL_TOKEN"],
    base_url=f"{os.environ['KEYPOOL_BASE_URL']}/v1/elevenlabs",
)

Speech-to-Text Availability¶

Speech-to-text availability can differ from general ElevenLabs account or voice endpoints. If speech-to-text is unavailable through your KeyPool workspace, client code should treat 503 NO_AVAILABLE_KEYS as temporary STT capacity exhaustion rather than a bad KeyPool token.

For repeated 503 responses on STT, retry later or ask your KeyPool administrator whether STT is enabled for your workspace.

Choose the Right ElevenLabs API¶

Use case	Endpoint family	SDK resource	Notes
Fast voice generation	`POST /v1/text-to-speech/{voice_id}`	`client.textToSpeech.convert()` / `client.text_to_speech.convert()`	Returns audio stream. Use `output_format` for telephony or media pipelines.
Low-latency voice	`POST /v1/text-to-speech/{voice_id}/stream`	`client.textToSpeech.stream()`	Use `optimize_streaming_latency` for interactive UX.
Word/character alignment	text-to-speech with timestamps	`convertWithTimestamps`	Useful for karaoke captions, lip sync, subtitles.
Speech transcription	`POST /v1/speech-to-text`	`speechToText.convert()`	Requires `model_id` (`scribe_v2` or `scribe_v1`) and `file` or `cloud_storage_url`.
Voice changer	`POST /v1/speech-to-speech/{voice_id}`	`speechToSpeech.convert()`	Preserves timing/emotion from input audio.
Sound effects	`POST /v1/sound-generation`	`textToSoundEffects.convert()`	Great for games, videos, notification sounds.
Voice selection	`/v1/voices/search`, `/v1/voices/{voice_id}`	`voices.search()`, `voices.get()`	Search by name, labels, category, voice type.
Account/quota	`/v1/user`	`user.get()`	Inspect subscription character count/limit.
Conversational AI	`/v1/convai/...`	`conversationalAi.*`	Agent setup, signed URLs, WebRTC tokens, conversation history.

Text-to-Speech: Production Pattern¶

Pick a stable voice ID, request a practical output format, and consume the returned stream. For phone/RTC flows, use ulaw_8000, alaw_8000, pcm_16000, or Opus formats. For web playback, use MP3.

TypeScript¶

import { writeFile } from "node:fs/promises";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

const client = new ElevenLabsClient({
  apiKey: process.env.KEYPOOL_TOKEN!,
  baseUrl: `${process.env.KEYPOOL_BASE_URL}/v1/elevenlabs`,
});

async function streamToBuffer(stream: ReadableStream<Uint8Array>) {
  const reader = stream.getReader();
  const chunks: Uint8Array[] = [];
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    chunks.push(value);
  }
  return Buffer.concat(chunks);
}

const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
  text: "ElevenLabs request through KeyPool.",
  modelId: "eleven_multilingual_v2",
  outputFormat: "mp3_44100_128",
  voiceSettings: {
    stability: 0.45,
    similarityBoost: 0.8,
    style: 0.2,
    useSpeakerBoost: true,
  },
  // Optional: deterministic-ish generation for repeatable tests.
  seed: 1234,
});

await writeFile("speech.mp3", await streamToBuffer(audio));

Python¶

import os
from pathlib import Path
from elevenlabs.client import ElevenLabs

client = ElevenLabs(
    api_key=os.environ["KEYPOOL_TOKEN"],
    base_url=f"{os.environ['KEYPOOL_BASE_URL']}/v1/elevenlabs",
)

audio_iter = client.text_to_speech.convert(
    "JBFqnCBsd6RMkjVDRZzb",
    text="ElevenLabs request through KeyPool.",
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128",
    voice_settings={
        "stability": 0.45,
        "similarity_boost": 0.8,
        "style": 0.2,
        "use_speaker_boost": True,
    },
    seed=1234,
)

Path("speech.mp3").write_bytes(b"".join(audio_iter))

Low-Latency Streaming TTS¶

For voice assistants or live UX, stream audio instead of waiting for a full file. The optimize_streaming_latency query option trades quality for latency:

0: default quality
1-3: progressively lower latency
4: lowest latency; can mispronounce numbers/dates because text normalization is reduced

const stream = await client.textToSpeech.stream("JBFqnCBsd6RMkjVDRZzb", {
  text: "This chunk can start playing before the full response is ready.",
  modelId: "eleven_flash_v2_5",
  outputFormat: "mp3_44100_128",
  optimizeStreamingLatency: 2,
});

// Pipe `stream` to your HTTP response, Web Audio pipeline, or a file.

Speech-to-Text with Scribe¶

Official API requirements:

Endpoint: POST /v1/speech-to-text
Multipart form body
Required: model_id = scribe_v2 or scribe_v1
Required input: exactly one of file or cloud_storage_url
Minimum audio length: 100 ms
file_format="pcm_s16le_16" can reduce latency when sending raw 16kHz mono PCM; use WAV/MP3/etc. as other/default.

TypeScript¶

const file = new Blob([await Bun.file("meeting.wav").arrayBuffer()], { type: "audio/wav" });

const transcript = await client.speechToText.convert({
  file,
  modelId: "scribe_v2",
  languageCode: "en",             // optional; omit for auto-detect
  tagAudioEvents: true,
  diarize: true,
  timestampsGranularity: "word",
});

console.log(transcript.text);

Python¶

with open("meeting.wav", "rb") as audio:
    transcript = client.speech_to_text.convert(
        file=audio,
        model_id="scribe_v2",
        language_code="en",
        tag_audio_events=True,
        diarize=True,
        timestamps_granularity="word",
    )

print(transcript.text)

For larger files, prefer cloud_storage_url so the client does not upload through your application server.

Voice Discovery and Selection¶

Do not hardcode demo voices forever. Search by category, labels, language, or voice type, then store selected voice_id in your product configuration.

const voices = await client.voices.search({
  pageSize: 10,
  voiceType: "default",
  category: "premade",
  search: "narration",
  includeTotalCount: false,
});

for (const voice of voices.voices ?? []) {
  console.log(voice.name, voice.voiceId, voice.labels, voice.previewUrl);
}

voices = client.voices.search(
    page_size=10,
    voice_type="default",
    category="premade",
    search="narration",
    include_total_count=False,
)
for voice in voices.voices or []:
    print(voice.name, voice.voice_id, voice.labels, voice.preview_url)

Voice Changer / Speech-to-Speech¶

Voice changer keeps timing and emotion from input audio while changing the speaker voice.

const input = new Blob([await Bun.file("source.wav").arrayBuffer()], { type: "audio/wav" });
const changed = await client.speechToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
  audio: input,
  modelId: "eleven_english_sts_v2",
  outputFormat: "mp3_44100_128",
  removeBackgroundNoise: true,
});

with open("source.wav", "rb") as audio:
    changed = client.speech_to_speech.convert(
        "JBFqnCBsd6RMkjVDRZzb",
        audio=audio,
        model_id="eleven_english_sts_v2",
        output_format="mp3_44100_128",
        remove_background_noise=True,
    )

Sound Effects¶

Sound generation is useful for product UX, games, video editing, and synthetic datasets.

const sfx = await client.textToSoundEffects.convert({
  text: "A short soft confirmation chime, friendly SaaS product UI",
  durationSeconds: 1.2,
  promptInfluence: 0.45,
  outputFormat: "mp3_44100_128",
});

sfx = client.text_to_sound_effects.convert(
    text="A short soft confirmation chime, friendly SaaS product UI",
    duration_seconds=1.2,
    prompt_influence=0.45,
    output_format="mp3_44100_128",
)

Account and Quota Introspection¶

const user = await client.user.get();
const sub = user.subscription;
console.log({
  tier: sub.tier,
  used: sub.characterCount,
  limit: sub.characterLimit,
  remaining: sub.characterLimit - sub.characterCount,
});

user = client.user.get()
sub = user.subscription
print({
    "tier": sub.tier,
    "used": sub.character_count,
    "limit": sub.character_limit,
    "remaining": sub.character_limit - sub.character_count,
})

Error Handling¶

Status/code	Meaning	Recommended handling
`401 AUTH_*` from KeyPool	Missing/invalid team token	Fix KeyPool token; do not retry with same token.
`403 SERVICE_FORBIDDEN`	Token policy does not allow ElevenLabs	Ask admin to add ElevenLabs scope.
`429 QUOTA_EXCEEDED`	KeyPool team-token quota hit	Back off until quota window resets.
`503 NO_AVAILABLE_KEYS`	The requested ElevenLabs capability is temporarily unavailable through your KeyPool workspace	Retry later or switch feature path.
Upstream `422`	Request validation error	Fix payload (`model_id`, file, voice ID, enum names).
Upstream `401 detected_unusual_activity`	ElevenLabs rejected the request	Retry later or ask your KeyPool administrator to check service availability.

Development Checklist¶

Use official ElevenLabs SDKs, not generic AI SDK wrappers, when you need full ElevenLabs feature coverage.
Configure baseUrl/base_url to KeyPool and use the KeyPool team token as apiKey.
Pick output formats deliberately; ulaw_8000/alaw_8000 for telephony, mp3_44100_128 for web, PCM for low-latency pipelines.
For STT, use scribe_v2 first and include language_code when known.
Use voice search at setup time; store voice_id, not voice names.
Avoid sending secrets in browser code. If using Conversational AI signed URLs/WebRTC tokens, mint them server-side through KeyPool.

Interactive API Reference¶

For all ElevenLabs endpoints and in-browser testing, see API Reference → ElevenLabs.