Groq Integration¶

Official Groq API Documentation

Groq offers fast inference for large language models and other generative AI tasks. Use KeyPool to call Groq with your team token and a stable base URL.

KeyPool Endpoint for Groq¶

To interact with Groq via KeyPool, configure your SDK with the following base URL:

{YOUR_KEYPOOL_BASE_URL}/v1/groq

This applies to any KeyPool host, including custom domains such as:

https://your-keypool.example.com/v1/groq

Important Note: Do not include /openai/v1 in the base_url when using KeyPool. The Groq SDK already appends Groq's OpenAI-compatible paths (for example /openai/v1/chat/completions).

Authentication¶

Use your KeyPool team token as a bearer token when making requests to the KeyPool Groq endpoint.

Do not send a personal Groq API key when using KeyPool.

Rate Limit Behavior¶

Groq's free-plan limits are enforced at the Groq organization level. Adding multiple KeyPool team tokens does not create multiple Groq quotas, and a large context window does not mean the whole context is usable on the free plan. For example, openai/gpt-oss-120b and openai/gpt-oss-20b have large context windows, but their free-plan TPM limit is much smaller than that.

Groq free-plan traffic is best suited for demo, development, and prototype usage. When a Groq model is temporarily out of usable request or token budget, KeyPool may return 429.

If you receive a 429 from KeyPool:

Check the Retry-After response header before retrying.
Reduce prompt size or max_tokens / max_completion_tokens for free-tier GPT-OSS requests.
Prefer smaller or higher-throughput models such as llama-3.1-8b-instant when you need repeated development calls.
Treat very large coding or document-manipulation prompts as best-effort on Groq free-tier access; use a higher-limit provider or a paid Groq tier for sustained large-context work.

Successful responses use Groq's normal response format, including usage metadata where Groq returns it.

Key Features and Usage¶

KeyPool supports Groq chat completions, audio transcriptions, and model listing.

1. Chat Completions¶

Perform text generation using Groq's powerful language models.

Basic Completion¶

Python Example:

import os
from groq import Groq

KEYPOOL_BASE_URL = os.environ.get("KEYPOOL_BASE_URL")
KEYPOOL_TOKEN = os.environ.get("KEYPOOL_TOKEN")

client = Groq(
    api_key=KEYPOOL_TOKEN,
    base_url=f"{KEYPOOL_BASE_URL}/v1/groq",
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Explain the concept of quantum entanglement in a short paragraph.",
        }
    ],
    model="llama-3.3-70b-versatile",
)

print(chat_completion.choices[0].message.content)

TypeScript Example:

import Groq from "groq-sdk";

const KEYPOOL_BASE_URL = process.env.KEYPOOL_BASE_URL;
const KEYPOOL_TOKEN = process.env.KEYPOOL_TOKEN;

const client = new Groq({
  apiKey: KEYPOOL_TOKEN,
  baseURL: `${KEYPOOL_BASE_URL}/v1/groq`,
});

async function basicCompletion() {
  const chatCompletion = await client.chat.completions.create({
    messages: [
      {
        role: "user",
        content: "Explain the concept of quantum entanglement in a short paragraph.",
      },
    ],
    model: "llama-3.3-70b-versatile",
  });

  console.log(chatCompletion.choices[0].message.content);
}

basicCompletion();

Streaming Completions¶

Receive responses in real-time as they are generated.

Python Example:

import os
from groq import Groq

KEYPOOL_BASE_URL = os.environ.get("KEYPOOL_BASE_URL")
KEYPOOL_TOKEN = os.environ.get("KEYPOOL_TOKEN")

client = Groq(
    api_key=KEYPOOL_TOKEN,
    base_url=f"{KEYPOOL_BASE_URL}/v1/groq",
)

stream = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Count to 10, one number per response.",
        }
    ],
    model="llama-3.1-8b-instant",
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
print()

TypeScript Example:

import Groq from "groq-sdk";

const KEYPOOL_BASE_URL = process.env.KEYPOOL_BASE_URL;
const KEYPOOL_TOKEN = process.env.KEYPOOL_TOKEN;

const client = new Groq({
  apiKey: KEYPOOL_TOKEN,
  baseURL: `${KEYPOOL_BASE_URL}/v1/groq`,
});

async function streamingCompletion() {
  const stream = await client.chat.completions.create({
    messages: [
      {
        role: "user",
        content: "Count to 10, one number per response.",
      },
    ],
    model: "llama-3.1-8b-instant",
    stream: true,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || "");
  }
  process.stdout.write("\n");
}

streamingCompletion();

Tool Use / Function Calling¶

Enable the model to call external functions or tools.

Python Example:

import os
from groq import Groq

KEYPOOL_BASE_URL = os.environ.get("KEYPOOL_BASE_URL")
KEYPOOL_TOKEN = os.environ.get("KEYPOOL_TOKEN")

client = Groq(
    api_key=KEYPOOL_TOKEN,
    base_url=f"{KEYPOOL_BASE_URL}/v1/groq",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    }
                },
                "required": ["location"],
            },
        },
    }
]

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What's the weather like in Boston?",
        }
    ],
    model="llama-3.3-70b-versatile",
    tools=tools,
    tool_choice="auto",
)

choice = chat_completion.choices[0]
if choice.message.tool_calls:
    tool_call = choice.message.tool_calls[0]
    print(f"Model wants to call tool: {tool_call.function.name} with arguments: {tool_call.function.arguments}")
elif choice.message.content:
    print(choice.message.content)

TypeScript Example:

import Groq from "groq-sdk";

const KEYPOOL_BASE_URL = process.env.KEYPOOL_BASE_URL;
const KEYPOOL_TOKEN = process.env.KEYPOOL_TOKEN;

const client = new Groq({
  apiKey: KEYPOOL_TOKEN,
  baseURL: `${KEYPOOL_BASE_URL}/v1/groq`,
});

async function chatToolUse() {
  const tools = [
    {
      type: "function" as const,
      function: {
        name: "get_current_weather",
        description: "Get the current weather in a given location",
        parameters: {
          type: "object",
          properties: {
            location: {
              type: "string",
              description: "The city and state, e.g. San Francisco, CA",
            },
          },
          required: ["location"],
        },
      },
    },
  ];

  const chatCompletion = await client.chat.completions.create({
    messages: [
      {
        role: "user",
        content: "What's the weather like in Boston?",
      },
    ],
    model: "llama-3.3-70b-versatile",
    tools: tools,
    tool_choice: "auto",
  });

  const choice = chatCompletion.choices[0];
  if (choice.message.tool_calls) {
    const toolCall = choice.message.tool_calls[0];
    console.log(`Model wants to call tool: ${toolCall.function.name} with arguments: ${toolCall.function.arguments}`);
  } else if (choice.message.content) {
    console.log(choice.message.content);
  }
}

chatToolUse();

JSON Mode¶

Force the model to respond with valid JSON.

Python Example:

import os
import json
from groq import Groq

KEYPOOL_BASE_URL = os.environ.get("KEYPOOL_BASE_URL")
KEYPOOL_TOKEN = os.environ.get("KEYPOOL_TOKEN")

client = Groq(
    api_key=KEYPOOL_TOKEN,
    base_url=f"{KEYPOOL_BASE_URL}/v1/groq",
)

chat_completion = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "Respond in JSON only."},
        {"role": "user", "content": 'Generate a JSON object with "item" and "price" keys.'},
    ],
    model="llama-3.3-70b-versatile",
    response_format={"type": "json_object"},
)

parsed_response = json.loads(chat_completion.choices[0].message.content)
print(parsed_response)
assert "item" in parsed_response
assert "price" in parsed_response

TypeScript Example:

import Groq from "groq-sdk";

const KEYPOOL_BASE_URL = process.env.KEYPOOL_BASE_URL;
const KEYPOOL_TOKEN = process.env.KEYPOOL_TOKEN;

const client = new Groq({
  apiKey: KEYPOOL_TOKEN,
  baseURL: `${KEYPOOL_BASE_URL}/v1/groq`,
});

async function chatJsonMode() {
  const chatCompletion = await client.chat.completions.create({
    messages: [
      { role: "system", content: "Respond in JSON only." },
      { role: "user", content: 'Generate a JSON object with "item" and "price" keys.' },
    ],
    model: "llama-3.3-70b-versatile",
    response_format: { type: "json_object" },
  });

  const parsedResponse = JSON.parse(chatCompletion.choices[0].message.content!);
  console.log(parsedResponse);
  // Further assertions could be made on the structure of parsedResponse
}

chatJsonMode();

2. Audio Transcription (Whisper)¶

Transcribe audio files using Groq's Whisper integration.

Python Example:

import os
import httpx
from groq import Groq
import tempfile
from pathlib import Path

SAMPLE_AUDIO_URL = "https://cdn.openai.com/API/docs/audio/alloy.wav"

KEYPOOL_BASE_URL = os.environ.get("KEYPOOL_BASE_URL")
KEYPOOL_TOKEN = os.environ.get("KEYPOOL_TOKEN")

client = Groq(
    api_key=KEYPOOL_TOKEN,
    base_url=f"{KEYPOOL_BASE_URL}/v1/groq",
)

# Download sample audio
with tempfile.TemporaryDirectory() as tmpdir:
    audio_path = Path(tmpdir) / "sample.wav"
    response = httpx.get(SAMPLE_AUDIO_URL)
    audio_path.write_bytes(response.content)

    with open(audio_path, "rb") as f:
        transcription = client.audio.transcriptions.create(
            file=("sample.wav", f.read()),
            model="whisper-large-v3-turbo",
            response_format="json",
        )
    print(f"Transcription: {transcription.text}")

TypeScript Example:

import Groq from "groq-sdk";

const SAMPLE_AUDIO_URL = "https://cdn.openai.com/API/docs/audio/alloy.wav";

const KEYPOOL_BASE_URL = process.env.KEYPOOL_BASE_URL;
const KEYPOOL_TOKEN = process.env.KEYPOOL_TOKEN;

const client = new Groq({
  apiKey: KEYPOOL_TOKEN,
  baseURL: `${KEYPOOL_BASE_URL}/v1/groq`,
});

async function audioTranscription() {
  // Fetch the audio file
  const audioResp = await fetch(SAMPLE_AUDIO_URL);
  const audioBuffer = await audioResp.arrayBuffer();
  const audioFile = new File([audioBuffer], "sample.wav", {
    type: "audio/wav",
  });

  const transcription = await client.audio.transcriptions.create({
    file: audioFile,
    model: "whisper-large-v3-turbo",
    response_format: "json",
  });
  console.log(`Transcription: ${transcription.text}`);
}

audioTranscription();

3. List Available Models¶

Retrieve a list of models available through the Groq API.

Python Example:

import os
from groq import Groq

KEYPOOL_BASE_URL = os.environ.get("KEYPOOL_BASE_URL")
KEYPOOL_TOKEN = os.environ.get("KEYPOOL_TOKEN")

client = Groq(
    api_key=KEYPOOL_TOKEN,
    base_url=f"{KEYPOOL_BASE_URL}/v1/groq",
)

models = client.models.list()
print("Available Groq Models:")
for model in models.data:
    print(f"- {model.id}")

TypeScript Example:

import Groq from "groq-sdk";

const KEYPOOL_BASE_URL = process.env.KEYPOOL_BASE_URL;
const KEYPOOL_TOKEN = process.env.KEYPOOL_TOKEN;

const client = new Groq({
  apiKey: KEYPOOL_TOKEN,
  baseURL: `${KEYPOOL_BASE_URL}/v1/groq`,
});

async function listModels() {
  const models = await client.models.list();
  console.log("Available Groq Models:");
  models.data.forEach((model) => {
    console.log(`- ${model.id}`);
  });
}

listModels();

Troubleshooting Blocked Requests¶

If Groq returns a non-rate-limit 403 or a response that says the request was blocked, check the User-Agent sent by your client. Some TypeScript SDKs use an OpenAI-compatible default user agent, such as OpenAI/JS, which can be treated differently by provider-side request filters.

For server-side Node.js, Bun, or Worker-style runtimes where your client can set request headers, use an application-specific User-Agent:

import Groq from "groq-sdk";

const client = new Groq({
  apiKey: process.env.KEYPOOL_TOKEN,
  baseURL: `${process.env.KEYPOOL_BASE_URL}/v1/groq`,
  defaultHeaders: {
    "User-Agent": "my-app-groq-client/1.0",
  },
});

For raw fetch calls, set the same header explicitly:

const response = await fetch(`${process.env.KEYPOOL_BASE_URL}/v1/groq/openai/v1/chat/completions`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.KEYPOOL_TOKEN}`,
    "Content-Type": "application/json",
    "User-Agent": "my-app-groq-client/1.0",
  },
  body: JSON.stringify({
    model: "llama-3.1-8b-instant",
    messages: [{ role: "user", content: "Say hello." }],
  }),
});

Browsers generally do not allow JavaScript to set User-Agent. If this problem appears in a browser-only integration, make the Groq call from a backend route or server-side function where you control request headers.

Direct Groq 403 From Server-Side Runtimes¶

Groq's direct API can return HTTP 403 or Cloudflare 1010 from some server-side runtimes when request headers are rejected before the API request is processed.

When using KeyPool, send requests to your KeyPool Groq endpoint instead of calling Groq directly. If you still see a blocked response, check the troubleshooting section above and set an application-specific User-Agent where your runtime allows it.

Error Handling¶

KeyPool handles various error conditions gracefully, providing informative responses. Common error scenarios include:

401 Unauthorized: Invalid or missing KeyPool Team Token.
403 Forbidden: Your Team Token does not have permission to access the Groq service.
429 Quota Exceeded: Your team's rate limit for Groq has been reached.
503 No Available Keys: Groq is temporarily unavailable through your KeyPool workspace.
502 Upstream Failed: The request to the Groq API failed (e.g., network error, Groq API down).

Interactive API Reference¶

For all Groq endpoints and in-browser testing, see API Reference → Groq.