Groq runtime authorization

Name: Veto
Availability: InStock
Author: Veto

Wrap Groq tool calls with Veto. Calls returned through Groq's tool-use path are evaluated before dispatch: allow, review, or deny, with an exportable decision record per governed decision.

Why Groq needs guardrails

Groq's LPU inference returns tool calls in well under a second. That speed is what makes Groq agents feel real-time. The same speed leaves them unbounded without guardrails. A model loop on Groq can decide and dispatch tens of tool calls per minute. If those tools are paid (Twilio, SendGrid, an external API), one prompt injection can consume a vendor budget before a human notices.

Enforcement is model-agnostic: Veto enforces policy regardless of which inference provider produced the tool call. open-weight models on Groq, another provider in fallback: same policy, same enforcement, same decision record.

Speed compounds cost

LPU inference makes tool loops fast. Without rate limits at the guardrail boundary, vendor API bills can spike before anyone reviews the run.

Open weights, open jailbreaks

Open-weight model behavior is widely studied. Jailbreak techniques are widely documented. Enforcement runs at the call site, not the prompt.

No built-in guardrails

Groq is an inference provider, not an agent framework. Enforcement is your responsibility, not the platform's.

Before and after Veto

The left tab shows standard Groq function calling. The model returns tool_calls and your code dispatches them at LPU speed. The right tab adds Veto between selection and dispatch. Same model, same tools, each governed call evaluated against policy first.

import os
import json
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

tools = [
    {
        "type": "function",
        "function": {
            "name": "fetch_external_api",
            "description": "GET data from an external API endpoint",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string"},
                    "api_key": {"type": "string"},
                },
                "required": ["url"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "send_sms",
            "description": "Send an SMS via Twilio",
            "parameters": {
                "type": "object",
                "properties": {
                    "to": {"type": "string"},
                    "body": {"type": "string"},
                },
                "required": ["to", "body"],
            },
        },
    },
]

response = client.chat.completions.create(
    model=os.environ["GROQ_MODEL"],
    messages=[{"role": "user", "content": user_message}],
    tools=tools,
    tool_choice="auto",
)

# Groq is fast: sub-second tool selection. That speed is the threat.
# Loop the model 60 times per minute and burn through your Twilio budget.
for tool_call in response.choices[0].message.tool_calls or []:
    args = json.loads(tool_call.function.arguments)
    execute_tool(tool_call.function.name, args)

Parallel dispatch with rate limits

Groq's async client lets you dispatch tool calls in parallel. Veto's per-call evaluation enforces rate limits and allowlists in the same parallel path: denials short-circuit without blocking other in-flight calls.

groq_parallel_with_veto.py

import os
import json
import asyncio
from groq import AsyncGroq
from veto_sdk import Veto

client = AsyncGroq(api_key=os.environ["GROQ_API_KEY"])
veto = Veto(api_key=os.environ["VETO_API_KEY"])

TOOL_REGISTRY = {
    "fetch_external_api": fetch_external_api_impl,
    "send_sms": send_sms_impl,
    "run_inference": run_inference_impl,
}

async def dispatch_with_veto(tool_call, ctx):
    args = json.loads(tool_call.function.arguments)
    decision = veto.guard(
        tool=tool_call.function.name,
        arguments=args,
        context=ctx,
    )
    if decision.decision != "allow":
        return {"tool": tool_call.function.name, "result": f"Blocked: {decision.reason}"}
    handler = TOOL_REGISTRY[tool_call.function.name]
    result = await handler(**args)
    return {"tool": tool_call.function.name, "result": result}

# Groq's LPU returns the full tool_calls list at very low latency.
# Dispatch in parallel while Veto enforces per-call policy.
async def run_turn(messages):
    response = await client.chat.completions.create(
        model=os.environ["GROQ_MODEL"],
        messages=messages,
        tools=TOOLS,
        tool_choice="auto",
    )
    calls = response.choices[0].message.tool_calls or []
    ctx = {"hourly_calls": await get_hourly_count(user_id)}
    return await asyncio.gather(*(dispatch_with_veto(c, ctx) for c in calls))

Policy configuration

Define guardrails in declarative YAML. Tune rate-limit thresholds, domain allowlists, and approval routing without redeploying.

veto/policies.yaml

rules:
  - name: rate_limit_external_api
    description: Cap external API calls at 60 per hour
    tool: fetch_external_api
    when: context.hourly_calls >= 60
    action: deny
    message: "Hourly external API call cap reached: pausing dispatch"

  - name: api_domain_allowlist
    description: Only call allowlisted external API hosts
    tool: fetch_external_api
    when: "!args.url.match(/^https:\\/\\/api\\.(stripe|twilio|sendgrid|openweathermap)\\.invalid\\//)"
    action: deny
    message: "External API host not in allowlist"

  - name: strip_api_key_in_args
    description: Reject calls that pass API keys as plain args
    tool: fetch_external_api
    when: "args.api_key != null"
    action: deny
    message: "API keys must come from a secret store, not from the model"

  - name: sms_per_recipient_cap
    description: At most 1 SMS per recipient per hour
    tool: send_sms
    when: context.recipient_hourly_count >= 1
    action: deny
    message: "SMS rate limit reached for this recipient"

  - name: sms_business_hours
    description: SMS only during 9am-8pm local
    tool: send_sms
    when: context.local_hour < 9 || context.local_hour >= 20
    action: deny
    message: "SMS sending restricted to 09:00-20:00 local time"

How Veto fits

Install the SDK

pip install veto-sdk groq

Define policies

Create veto/policies.yaml with rate limits, allowlists, and approval routes. The same policies apply across Groq tool-calling paths.

Guard the dispatch loop

Call veto.guard() for each governed tool_call returned by Groq before invoking the underlying handler.

Use cases

High-throughput SMS agents

Groq's low latency makes voice and SMS agents feel responsive. Cap sends per recipient per hour, restrict to business hours, block opted-out numbers: without slowing the loop.

External API rate guards

Agents that call paid APIs (Stripe, weather, geocoding) get hard rate limits at the guardrail boundary. Budget protection at local decision path per decision.

Multi-model deployments

Use Groq-hosted open models for fast paths and a fallback provider for reasoning paths. One Veto policy file governs both: enforcement is the same regardless of inference provider.

Open-weight jailbreak defense

Open-weight models can be jailbroken with widely known techniques. Enforcement at the tool boundary makes jailbreak output non-executable: the call is blocked regardless of the model output.

Frequently asked questions

Does Veto add latency to Groq tool calls?

Veto evaluates policy in-process before dispatch. Groq's LPU latency is the dominant cost; Veto adds a local decision-path guardrail step. Network calls to a remote decision service are not on the hot path.

Which Groq models does this work with?

Use Groq models in your account that emit OpenAI-compatible tool or function calls. Keep the compatibility check in CI because provider catalogs change; Veto guards the dispatch path, not the model.

Can I share policies between Groq and OpenAI agents?

Yes. Policies are model-agnostic: they match on tool name and arguments, not on the inference provider. One veto/policies.yaml governs your Groq fast path and your OpenAI reasoning path together.

What about streaming tool calls?

Veto evaluates on the complete tool_call object once the model has finished emitting its arguments. With streaming, accumulate the tool_call deltas, parse the final JSON, then call veto.guard() before dispatching.

Related integrations

OpenAI

GPT function call guardrails

Claude

Anthropic SDK tool_use guardrails

Vercel AI SDK

Guardrails for streaming AI SDK tool calls

Wrap one Groq tool path and inspect the decision record.