Groq runtime authorization
Wrap Groq tool calls with Veto. Calls returned through Groq's tool-use path are evaluated before dispatch: allow, review, or deny, with an exportable decision record per governed decision.
Why Groq needs guardrails
Groq's LPU inference returns tool calls in well under a second. That speed is what makes Groq agents feel real-time. The same speed leaves them unbounded without guardrails. A model loop on Groq can decide and dispatch tens of tool calls per minute. If those tools are paid (Twilio, SendGrid, an external API), one prompt injection can consume a vendor budget before a human notices.
Enforcement is model-agnostic: Veto enforces policy regardless of which inference provider produced the tool call. open-weight models on Groq, another provider in fallback: same policy, same enforcement, same decision record.
LPU inference makes tool loops fast. Without rate limits at the guardrail boundary, vendor API bills can spike before anyone reviews the run.
Open-weight model behavior is widely studied. Jailbreak techniques are widely documented. Enforcement runs at the call site, not the prompt.
Groq is an inference provider, not an agent framework. Enforcement is your responsibility, not the platform's.
Before and after Veto
The left tab shows standard Groq function calling. The model returns tool_calls and your code dispatches them at LPU speed. The right tab adds Veto between selection and dispatch. Same model, same tools, each governed call evaluated against policy first.
import os
import json
from groq import Groq
client = Groq(api_key=os.environ["GROQ_API_KEY"])
tools = [
{
"type": "function",
"function": {
"name": "fetch_external_api",
"description": "GET data from an external API endpoint",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string"},
"api_key": {"type": "string"},
},
"required": ["url"],
},
},
},
{
"type": "function",
"function": {
"name": "send_sms",
"description": "Send an SMS via Twilio",
"parameters": {
"type": "object",
"properties": {
"to": {"type": "string"},
"body": {"type": "string"},
},
"required": ["to", "body"],
},
},
},
]
response = client.chat.completions.create(
model=os.environ["GROQ_MODEL"],
messages=[{"role": "user", "content": user_message}],
tools=tools,
tool_choice="auto",
)
# Groq is fast: sub-second tool selection. That speed is the threat.
# Loop the model 60 times per minute and burn through your Twilio budget.
for tool_call in response.choices[0].message.tool_calls or []:
args = json.loads(tool_call.function.arguments)
execute_tool(tool_call.function.name, args)Parallel dispatch with rate limits
Groq's async client lets you dispatch tool calls in parallel. Veto's per-call evaluation enforces rate limits and allowlists in the same parallel path: denials short-circuit without blocking other in-flight calls.
import os
import json
import asyncio
from groq import AsyncGroq
from veto_sdk import Veto
client = AsyncGroq(api_key=os.environ["GROQ_API_KEY"])
veto = Veto(api_key=os.environ["VETO_API_KEY"])
TOOL_REGISTRY = {
"fetch_external_api": fetch_external_api_impl,
"send_sms": send_sms_impl,
"run_inference": run_inference_impl,
}
async def dispatch_with_veto(tool_call, ctx):
args = json.loads(tool_call.function.arguments)
decision = veto.guard(
tool=tool_call.function.name,
arguments=args,
context=ctx,
)
if decision.decision != "allow":
return {"tool": tool_call.function.name, "result": f"Blocked: {decision.reason}"}
handler = TOOL_REGISTRY[tool_call.function.name]
result = await handler(**args)
return {"tool": tool_call.function.name, "result": result}
# Groq's LPU returns the full tool_calls list at very low latency.
# Dispatch in parallel while Veto enforces per-call policy.
async def run_turn(messages):
response = await client.chat.completions.create(
model=os.environ["GROQ_MODEL"],
messages=messages,
tools=TOOLS,
tool_choice="auto",
)
calls = response.choices[0].message.tool_calls or []
ctx = {"hourly_calls": await get_hourly_count(user_id)}
return await asyncio.gather(*(dispatch_with_veto(c, ctx) for c in calls))Policy configuration
Define guardrails in declarative YAML. Tune rate-limit thresholds, domain allowlists, and approval routing without redeploying.
rules:
- name: rate_limit_external_api
description: Cap external API calls at 60 per hour
tool: fetch_external_api
when: context.hourly_calls >= 60
action: deny
message: "Hourly external API call cap reached: pausing dispatch"
- name: api_domain_allowlist
description: Only call allowlisted external API hosts
tool: fetch_external_api
when: "!args.url.match(/^https:\\/\\/api\\.(stripe|twilio|sendgrid|openweathermap)\\.invalid\\//)"
action: deny
message: "External API host not in allowlist"
- name: strip_api_key_in_args
description: Reject calls that pass API keys as plain args
tool: fetch_external_api
when: "args.api_key != null"
action: deny
message: "API keys must come from a secret store, not from the model"
- name: sms_per_recipient_cap
description: At most 1 SMS per recipient per hour
tool: send_sms
when: context.recipient_hourly_count >= 1
action: deny
message: "SMS rate limit reached for this recipient"
- name: sms_business_hours
description: SMS only during 9am-8pm local
tool: send_sms
when: context.local_hour < 9 || context.local_hour >= 20
action: deny
message: "SMS sending restricted to 09:00-20:00 local time"How Veto fits
Install the SDK
pip install veto-sdk groqDefine policies
Create veto/policies.yaml with rate limits, allowlists, and approval routes. The same policies apply across Groq tool-calling paths.
Guard the dispatch loop
Call veto.guard() for each governed tool_call returned by Groq before invoking the underlying handler.
Use cases
High-throughput SMS agents
Groq's low latency makes voice and SMS agents feel responsive. Cap sends per recipient per hour, restrict to business hours, block opted-out numbers: without slowing the loop.
External API rate guards
Agents that call paid APIs (Stripe, weather, geocoding) get hard rate limits at the guardrail boundary. Budget protection at local decision path per decision.
Multi-model deployments
Use Groq-hosted open models for fast paths and a fallback provider for reasoning paths. One Veto policy file governs both: enforcement is the same regardless of inference provider.
Open-weight jailbreak defense
Open-weight models can be jailbroken with widely known techniques. Enforcement at the tool boundary makes jailbreak output non-executable: the call is blocked regardless of the model output.
Frequently asked questions
Does Veto add latency to Groq tool calls?
Which Groq models does this work with?
Can I share policies between Groq and OpenAI agents?
What about streaming tool calls?
Related integrations
Wrap one Groq tool path and inspect the decision record.