Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.thunderphone.com/llms.txt

Use this file to discover all available pages before exploring further.

Iterating on an AI agent means iterating on its prompt, its tools, and the way it handles edge cases. The test-calls API runs real (bot-to-bot or SIP loopback) calls against an agent using a scenario prompt you supply — every run produces a real call log with transcript, grading, and billing, so you see exactly how the agent behaves and what it costs. Use it for:
  • Pre-deploy smoke tests after every prompt edit
  • Regression suites wired into CI (hook test-call.completed webhook → fail the build if score drops)
  • Stress-testing concurrency limits

One-shot: single run

curl -X POST https://api.thunderphone.com/v1/test-calls \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "target_type":     "agent",
    "target_id":       12,
    "direction":       "outbound",
    "scenario_prompt": "You are a polite caller asking about refund policy for order 12345.",
    "consent_to_charge": true
  }'
Fields:
FieldTypeRequiredDescription
target_typestringyesagent or phone_number
target_idintegeryesThe agent id (or phone number id)
directionstringyesoutbound (bot places) or inbound (bot answers)
scenario_promptstringnoDrives what the test bot says
modestringnobot (bot-to-bot, default) or sip (SIP loopback)
consent_to_chargebooleanyesMust be true. Test calls cost 2× normal rate
target_numberstringnoOverride for the bot’s caller id (E.164)
The response is a Test call run object in status="queued". Poll until status becomes completed or failed; once call_id is set, load the transcript via GET /v1/calls/{call_id}/transcript.

Batches: parallel scenarios

Run N scenarios concurrently — useful for regression suites that hit every known edge case in parallel:
curl -X POST https://api.thunderphone.com/v1/test-call-batches \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "target_type":     "agent",
    "target_id":       12,
    "direction":       "outbound",
    "run_count":       5,
    "stagger_seconds": 2,
    "scenario_prompts": [
      "Ask about refund policy.",
      "Ask for hours of operation.",
      "Complain about a delayed shipment.",
      "Ask to speak with a human.",
      "Ask an unrelated trivia question."
    ],
    "consent_to_charge": true
  }'
Response carries a run_ids list of child run ids. Fetch batch status:
curl https://api.thunderphone.com/v1/test-call-batches/{batch_id} \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY"
run_count is capped at 20; stagger_seconds spaces out the spawn to avoid hammering the agent (0–60 s).

Hook it into CI

The test-call.completed webhook fires once per run. Subscribe to it and fail your CI job if any run returns status: "failed" or scores below your threshold on the companion call.graded event:
# Pseudocode for a CI integration
@app.post("/thunderphone-hook")
async def hook(request):
    body = await request.body()
    if not verify(body): abort(401)
    event = json.loads(body)
    if event["type"] == "test-call.completed":
        run = event["data"]["test_call_run"]
        if run["status"] != "completed":
            trigger_ci_failure(run)
    if event["type"] == "call.graded" and event["data"]["grade"]["score"] < 0.8:
        trigger_ci_failure(event["data"])

Patterns

Per-prompt regression corpus

Maintain a JSON file of {name, scenario_prompt, expected_outcome} tuples. On every prompt change, run the full set as a batch; diff the transcripts and grades against the previous run.

Per-release smoke test

A single batch of five happy-path scenarios you run after every deploy. Latency-sensitive, so keep stagger_seconds: 0.

Latency benchmarking

Run identical scenarios against different product tiers (spark, bolt, storm-base). Compare the call.graded scores and the duration_seconds from each resulting call log.

Next steps

Test calls reference

Every query parameter, status code, and batch shape.

AI grading

Auto-score every test run to track quality over time.

Issue reports

Flag specific tests for human review.

test-call.completed webhook

Stream results into your CI / Slack / PagerDuty.