Skip to main content
Iterating on an AI agent means iterating on its prompt, its tools, and the way it handles edge cases. The test-calls API runs real (bot-to-bot or SIP loopback) calls against an agent using a scenario prompt you supply — every run produces a real call log with transcript, grading, and billing, so you see exactly how the agent behaves and what it costs. Use it for:
  • Pre-deploy smoke tests after every prompt edit
  • Regression suites wired into CI (hook test-call.completed webhook → fail the build if score drops)
  • Stress-testing concurrency limits

One-shot: single run

curl -X POST https://api.thunderphone.com/v1/test-calls \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "target_type":     "agent",
    "target_id":       12,
    "direction":       "outbound",
    "scenario_prompt": "You are a polite caller asking about refund policy for order 12345.",
    "consent_to_charge": true
  }'
Fields:
FieldTypeRequiredDescription
target_typestringyesagent or phone_number
target_idintegeryesThe agent id (or phone number id)
directionstringyesoutbound (bot places) or inbound (bot answers)
scenario_promptstringnoDrives what the test bot says
modestringnobot (bot-to-bot, default) or sip (SIP loopback)
consent_to_chargebooleanyesMust be true. Test calls cost 2× normal rate
target_numberstringnoOverride for the bot’s caller id (E.164)
The response is a Test call run object in status="queued". Poll until status becomes completed or failed; once call_id is set, load the transcript via GET /v1/calls/{call_id}/transcript.

Batches: parallel scenarios

Run N scenarios concurrently — useful for regression suites that hit every known edge case in parallel:
curl -X POST https://api.thunderphone.com/v1/test-call-batches \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "target_type":     "agent",
    "target_id":       12,
    "direction":       "outbound",
    "run_count":       5,
    "stagger_seconds": 2,
    "scenario_prompts": [
      "Ask about refund policy.",
      "Ask for hours of operation.",
      "Complain about a delayed shipment.",
      "Ask to speak with a human.",
      "Ask an unrelated trivia question."
    ],
    "consent_to_charge": true
  }'
Response carries a run_ids list of child run ids. Fetch batch status:
curl https://api.thunderphone.com/v1/test-call-batches/{batch_id} \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY"
run_count is capped at 20; stagger_seconds spaces out the spawn to avoid hammering the agent (0–60 s).

Hook it into CI

The test-call.completed webhook fires once per run. Subscribe to it and fail your CI job if any run returns status: "failed" or scores below your threshold on the companion call.graded event:
# Pseudocode for a CI integration
@app.post("/thunderphone-hook")
async def hook(request):
    body = await request.body()
    if not verify(body): abort(401)
    event = json.loads(body)
    if event["type"] == "test-call.completed":
        run = event["data"]["test_call_run"]
        if run["status"] != "completed":
            trigger_ci_failure(run)
    if event["type"] == "call.graded" and event["data"]["grade"]["score"] < 0.8:
        trigger_ci_failure(event["data"])

Patterns

Per-prompt regression corpus

Maintain a JSON file of {name, scenario_prompt, expected_outcome} tuples. On every prompt change, run the full set as a batch; diff the transcripts and grades against the previous run.

Per-release smoke test

A single batch of five happy-path scenarios you run after every deploy. Latency-sensitive, so keep stagger_seconds: 0.

Latency benchmarking

Run identical scenarios against different product tiers (spark, bolt, storm-base). Compare the call.graded scores and the duration_seconds from each resulting call log.

Next steps

Test calls reference

Every query parameter, status code, and batch shape.

AI grading

Auto-score every test run to track quality over time.

Issue reports

Flag specific tests for human review.

test-call.completed webhook

Stream results into your CI / Slack / PagerDuty.