- Pre-deploy smoke tests after every prompt edit
- Regression suites wired into CI (hook
test-call.completedwebhook → fail the build if score drops) - Stress-testing concurrency limits
One-shot: single run
| Field | Type | Required | Description |
|---|---|---|---|
target_type | string | yes | agent or phone_number |
target_id | integer | yes | The agent id (or phone number id) |
direction | string | yes | outbound (bot places) or inbound (bot answers) |
scenario_prompt | string | no | Drives what the test bot says |
mode | string | no | bot (bot-to-bot, default) or sip (SIP loopback) |
consent_to_charge | boolean | yes | Must be true. Test calls cost 2× normal rate |
target_number | string | no | Override for the bot’s caller id (E.164) |
status="queued". Poll until status becomes completed or
failed; once call_id is set, load the transcript via
GET /v1/calls/{call_id}/transcript.
Batches: parallel scenarios
Run N scenarios concurrently — useful for regression suites that hit every known edge case in parallel:run_ids list of child run ids. Fetch batch
status:
run_count is capped at 20; stagger_seconds spaces out the spawn
to avoid hammering the agent (0–60 s).
Hook it into CI
Thetest-call.completed webhook
fires once per run. Subscribe to it and fail your CI job if any run
returns status: "failed" or scores below your threshold on the
companion call.graded event:
Patterns
Per-prompt regression corpus
Maintain a JSON file of{name, scenario_prompt, expected_outcome}
tuples. On every prompt change, run the full set as a batch; diff the
transcripts and grades against the previous run.
Per-release smoke test
A single batch of five happy-path scenarios you run after every deploy. Latency-sensitive, so keepstagger_seconds: 0.
Latency benchmarking
Run identical scenarios against different product tiers (spark,
bolt, storm-base). Compare the call.graded scores and the
duration_seconds from each resulting call log.
Next steps
Test calls reference
Every query parameter, status code, and batch shape.
AI grading
Auto-score every test run to track quality over time.
Issue reports
Flag specific tests for human review.
test-call.completed webhook
Stream results into your CI / Slack / PagerDuty.