V2 Pricing (RL)
Reinforcement-learning policy. Read persisted runs, recompute on demand, retrain, or inspect the active policy.
Read persisted RL recommendations (latest by default)
Returns recommendations from a persisted run, filtered to [start_date, end_date]. runs_ago selects the run by recency (0 = latest, the default; 1 = the run before it). Use GET /v2/runs to see how many runs exist. Returns 404 no_run_at_offset when no run exists at that offset.
Compute V2 (RL) recommendations on demand
Invokes the RL orchestrator synchronously. Expensive — prefer `GET /v2/recommendations` and rely on the nightly batch. Set `dry_run=true` to compute without persisting.
List persisted RL runs (newest first)
Returns run metadata so you can pick a `runs_ago` offset for `GET /v2/recommendations`.
Train a fresh RL policy artifact for a company
Synchronous training (typically 30s-5min; set your client timeout accordingly). The fresh artifact is persisted to GCS as the **latest** policy. Set `auto_approve=true` to also promote it to **approved** in the same call.
Promote a trained policy artifact to the approved slot
Use after training with `auto_approve=false`. The `model_id` must match the current latest artifact, otherwise you get `409 model_id_stale` (read `GET /v2/model` for the current id).
Inspect the active V2 policy
Reports which artifact backs RL inference for the company: `approved`, `latest`, or `rule` (rule-based fallback).