V2 Pricing (RL)

Reinforcement-learning policy. Read persisted runs, recompute on demand, retrain, or inspect the active policy.

📄️Read persisted RL recommendations (latest by default)

Returns recommendations from a persisted run, filtered to [start_date, end_date]. runs_ago selects the run by recency (0 = latest, the default; 1 = the run before it). Use GET /v2/runs to see how many runs exist. Returns 404 no_run_at_offset when no run exists at that offset.

📄️Compute V2 (RL) recommendations on demand

Invokes the RL orchestrator synchronously. Expensive — prefer `GET /v2/recommendations` and rely on the nightly batch. Set `dry_run=true` to compute without persisting.

📄️List persisted RL runs (newest first)

Returns run metadata so you can pick a `runs_ago` offset for `GET /v2/recommendations`.

📄️Train a fresh RL policy artifact for a company

Synchronous training (typically 30s-5min; set your client timeout accordingly). The fresh artifact is persisted to GCS as the **latest** policy. Set `auto_approve=true` to also promote it to **approved** in the same call.

📄️Promote a trained policy artifact to the approved slot

Use after training with `auto_approve=false`. The `model_id` must match the current latest artifact, otherwise you get `409 model_id_stale` (read `GET /v2/model` for the current id).

📄️Inspect the active V2 policy

Reports which artifact backs RL inference for the company: `approved`, `latest`, or `rule` (rule-based fallback).