Manual Evaluation

Interactive playground

This page is a browser version of the OpenEnv flow. Reset starts one evaluation episode, Step submits one agent answer, and the result shows the reward returned by the grader.

1 Start / Reset Environment

Starts a new incident episode and returns the observation. No grading happens yet.

2 Read Observation

Check the incident, expected field, allowed values, and context.

3 Submit Step

Send one answer. The backend grades it and prints a terminal log.

Step 1

Start a session

Session Not started

Pick a preset or enter a task and ticket manually.

Step 2

Submit an action

The playground automatically maps your choice to `severity`, `root_cause`, or `action`. If you choose a known ticket, it also sets the matching task type for you.

Incident --

Expected field --

Reward --

Status Waiting

Observation Brief

What the agent sees

Start a session to load the incident alert.

Task --

Difficulty --

Expected field --

Allowed values

Start a session first

Context signals

No context loaded yet.

Grader Result

What the reward means

Waiting for step

Submitted answer --

Ground truth --

Reward --

Reason

Submit a step to see the deterministic grader explanation.

Observation

Latest reset payload

No observation yet.

Result

Latest step payload

No step submitted yet.