Trajectory View

Evaluating policy trajectories just got easier with a graph viewer I developed based on streamlit.io. Since storing all environment states can be daunting and large in size, I implemented a solution to make every environment reproducible using a seed, the environment configuration, and the actions dictated by the policy. This approach results in a smaller footprint, simplifying our ability to view the environment and adapt visualizations after running experiments, without the need for costly re-runs.

The viewer ist hosted at https://dap.bru.lu/view/. The payload of env config, seed and actions is submitted as compressed url parameter. But this only as background. You can just click links ;)

Coloring scheme

Hovering over a node shows the current action of a node in the corresponding timestep and possibly other available information.

Each node in the graph viewer provides detailed information to help you understand the policy's behavior better.

Nodes with a blue border are in the selection for the next acting node. This aspect introduces randomness and models the asynchronicity of the policy.
A node with a yellow/orange border represents the currently active node.
The node color itself indicates the correctness of a node and is a color gradient ranging from red (incorrect) to green (correct).

Controls

Hovering over a node will reveal the current action of a node at the corresponding timestep and any other relevant information.
The slider on the top allows to change the timestep.
The slider on the bottom allows to change the speed of the animation.

Trajectory View

Coloring scheme​

Controls​

Coloring scheme

Controls