Skip to main content

Trajectory View

Evaluating policy trajectories just got easier with a graph viewer I developed based on streamlit.io. Since storing all environment states can be daunting and large in size, I implemented a solution to make every environment reproducible using a seed, the environment configuration, and the actions dictated by the policy. This approach results in a smaller footprint, simplifying our ability to view the environment and adapt visualizations after running experiments, without the need for costly re-runs.

The viewer ist hosted at https://dap.bru.lu/view/. The payload of env config, seed and actions is submitted as compressed url parameter. But this only as background. You can just click links ;)

Coloring scheme

Hovering over a node shows the current action of a node in the corresponding timestep and possibly other available information.

Each node in the graph viewer provides detailed information to help you understand the policy's behavior better.

  • Nodes with a blue border are in the selection for the next acting node. This aspect introduces randomness and models the asynchronicity of the policy.
  • A node with a yellow/orange border represents the currently active node.
  • The node color itself indicates the correctness of a node and is a color gradient ranging from red (incorrect) to green (correct).

Controls

  • Hovering over a node will reveal the current action of a node at the corresponding timestep and any other relevant information.
  • The slider on the top allows to change the timestep.
  • The slider on the bottom allows to change the speed of the animation.