Trajectory View
Evaluating policy trajectories just got easier with a graph viewer I developed based on streamlit.io. Since storing all environment states can be daunting and large in size, I implemented a solution to make every environment reproducible using a seed, the environment configuration, and the actions dictated by the policy. This approach results in a smaller footprint, simplifying our ability to view the environment and adapt visualizations after running experiments, without the need for costly re-runs.
The viewer ist hosted at https://dap.bru.lu/view/. The payload of env config, seed and actions is submitted as compressed url parameter. But this only as background. You can just click links ;)
Coloring scheme
Hovering over a node shows the current action of a node in the corresponding timestep and possibly other available information.
Each node in the graph viewer provides detailed information to help you understand the policy's behavior better.
- Nodes with a blue border are in the selection for the next acting node. This aspect introduces randomness and models the asynchronicity of the policy.
- A node with a yellow/orange border represents the currently active node.
- The node color itself indicates the correctness of a node and is a color gradient ranging from red (incorrect) to green (correct).
Controls
- Hovering over a node will reveal the current action of a node at the corresponding timestep and any other relevant information.
- The slider on the top allows to change the timestep.
- The slider on the bottom allows to change the speed of the animation.