Skip to main content

DAP-Animation

DAPv2 - Distributed Asynchronous Policy

Welcome to the home of DAPv2, a ground-breaking system for learning a distributed asynchronous policy, born out of my master's thesis and fuelled by my passion for Multi-Agent Reinforcement Learning (MARL).

My master's journey involved creating a system using MARL in a simulated environment, tackling both distance task and graph coloring. These tasks were chosen to explore learning long dependencies and handling situations with a lack of ground truth labels. However, we encountered fundamental limitations of the computational systems - one major limitation was the absence of gradients in messages sent, making learning more challenging.

Undeterred, I took on this challenge, coding everything from scratch but with the inclusion of message gradients this time. The result is a system that increases sampling efficiency and training speed by a factor of approximately 10. We can now learn simple policies in just 2-4 hours! Moreover, the overhaul allows us to evaluate a learned policy on new graphs, irrespective of the graphs used during training. Limitations like limited degree and max number of nodes are not required anymore for evaluation. The only possible hurdle is the generalization of a learned policy, a problem I'm currently looking into.

Visualizations for evaluations on different graphs are in the pipeline and will be added soon, so stay tuned!

Visualizations

Part of the overhaul is a new visualization system that provides a graphical representation of evaluation trajectories of learned policies. It logs every evaluation step during training and makes it accessible in your browser. Moreover, the actions of every node are visibly represented.

Check out an example visualization here: DAP-Visualization

To get an overview, watch a video of the trajectory. For an explanation of the video viewer, please visit the docs.

About the Author

Lucas Brunner

Hello there, I'm Lucas Brunner, the mind behind DAPv2.

I encourage you to reach out to me if you have any questions or suggestions. I'm always eager to discuss my work and learn from others. Looking forward to hearing from you!