Results 1 to 3 of 3

Thread: Creep block Bot using Reinforcement Learning

  1. #1

    Creep block Bot using Reinforcement Learning

    Hi guys
    I tried training a creep block agent using the Deep Deterministic Policy Gradient (DDPG) algorithm. Here is one run of the runs (not the best):
    I was inspired by BeyondGodlikeBot's post last August, and derived a lot of the training environment setup from him. There are a couple differences in our approaches:
    • He uses a stochastic policy, whereas I'm using a deterministic policy.
    • He uses a hard-coded policy to speed up learning. This hard-coded policy is quite simple (move 100 units in front of the farthest ahead creep) but still works quite good. What ends up happening is that the agent learns to mimic this hard-coded policy instead of exploring for itself.

    I intentionally did not use any hard-coded policy and instead allowed the agent to randomly explore by itself. This results in slower learning, but atleast the agent isn't biased by what us humans might feel is the 'right' way to block creeps.
    This is still my first iteration of the agent and was quite slow to train. I plan to speed up learning a lot and use other exploration algorithms and maybe even other learning algorithms. I also plan to make a clean openai-gym-like environment setup (gym-dota?) which I can put on github so other people can also train their agents on the same environment, and even add more environments

  2. #2
    Basic Member
    Join Date
    Oct 2013

    The result looks pretty good! How did you manage to reset the waves after they get past the tower?

  3. #3
    Thanks! Right now I'm spawning each creep manually around the mid lane spawn point and then force killing it. You could also issue console commands for spawning and killing creep waves ('dota_spawn_creeps_mid' and 'dota_kill_creeps all' AFAIK)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts