Forum Rules

  • No flaming or derogatory remarks, directly or through insinuation.
  • No discussion, sharing or referencing illegal software such as hacks, keygen, cracks and pirated software.
  • No offensive contents, including but not limited to, racism, gore or pornography.
  • No excessive spam/meme, i.e. copious one liners in a short period of time, typing with all caps or posting meme responses (text/image).
  • No trolling, including but not limited to, flame incitation, user provocation or false information distribution.
  • No link spamming or signature advertisements for content not specific to Dota 2.
  • No Dota 2 key requests, sell, trade etc.
  • You may not create multiple accounts for any purpose, including ban evasion, unless expressly permitted by a moderator.

  • Please search before posting. One thread per issue. Do not create another thread if there is an existing one already.
  • Before posting anything, make sure you check out all sticky threads (e.g., this). Do not create new threads about closed ones.
  • It is extremely important that you post in correct forum section.

  • Balance discussion only in Misc.
  • All art related (such as hero model) feedbacks go to Art Feedback Forum.
  • All matchmaking feedback should go here: Matchmaking Feedback
  • All report/low priority issues should go here: Commend/Report/Ban Feedback
  • No specific workshop item feedback. These should go to workshop page of that item.
  • When posting in non-bugs section (such as this), use [Bugs], [Discussion] or [Suggestion] prefix in your thread name.

In case you object some action by a moderator, please contact him directly through PM and explain your concerns politely. If you are still unable to resolve the issue, contact an administrator. Do not drag these issues in public.

All rules are meant to augment common sense, please use them when not conflicted with aforementioned policies.
See more
See less


  • Filter
  • Time
  • Show
Clear All
new posts

  • pydota2

    As I brush up on my understanding of Reinforcement Learning and Deep Learning I was combing through the approaches used by AlphaGo, AlphaGo Zero, OpenAI and their gym, and baselines packages, as well as many articles written in the MachineLearning reddit group and really want to see what I (or we as a group) can achieve.

    I looked at the approach being taken by DeepMind's StarCraft II team and have decided to try and replicate their framework within the Dota 2 universe.

    I just recently started this... so it is a very early Work-In-Progress project. I am 50% hesitant to even make this post at this time, but leaned to do so simply because I'm hopeful others will be willing to jump on-board and help propel this forward with me.
    You can find it here:

    I have tried to put together a lot of thoughts and forward-looking ideas into the so please read through it (it shows up as the text below the github project from the link above). One piece is currently completed (that is the ability to pull down protobuf dumps from a bot game). A single replay (for dire & radiant from the same game) is uploaded.

    The goal is really initially to create the appropriate framework for ALL OF US to use in developing RL/ML approaches to Dota 2. Writing the framework is a critical step, a big one, but not unfathomable. Yes, we will find faults in the API made available by Valve and hopefully Valve will be responsive in terms of providing us the functionality we need in order to interact with the game or with game-elements / replays as we implement these. Also, many members already have started writing many of the pieces I hope will be mainstream components of the framework in isolation in their own private/public repos. For actual RL/ML approaches to writing agents, people can do what they want (and collaborate as they want), just hopefully within the framework so code sharing (if that's what they want) can easily be done.

    I haven't started a wiki or any task list yet for things that need to be worked on yet and have no time until much later today (Halloween after-all) to do so.

    My personal plan is to devote at minimum 4-5 hrs per week (yeah, I know it's not much, but I am very busy the last few months and will continue to be for many more months) to this project. Some weeks I might hit 10-20 hrs, but I cannot commit to that.

    Cheers (I'm out of time to write this post).
    Last edited by nostrademous; 10-31-2017, 10:53 AM.

  • #2
    Hint: create an org instead of it being a repository under your personal profile. Then you can easily add contributors and what not, dedicated repositories for various parts of the project etc.
    Troubleshooting crashes
    Dota 2 Resources: autoexec.cfg reference / benchmark.cfg / Tweak it yourself (launch options, cvars) / useful batch files
    No-Bling GlanceValue & FPS ++ NEW! Panorama hotkeys NEW! dota_primary_mm_language_override NEW TOGGLE_QUICKCAST_TP.bat NEW FIX LEGACY KEYS YOURSELF! NEW Toggle UI Animations OFF


    • #3
      Okay, created the Organization known as 'pydota2' and update OP with link.

      Added the first few issues with a milestone version 0.2 deadline.


      • #4
        well, it took me about maybe 100hr to write a basic one with C++ and Python Binding.
        efficiency is not that important, but it must be multithread friendly.

        i spent many many time on the algorithms, thats the real hard part QAQ. ----My ML bot work in progress


        • #5
          Okay, I modularized the protocol connections to the CMsgBotWorldState to their own class for easy re-use in self_play and human_play models later on (proto_connector). I tested this morning that the ( script still works and will record the Radiant & Dire replay binaries in the appropriate location. This was tested using Python 3.6, but it should work on 2.7 as well.

          I also started writing the ( to act as the server that Dota2 will connect to when sending CreateHTTPRequest POST messages from live bots. The design here is for them to not actually POST anything (although they could) but rather for the POST to act like a polling method and allow the server to send data to the live bots through responses to the polls (after all, we will be getting our world state information through the CMsgBotWorldState dumps in a separate thread). I can test this code later tonight, although no promises.


          • #6
            I have read your and I think the approach you think is very comprehensive and feasible, but I still have something to say, To be honest, I am also a novice in ML. I started to learn DL 2 month ago and RL just few days ago, so forgive my noob if my understanding is wrong.

            I notice that you want to represent abilities as certain features(isStun, blink, mana, cd, duration..), while I don't think it is a good idea. You need to collect and input a lot of ability data and those may be changed according to the DOTA2 update so you need to train it again. Moreover, some ability can't be represent in such simple features such as hook of pudge, link of IO and so on. You can't extract all the features you want to represent the abilities. In my opinion, we can just put the ability into an embedding layer with about maybe 100 dimension I think enough to represent a ability. It is just like Word2Vec in NLP domain so we may call it Ability2Vec. In this network, the input maybe (observation, abilityVector) and calculate through embedding layer and strategy layers, then the ouput maybe the action of the ability(use it or not, use on where, who, when). If my understanding is right, there are 3 advantages of such layer.
            1). After the good training, we can get lots of good embedding vectors of abilities, and the similar abilities would have little distance in vector (AM's blink and QOP's blink may be very similar in embedding vector just like synonym in Word2Vec). More importantly, we can get a good strategy network that can be commonly used in any ability(it may be overfitting for recent used ability but some trick may fix it) so we don't need train it for each ability.
            2). Even if the Dota2 version update the ability data, the embedding vector may just change a little in theory, which means that we can input recent games into the old version network to train the embedding vector to fix the update problem so we don't need to train the network from zero again.
            3). The time of extract features would be saved.

            Actually, I write so much about ability vector just in order to come up with an example. We can do some Item2Vec so on. Never the less, Hero2Vec, Unit2Vec may be possible, too. Any entity can be changed into embedding layer. To be honest, I'd like to work with you to build the RL model together if quickly gaming and multi-process are possible. It's really exciting to let Dota2 AI learn how to play games and beat human.

            There is one thing I can ensure is that neural network would be really useful in such video game. I use something like NN in my team_desire.lua to adjust the push desire(ofc its parameters are adjust by myself without ML). The inputs are information of the lane and heroes position we can see and their item worth(has tp or not), items that useful in pushing tower my team have, towers' and barracks' life. Through just one neuron of hidden layer, the output is 3 push lane desire. Finally, each bot would have a last layer to input itself and the team desire to get the final desire of itself. It is a simple pushing system, but it can beat most of the bots script right now. My bot script name is "Army Bots", you can have a look at it.


            • #7
              That's the OpenAI what does this "button" do? self-learn approach. It has greater potential as it mimics human learning more closely, but requires a shitload of processing resources to the point of being unviable for this small independent project. Dota is not LoL with it's "this is my..cone move, this is move, this is my..throw stuff on the ground to blow it up move". After 7.07 it's even harder to come up with "synonyms" in a timely fashion. Nothing wrong with "hard-coding" abilities / items, and focusing on other aspects for starters (OpenAI team also done that, and they were working on just one hero).
              Troubleshooting crashes
              Dota 2 Resources: autoexec.cfg reference / benchmark.cfg / Tweak it yourself (launch options, cvars) / useful batch files
              No-Bling GlanceValue & FPS ++ NEW! Panorama hotkeys NEW! dota_primary_mm_language_override NEW TOGGLE_QUICKCAST_TP.bat NEW FIX LEGACY KEYS YOURSELF! NEW Toggle UI Animations OFF


              • #8
                @pilaoda - sorry, "reply with quote" seems to be currently broken on the forums.

                I have no issues with what you suggest and would love to have your help on the project. It's "one" approach, and the whole point is to allow multiple approaches so people can test ideas and things. In Dota, as far as coding is concerned, Items and Abilities are synonymous and you can treat them as the same thing.

                My push right now is to get the various modes of interacting with Dota2 for the purposes of being able to do ML working. As soon as I have live system messaging between the python back-end and the main client working I will start considering how to properly extract various world state inputs that we want to use. In other words, getting the framework up and running so people can do whatever ML approach testing they want is priority #1. Then doing actual ML is #2.

                Ultimately, my goal is to make the ML approach as hero-agnostic as possible and rather focus on distinguishing features of a heroes and abilities and items and how they relate to local and global objectives. This is to address the huge hero pool and not having to train for each hero. Also, because patches are somewhat frequent, an approach to quickly re-learn a specific piece of the overall system seems somewhat critical.

                Shoot me a PM with your github user name or request access to the pydota2 org and I will add you as collaborator.
                Last edited by nostrademous; 11-03-2017, 05:40 AM.


                • #9
                  Started some documentation on the pydota2 topic.



                  • #10
                    Okay got the basic framework for pydota2/bin/ working.

                    Run Using:
                    python pydota2\bin\ --team Radiant
                    What It Does:
                    1) Sets up a protobuf listener for the specified team and accepts the dumps as they happen (depending on the dota 2 specified launch options: example `-botworldstatetosocket_frames 10`). It then calls a function callback to process the serialized protobuf dump into a JSON-represented proto object as specified by the CMsgBotWorldState.proto file maintained by Valve.

                    2) Sets up a HTTP POST listener for the bots playing in the actual game to call using a polling model (after authenticating).

                    Example Cmd Shell Output:
                    Radiant protoSize: 68
                    Received Post:  {'Type': 'X', 'Time': 1976.0185546875}                      <====== THIS IS AUTHENTICATION PACKET
                    SENDING RESPONSE:
                     {'status': 200, 'Type': 'X', 'Time': 1976.0185546875}
                    Radiant protoSize: 22424
                    Received Post:  {'Type': 'P', 'Time': 1976.2802734375, 'PlayerID': 3}   <====== THIS IS A POLLING PACKET
                    TODO - fill out with actions for agents
                    SENDING RESPONSE:
                     {'status': 200, 'Type': 'P', 'Data': {}, 'Time': 1976.2802734375}
                    Example Console Output in Dota2 Client:
                    [VScript] 50.16624 [npc_dota_hero_bane]: SENDING SERVER UPDATE
                    [VScript] Connected Successfully to Backend Server
                    [VScript] 50.56624 [npc_dota_hero_bane]: SENDING SERVER UPDATE
                    [VScript] Received Update from Server
                    [VScript] 50.56624 [npc_dota_hero_bane]: Need to Process new Server Reply
                    [VScript] 50.56624 [npc_dota_hero_bane]: Packet RTT: 0.12145996095001
                    [VScript] 50.56624 [npc_dota_hero_bane]: Server Data: table: 0x0024efd0
                    Still need to work out the exact frequency because currently I'm polling every 0.1 seconds, but it seems that the fastest I can get the round trip time of a poll to a reply is about 0.12145996095001 so there might need to be some deconfliction between which update got which response.

                    Next Work:
                    * model after by creating 2 sub-processes (one for each team) so they are isolated from each other
                    * write a thread-locked queue for putting the JSON-encoded proto data and handing it over to the RL system for processing
                    * write a thread-locked queue for retrieving the actions from the RL system for sending as a response to the HTTP POST message
                    * write some basic movement actions to test the system
                    * a lot more stuff...


                    • #11
                      Added some documentation about the run_loop and the current thoughts/implementation. It really helped me hone in on some bugs I have that I need to fix (hopefully this week time permitting).

                      Here is the PDF:

                      The PDF is NOT COMPLETE YET, and will need to be updated when I fix up the bugs I realized exist in the run_loop data flow (remember, much of the framework comes from Starcraft II, and their API is different). I need to move out a lot of stuff in the self._step() code since we will not be "stepping the world forward and returning observations" as they do, we will just be sending actions to the bots in live games and then take a new protobuf frame in the next time-step.

                      Anyways, lesson learned, white-boarding and documenting the process helps, a lot!


                      • #12
                        With latest push to repo the RL run loop works for

                        We don't do any learning yet (the agent is random and pick a random action for each players on my team from those I coded in - which is three really: 1) use_glyph, 2) no_op, 3) level ability <ID>). That gets sent to the in-game clients as a reply to a polling HTTP POST message. The in-game bots don't do anything with those commands yet as I haven't written the code to execute the mandated actions. I did though set a function on use_glyph to detect if it's a "valid" action, meaning it's possible (it checks to see if dota_time >= glyph_cooldown) before it recommends it as a possible action for bots to take.

                        What's next:
                        1) flush out a world-model for the agents from protobuf data
                        2) make random ability level up action pick from valid/real ability ID numbers belonging to that player (currently it just picks a random value between 0 - 1500) using the world-model created
                        3) write client bot-code command interpreter for ability leveling to show the system actually working

                        *) separate agent used during "Hero Selection" (game_state == 3) versus "Game Play" (game_state in [4,5]) as they don't belong together or as one entity
                        *) write a lot more "action" commands that we can send to bot-clients (like move_to, attack_unit, purchase_item, etc.)
                        *) write a non-random but rather learning agent

                        You can test it by running:
                        PYTHONPATH=. /usr/local/bin/python3.6 -m pydota2.bin.human_play --team Radiant --agent pydota2.agents.random_agent.RandomAgent
                        Obviously adjust for Windows/Linux and Python versions as appropriate.
                        Last edited by nostrademous; 11-15-2017, 07:20 AM.


                        • #13
                          got the "ok" from DeepMind to be leveraging their platform code developed for SC2 for the pydota2 use. Makes me happy.


                          • #14
                            Getting Going

                            As of my last commit, we have basic working multi-bot random decision making across the few functions I have implemented.

                            Python Side Console View:
                            C:\pydota2>python -m pydota2.bin.human_play --team Radiant --agent pydota2.agents.random_agent.RandomAgent
                            Windows                                                       <---- Still need to implement dota 2 bot code updates on Linux/MacOS
                            Starting Protobuf Thread 1 for Radiant
                            Starting HTTP POST Thread 2 for Radiant
                            Starting Thread for Agent(s)
                            I1122 08:56:49.218526  5164] Environment is ready.
                            IMPLEMENT HS AGENT SETUP                                                                    <------ Still need to implement Hero Selection Agent (for now hardcoded)
                            I1122 08:56:49.219526  5164] Starting episode: 1
                            ....                   <------ skipping a bunch of print-out related to hero-selection stuff (protobufs need to be fixed to support agent hero selection anyways)
                            Current Protobuf Timestamp: -89.866661
                            npc_dota_hero_antimage [Lvl: 1] is able to level 1 abilities
                            npc_dota_hero_bane [Lvl: 1] is able to level 1 abilities
                            npc_dota_hero_pudge [Lvl: 1] is able to level 1 abilities
                            npc_dota_hero_necrolyte [Lvl: 1] is able to level 1 abilities
                            npc_dota_hero_nyx_assassin [Lvl: 1] is able to level 1 abilities
                            Game State: 4
                                1/no_op                                              ()
                                2/clear_action                                       (6/bool [2])
                                3/cmd_level_ability                                  (4/ability_str [''])
                                0/use_glyph                                          ()
                            RandomAgent chose random action: 1 for player_id 2
                            RandomAgent chose random action: 2 for player_id 3
                            RandomAgent chose random action: 1 for player_id 4
                            RandomAgent chose random action: 3 for player_id 5
                            npc_dota_hero_necrolyte [Lvl: 1] is able to level 1 abilities
                            PID: 5, Rand: 2, RandName: necrolyte_heartstopper_aura, AbilityIDS: ['necrolyte_death_pulse', 'necrolyte_sadist', 'necrolyte_heartstopper_aura']
                            RandomAgent chose random action: 3 for player_id 6
                            npc_dota_hero_nyx_assassin [Lvl: 1] is able to level 1 abilities
                            PID: 6, Rand: 1, RandName: nyx_assassin_mana_burn, AbilityIDS: ['nyx_assassin_impale', 'nyx_assassin_mana_burn', 'nyx_assassin_spiked_carapace']
                            RandomAgent chose random action: 0 for the team
                            What you see above is that currently in Game State 4 there are four possible actions that can be taken by the collective group. On the back-end, Actions 1-3 can be taken by each "hero" whereas action 0 can be taken by the "team" (in reality it is taken by one of the heroes on behalf of the team since many actions that are team-based like using the glyph still need a bot handle as they are unit-scoped).

                            Below is the in game console dump corresponding to the above server commands:
                            [VScript] Received Update from Server
                            [VScript] 44.99966 [npc_dota_hero_antimage]: SENDING POLL REQUEST
                            [VScript] 44.99966 [npc_dota_hero_antimage]: Getting Last TEAM Packet Reply
                            [VScript] {
                            	['0'] = {
                            [VScript] 44.99966 [npc_dota_hero_antimage]: <ERROR> [0] does not exist in action table!
                            [VScript] 44.99966 [npc_dota_hero_antimage]: Getting MY Last Packet Reply
                            [VScript] 44.99966 [npc_dota_hero_antimage]: Packet RTT: 0.26298522949242
                            [VScript] {
                            	['1'] = {
                            [VScript] 44.99966 [npc_dota_hero_antimage]: Executing Action: No Action
                            [VScript] 44.99966 [npc_dota_hero_antimage]: No Action
                            [VScript] 44.99966 [npc_dota_hero_bane]: Getting MY Last Packet Reply
                            [VScript] 44.99966 [npc_dota_hero_bane]: Packet RTT: 0.26313018798852
                            [VScript] {
                            	['2'] = {
                            		[1] = {
                            			[1] = 1
                            [VScript] 44.99966 [npc_dota_hero_bane]: Executing Action: Clear Action
                            [VScript] 44.99966 [npc_dota_hero_pudge]: Getting MY Last Packet Reply
                            [VScript] 44.99966 [npc_dota_hero_pudge]: Packet RTT: 0.26328277587914
                            [VScript] {
                            	['1'] = {
                            [VScript] 44.99966 [npc_dota_hero_pudge]: Executing Action: No Action
                            [VScript] 44.99966 [npc_dota_hero_pudge]: No Action
                            [VScript] 44.99966 [npc_dota_hero_necrolyte]: Getting MY Last Packet Reply
                            [VScript] 44.99966 [npc_dota_hero_necrolyte]: Packet RTT: 0.26336669921898
                            [VScript] {
                            	['3'] = {
                            		[1] = {
                            			[1] = 'necrolyte_heartstopper_aura'
                            [VScript] 44.99966 [npc_dota_hero_necrolyte]: Executing Action: Level Ability
                            [VScript] 44.99966 [npc_dota_hero_necrolyte]: Leveling: necrolyte_heartstopper_aura
                            [VScript] 44.99966 [npc_dota_hero_nyx_assassin]: Getting MY Last Packet Reply
                            [VScript] 44.99966 [npc_dota_hero_nyx_assassin]: Packet RTT: 0.26401901245141
                            [VScript] {
                            	['3'] = {
                            		[1] = {
                            			[1] = 'nyx_assassin_mana_burn'
                            [VScript] 44.99966 [npc_dota_hero_nyx_assassin]: Executing Action: Level Ability
                            [VScript] 44.99966 [npc_dota_hero_nyx_assassin]: Leveling: nyx_assassin_mana_burn
                            [VScript] Received Update from Server
                            And yes, I have not written the use_glyph action yet so it just notifies with an error; but the bots do level their abilities in game.

                            Anyways, I'm excited. There is still a lot of LUA and Python (and maybe even C++) code to be written but at least the basic framework is there.

                            On the Python side:
                            * Still need to write a learning agent (instead of a random one) - probably a several hierarchical agents (but honestly, this is up to each person to do as they please)
                            * Still need to implement a large amount of actions to define the action space (for the hero and the team)
                            * Still need to decide on world state representation for the learning agent (again, this can be up to each framework user's preference).
                            * Need to figure out how we will command minions/illusions.

                            On the Lua side:
                            * Will need to implement all the hero and team functions that the Python side can select
                            * Implement debugging pane
                            * Implement the HTTP POST polling so it is available during hero selection
                            * Need to figure out and implement how we will command minions/illusions of heroes

                            On the C++ side:
                            There is a high probability that we will need a Dota2 simulator/emulator like lenlrx created to simply be able to train faster in a headless & parallelized manner. To do this, we would circumvent the HTTP POST method to instead talk to the simulator and then simulate the environment step() forward.

                            I'm ALWAYS looking for help from anyone interested.


                            • #15
                              Random movement is in... they don't really go anywhere as it is a random action (1 of several they can do) but rather just circle in the fountain, but at least functionality is there now once we implement a real agent that gives reward for proper goals. Glyph use is in (with code to not attempt action when on cooldown).