Page 2 of 4 FirstFirst 1 2 3 4 LastLast
Results 11 to 20 of 39

Thread: pydota2

  1. #11
    Basic Member
    Join Date
    Dec 2016
    Posts
    731
    Added some documentation about the run_loop and the current thoughts/implementation. It really helped me hone in on some bugs I have that I need to fix (hopefully this week time permitting).

    Here is the PDF:
    https://github.com/pydota2/pydota2/b..._data_flow.pdf

    The PDF is NOT COMPLETE YET, and will need to be updated when I fix up the bugs I realized exist in the run_loop data flow (remember, much of the framework comes from Starcraft II, and their API is different). I need to move out a lot of stuff in the self._step() code since we will not be "stepping the world forward and returning observations" as they do, we will just be sending actions to the bots in live games and then take a new protobuf frame in the next time-step.

    Anyways, lesson learned, white-boarding and documenting the process helps, a lot!

  2. #12
    Basic Member
    Join Date
    Dec 2016
    Posts
    731
    With latest push to repo the RL run loop works for human_play.py.

    We don't do any learning yet (the agent is random and pick a random action for each players on my team from those I coded in - which is three really: 1) use_glyph, 2) no_op, 3) level ability <ID>). That gets sent to the in-game clients as a reply to a polling HTTP POST message. The in-game bots don't do anything with those commands yet as I haven't written the code to execute the mandated actions. I did though set a function on use_glyph to detect if it's a "valid" action, meaning it's possible (it checks to see if dota_time >= glyph_cooldown) before it recommends it as a possible action for bots to take.

    What's next:
    1) flush out a world-model for the agents from protobuf data
    2) make random ability level up action pick from valid/real ability ID numbers belonging to that player (currently it just picks a random value between 0 - 1500) using the world-model created
    3) write client bot-code command interpreter for ability leveling to show the system actually working

    *) separate agent used during "Hero Selection" (game_state == 3) versus "Game Play" (game_state in [4,5]) as they don't belong together or as one entity
    *) write a lot more "action" commands that we can send to bot-clients (like move_to, attack_unit, purchase_item, etc.)
    *) write a non-random but rather learning agent

    You can test it by running:
    Code:
    PYTHONPATH=. /usr/local/bin/python3.6 -m pydota2.bin.human_play --team Radiant --agent pydota2.agents.random_agent.RandomAgent
    Obviously adjust for Windows/Linux and Python versions as appropriate.
    Last edited by nostrademous; 11-15-2017 at 08:20 AM.

  3. #13
    Basic Member
    Join Date
    Dec 2016
    Posts
    731
    got the "ok" from DeepMind to be leveraging their platform code developed for SC2 for the pydota2 use. Makes me happy.

  4. #14
    Basic Member
    Join Date
    Dec 2016
    Posts
    731

    Getting Going

    As of my last commit, we have basic working multi-bot random decision making across the few functions I have implemented.

    Python Side Console View:
    Code:
    C:\pydota2>python -m pydota2.bin.human_play --team Radiant --agent pydota2.agents.random_agent.RandomAgent
    
    Windows                                                       <---- Still need to implement dota 2 bot code updates on Linux/MacOS
    
    Starting Protobuf Thread 1 for Radiant
    Starting HTTP POST Thread 2 for Radiant
    Starting Thread for Agent(s)
    
    I1122 08:56:49.218526  5164 dota2_env.py:116] Environment is ready.
    
    IMPLEMENT HS AGENT SETUP                                                                    <------ Still need to implement Hero Selection Agent (for now hardcoded)
    
    I1122 08:56:49.219526  5164 dota2_env.py:142] Starting episode: 1
    
    ....                   <------ skipping a bunch of print-out related to hero-selection stuff (protobufs need to be fixed to support agent hero selection anyways)
    
    Current Protobuf Timestamp: -89.866661
    npc_dota_hero_antimage [Lvl: 1] is able to level 1 abilities
    npc_dota_hero_bane [Lvl: 1] is able to level 1 abilities
    npc_dota_hero_pudge [Lvl: 1] is able to level 1 abilities
    npc_dota_hero_necrolyte [Lvl: 1] is able to level 1 abilities
    npc_dota_hero_nyx_assassin [Lvl: 1] is able to level 1 abilities
    
    Game State: 4
        1/no_op                                              ()
        2/clear_action                                       (6/bool [2])
        3/cmd_level_ability                                  (4/ability_str [''])
        0/use_glyph                                          ()
    
    RandomAgent chose random action: 1 for player_id 2
    
    RandomAgent chose random action: 2 for player_id 3
    
    RandomAgent chose random action: 1 for player_id 4
    
    RandomAgent chose random action: 3 for player_id 5
    npc_dota_hero_necrolyte [Lvl: 1] is able to level 1 abilities
    PID: 5, Rand: 2, RandName: necrolyte_heartstopper_aura, AbilityIDS: ['necrolyte_death_pulse', 'necrolyte_sadist', 'necrolyte_heartstopper_aura']
    
    RandomAgent chose random action: 3 for player_id 6
    npc_dota_hero_nyx_assassin [Lvl: 1] is able to level 1 abilities
    PID: 6, Rand: 1, RandName: nyx_assassin_mana_burn, AbilityIDS: ['nyx_assassin_impale', 'nyx_assassin_mana_burn', 'nyx_assassin_spiked_carapace']
    
    RandomAgent chose random action: 0 for the team
    What you see above is that currently in Game State 4 there are four possible actions that can be taken by the collective group. On the back-end, Actions 1-3 can be taken by each "hero" whereas action 0 can be taken by the "team" (in reality it is taken by one of the heroes on behalf of the team since many actions that are team-based like using the glyph still need a bot handle as they are unit-scoped).

    Below is the in game console dump corresponding to the above server commands:
    Code:
    [VScript] Received Update from Server
    [VScript] 44.99966 [npc_dota_hero_antimage]: SENDING POLL REQUEST
    [VScript] 44.99966 [npc_dota_hero_antimage]: Getting Last TEAM Packet Reply
    [VScript] {
    	['0'] = {
    
    	}
    }
    [VScript] 44.99966 [npc_dota_hero_antimage]: <ERROR> [0] does not exist in action table!
    [VScript] 44.99966 [npc_dota_hero_antimage]: Getting MY Last Packet Reply
    [VScript] 44.99966 [npc_dota_hero_antimage]: Packet RTT: 0.26298522949242
    [VScript] {
    	['1'] = {
    
    	}
    }
    [VScript] 44.99966 [npc_dota_hero_antimage]: Executing Action: No Action
    [VScript] 44.99966 [npc_dota_hero_antimage]: No Action
    [VScript] 44.99966 [npc_dota_hero_bane]: Getting MY Last Packet Reply
    [VScript] 44.99966 [npc_dota_hero_bane]: Packet RTT: 0.26313018798852
    [VScript] {
    	['2'] = {
    		[1] = {
    			[1] = 1
    		}
    	}
    }
    [VScript] 44.99966 [npc_dota_hero_bane]: Executing Action: Clear Action
    [VScript] 44.99966 [npc_dota_hero_pudge]: Getting MY Last Packet Reply
    [VScript] 44.99966 [npc_dota_hero_pudge]: Packet RTT: 0.26328277587914
    [VScript] {
    	['1'] = {
    
    	}
    }
    [VScript] 44.99966 [npc_dota_hero_pudge]: Executing Action: No Action
    [VScript] 44.99966 [npc_dota_hero_pudge]: No Action
    [VScript] 44.99966 [npc_dota_hero_necrolyte]: Getting MY Last Packet Reply
    [VScript] 44.99966 [npc_dota_hero_necrolyte]: Packet RTT: 0.26336669921898
    [VScript] {
    	['3'] = {
    		[1] = {
    			[1] = 'necrolyte_heartstopper_aura'
    		}
    	}
    }
    [VScript] 44.99966 [npc_dota_hero_necrolyte]: Executing Action: Level Ability
    [VScript] 44.99966 [npc_dota_hero_necrolyte]: Leveling: necrolyte_heartstopper_aura
    [VScript] 44.99966 [npc_dota_hero_nyx_assassin]: Getting MY Last Packet Reply
    [VScript] 44.99966 [npc_dota_hero_nyx_assassin]: Packet RTT: 0.26401901245141
    [VScript] {
    	['3'] = {
    		[1] = {
    			[1] = 'nyx_assassin_mana_burn'
    		}
    	}
    }
    [VScript] 44.99966 [npc_dota_hero_nyx_assassin]: Executing Action: Level Ability
    [VScript] 44.99966 [npc_dota_hero_nyx_assassin]: Leveling: nyx_assassin_mana_burn
    [VScript] Received Update from Server
    And yes, I have not written the use_glyph action yet so it just notifies with an error; but the bots do level their abilities in game.

    Anyways, I'm excited. There is still a lot of LUA and Python (and maybe even C++) code to be written but at least the basic framework is there.

    On the Python side:
    * Still need to write a learning agent (instead of a random one) - probably a several hierarchical agents (but honestly, this is up to each person to do as they please)
    * Still need to implement a large amount of actions to define the action space (for the hero and the team)
    * Still need to decide on world state representation for the learning agent (again, this can be up to each framework user's preference).
    * Need to figure out how we will command minions/illusions.

    On the Lua side:
    * Will need to implement all the hero and team functions that the Python side can select
    * Implement debugging pane
    * Implement the HTTP POST polling so it is available during hero selection
    * Need to figure out and implement how we will command minions/illusions of heroes

    On the C++ side:
    There is a high probability that we will need a Dota2 simulator/emulator like lenlrx created to simply be able to train faster in a headless & parallelized manner. To do this, we would circumvent the HTTP POST method to instead talk to the simulator and then simulate the environment step() forward.

    I'm ALWAYS looking for help from anyone interested.

  5. #15
    Basic Member
    Join Date
    Dec 2016
    Posts
    731
    Random movement is in... they don't really go anywhere as it is a random action (1 of several they can do) but rather just circle in the fountain, but at least functionality is there now once we implement a real agent that gives reward for proper goals. Glyph use is in (with code to not attempt action when on cooldown).

  6. #16
    Basic Member
    Join Date
    Mar 2012
    Posts
    2,012
    I am curious, how do you make the bots move?
    I ask because the most efficient way would be to check if the destination changes and only then assign a new movement order instead of ordering the bot to do the same thing each frame
    While throttling has been fixed by Chris, this would also help with performance.
    Explanations on the normal, high and very high brackets in replays: here, here & here
    Why maphacks won't work in D2: here

  7. #17
    Basic Member
    Join Date
    Dec 2016
    Posts
    731
    So I have a running average of the last 10 round trip times for the time it takes between polling the backend server and getting a reply. I then pick a (“random” for now - using a uniform -1.0 to 1.0 distribution) vector scaled by the max distance the bot can travel based on its current movement speed and the amount of time between possible movement commands and add it to the bots current location. As a result I am not over-riding any command with a future command since all commands are gated to what can be achieved in a single “framed”-worth of time.

    Having said all that, I think I will change this to rather pick a random facing direction which is then scaled by movement speed so for ML purposes I can limit the number of choices to a more finite degree delta in terms of where you can go (in other words - I can limit our possible directions to say multiples of 45 degrees (so 8 in all) and thus limit the degrees of freedom.

  8. #18
    Basic Member
    Join Date
    Mar 2012
    Posts
    2,012
    Quote Originally Posted by nostrademous View Post
    So I have a running average of the last 10 round trip times for the time it takes between polling the backend server and getting a reply. I then pick a (“random” for now - using a uniform -1.0 to 1.0 distribution) vector scaled by the max distance the bot can travel based on its current movement speed and the amount of time between possible movement commands and add it to the bots current location. As a result I am not over-riding any command with a future command since all commands are gated to what can be achieved in a single “framed”-worth of time.
    So you mean you dictated each frame's movement time with the amount it would take in that frame?

    Quote Originally Posted by nostrademous View Post
    Having said all that, I think I will change this to rather pick a random facing direction which is then scaled by movement speed so for ML purposes I can limit the number of choices to a more finite degree delta in terms of where you can go (in other words - I can limit our possible directions to say multiples of 45 degrees (so 8 in all) and thus limit the degrees of freedom.
    So only one move command and control when to stop (or turn) ?
    Explanations on the normal, high and very high brackets in replays: here, here & here
    Why maphacks won't work in D2: here

  9. #19
    Basic Member
    Join Date
    Dec 2016
    Posts
    731
    Eh, more like I dictate the action to fit inside the loop time between getting new actions. This is my Think() for all bots:
    Code:
    function X:Think(hBot)
        -- if we are a human player, don't bother
        if not hBot:IsBot() then return end
    
        if not self.Init then
            self:DoInit(hBot)
            return
        end
    
        if GetGameState() ~= GAME_STATE_GAME_IN_PROGRESS and GetGameState() ~= GAME_STATE_PRE_GAME then return end
    
        -- throttle how often we query the back-end server
        if (GameTime() - X.lastUpdateTime) >= THROTTLE_RATE then
            -- check if bot has updated directives from our AI
            ServerUpdate()
            X.lastUpdateTime = GameTime()
        end
    
        -- process the team commands (called by a bot, doesn't matter which one)
        self:ProcessTeamCommands(hBot)
        
        -- process the commands for this bot
        self:ProcessCommands(hBot)
    end
    As you see, the ServerUpdate() is throttled, meaning, several bot-frames might happen before a new update request (polling) is sent to the back-end server for new commands. Currently I set the throttle rate at 0.25 seconds (4 times a second). Now, remember, HTTP POST function is async, so I cannot count on it returning results immediately, hence the two Process*Commands() later, which check for the last server reply (being different then a previous ones which I track; this is so I never process the same actions across multiple frames that all belong to the same POST request and response. As a result, the "movement" action is given the amount of time typically taken between one server reply and the next server reply (although currently I have a bug which you made me realize b/c of your question, so thank you) since a new server reply could very well over-write the previous one's action (I don't support PUSH or QUEUE actions for now). So, the bots can move as far as their movement speed is able to carry them in the average amount of time between server replies.

    If I switch it from picking a random location in map bounds and scaling it down to a vector achievable in the time, to a random direction and just then a random distance that's achievable, I should be able to more accurately calculate the time b/c I can account for the turn-time to face the direction going.

  10. #20
    Basic Member
    Join Date
    Dec 2016
    Posts
    731
    So I believe I have implemented a simple Q-learning algorithm for moving to the center of the map. I say believe b/c I didn't have a chance to test it yet, but feeling good (at least it compiles... :P)
    Link to the Learning Agent

    So, first of all, let me say I realize there is at least one problem with my implementation, is that it is dumb to the geographic terrain of the map, meaning it knows nothing about not being able to walk up cliffs, through trees, etc.

    What I do, is simply calculate the desired facing directions based on the bots location and angle to desired location, the center Location (0,0,0). I initially left it at that and b/c of the precision of floating point numbers in terms of "exact" degrees for the facing the state space was too big for my taste, so now we simplify by rounding to the nearest integer.

    All 5 bots use the same learning table so it helps fill out the state-action space much quicker. When I get some time I will let it run for 15min or so and see how dumb/smart it is. I say only 15 min b/c there is only 8 directional actions (+ 3 fluff ones to see if they get filtered out) and initially probably only 20 or so "heading angles" (or what I call desired_facing_direction).

    The rewards are: -1 by default (to force them to learn to go to location as fast as possible), -0.5 if they take an action that gets them closer to the destination, +10 for arriving within 50 units of destination.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •