Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 24

Thread: AlphaGo Zero Article

  1. #1
    Basic Member
    Join Date
    Dec 2016
    Posts
    732

    Cool AlphaGo Zero Article

    Linked is an interesting article about AlphaGo Zero (a completely self-trained variant) and how it faired against supervised training variants (those that beat human professionals).

    https://www.nature.com/nature/journa...ature24270.pdf

    I found it to be a cool read with some new insights into things I haven't considered (mind you, I'm a noob still at this point on the domain topic).

    Obviously there are differences between Go and Dota2, the main one being one is a turn-based game and the other continuous, but perhaps there are approaches or approximations that can be made to handle that. Anyways, if you think ideas from the article are not applicable or apply to Dota2 then I apologize for wasting your time; otherwise, enjoy the read.

  2. #2
    Basic Member
    Join Date
    Mar 2012
    Posts
    2,018
    Quote Originally Posted by nostrademous View Post
    Obviously there are differences between Go and Dota2, the main one being one is a turn-based game and the other continuous
    I am surely even lower than you on ML. But my first thought as I read this sentence was approximating TBS in D2 as action-reaction, such as CM's Frostbite on PA, PA activating BKB, CM ulti-ing shortly after, PA jumping and slashing her, dazzle Shallow Graving CM, Alchemist loading a stun for PA, Sky silencing Alch, Alch stuns himself, PA jumping and slashing Alch after he stunned himself, Sky ulti-ing Dazzle and CM.
    Obviously not so much turn-based, but I assume the action-reaction concept can be used without turning it into a rudimentary conditional SBM like we're using now. Of course, I could be wrong since I know nothing on the ML topic
    Explanations on the normal, high and very high brackets in replays: here, here & here
    Why maphacks won't work in D2: here

  3. #3
    Basic Member
    Join Date
    Dec 2016
    Posts
    732
    I mean sure, obviously when it comes to computer software and anything digital there is no such things as continuous but rather discrete events being modified at some sampling rate. What you suggest can surely be achieved, although it's a tiny bit more complicated as there are numerous entities on each team (5 in most cases, not counting meepo/arc-warden clone, etc.) that can all take an action (out of a pool of a large number of actions) simultaneously. Sure, through simulation and sequencing (i.e., in your example, CM casts Frostbite, which has a cast point, which can be placed at a discrete future point of time) a ML system can unroll all the simultaneous actions into a sequence of events that occur in some timed order in its prediction stage (the paper refers to it in its MCTS, for Monte Carlo Tree Search, lookahead search).

    What's harder to model though is that the execution of actions will be occurring in a world that has variable network bandwidth / lag, while the system self-learned at some other variable if you, for example, self-trained purely on localhost bot lobbies. I would need to scour around the web to see what research and recommendations exist to remove this from training and account for it in world-state approximation. Perhaps the best way to handle this is to discretize time into a fixed frequency (say 0.25 sec slices) and assuming that such a reaction time is equivalent or "good enough" to be on-par or ahead of professional players.

    PS - When I read the article this morning the full article was available free of charge, apparently now you have to subscribe to the journal / pay to view full article.

  4. #4
    Basic Member
    Join Date
    Dec 2016
    Posts
    76
    as u guys keep ur eyes on news,i am keep on my ML bot proj.
    https://github.com/lenLRX/Dota2_DPPO_bots ----My ML bot work in progress

  5. #5
    Basic Member
    Join Date
    Dec 2016
    Posts
    732
    Quote Originally Posted by lenlrx View Post
    as u guys keep ur eyes on news,i am keep on my ML bot proj.
    How goes it? Any successes/thoughts to share? I assume it's not the repo in your signature as no commits have been made in 10 months to that (unless you are just working locally). What I would like to know is what approach you are taking, which facet(s) of the game are you tackling, and what progress/results have you seen.

    I do want to start my own ML approach too, unfortunately (and yes, this sounds like an excuse) I have very limited time currently and it comes in very unpredictable spurts.

  6. #6
    Basic Member
    Join Date
    Dec 2016
    Posts
    76
    updated signature :>

    i had built a simple dota2 simulator for training.
    currently objective:try to let bots to get EXP at a safe distance
    but they were always too far to get EXP or too close to be hurt by creeps
    just add a lstm layer,but training is too slow now.
    multi-process optimization is needed, i will try it when i am free.

    just leave the company 1h ago(2am local time now), have to go to work as usual tomorrow. gona share more detail later.
    https://github.com/lenLRX/Dota2_DPPO_bots ----My ML bot work in progress

  7. #7
    Basic Member
    Join Date
    Sep 2017
    Posts
    56
    Yeah I'm kind of thinking of doing some ML too as it should theoretically give much better results. The architecture is an important one though. How are you guys thinking of tackling it? Are there any other ways than using the httprequests to get the data out? Are there any good ways of running games automatically? Any way of doing parallel or headless games? Distributed across multiple machines?
    The OpenAI team must've done something to accomplish this. Does anyone know if they used the bot api?

  8. #8
    Basic Member
    Join Date
    Mar 2012
    Posts
    2,018
    If you want to limit yourself strictly to teaching, learning and coaching, I guess there's this (haven't tried it tho and not sure whether VAC qualifies it as a "hack" so use at your own risk ).
    Another PoC is here but that's for custom games, not bots though.

    You won't be able to use it in an actual game though... (I mean once you'd distribute the bot online over the workshop)
    Explanations on the normal, high and very high brackets in replays: here, here & here
    Why maphacks won't work in D2: here

  9. #9
    Basic Member
    Join Date
    Dec 2016
    Posts
    76
    the first one has no advantage comparing to httprequest, httprequest itself sucks as well
    https://github.com/lenLRX/Dota2_DPPO_bots ----My ML bot work in progress

  10. #10
    Basic Member
    Join Date
    Mar 2012
    Posts
    2,018
    I'd say the C communication PoC has the advantage that it's bidirectional while HTTPReq can only send, not receive. And I know this is what some of you are looking for. So there is this as an advantage. Other than that, probably nothing else. But as I said, use at your own risk.
    I don't know how up-to-date the code is either.
    Explanations on the normal, high and very high brackets in replays: here, here & here
    Why maphacks won't work in D2: here

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •