Originally Posted by

**Platinum_dota2**
lenlrx, I actually saw your post the other day, but forgot to leave a comment: I think it is unlikely that a MDP (Markov decision process) based algorithm (like Q learning) alone works for switching between states (you may be able to use its value for your decision making tho). There are two main problems:

1. MDPs usually only give you the probability of changing the state (even tho the updating rules may be different), so if you think about it, it doesn't make that much sense to dota bots (for example deciding to retreat with x probability is a bit weird). you can use the numbers with some thresholds (instead of treating them like probabilities) that you define to somewhat fix this, but I'm not sure even this will be decent at the end.

2. One way you may be able to fix the last problem is by defining many sub-states (for example defining states so that something like laning_highhp_highmana_2enemies_1tower is a valid state). But the problem with doing this is that your transition matrix can become huge (and will be huge if you actually want to define all the relevant states). Even if the matrix doesn't create memory/running time issues, learning/updating the matrix to reach some acceptable values can become insanely long.

I haven't tried this (and haven't read your code), so let me know if you are doing something different or you think I'm wrong about something. I suggest if you want to try to improve your bots, try the second one (but not defining too many sub-states) and see what happens.