Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 24

Thread: Anyone tracking Skill by Match_ID?

  1. #1
    Basic Member
    Join Date
    Mar 2013
    Posts
    28

    Anyone tracking Skill by Match_ID?

    Hi all! Database developer by day, Dota (and math) nerd by night here, making my first post... I set up a dedicated DOTAnalysis server (i5, 32gb RAM) to crunch some DOTA stats for me, and I've got some conundrums...

    I'm currently running a full scrape of all historical match data (chugging along at 2.6s per 100 matches parsed/db inserted - if anyone can do better I'd love to chat - I'm thinking I'll have to start threading my requests to improve any), and have started to realize that I'm not able to get all the data I'd like to via GetMatchHistoryBySequenceNum. The main thing I'm peeved about is Game Skill rating (which I THINK is the only thing you can't get via GMHBSN but can via GMH). I'm wondering if any of you out there happen to be tracking Skill by Match_ID and if anyone would be able to share that info with me. I find it pretty amusing that technically, it's not even available via GetMatchHistory - you have to search by it, it's not even returned.

    I'm thinking if I can't find it anywhere else, I can use my extra API calls (I'm only hitting the API 35,000 times per day right now) to pull that info and parse it up - fortunately you only need to search for VH/H, everything else is medium!

  2. #2
    Basic Member
    Join Date
    Feb 2012
    Posts
    57
    I've been using game skill, but I've been settling for partial samples instead of having a complete scrape. Do you know about the issues surrounding GetMatchHistory's date_max and date_min settings and do you have a plan on how to get around them?

  3. #3
    Basic Member MuppetMaster42's Avatar
    Join Date
    Nov 2011
    Location
    Australia
    Posts
    585
    Quote Originally Posted by Aardvarki View Post
    I'm currently running a full scrape of all historical match data (chugging along at 2.6s per 100 matches parsed/db inserted *snip*
    I hope you're not fetching 100 matches per 2.6s directly from the API..

  4. #4
    Basic Member
    Join Date
    Mar 2013
    Posts
    28
    Quote Originally Posted by Phantasmal View Post
    I've been using game skill, but I've been settling for partial samples instead of having a complete scrape. Do you know about the issues surrounding GetMatchHistory's date_max and date_min settings and do you have a plan on how to get around them?
    You're talking about the fact that date_min and date_max are typecast to DATE type on steam's end and so you can only use them to get the first page of results for a given DAY? I just spent an hour learning that the hard way and was about to come on and post a complaint about it.

    I don't have an answer for it, because Start_At_Match_ID also doesn't work.


    Quote Originally Posted by MuppetMaster42 View Post
    I hope you're not fetching 100 matches per 2.6s directly from the API..
    Of course I am. How else can I ever catch up to live data? GetMatchHistoryBySequenceNum returns 100 matches (with full detail) from a single API call. I'm only hitting the API 35,000 times per day, but I'm pulling back 3,500,000 matches a day.
    Last edited by Aardvarki; 03-12-2013 at 11:51 PM. Reason: answered MM42 too

  5. #5
    Basic Member
    Join Date
    Feb 2012
    Posts
    57
    Quote Originally Posted by Aardvarki View Post
    You're talking about the fact that date_min and date_max are typecast to DATE type on steam's end and so you can only use them to get the first page of results for a given DAY? I just spent an hour learning that the hard way and was about to come on and post a complaint about it.

    I don't have an answer for it, because Start_At_Match_ID also doesn't work.
    Yeah, that would be the issue. I've been trying to think of a way around it, but I'm not confident that there's a way that's guaranteed to grab every match, and I'm worried any possible solution might be very call intensive.

  6. #6
    Basic Member
    Join Date
    Mar 2013
    Posts
    28
    I agree. Once I'm caught up to current with match details, I think I can just ping the API every few minutes to grab the latest very high/high skill matches. Any other solution for historical data would be very call intensive.

    The method I was considering is using date_min/max and an account_id - no one can play 500 matches in a day, so you can get the skill rating of every game that has had at least one public player in it.

    Of course, this has a lot of potential for wasted calls, as there could potentially be 10 public players in a game - that game would be returned in 10 different calls. Using some database magic, I could probably get the system to be pretty good at predicting account IDs that play a lot of v-high/high games... But this would tend to skew my results towards higher skill games, and would still leave gaps where I can't determine a skill rating.

  7. #7
    Basic Member
    Join Date
    Feb 2012
    Posts
    57
    Every few minutes on a High and Very High Get Matches search cross referenced with a complete list of games for the same time period from Sequence is my best guess too. I'm not 100% sure it won't run into problems, and it's obviously useless for getting the skill date of older matches, but it's the only possibility I see that has a chance of getting full coverage in a remotely reasonable number of calls.

  8. #8
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    814
    I will be interested to see what you end up doing. If possible, please do the community a massive favour and make a dump of your data (distribute via torrent or something). I am more than happy to help seed, although my seedbox will have a pretty shocking speed, it never turns off. =P

  9. #9
    Basic Member
    Join Date
    Nov 2012
    Posts
    35
    There's a huge dump here which might speed up getting up to date for you somewhat (should have all games until December 2012)

  10. #10
    Basic Member
    Join Date
    Mar 2013
    Posts
    28
    I actually downloaded the dump already, but my scripts had nearly made it as far as the dump got by the time I finished downloading it (it took me a solid week averaging under 60kb/s) - between that and standards differences (I use mySQL) I decided in the end not to use it. However, it's awesome that you put it out there, and I wish I could've made use of it.

    So, I coded and deployed my skill scraper (rate-limited to one call per 1.8 seconds, running concurrently with my match details scraper) - I've started to run into some issues however with the use of "start_at_match_id" to get current matches..... The returned dataset is sorted descending by Match_ID, however the Match_ID cannot be used to guarantee existence of all matches. Here's why:

    Match_ID is assigned at the START of a match, however a match will not return in GetMatchHistory until it has completed. Let's say Match ID 150 (Medium Skill) starts at noon. Let's say it runs for 90 minutes (a VERY long game). Between 12:01 and 13:30, over 500 newer Medium Skill matches start and finish (all with higher Match IDs). By the time this match is added to the history file, it has already been pushed off by newer matches. At no time can this Match be accessed via GetMatchHistory using only the "start_at_match_id" and "skill" search options.

    So, while my skill scraper will be good at tracking current match skill data, it's still not perfect.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •