Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 29

Thread: A current (for now) MySQL database dump (to April 2013)

  1. #1
    Basic Member
    Join Date
    Mar 2013
    Posts
    28

    Post A current (for now) MySQL database dump (to April 2013)

    Dota_2_Matches_to_April_2013.torrent

    So this dump is big. 175 gigs big. I tried to RAR it but even with "Best" compression, winRar only predicted 18% compression so I opted not to wait the 10 hours it said it would take.

    This is a (mostly) complete dump of all matches, match players, ability level ups, picks and bans, extra units (spirit bear), heroes, items and cluster data.

    There are eight tables, each in their own .sql file. Please, if you're going to download this and you don't think you're going to use all of them, don't download the ones you won't use. For instance, the ability upgrades table file is over 100GB - if you don't need this information, don't download it. I'm starting this as the only seed, so I don't want to piss off my ISP.

    This set of data has been parsed and all matches where I can definitely identify that stats are not being recorded have been removed (bot games, games with <10 players, practice games, and games with <5 min duration).

    Here's a little information about each table and its columns:

    Cluster (Cluster_ID, Location): One column for cluster code, another for the Region the cluster code is located in. Useful to convert from the Cluster # on a Match to US East or China or Australia, etc.
    Hero (Hero_ID, long_Name, short_name): A quick breakdown from each numeric Hero_ID to a full name (npc_dota_hero_antimage) and a short name (antimage) - I use the short name to access hero image files on steam's site.
    Item (Item_ID, full_item_name, short_name, cast_range, cast_point, cooldown, mana_cost, price, quality, side_shop): Information for each item. A lot of the later data was parsed poorly from the internal files, so it may not all be complete/accurate. I don't use a lot of it except the short_name for accessing image files on steam's servers.
    Match(Match_ID, Match_Seq_Num, Season, Radiant_Win, Duration,Start_Time, Tower_Status_Radiant, Tower_Status_Dire, Barracks_Status_Radiant,Barracks_Status_Dire, Cluster, First_Blood_Time, Lobby_Type, Human_Players, LeagueID, Positive_Votes, Negative_Votes, Game_Mode, Complete_Record, Skill): Most of the fields are self-explanatory and are pulled directly from the API. The only two that aren't are "Complete_Record" and "Skill". Complete_Record was for my own records, to tell me if the match was successfully pulled completely. Every match in this database SHOULD be a '1'. If it's not, double check that all the records (in all other tables) for the match have complete data before using the match. "Skill" is the skill rating that the match was played on. 1 is normal, 2 is High, 3 is Very high. This data is very incomplete, but there is still a very large number of matches with skill data.
    Match_Picks_Bans(Match_ID, PickCode, Hero_ID): The only field on this table that needs explanation is PickCode: The order of the pick/ban is the highest six bits (only 5 used) the second lowest bit is the team, and the lowest bit is whether it was a pick (1) or a ban (0). If you want to convert this to the original three pick_ban fields, do the following: Order = int(PickCode / 4); Team = int((PickCode % 4) / 2); isPick = PickCode % 2;
    Match_Player(Match_ID, Player_Slot, Account_ID, Hero_ID, Item_0, Item_1, Item_2, Item_3, Item_4, Item_5, Kills, Deaths, Assists, Leaver_Status, Gold, Last_Hits, Denies, Gold_Per_Min, XP_Per_Min, Gold_Spent, Hero_Damage, Tower_Damage, Hero_Healing, Level, Game_Won): The only field on this table that is not directly from the API is "Game_Won". This field makes it easier to determine from this table alone whether the player won the game or not. Using this field, a query for a player's wins or losses only requires this one table (select count(1) as played, sum(game_won) as wins from match_player where account_id = 9020491; will pull back my games played and wins).
    Match_Player_Ability(Match_ID, Player_Slot, Level, Ability, Time): This is pretty self-explanatory. This table is absolutely enormous - potentially 250 records per match. Don't download it unless you have a plan for it.
    Match_Player_unit(Match_ID, Player_Slot, UnitName, Item_0, Item_1, Item_2, Item_3, Item_4, Item_5): This table should be self-explanatory. All these fields are taken directly from the API. Currently this table is only used for Spirit Bears.

  2. #2
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    814
    Going to try seeding this. This will serve as a good test to break in my new server.

  3. #3
    Basic Member GelioS's Avatar
    Join Date
    Aug 2012
    Posts
    130
    Aardvarki, could you make separated file with Cluster Data (id, location)?

    DotA 2 Web API Processing with PHP
    DotA2 Statistic - Professional leagues statistic
    follow @dota2statistic

  4. #4
    Basic Member
    Join Date
    Mar 2013
    Posts
    28
    Technically, you could always download the torrent and just choose the cluster file, but that would mean downloading a 400KB torrent for a 3KB file.

    I'll be nice and give you a direct link for the cluster file.

    cluster.sql

    If you want to create this table in your own database (the dump file uses my database name 'csdb' (Creep Stats Database) by default), you'll just want to grab the lines from "Create Table" through "Unlock Tables" out of the middle.
    Last edited by Aardvarki; 04-16-2013 at 06:25 AM.

  5. #5
    Basic Member GelioS's Avatar
    Join Date
    Aug 2012
    Posts
    130
    Aardvarki, my respect to you! thx!

    DotA 2 Web API Processing with PHP
    DotA2 Statistic - Professional leagues statistic
    follow @dota2statistic

  6. #6
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    814
    Seed faster!

    =P

    I can't seed until I get a copy, and for some reason... it is grabbing the largest table first even though it is put on low priority.

  7. #7
    Basic Member
    Join Date
    Mar 2013
    Posts
    28
    I've actually noticed that your client is only grabbing one block at a time. It's ramping up from zero to about 1MB/s (I can upload at about 4.5MB/s) before it finishes the block and then winds down to almost zero again (likely as your system hashes the block to make sure it was received correctly). The bandwidth graph looks like a sawtooth pattern, and has for hours. I'm not sure if it's your settings or torrent software (or perhaps mine but I've never seen this before and uTorrent is the only software I use). I've played with my settings all over the place and none of them seem to make much difference.

  8. #8
    Basic Member
    Join Date
    Mar 2013
    Posts
    28
    I made some changes and the sawtooth continues, but disabling "Initial Seeding" seemed to have improved the throughput by about a factor of 3. Still very odd! dl.png

  9. #9
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    814
    Haha. Well it is a very old box (read low RAM and slow CPU), and I have a feeling its connection in Hungary is not very reliable. From my end, you kept disappearing.

    It is using Transmission (so I can control and automate via RPC and have a webUI). It will eventually get there. =P

    As of writing, it is holding 1.5MB/s



    EDIT: At least it is grabbing some of the other tables now.
    Last edited by CvP; 04-17-2013 at 06:59 AM. Reason: removed the pic - CvP

  10. #10
    Basic Member
    Join Date
    Aug 2012
    Posts
    46
    Not sure I would leave the IP and remote port in the screenshot.
    This is a fairly friendly place and I'm sure there is a password.. but relying on the security of whatever box that is and whatever software it contains...

    /shrug.

    Excellent work Aardvarki. Thankyou for attempting to seed jimmy : )

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •