Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: Better late than never - complete start->December 2012 data dump!

  1. #1
    Basic Member
    Join Date
    Nov 2012
    Posts
    35

    Better late than never - complete start->December 2012 data dump!

    It's here.

    Very similar format to before (will redo readme if people really want us to), but all account ids are anonymised, meaning that field on account is a) pretty dull reading b) now a bigint to support the placeholder value Valve are using. Total size of data compressed is ~15gb, full database is probably around ~70gb (if someone can post exact numbers below I'll update!).
    We should have found and filled in the "quirky" games (matches with no matchplayers), although we're still lacking on some (most) of the skill data - we'll hopefully have come up with a decent way to share large data by the time we can fill these in.

    For now though an apology for taking so long; large chunks of data are a massive pain, so we've given up trying to package it all up neatly into managable chunks because not doing so means we can release it now. People who only need a portion of it are probably better off to use the new API features to get exactly what they're after, this dump is aimed far more at people who'd like a complete copy of the data.
    We do still intend to work out a way to get access to en masse recent data available, but for now the recent API changes make this considerably less of an issue than it was a few weeks ago.
    Last edited by Sproinknet; 03-04-2013 at 05:32 AM.

  2. #2
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    741
    Thanks for release.

    Too bad it's in PostGRE.

    If I can assume that the data is fully anonymised, that would mean no player name, thus no funky characters to "break" MySQL compatibility. Any chance of releasing as a MySQL dump?

    ^_^

    EDIT: Torrent has no seeds. *_* Hope you can get it fully initially seeded... or perhaps trying using 7zip to compress it and split into multiple volumes that can be uploaded to filehosts.

    A similarly large backup compressed from 15.2 to just 2gb with the right settings.

    Last edited by jimmydorry; 02-20-2013 at 03:13 PM.

  3. #3
    Any insight on some of the fields? For instance, matches has "tower_status_radiant" which is clearly an 10 bit bitfield for the towers, but there are 11 towers... and which tower is represented by which bit?

  4. #4
    Basic Member
    Join Date
    Nov 2011
    Posts
    113

  5. #5
    Ah, so math fail regarding 10 vs 11 bits. Thanks for the link! :-)

    Any similar threads on cluster, lobby_type, season, leagueid?

    A little slow on the uptake, actually checked forums and most of my noob questions are answered now. :-)
    Last edited by mattieshoes; 03-03-2013 at 12:46 AM. Reason: I am dumb.

  6. #6
    Basic Member
    Join Date
    Nov 2012
    Posts
    35
    If you've not found answers to your list yet:
    • Cluster I don't think anyone knows much about
    • lobby_type follows this, and reading at least that first post if not that entire thread is likely very helpful!
    • Season is undocumented, but I believe largely uninteresting
    • leagueid you should be able to get some insight into from the GetLeagueListing API, and the link above has some details on it also

    Hope that helps!

  7. #7
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    741
    Quote Originally Posted by Sproinknet View Post
    If you've not found answers to your list yet:
    • Cluster I don't think anyone knows much about
    • lobby_type follows this, and reading at least that first post if not that entire thread is likely very helpful!
    • Season is undocumented, but I believe largely uninteresting
    • leagueid you should be able to get some insight into from the GetLeagueListing API, and the link above has some details on it also

    Hope that helps!
    • Cluster is the server id that the replay sits on. (Refer to replay section of that thread) It's not use-able now though...
    • Season is undocumented, but is probably going to be related to the ladder
    Last edited by jimmydorry; 03-04-2013 at 05:43 PM. Reason: forgot my closing tag

  8. #8
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    741
    I've had a copy of the dump for quite a few days now. I finally have some time to try playing with it.

    So where can we download a portable copy of postgre? What versions are supported by this dump?

    I am getting this using pgadminIII on a postgresql 8.3.3

    Code:
    F:\PostgreSQLPortable\usbpg\apps\pgAdminIII\pg_restore.exe -h localhost -p 5432 -U postgres -d test -v "C:\Users\jimmydorry\Downloads\Dota_2_matches_to_November_2012.pgbackup"
    pg_restore: [archiver] unsupported version (1.12) in file header
    
    Process returned exit code 1.

  9. #9
    Quote Originally Posted by jimmydorry View Post
    I've had a copy of the dump for quite a few days now. I finally have some time to try playing with it.

    So where can we download a portable copy of postgre? What versions are supported by this dump?

    I am getting this using pgadminIII on a postgresql 8.3.3

    Code:
    F:\PostgreSQLPortable\usbpg\apps\pgAdminIII\pg_restore.exe -h localhost -p 5432 -U postgres -d test -v "C:\Users\jimmydorry\Downloads\Dota_2_matches_to_November_2012.pgbackup"
    pg_restore: [archiver] unsupported version (1.12) in file header
    
    Process returned exit code 1.
    I'm having the same issue. Googling brings up articles about hosting the data on Heroku but I would rather have a local dump. Does anyone have experience with hosting a .pgbackup file locally?

  10. #10
    Basic Member
    Join Date
    Mar 2013
    Posts
    28
    My google-fu indicates that pg_restore version 1.12 corresponds to a postresql server version of 9.0. I think you need to be running the same version that Sproink was when he exported.

    If there's interest, once I'm caught up a little further, I'll gladly start a torrent for a mySQL version of the backup as well. I'm a little ahead of where this dump stops, but not enough to warrant a separate giant torrent. Maybe once I get up to Match Sequence Number 104,529,208, I can dump everything prior. (For those who care, 104,529,208 is the first match recorded under the new API Patch which includes Ability_upgrades)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •