Page 2 of 2 FirstFirst 1 2
Results 11 to 14 of 14

Thread: A lot less people share match history than I expected

  1. #11
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    814
    Quote Originally Posted by adrianlegg View Post
    I asked on dotametric, but since it's Your thread, I'll repost.

    What Your data really means as in:
    Having 20 883 matches I'd expect sum of (anon+total id_ed) to be 208 830 (number of games *10). 122k+38k does not sum up to 200, I assume incomplete games, yet no data about it.
    How can You have more unique users, then id'ed ones? Aren't they swapped in Your image?
    What methodology do You use to differ from 1 anon user, then another?

    For example:
    match1: anon,anon,anon,anon,anon vs 1,2,3,4,5
    match2: anon,anon,4,5,6 vs anon,anon,anon,anon,anon

    In this example, do You count 8 id'ed users, 6 unique and 12 anon => 12/(12+8=20)=60% ?
    That's wrong assumption, as anons in game 1 and 2 might be same 5man stack, so that would be 7 unique anons => 7 / (7(unique anon)+6(unique)=13) => 54% of players in this 2 matches don't share stats.
    And what if we would have 8 more matches with full anons only? ( so 8ided, 6 unique, 92 anons) Will You call it 92% ? Those as well might be just 10 unique anons, which would mean 10/(10+6) = 62.5%

    Unless You state Your methodology, You cannot say "% of userbase are sharing stats", You might ofc say "% of users in matches are anon", but that's completely different statement.
    The bug was not apparent at lower match numbers, but it is quite obvious now. I'm not quite sure why, but my enum column in mysql was not working, so players in the player slots higher than 4 all ended up null in a non-null column somehow, and because my primary key includes the player slot... we were dropping half of the users. *_*

    I'll fix this when I land. I'll probably nuke the table and have it parse by sequence backwards to refill.

    The methodology is simple. NULLs by definition can't be unique, so they are all anon. Unique players are as named.

    Sorry for confusion. We will see how the complete dataset changes this number, but I have a feeling that it shouldn't swing to far. I am prepared to eat my words though =P

  2. #12
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    814
    Quote Originally Posted by hoveringmover View Post
    I feel like cosmetics(and even that is pretty sloppy most of the time) and preventing Dotabuff from stealing match information(what sensible company would allow this, really?) are the only 2 things they've done right lately.
    I agree that the cosmetics can be awesome.

    I also agree that players have the right to protect their data (even if it is not tied to their real life identity), but I do not agree that someone can make that choice for them by default without telling them. Valve should have done the responsible thing here are had a single prompt on first opening asking if they wanted their data shared. Accusing people of "stealing" data though is harsh handed and narrow minded. If it is available via API, then it is available for use. We were also recommended to parse replays if there was additional stuff not available in the API.

    Why are you bringing this up again anyway? It's not particularly related, and I can't help but feel that it is aimed at starting another flame war.

  3. #13
    Basic Member
    Join Date
    Jan 2012
    Posts
    189
    Quote Originally Posted by jimmydorry View Post
    Why are you bringing this up again anyway? It's not particularly related, and I can't help but feel that it is aimed at starting another flame war.
    I agree. Setting the sharing option to default off is a bad idea anyway.

  4. #14
    Basic Member jimmydorry's Avatar
    Join Date
    Dec 2012
    Posts
    814
    I'm pretty happy with my data now. It took a while to go back and re-parse the incomplete matches (missing players) whilst also keeping up with the new matches. There is a about a day of data missing (when the parser fell over), but I should pick that up at some later point.

    65.9% is still quite a large number. While the sample size was smaller, I have seen this number rise to 80%... but never drop below 50%.

    Any thoughts on how relevant this phenomena is to the accuracy of the data you can pull from the API?


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •