Jump to content

More Hockey Stats

Blogger
  • Posts

    143
  • Joined

  • Last visited

Blog Entries posted by More Hockey Stats

  1. More Hockey Stats
    Original post
     
    Nothing derails from the regular blogging like a research that you've been so eager to start, but were putting off to finish simpler and more materialistic stuff, but then you couldn't hold it off any longer.

    But then, I realized that in order to do that research I need to perform a lemma research. Just like its mathematical namesake, a lemma research is one that is done for the sake of a bigger one, yet producing a useful result by itself.

    So I noticed that I needed the penalty box data. Who was in the penalty box during a power play goal? More specifically, who was responsible for that goal, i.e. who left the penalty box as the result (or had a non-matched 5-minute major penalty during it). Once there was a feed that had penalty box data, but it was only since 2010 or so, and it seems to have become discontinued. Therefore I gathered the game data I already had and just went through the power play goals from 1987/88 through now, and tried to match them with penalties, while weeding out all the cancelling and all the irrelevant (e.g. misconduct) ones.

    I am glad to tell that I was able to create a consistent, at least at first look, dataset. But before that I had to go in and correct penalty box entries for about 160 goals manually. About 35 of them were just forced to enter the penalty data manually, because any algorithm assigning the player in the box to the goal would be ambiguous. However I also discovered about 125 goals (120 pre 1999/00, when extra reports were introduced, and only five since) that should not even be marked as power play goals. There was no matching penalty. Of course, the mistake can be on the penalty data in the NHL report: the time of the penalty may be reported wrongly. But until further notice, these goals should not be considered PPG:
     
    GameID P Time Scorer Team 198720125 3 9:19 BOB SWEENEY Boston Bruins 198720134 3 18:58 RICK TOCCHET Philadelphia Flyers 198720239 1 1:23 STEPHANE RICHER Montreal Canadiens 198720367 1 4:51 DALE HAWERCHUK Winnipeg Jets 198720388 1 17:46 TROY MURRAY Chicago Blackhawks 198720389 3 3:55 PAUL MACLEAN Winnipeg Jets 198720431 2 6:56 PETER TAGLIANETTI Winnipeg Jets 198720449 2 12:34 MIKKO MAKELA New York Islanders 198720471 3 10:51 ANTON STASTNY Quebec Nordiques 198720484 2 13:38 MARIO LEMIEUX Pittsburgh Penguins 198720528 1 6:32 CHARLIE SIMMER Pittsburgh Penguins 198720610 3 3:08 LAURIE BOSCHMAN Winnipeg Jets 198720735 3 8:10 MIKE FOLIGNO Buffalo Sabres 198720755 1 3:06 GARRY GALLEY Washington Capitals 198720755 2 0:29 GERALD DIDUCK New York Islanders 198720787 1 17:02 AARON BROTEN New Jersey Devils 198720799 2 0:49 JIMMY CARSON Los Angeles Kings 198720802 2 3:24 MIKE FOLIGNO Buffalo Sabres 198720804 1 18:03 PAT VERBEEK New Jersey Devils 198730134 1 7:52 BRUCE DRIVER New Jersey Devils 198730223 3 9:47 MARK JOHNSON New Jersey Devils 198730314 2 12:31 CAM NEELY Boston Bruins 198820088 2 6:17 RANDY MOLLER Quebec Nordiques 198820088 2 7:11 ANTON STASTNY Quebec Nordiques 198820147 1 15:58 JOHN CULLEN Pittsburgh Penguins 198820150 2 2:29 KEVIN DINEEN Hartford Whalers 198820203 3 12:45 JOE MULLEN Calgary Flames 198820241 2 18:53 DAN QUINN Pittsburgh Penguins 198820307 3 10:12 MARIO LEMIEUX Pittsburgh Penguins 198820318 2 5:24 GAETAN DUCHESNE Quebec Nordiques 198820331 3 7:36 PAUL GAGNE Toronto Maple Leafs 198820489 1 16:20 DALE HUNTER Washington Capitals 198820542 1 18:13 DOUG EVANS St. Louis Blues 198820727 3 7:49 PAUL MACLEAN Detroit Red Wings 198820803 2 6:58 DOUG SMITH Vancouver Canucks 198820821 3 17:43 BRENT FEDYK Detroit Red Wings 198920146 1 5:14 KEVIN DINEEN Hartford Whalers 198920158 1 5:26 TROY MURRAY Chicago Blackhawks 198920220 2 15:53 NEAL BROTEN Minnesota North Stars 198920238 1 2:39 GREG PASLAWSKI Winnipeg Jets 198920303 2 4:55 PAT ELYNUIK Winnipeg Jets 198920475 3 8:25 BRIAN MULLEN New York Rangers 198920543 3 18:46 AL MACINNIS Calgary Flames 198920667 1 3:04 JEREMY ROENICK Chicago Blackhawks 198920749 1 8:18 CRAIG JANNEY Boston Bruins 198920818 2 1:25 JOHN OGRODNICK New York Rangers 198930231 2 13:43 TRENT YAWNEY Chicago Blackhawks 198930322 2 18:01 JARI KURRI Edmonton Oilers 199020005 3 19:54 RAY SHEPPARD New York Rangers 199020056 1 15:44 DAVE TAYLOR Los Angeles Kings 199020262 2 11:20 BOBBY HOLIK Hartford Whalers 199020612 3 7:18 JOHN CHABOT Detroit Red Wings 199020636 1 15:02 BRIAN LEETCH New York Rangers 199020647 2 9:16 KENNETH JR HODGE Boston Bruins 199020704 2 12:47 KELLY KISIO New York Rangers 199020716 3 0:34 JOE SAKIC Quebec Nordiques 199020762 2 5:16 MARK RECCHI Pittsburgh Penguins 199020815 2 19:28 KEVIN STEVENS Pittsburgh Penguins 199030222 3 2:17 DINO CICCARELLI Washington Capitals 199030235 3 14:04 BRIAN PROPP Minnesota North Stars 199120059 3 2:20 DOUG GILMOUR Calgary Flames 199120070 2 10:26 DAVE GAGNER Minnesota North Stars 199120093 3 18:48 DARREN TURCOTTE New York Rangers 199120238 3 0:18 PAUL RANHEIM Calgary Flames 199120313 3 11:52 JEREMY ROENICK Chicago Blackhawks 199120407 1 10:39 BOBBY CARPENTER Boston Bruins 199120504 2 18:42 JIMMY CARSON Detroit Red Wings 199120617 2 16:43 MARTY MCSORLEY Los Angeles Kings 199120625 3 18:17 TODD ELIK Minnesota North Stars 199120638 1 13:15 GREG ADAMS Vancouver Canucks 199120755 3 10:48 DOUG GILMOUR Toronto Maple Leafs 199130143 3 13:57 AL IAFRATE Washington Capitals 199220002 2 5:17 SCOTT STEVENS New Jersey Devils 199220013 1 12:08 RICK TOCCHET Pittsburgh Penguins 199220023 2 17:32 MARIO LEMIEUX Pittsburgh Penguins 199220068 2 17:22 MIKE GARTNER New York Rangers 199220091 3 19:34 JOE JUNEAU Boston Bruins 199220121 1 6:14 MIKE GARTNER New York Rangers 199220149 2 18:44 CHRIS KONTOS Tampa Bay Lightning 199220181 3 6:21 ULF DAHLEN Minnesota North Stars 199220287 1 16:56 CHRIS KONTOS Tampa Bay Lightning 199220337 2 5:42 ALEXANDER MOGILNY Buffalo Sabres 199220388 1 15:45 GREG HAWGOOD Edmonton Oilers 199220401 2 12:48 JEFF NORTON New York Islanders 199220562 3 9:19 CHRIS KONTOS Tampa Bay Lightning 199220563 1 9:25 TEPPO NUMMINEN Winnipeg Jets 199220589 3 9:18 ROD BRIND'AMOUR Philadelphia Flyers 199220595 1 7:20 CRAIG JANNEY St. Louis Blues 199220908 2 16:28 VALERI KAMENSKY Quebec Nordiques 199220986 1 15:44 JIRI SLEGR Vancouver Canucks 199320065 1 15:37 DENIS SAVARD Tampa Bay Lightning 199320092 3 11:22 SERGEI FEDOROV Detroit Red Wings 199320636 1 16:40 TIM SWEENEY Mighty Ducks Of Anaheim 199320643 3 16:50 MARTIN LAPOINTE Detroit Red Wings 199320740 2 7:53 KEITH TKACHUK Winnipeg Jets 199320840 1 17:17 SERGEI ZUBOV New York Rangers 199320905 3 18:16 KEVIN STEVENS Pittsburgh Penguins 199420025 2 1:32 KELLY BUCHBERGER Edmonton Oilers 199420072 2 7:27 STEVE THOMAS New York Islanders 199420087 1 9:52 KEITH TKACHUK Winnipeg Jets 199420332 2 2:35 GRANT LEDYARD Dallas Stars 199420332 2 7:11 RAY SHEPPARD Detroit Red Wings 199520189 2 15:41 LUC ROBITAILLE New York Rangers 199520226 3 17:34 CHRIS GRATTON Tampa Bay Lightning 199520490 2 9:41 TODD BERTUZZI New York Islanders 199520539 3 13:25 MARK MESSIER New York Rangers 199520560 1 9:03 VYACHESLAV KOZLOV Detroit Red Wings 199520694 3 15:43 RON FRANCIS Pittsburgh Penguins 199520744 2 6:41 SCOTT MELLANBY Florida Panthers 199520750 2 9:05 NICKLAS LIDSTROM Detroit Red Wings 199520766 1 16:00 BENOIT HOGUE Dallas Stars 199520776 2 5:53 MARIO LEMIEUX Pittsburgh Penguins 199520809 1 12:47 ADAM GRAVES New York Rangers 199520847 2 16:40 KEITH PRIMEAU Detroit Red Wings 199620127 3 12:03 ALEXEI ZHAMNOV Chicago Blackhawks 199620211 2 13:08 GREG ADAMS Dallas Stars 199620466 3 14:50 JOZEF STUMPEL Boston Bruins 199620792 1 11:50 BRENDAN SHANAHAN Detroit Red Wings 199621058 2 1:41 MARTIN GELINAS Vancouver Canucks 199720028 2 18:01 TREVOR LINDEN Vancouver Canucks 199720067 3 16:29 TERRY YAKE St. Louis Blues 199920756 1 6:46 JAMIE LANGENBRUNNER Dallas Stars 200520486 1 17:28 ROB COLLINS New York Islanders 200620150 1 7:59 NATHAN HORTON Florida Panthers 200620627 1 5:00 RYANE CLOWE San Jose Sharks 200821010 1 16:44 MIKAEL SAMUELSSON Detroit Red Wings 201620276 2 12:03 JOHAN LARSSON Buffalo Sabres
    The resulting penalty box data is available on our website, in the Request Analysis section.
    Hopefully, tomorrow, I'll blog about another useful dataset the lemma research has produced.
  2. More Hockey Stats
    Original post.
     
    Part I
      After completing the first part of the lemma research - penalty box - the second part was shorter, easier, but just as useful. I decided to find out the share of time teams spend on average while at even strength, on power play/shorthanded and with empty net. Then given this number, and the number of goals scored in each such situation, I was able to calculate the frequency of EVG/PPG/SHG/ENG or the reverse of it which I called the difficult of such goal.   I scanned the database of all games between the 1999/00 season and today, and all the goals extracted from these games. Penalty shot goals were ignored, regardless if during the game itself, or in post-game shootout. The EN time was calculated as total game time minus goaltender TOI. PP/SH time was deducted from the recorded PP TOI of the players. The EV time would naturally become the total game time minus EN minus PP of both teams.   Then I calculated the difficulty of scoring a goal in each of these situations through the following formula:   DiffTYPE = ( GOALSEV /  GOALSTYPE ) x ( TOITYPE / TOIEV )   where the difficulty of the EV goal is considered "1". Here are the combined results of the difficulties in a table: Season EV PP SH EN 1999 1.000 0.502 3.506 0.162 2000 1.000 0.473 3.387 0.146 2001 1.000 0.492 3.635 0.153 2002 1.000 0.468 3.585 0.167 2003 1.000 0.445 3.127 0.221 2005 1.000 0.535 4.247 0.272 2006 1.000 0.506 4.000 0.228 2007 1.000 0.458 3.597 0.187 2008 1.000 0.438 3.745 0.183 2009 1.000 0.456 4.044 0.192 2010 1.000 0.450 3.517 0.177 2011 1.000 0.460 3.568 0.169 2012 1.000 0.430 3.890 0.169 2013 1.000 0.453 3.209 0.198 2014 1.000 0.427 3.564 0.171 2015 1.000 0.415 3.284 0.158 2016 1.000 0.419 3.252 0.178 2017 1.000 0.427 3.052 0.168
    If you divide 1 by these values you can get the relative frequency of goals scored in each situation.   The dataset containing this data is available on the website, on the Request Analysis page.   So why did I need these two lemmas? That blog post won't be ready any time soon, and I better resume the "Page A Day series'.
  3. More Hockey Stats
    whoops, got the title wrong
    Original post.
    A rule change suggestion
      There's no irreplaceable people. I.V. Stalin   Rushing this one up, because this idea already came to my mind before, but I forgot about it. The age is taking its toll.

    Anyways. Everyone is talking these days about rule changes. I've already expressed a few thoughts on the scoring systems, but I am not original there. Now, however, I want to make a suggestion I haven't seen mentioned yet.

    Allow soccer (baseball, too)-like substitutions in hockey. Allow the coaches to replace players in the original lineup at the start of the game with one of the "healthy scratches", as submitted in the roster sheet, like the one Peter DeBoer recently messed up in the game against Edmonton.

    The substitution goes ONE-WAY. That means that the player that was substituted cannot return to the game. The substitutions may occur:
      During the intermissions During the commercial breaks During a time-out First and foremost this will allow teams to handle early injuries much better. Your D-man got injured at the 7:04 mark of the 1st period? Around 10:00 there will be a commercial break, you can substitute him with one of the scratches!   Second, it may allow coaches to send stronger messages to players they deem slacking. Rather than shorten the roster by benching that guy, you can send an eager healthy scratch in. Of course, then the "slacking" player is benched for the whole remainder of the game.   Third (oh, I did military service, so I have a natural obsession of providing three reasons for each thing), it may give the coaches some extra flexibility if a designated roster player gets slightly injured in the warm-ups. Then a scratch takes his place as usual, but if the original player is fixed by the 1st intermission, he can substitute the starting scratch.   The substitutes will have to come from the "scratch" list with the exception of the emergency goaltending contracts.   Oh, and I am sure the NHL website will make a mess out of it in their game reports.
  4. More Hockey Stats
    Original post.
     
    One of the greatest chess methodologists, if not the greatest one, the sixth World Champion, Mikhail Botvinnik, wrote in one of his books (about the 1948 World Chess Championship Tournament):

    A tournament must go on a uniform schedule, so that the participants would get used to a certain pace of competition. ...

    The Dutch organizers neglected that. They didn't take into account that plenty of free days (because of the holidays, and because the number of the participants was odd) may break that rhythm and take the participant out of the equilibrium.

    When I found out that one of the participants is going to "rest" for six days before the last gameday of the second round, I suggested to my colleagues Mr. Keres and Mr. Smyslov that we would submit a protest together. Alas, they didn't support me! Angrily, I told them then: "You'll see, one of us is going to rest six days in a row at the Hague, and on the seventh day he'll lose without putting up any resistance..."

    And here came true the first part of my prophecy: after the six-day rest, Keres, pale as a sheet, sat at the chess table across from me, worrying, probably, that the second part of it will also come true...

    Keres lost a rather short and lopsided game.
  5. More Hockey Stats
    Original post.
     
    While the series "Website - A Page A Day" is being delayed by all kinds of things, here comes a short post on a different topic.

    Last year, in my opinion, the accuracy shooting competition which included shooting the pack from the goal line into a small hole was, in my opinion a total failure. Mike Smith's spectacular score across the rink did the injustice and provided a false impression this skill contest was any good. Otherwise, the competition was not exciting to say the least.

    Therefore, here's a suggestion to replace it: reverse shootouts.

    Let the goaltenders shed their equipment for once, and let the skaters don it instead. Let's have a competition where the goaltenders skate and attempt to score in shootout, while the skaters try to stop them. I am sure that somewhere in the back of their minds that would fulfill a little dream both parties would have!    
  6. More Hockey Stats
    Original post.
     
    Better less, but better V.I. Lenin
    I've got another rule change suggestion, this one even simpler:

    Allow teams to decline penalty shot awards in favor of a regular power-play.

    I think it adds more tactical variety to the game and discourages penalties on breakaways that are worse in penalty shooting.

    As a side matter, I think: a player who is charged with the offense after which the penalty shot is awarded should still be added a minor penalty (2 minutes) in the statistics.
  7. More Hockey Stats
    I read about this idea on HFBoards and finally got to implement it.
     
    Faceoff stats and Elo ratings.
     
    The available views are:
    * career/specific season
    * per zone/per stick first
     
    Here's a sample table: By stick first on the ice, 2016 season.
    FW - First Stick Wins, FL - First Stick Losses, FP - First Stick Win %
    LW - Last Stick Wins, LL - Last Stick Losses, LP - Last Stick Win %
    TW - Total Faceoff Wins, TL - Total Faceoff Losses, TP - Total Faceoff W %
    FR - Faceoff Rating (TW*TP/100), Elo - Elo rating.
     
    Matt Duchene really surged this season.
     
    # Player FW FL FP LW LL LP TW TL TP FR Elo 1 MATT DUCHENE 239 142 62.73 214 125 63.13 453 267 62.92 285.01 2099.16 2 PATRICE BERGERON 349 262 57.12 392 254 60.68 741 516 58.95 436.82 2078.38 3 RYAN O'REILLY 383 286 57.25 315 220 58.88 698 506 57.97 404.65 2070.72 4 ANTOINE VERMETTE 350 198 63.87 265 172 60.64 615 370 62.44 383.98 2067.26 5 RYAN KESLER 442 320 58.01 300 217 58.03 742 537 58.01 430.46 2065.49 6 MARTIN HANZAL 291 254 53.39 274 185 59.69 565 439 56.27 317.95 2063.27 7 CLAUDE GIROUX 281 218 56.31 433 350 55.30 714 568 55.69 397.66 2057.67 8 JORDAN STAAL 217 170 56.07 215 130 62.32 432 300 59.02 254.95 2057.40 9 JONATHAN TOEWS 268 216 55.37 333 227 59.46 601 443 57.57 345.98 2056.26 10 PAUL STASTNY 341 267 56.09 310 253 55.06 651 520 55.59 361.91 2054.49 11 KYLE TURRIS 226 181 55.53 272 261 51.03 498 442 52.98 263.83 2049.56 12 TYLER BOZAK 201 161 55.52 311 232 57.27 512 393 56.57 289.66 2049.46 13 BRANDON SUTTER 314 261 54.61 273 211 56.40 587 472 55.43 325.37 2049.30 14 MIKKO KOIVU 394 360 52.25 295 203 59.24 689 563 55.03 379.17 2047.14 15 FRANS NIELSEN 187 165 53.12 269 214 55.69 456 379 54.61 249.03 2046.67 16 DEREK RYAN 86 62 58.11 153 110 58.17 239 172 58.15 138.98 2046.05 17 RYAN JOHANSEN 236 224 51.30 297 244 54.90 533 468 53.25 283.81 2043.37 18 TRAVIS ZAJAC 325 293 52.59 267 216 55.28 592 509 53.77 318.31 2042.99 19 JEAN-GABRIEL PAGEAU 211 178 54.24 147 111 56.98 358 289 55.33 198.09 2040.48 20 SEAN COUTURIER 199 162 55.12 170 160 51.52 369 322 53.40 197.05 2038.48 21 BRYAN LITTLE 190 168 53.07 219 157 58.24 409 325 55.72 227.90 2038.16 22 JAY BEAGLE 297 220 57.45 144 114 55.81 441 334 56.90 250.94 2035.44 23 MIKE FISHER 300 249 54.64 240 200 54.55 540 449 54.60 294.84 2034.64 24 CODY EAKIN 165 153 51.89 138 126 52.27 303 279 52.06 157.75 2034.28 25 SEAN MONAHAN 214 189 53.10 346 310 52.74 560 499 52.88 296.13 2031.43 26 ERIK HAULA 163 141 53.62 156 129 54.74 319 270 54.16 172.77 2030.95 27 TOMAS HERTL 67 55 54.92 55 34 61.80 122 89 57.82 70.54 2029.31 28 TORREY MITCHELL 218 209 51.05 120 81 59.70 338 290 53.82 181.92 2028.69 29 JOHN MITCHELL 113 105 51.83 128 90 58.72 241 195 55.28 133.21 2027.15 30 HENRIK ZETTERBERG 207 194 51.62 275 248 52.58 482 442 52.16 251.43 2026.42
  8. More Hockey Stats
    After a period of deep dormancy we welcome the 2018/19 on our website with three major changes:
     
    1. MoreHockeyStats is now a network of sites:
    MoreHockeyStats.com - the unusual statistics from the NHL for the league, the teams, the players, the coaches and the draft. HockeyEloRatings.com - the Elo ratings for teams, coaches and players, in general and for particular stats and situations. NHLErrata.com - the errors we discovered while crawling and testing the NHL data.   2. All our data is now available through a sort of API. When a table is displayed, there is a set of links on how you can access the data:     JSON - receive the data displayed in the table as a JSON via direct link.  CSV - receive the data in a CSV table. Table - display the data as a pure HTML table with a direct link. Link - direct link to the results displayed. These links have a systemic structure and thus can be crawled by a bot.   3. Most of our tables now feature links to personal cards for teams, players, coaches and even the rinks. These cards display the visual changes to the stat displayed in the table for the particular team, player etc. We welcome ideas for the cards in the pages that do not have these cards yet. Here's a sample playercard:       Welcome, and have a great 2018/19 hockey season.    
  9. More Hockey Stats
    Ever since I started collecting the NHL data it was very important to me to validate the collected information. So I created a set of checks that finally formed a whole library to test both the NHL boxscore feeds and the HTML reports.

    I managed to fish out quite a few errors and inconsistencies. Some where systemic, and could be fixed in software, some required a manual intervention, sometimes by editing the source file, sometimes by providing the correct value overriding what the parser would read. These interventions, classified into a variety of types, formed another library.

    But I also wanted to share the errors that I found and the fixes I figured out with the analytics community. So I decided to create a website dedicated to it. It took a while, but finally a couple of days ago I was able to open NHLErrata.com. There you can find:
      An overview of data sources. Information on missing players and events Information of broken reports, players and events Systemic problems encountered with the reports
    Both mentioned libraries, Test.pm and Errors.pm are part of my scrape-to-database package on CPAN.
  10. More Hockey Stats
    Original post.
     
    Just to not let the month of January slip away without another post, I got sentimental and decided to tell a small story about how my website came to life.

    There was a void. A lot of time people on hockey boards would wonder if specific statistics of players and teams were available, and they wouldn't, although the raw data seemed to be there. Then, there was the fantasy hockey world, with its pizzazz, and asking for a predictive tool, - and again, the raw data seemed to be there.

    Now, I am a sysadmin by trade, with occasional forays into software development, and since I've been doing Perl for all of my career, I got a few exposures to the Web development process and to databases. I've got a college degree in Engineering, so that gave me some idea about statistics.

    So I got a look at the publicly available NHL reports, but was unsure of how to use them. I tried some standard database approach, but it wasn't working.

    The turning point came when I attended a lecture on MongoDB. That one turned out to be perfect, with the loosely compiled NHL stats documents, just spill them into the Mongo database. Then extract data from them and summarize them into tables. Store the tables in an SQL database for quick serving on the website. And along came more luck - a lecture on the Mojolicious Perl Web framework which equipped me with an easy solution for how to run a website.

    Thus, I was able to actually implement what I had in mind. First came the spider part, to crawl and collect the data available on NHL.com. Fortunately, I was able to scrape everything before the website's design changed drastically, and the box scores prior to 2002 stopped being available. I got everything from the 1987/88 season on.

    Then, I started writing the parsers,.. and had to take a step back. There was quite a lot of inconsistent and missing reports. Therefore I had to a) add a thorough testing of every report I scraped to ensure it came together, b) look for complementing sources for whatever data was missing. So before I got done with the parsers, I had a large testing framework, and also visited all corners of the hockey-related websites to get the missing or conflicting data resolved, even the online archives of newspapers such as USA Today. Some of the downloaded reports had to be edited manually. Then, NHL.com landed another blow, dropping draft pick information from their player info pages. Luckily, the German version of the website still had it, so I began to scrape the German NHL website too.

    I was able to produce the unusual statistics tables relatively quickly and easily. However I decided that the website will not open without the prediction models I had in mind. Being a retired chess arbiter and a big chess enthusiast I decided to try to apply the Chess Elo rating model to the performances of hockey teams and players. Whether it really works, or not, I don't know yet. I guess by the end of the season I can make a judgement on that.

    In October 2016 I opened my website by using a free design I found somewhere online. Unfortunately, I quickly realized it was not a good fit with the contents the site was serving, so I sighed, breathed some air, opened w3schools.com in my browser, and created my own design. And a CMS too. At least I am happy with the way the site looks now, and even more happier that when someone asks a question - on Twitter, Reddit or hockey forums - whether it's possible to measure a specific metric, I am able to answer, 'Already done! Welcome to my website!'

    At the end I'm a software developer, a web designer, a DBA, a sysadmin, a statistician and an SEO amateur. Oh, and a journalist too, since I'm writing a blog.
  11. More Hockey Stats
    Original post.
     
    The practice of chess tournaments provides two traditional metrics that are used to rank participants beyond their mere scoring. Their names are the Buchholz coefficient and the Sonneborn-Berger coefficient (often called just Berger). They are frequently used as tie-breakers in chess events, however I arrived to completely different application for them for the National Hockey League seasons.

    1. The Buchholz coefficient

    The Buchholz coefficient is simply the sum of the points of your opponents.
     
    B = Σn=1N Pn   So, if you played five games, and your opponents currently have 5, 3, 8, 6 and 6 points, your Buchholz value will be 28. Please note, that the current number of points is always used, not the number of points at the moment of meeting. The outcome of the game does not matter (for that one see the Sonneborn-Berger).

    At first, the usefulness of such a criteria would prompt a raise of the eyebrow. However, it's not used in round-robin all-play-all tournaments as a final tie-break, because, naturally, the coefficient would be the same for all tied parties. It's used in a special format of chess events called the Swiss Tournament, not very popular outside of the realm of board games for purely logistic reason. But then, consider, first, an NFL season. The list of opponents every team plays there over the 16-game season may be quite different. And, whoever would end up with a larger Buchholz coefficient, clearly would've had stronger opposition on the way.

    Now let's go back to hockey. First of all, at the end of the season, although everyone has played everyone, they did so a different number of times. Thus, the sum of opponents' points at the end of the season could be different between teams - including within the same division, if they had a different schedule. So, this could still be a very valid tiebreak. Secondly, the season is so long (82 games, unlike a chess Swiss which is rarely longer than 11 rounds), and that gives us a lot of midway points in time, when the all-play-all has not been completed yet! Here the Buchholz coefficient can clearly show, who has had the stronger opposition up until a certain moment.

    Then, if we look at the remainder of the schedule for each team, and for every game we add the opponent's points we get an excellent remaining schedule strength estimator.

    Wait... there's a caveat.

    Unlike in a chess tournament, where every round occurs for everyone at the same time, and barring very rare circumstances, every participant played an equal amount of games at any point of the tournament, there may be a significant difference in the number of games played by different teams, so summing the opponents up will not work very well. And these opponents also played a different number of games, so their total amount of points is not a very good indicator.

    Fortunately, it's not a big deal. Instead of totals, let's operate with per-game numbers. So the NHL Buchholz Coefficient for a team after N games becomes:
     
    B = (Σn=1N PPGn)/N.    Same applies for the remaining schedule strength, where the per-game numbers of the remaining opposition are summed an averaged.

    So, if the team played three games against opponents who currently are:
    A) 6 points in 4 games, B) 3 points in 3 games, C) 2 point in 5 games, then the team's Buchholz value would be (6/4 + 3/3 + 2/5) / 3 = 2.9/3 ~ 0.967pts.

    Here are the current (Mar 12th 2017) Buchholz coefficients and remaining schedule strengths for the entire 30 times (and note how the Blues stand out with plenty of matchups vs Colorado and Arizona remaining).

    +-----------------------+-----------+-------+-------+
    | Team Name             | PPG       | Buch  | RStr  |
    +-----------------------+-----------+-------+-------+
    | Washington Capitals   | 1.4179105 | 1.119 | 1.133 |
    | Pittsburgh Penguins   | 1.4029851 | 1.117 | 1.127 |
    | Minnesota Wild        | 1.3939394 | 1.090 | 1.070 |
    | Columbus Blue Jackets | 1.3731343 | 1.125 | 1.132 |
    | Chicago Blackhawks    | 1.3283582 | 1.088 | 1.096 |
    | San Jose Sharks       | 1.2985075 | 1.106 | 1.106 |
    | New York Rangers      | 1.2941176 | 1.120 | 1.184 |
    | Ottawa Senators       | 1.2537313 | 1.105 | 1.169 |
    | Montreal Canadiens    | 1.2352941 | 1.122 | 1.097 |
    | Edmonton Oilers       | 1.1791044 | 1.121 | 1.040 |
    | Anaheim Ducks         | 1.1764706 | 1.102 | 1.150 |
    | Calgary Flames        | 1.1764706 | 1.099 | 1.140 |
    | Boston Bruins         | 1.1470588 | 1.115 | 1.151 |
    | Toronto Maple Leafs   | 1.1343284 | 1.114 | 1.150 |
    | Nashville Predators   | 1.1323529 | 1.105 | 1.116 |
    | St. Louis Blues       | 1.1194030 | 1.144 | 0.943 |
    | New York Islanders    | 1.1194030 | 1.142 | 1.103 |
    | Tampa Bay Lightning   | 1.0895522 | 1.121 | 1.134 |
    | Los Angeles Kings     | 1.0746269 | 1.118 | 1.104 |
    | Philadelphia Flyers   | 1.0447761 | 1.122 | 1.179 |
    | Florida Panthers      | 1.0298507 | 1.118 | 1.175 |
    | Carolina Hurricanes   | 1.0000000 | 1.138 | 1.136 |
    | Buffalo Sabres        | 0.9855072 | 1.127 | 1.158 |
    | Winnipeg Jets         | 0.9565217 | 1.110 | 1.143 |
    | Vancouver Canucks     | 0.9558824 | 1.115 | 1.152 |
    | Dallas Stars          | 0.9552239 | 1.119 | 1.100 |
    | Detroit Red Wings     | 0.9545455 | 1.151 | 1.059 |
    | New Jersey Devils     | 0.9117647 | 1.148 | 1.132 |
    | Arizona Coyotes       | 0.8358209 | 1.133 | 1.098 |
    | Colorado Avalanche    | 0.6119403 | 1.128 | 1.164 |
    +-----------------------+-----------+-------+-------+
      In the next installment we're going to talk about the application of the Sonneborn-Berger coefficient to the NHL regular season.
  12. More Hockey Stats
    Original post.
    2. The Sonneborn-Berger coefficient.

    This stranger beast is a metric extensively used for tie-breaks in chess-round robins and as an auxiliary tie-break tool to the Buchholz coefficient in non-round robins. Let's start with the definition.
     
    SB = Σn=1N f(Rn,Pn)   where Rn is the result against the n-th opponent, and Pn is the opponent's points score.
    The function  f(Rn, Pn) is defined as:

    f(Win, Pn)  = Pn
    f(Tie, Pn)  = Pn/2
    f(Loss, Pn) = 0

    The result value evaluates whether the participant performed better against stronger and weaker opposition. Actually, I do have a problem with this criteria as a tie-breaker, in my opinion ALL points are created equal, and it doesn't matter if they came from a contender or a bottom feeder. However, this metric does answer the notorious statements like "This team only shows up for big games" and "This team is only good against garbage opposition."

    So, first of all, for the NHL application, we will modify the function f(Rn, Pn) to:

    f(Win, Pn) = Pn
    f(OW, Pn)  = 2*Pn/3
    f(OL, Pn)  = Pn/3
    f(L, Pn)   = 0

    to account for the overtime point.

    Then, we can calculate the minimal possible SBmin value for a team with the given schedule so far this season, by assigning Wins to be against the weakest teams played, and the OW/OL against the weakest remainder until the sum of W, OW and OL points add up to the number of points the team currently has.

    Similarly we shall calculate the maximal possible SBmax value by assigning Wins to be against the strongest teams played, and the OW/OL against the strongest of the remainder, assuming OT wins are about 1/4 of the whole.

    Then the closer the actual SB is to the SBmin or SBmax we may be able to say whether the team is successful more against the bottom feeders, the top guns, or whether it achieves its points from the whole spectrum available.

    Here is the table describing how this season's teams have their SB positioned between SBminand SBmax.
     
    Team Points SBmin SBopt SB SBmax Pittsburgh Penguins 1.40 44.28 46.48 46.24 53.06 Washington Capitals 1.40 44.70 46.74 47.77 52.89 Minnesota Wild 1.37 42.25 44.36 46.63 50.66 Columbus Blue Jackets 1.37 43.10 45.36 46.44 52.15 Chicago Blackhawks 1.34 41.61 43.90 43.79 50.80 San Jose Sharks 1.31 40.68 42.97 44.16 49.84 New York Rangers 1.30 41.25 43.67 45.55 50.92 Ottawa Senators 1.25 37.84 40.07 41.79 46.78 Montreal Canadiens 1.25 39.37 41.74 41.05 48.87 Anaheim Ducks 1.19 36.86 39.43 40.12 47.15 Calgary Flames 1.18 35.97 38.49 38.20 46.05 Edmonton Oilers 1.16 35.86 38.32 37.43 45.70 Boston Bruins 1.15 34.73 37.23 37.74 44.72 Nashville Predators 1.13 33.28 36.14 38.04 44.72 Toronto Maple Leafs 1.13 34.64 36.99 35.66 44.02 St. Louis Blues 1.12 34.69 37.14 38.52 44.50 New York Islanders 1.12 34.36 36.94 37.94 44.71 Tampa Bay Lightning 1.09 32.62 34.98 35.41 42.06 Los Angeles Kings 1.07 32.10 34.66 33.56 42.34 Philadelphia Flyers 1.04 31.26 33.56 32.01 40.48 Florida Panthers 1.03 30.89 33.12 30.95 39.82 Carolina Hurricanes 1.00 29.43 31.78 32.41 38.85 Buffalo Sabres 0.99 30.09 32.49 33.43 39.68 Winnipeg Jets 0.96 27.55 30.35 31.48 38.75 Vancouver Canucks 0.96 28.48 30.91 29.02 38.21 Dallas Stars 0.94 28.05 30.62 31.16 38.34 Detroit Red Wings 0.94 29.12 31.12 30.02 37.13 New Jersey Devils 0.91 27.78 30.15 28.63 37.27 Arizona Coyotes 0.84 25.13 27.24 25.86 33.56 Colorado Avalanche 0.61 17.90 19.74 19.98 25.25
    Once again, we use Point Per Game values because the teams and their opponents have a different number of games played at most of the moments within a season.

    We would dare to make one more step forward and claim that the team that performs closer to SBmax seem to have a coach problem (notable differences highlighted in green in the table above). The roster is there to compete against the best, but the points aren't trickling in at a pace good enough against the fodder. Similarly, if the SB value is closer to SBmin is more likely to have a GM problem (notable differences highlighted in blue in the table above), that its roster is not good enough to compete, but the coach is able to squeeze close to the maximum out of it. However, it is natural to win more games against the weaker teams, so we set the balance point at SBopt = (SBmax + 3*SBmin) / 4;

    Wrapping up the talk about the Buchholz and the Sonneborn-Berger coefficients we would like to state that these values have an almost entirely descriptive value and without any predictive capability, with a small exception of the Buchholz-based remaining schedule strength metric. And even then, it's sort of a 'descriptive prediction'.

    Please see more Buchholz and Berger-Sonneborn data on the website!
  13. More Hockey Stats
    Original post.
     
    Frequently, the importance of carrying momentum over an intermission can be heard being talked about. I thought it were possible to measure this harmony with algebra, so I tried to do that. I choose to analyze a very specific question:

    If the regulation of a game ends in a tie, other than 0-0, how frequently would the team that tied the game with the last regulation goal win in overtime. 

    We would define the team that tied the game as the one having the momentum. We would define the other team as the one trying to show resilience. For answering the question, we analyzed the outcome of games of seasons 2007/08-2016/17 (including the ongoing playoffs). We discard the games that end in a shootout, because their outcome depend truly more on the skill of the shooting players/goaltenders rather than the whatever momentum might've been accrued.

    The results of the analysis are displayed in the table below, per season, per the time frame during which the last tying goal was scored: in the last two, five, or ten minutes, in the last period, or in one of the first two. The numbers show the percentage of wins by the team having the momentum and the number of games falling into that specific segment. Also we display a separate column and a separate row for playoffs game, although a finer granularity is not really possible because of the sample size (as of 5/1/17).
     
    Season   2        5        10       20       1st/2nd  total     totalPO 2007     54.2/24  57.9/19  52.9/34  53.8/13  52.6/38  53.9/128  31.2/16 2008     43.5/23  48.1/27  45.2/31  53.8/13  40.0/40  44.8/134  25.0/16 2009     42.9/28  56.5/23  72.7/22  64.7/17  53.7/41  56.5/131  58.8/17 2010     48.6/37  54.2/24  47.1/34  40.7/27  56.8/44  50.0/166  59.1/22 2011     50.0/24  45.8/24  43.5/23  72.0/25  47.7/44  51.4/140  37.5/24 2012     62.5/16  33.3/15  50.0/22  50.0/14  57.9/19  51.2/86   53.8/26 2013     58.1/43  43.5/23  34.6/26  45.5/22  44.1/34  46.6/148  70.8/24 2014     51.7/29  65.2/23  55.3/38  46.7/15  60.5/43  56.8/148  57.9/19 2015     60.0/40  46.7/30  44.4/36  45.8/24  39.6/48  47.2/178  52.6/19 2016     43.6/39  50.0/28  60.5/38  48.1/27  61.8/68  54.5/200  63.2/19 totalPO  61.4/44  46.7/30  55.8/43  68.4/19  40.9/66  52.0/202  52.0/202 total    51.5/303 50.4/236 50.7/304 51.8/197 51.8/419 51.3/1459 52.0/202   We see that there is no specific "momentum" nor "resilience" capability overall, there is practically no indication on how the OT would end based on which team scored the last GTG. The only two moderate exceptions with decent sample sizes are the second and the sixth columns of the penultimate row. The GTG-scoring team is 27-17 (61.4%) in case it scored the tying goal in the last two minutes, however if the GTG was scored before the last period, as it happened in 66 games, the momentum would obviously not carry over two or more intermissions, and the tying team is 27-39 (40.9%) in these games.   Here is how it looks on a graph: We can see all lines wobbling slightly above the 50 mark. Insufficiently above. Even if we observe the extra 1.3% chance overall (2.0% in playoffs) - wouldn't it be more related to the home/away advantage? I haven't looked at this aspect yet. Maybe another time.  
  14. More Hockey Stats
    Why does the cat lick his balls? Because it can.   Recently I saw a request on a stats of goal posts / crossbars hit per game. While I do have that statistic per player, I haven't one for games, so - since I can - why shouldn't I produce one?

    About half an hour of Perl-ing created the following summary:
      Irons altogether, top:
    AWAY    HOME                P C T OTT  vs BUF  on 2011/12/31: 8 0 8 VAN  vs FLA  on 2010/02/11: 7 0 7 WPG  vs FLA  on 2009/12/05: 6 1 7 TOR  vs BUF  on 2007/10/15: 6 1 7 TBL  vs FLA  on 2006/04/01: 6 1 7 PHI  vs PIT  on 2006/03/12: 7 0 7 COL  vs NYI  on 2005/12/17: 7 0 7 NSH  vs DAL  on 2016/03/29: 4 2 6 PIT  vs NSH  on 2014/03/04: 5 1 6 NYI  vs TBL  on 2014/01/16: 3 3 6 DAL  vs VAN  on 2013/02/15: 5 1 6 STL  vs CAR  on 2012/03/15: 5 1 6 WPG  vs MTL  on 2011/01/02: 6 0 6 OTT  vs VAN  on 2011/02/07: 6 0 6 MTL  vs CAR  on 2011/11/23: 6 0 6 LAK  vs DAL  on 2010/03/12: 4 2 6 NJD  vs TBL  on 2009/10/08: 6 0 6 LAK  vs DAL  on 2009/10/19: 5 1 6 DAL  vs CBJ  on 2009/01/31: 5 1 6 COL  vs CHI  on 2009/11/11: 6 0 6 PIT  vs WPG  on 2008/01/30: 5 1 6 NYR  vs NJD  on 2008/04/09: 4 2 6 STL  vs ARI  on 2007/01/15: 5 1 6   followed by 109 games with 5 irons hit.   Crossbars, top: AWAY    HOME                P C T
    CGY  vs CBJ  on 2008/11/08: 1 4 5 NYR  vs FLA  on 2007/11/23: 0 4 4 PHI  vs FLA  on 2006/12/27: 1 4 5 BUF  vs DAL  on 2017/01/26: 1 3 4 EDM  vs DAL  on 2016/01/21: 2 3 5 TOR  vs STL  on 2015/01/17: 1 3 4 CHI  vs ANA  on 2015/05/19: 1 3 4 BOS  vs VAN  on 2015/02/13: 1 3 4 NYI  vs TBL  on 2014/01/16: 3 3 6 CHI  vs ANA  on 2008/01/04: 2 3 5 CAR  vs FLA  on 2007/11/12: 1 3 4   followed by 50 games with 2 crossbars hit.   The data is extracted from the PBP files of NHL.com, from the year 2005 on.

    However I consider this a one-time effort and will not add this to the website itself.  
  15. More Hockey Stats
    Original post.
     
    One "intangible" being tossed around is "motivation" of the players. Which brings memories of an episode I was witness to.
     
    In 2003/04, in the Israeli Top Tier Chess League (which is indeed no slouch) our club managed to assemble an outstanding team, featuring, among others, a former Champion of Russia and a former Champion of Europe. I was part of the management team, and orchestrated bringing the first of the two, who also happened to be my childhood friend back in Leningrad, Soviet Union.
     
    And so, in round III we were to face our main rival for the title, and the club's GM (also a pedestrian chess player) gathered the team and carried out a pronounced motivational speech, how we have to beat the team we're facing, and so on, and so on.
     
    We lost 1½-4½ without winning a single game and lost any chance for the championship we could have.
  16. More Hockey Stats
    Original post.
     
    Often the general managers, the coaches and the players talk about "intangible values". Sometimes it's about the "locker room contributions". Sometimes it's about "passion". In my opinion, these two are actually negligible and in certain cases even harmful. I remember such references, especially the latter one, made about Israeli soccer players, and that usually meant that the player doesn't have a lot of talent to go along, but contributes a lot of passion into the game. While a passionate play can indeed ignite the play and carry the team along, more often it indicated dumb physical low-talent execution that actually harmed the team.

    However, there is one intangible that I take my hat off in front. It's the one that I always admired, and myself did not have enough in my chess career. It's the ability to go for the throat of the opposition at even momentary display of weakness by it, or as Terry Pratchett put it one of its books, 'Carpe Jugulum1'.

    So what is it, in my understanding? It is the situation when your opponent puts itself into an inferior position in a volatile situation (for example, in a close score), such as by a penalty, or by an icing at the end of a long shift, or by allowing an odd-man rush, and you are able to capitalize on it, yanking any remains the carpet of security from under the feet of the opposition. And then, you continue to hammer the blows on the opposition until it collapses completely. Some also call it the 'killer instinct'. This blog (and this article too) sins with abundance of examples from chess, so let me plant one from tennis... Before the match between Lleyton Hewitt and Taylor Dent at the New York Open, 2005, the latter one complained: 'He displays a poor sportsmanship: taking joy in double errors at the opponent services as well as in unforced errors.' 'I don't care what Dent thinks about it', parried Hewitt, 'I always go for a win, and on the way to it many things are allowed.'

    Machiavelli advised the rulers and the politicians, 'Don't be kind'. Winston Churchill also knew something about achieving the goals when he was recommending: 'If you want to get to your goal, don't be delicate or kind. Be rough. Hit the target immediately. Come back and hit again. Then hit again with the strongest swing you can...'

    All the chess champions had it, the extremes going to Alexander Alekhine, Robert J. Fischer and Garry Kasparov. Many wonderful players that never got the title complained that they couldn't commit themselves to going for the throat of the opponent time after time.

    These qualities were elevated to perfection by the two best teams of the first half of 2010s, by the Los Angeles Kings and the Chicago Blackhawks that split between themselves five cups out of six from 2010 to 2015. Even when both teams seem to be struggling and wobbling, they seemed to be able to instill some kind of uncertainty into their opponents - and certainty into the spectators that these teams are going to be able to make a fist out of themselves that is going to hammer their opponents once they display any kind, and minimal level of weakness. That capability was championed by their leaders, Anze Kopitar, Drew Doughty and Jeff Carter for the Kings, and Jonathan Toews, Patrick Kane and Duncan Keith for the Hawks. When the playoffs series between the Blackhawks and their opponents were tied 3-3, Chicago has always been the favorite to win the game 7 because of their Carpe Jugulum reputation. The Kings gained even more notoriety, first by burying their sword to the hilt into each and every opponent in 2012 en route from the #8 seed to their first Stanley Cup, and then from the reverse sweep they managed against the Sharks that started their 2014 Cup run - which included two more comings from behind, 2-3 and 1-3. And even in 2016, down 1-3 to the Sharks in the first round of the playoffs somehow fans around the league were not ready to commit to the Sharks as the favorites to win the series, because the Kings were a hair away from the Sharks' throat in game 4, from 0-3 to 2-3 in the 3rd period, and then in game 5, they indeed were able to erase the 0-3 deficit into a 3-3 tie.

    Well, that tie didn't hold, the Sharks broke the stranglehold and got a boost that carried them all the way to their own first even Stanley Cup Finals, and that outcome got the Kings' reputation as the Carpe Jugulum team damaged to a degree. So did the Blackhawks' one, losing their game 7 to a team that - along with the Sharks and, for instance, the Washington Capitals - had a reputation of a somewhat nonplussed one - the St. Louis Blues.

    It would be entertaining to see whether the Carpe Jugulum landscape changes this year in the league, and whether the teams who were able to overcome their "benign" reputation will be able to go all the way to the Cup Finals - through their opponents' throats.

    Chess Grandmaster Gennady Sosonko wrote, 'A real professional, having thought about the situation on the board, acts most decisively. He knows, that during the game, there should be no place either for doubt, nor for compassion, because a thought which is not converted into action, isn't worth much, and an action that does not come from a thought isn't worth anything at all.'

    And it's important to remember, Carpe Jugulum is a necessary key to success in a competitive environment only. Albert Einstein used to say that chess "are foreign to me due to their suppression of intellect and the spirit of rivalry."

    1Carpe Jugulum (lat.) - seize the throat
  17. More Hockey Stats
    Original post.
     
      Wild thing, you make my heart sing You make everything groovy, wild thing   Also inspired by Twitter, and because I can, I decided to gather statistics on games with most lead changes* most lead swings** Here, for the 2016/17 season: By most lead swings: AWAY    HOME   Date        Sco LC LS CHI  vs DAL  on 2017/02/04: 5-3 7 3 CBJ  vs OTT  on 2017/01/22: 7-6 11 3 PHI  vs STL  on 2016/12/28: 3-6 7 3 MTL  vs PIT  on 2016/12/31: 3-4 7 3 CHI  vs NYI  on 2016/12/15: 5-4 7 3 ARI  vs PHI  on 2016/10/27: 5-4 9 3   with 60 games at 2 lead swings. Dallas leads the way with 8 games with at least two swings, and Carolina, Chicago, NY Islanders and Winnpeg follow with 7 each.   By most lead changes:       AWAY    HOME   Date        Sco LC LS CBJ  vs OTT  on 2017/01/22: 7-6 11 3 TOR  vs WSH  on 2017/01/03: 5-6  9 2 TOR  vs NYI  on 2017/02/06: 5-6  9 2 NYI  vs DET  on 2017/02/03: 4-5  9 1 CHI  vs COL  on 2017/01/17: 6-4  9 2 CAR  vs NYI  on 2017/02/04: 5-4  9 2 CHI  vs STL  on 2016/12/17: 6-4  9 1 BUF  vs OTT  on 2016/11/29: 5-4  9 1 ARI  vs PHI  on 2016/10/27: 5-4  9 3   with 31 game with at least 7 lead changes. Here we've got Carolina, Chicago and NY Islanders at the lead with at least 6 games with 7 or more lead changes.   And what do we get historically?
      The wildest games, regular season, by lead swings: AWAY    HOME   Date         Sco LC LS   PHI  vs BOS  on 2011/01/13: 5-7  11 5 COL  vs CGY  on 1991/02/23: 8-10 11 5 ARI  vs CGY  on 1991/01/15: 5-7  11 5 PHI  vs COL  on 1988/11/19: 5-6  11 5   with 30 games at 4 lead swings.   The wildest games, regular season, by lead changes: AWAY    HOME   Date        Sco LC LS   DET  vs SJS  on 2005/11/26: 7-6 13 4 MTL  vs COL  on 2002/12/06: 6-7 13 2 COL  vs SJS  on 1997/04/04: 6-7 13 2 ARI  vs PHI  on 1990/01/25: 6-8 13 1 TOR  vs PIT  on 1989/10/25: 8-6 13 3 COL  vs WSH  on 1997/11/18: 6-6 12 3 PIT  vs NJD  on 1993/04/14: 6-6 12 1 BUF  vs CAR  on 1991/12/07: 6-6 12 4 CAR  vs TOR  on 1990/02/14: 6-6 12 2 VAN  vs TOR  on 1988/01/04: 7-7 12 3   with 65 games at 11 lead changes (even numbers can only occur in the ties era).   The wildest games, playoffs, by lead swings: AWAY    HOME   Date        Sco LC LS   STL  vs DAL  on 1999/05/08: 4-5 9 4 MTL  vs COL  on 1993/04/26: 5-4 9 4 EDM  vs LAK  on 1992/04/20: 5-8 9 4   with 33 games at 3 lead swings.   The wildest games, playoffs, by lead changes: AWAY    HOME   Date        Sco LC LS   BUF  vs OTT  on 2006/05/05: 7-6 13 2 PHI  vs CHI  on 2010/05/29: 5-6 11 3 COL  vs SJS  on 2010/04/16: 5-6 11 1 PHI  vs WSH  on 1989/04/11: 8-5 11 3   with 42 games at 9 lead changes (only odd numbers can occur)   The data is presented since the year 1987 - the earliest boxscores from the NHL.com Now this one is going to make it into the website, I just haven't decided in which form.   *   Lead swing is defined as when a team takes the lead after the other team had it.  ** Lead change is defined as when a team loses the lead, even if only temporarily to a tied score.    
  18. More Hockey Stats
    Original post.
     
    In the previous post we mentioned the Goodhart's Law and how it threatens any evaluation of an object. We said that it traps the Corsi/Fenwick approach because it substitutes the complex function of evaluation of a hockey player by a remarkably simple stat - shots.   Goodhart's law is not alone. In any research it is preceded by the two pillars: Popper's law of falsifiability and the Occam's razor. A theory willing to bear any scientific value must comply with both, i.e. to produce hypotheses that can be overthrown by experiment or observation (and then relegated to the trashcan), and to avoid introduction of new parameters beyond the already existing ones. Add Granger causality into the mix and we see that the four Brits presented the hockey analytics society with pretty tough questions that the society - at least the public one - seems to be trying to avoid.   The avoidance will not help. Any evaluation system will not be able to claim credibility unless it complies with the four postulates above, and within that compliance issues measurable projections.   To be continued...
  19. More Hockey Stats
    Happy New Year everyone!
    Original post.
     
    The Elo rating system is the system used for evaluation and comparison of competitors. Up until today it's been mostly applied in the domain of board games, most well-known in chess, but also in disciplines such as draughts or go. The Elo system, named after its inventor, Prof. Arpad Elo, who first published it in the 1950s in the US, is capable to produce a reliable score expectation for an encounter between two competitors who oppose each other.

    For those who are not familiar with chess or draughts, let's take a look on how the Elo ratings work:

    1) In an encounter between two competitors, A and B, assume they have ratings Ra and Rb.

    2) There is a function that maps the expected result for each player given the opponent:
    Ea = F(Ra, Rb) Eb = F(Rb, Ra) where F is a monotonic non-decreasing function bounded between minimum and maximum possible scores, such as 0 and 1 in chess. An example for such a function would be arctan(x)/π + 0.5 .

    Ea+Eb should be equal to maximum possible score.

    In practice a non-analytical table-defined function is used that relates only on the difference between Ra and Rb, and not their actual values. The function can be reliably approximated by the following expression:
    E = 1 / [ 1 + 10(Rb-Ra) / 400 ]
    which works well with ratings in low 4-digit numbers and rating changes per game in 0-20 range.

    3) After the encounter, when real scores Sa and Sb have been registered, the ratings are adjusted:
    Ra1 = Ra + K*(Sa-Ea) Rb1 = Rb + K*(Sb-Eb) Where K is a volatility coefficient, which is usually higher for participants with shorter history, but ideally it should be equal for both participants. The new ratings are used to produce the new expected results and so on.
    The Elo rating has several highly important properties:

    1) It gravitates to the center. As rating R of a participant climbs higher, so does the expected result E, which becomes difficult to maintain, and a failure to maintain it usually results in a bigger drop in the rating.

    2) It's approximately distributive. If we gather N performances and average the opponents as Rav, the expected average performance as Eav = F(Ra, Rav), and the actual performance as Sav, then the new rating RaN' = Ra + N*K*(Sav-Eav) will be relatively close to RaN obtained via direct Rareciprocal update after each of the N games.

    3) It reflects tendencies, but overall performance still trumps it. Given the three players with ten encounters against other players with the same rating, when the performances are (W - win, L - loss):
     
    For player 1: L,L,L,L,L,W,W,W,W,W For player 2: L,W,L,W,L,W,L,W,L,W For player 3: W,W,W,W,W,L,L,L,L,L
    player 1 will end up with the highest rating of the three, player 2 will be in the middle, and player 3 will have the lowest one - but not by a very big margin. Only when the streaks become really long the Elo of a lower performance may overcome the Elo of a higher one.

    And how does Elo stack against the four Brits?

    * Goodhart's Law: pass. It measures the same thing it indicates.
    * Granger's Causality: pass. It is a consequence of a performance by definition, and a prediction of future peformance, by definition.
    * Occam's Razor: pass. The ratings revolve around the same parameter they measure.
    * Popper's Falsifiability: partial pass. The predictions of Elo sometimes fail, because they are probabilistic. However, the test of time and the wide acceptance indicate that the confidence level holds. Elo was even used for "paleostatistics" when the ratings were calculated backwards until middle XIX century, and the resulting calculations are well-received by the chess historians' community.

    The only well-known drawback of Elo is the avoidance by top chess players of competition against much weaker oppositions, especially when facing White, as such a game can be drawn relatively easily by the opponent, and the Elo rating of the top player could take a significant hit resulting in a drop of several places in the rating list.

    Now, to the question of the chicken and the egg - where do the initial Elo ratings come from? Well, they can be set to an arbitrary value of low 4-digit number. Currently a FIDE beginner starts with the rating of 1300. If the newcomer is recognized as being more skilled than a beginner, then a higher rating is assigned based on rating grades for each skill level, sort of an historical average of the newcomer's peers.

    And... What does all this have to do with hockey?

    To be continued...
  20. More Hockey Stats
    Part I. Part II.   Sherlock Holmes and Dr. Watson are camping in the countryside. In the middle of the night Holmes wakes up Watson: 'Watson, what do you think these stars are telling us? 'Geez, Holmes, I don't know, maybe it's going to be a nice weather tomorrow? 'Elementary, Watson! They are telling us our tent has been stolen!   Iconic Soviet joke.   Estimating a hockey player via Elo ratings is a highly complex task. Therefore, we shall wield the dialectic approach of getting from the simpler to the more complicated, and will tackle a seemingly simplistic task first. Let's work out the Elo ratings for the NHL teams as a whole first. After all, it's the teams who compete against each other, and the outcome of this competition is a straightforward result.   So, let's examine a match between Team A and Team B. They have ratings Ra and Rb. These ratings, or, more precisely, their difference Ra-Rb, defines the expected results Ea and Eb on the scale from 0 to 1. The teams play, one wins (S=1), another loses (S=0). To adapt this to the Elo scale, let's consider win 1 point, loss 0 point. The new ratings Ra' and Rb' will be (K is the volatility coefficient):   Outcome Sa Sb Sa-Ea Sb-Eb dRa dRb Ra' Rb' Team A Wins 1 0 1-Ea -Eb K-K*Ea -K*Eb Ra+K-K*Ea Rb-K*Eb Team B Wins 0 1 -Ea 1-Eb -K*Ea K-K*Eb Ra-K*Ea Rb+K-K*Eb     and the teams are ready for usage in the next meeting with their new ratings Ra' and Rb', reciprocally.   'Wait!', will ask the attentive reader, 'Not all possible outcomes are listed above! What about the OT/SO wins where both teams get some points.' And he will be correct. In these cases we must admit that the loser team scores 0.5 points, so unlike a chess game where the sum of the results is always 1, in the NHL hockey the total sum of results varies and can be either 1 or 1.5. Note, were the scoring system 3-2-1-0, then we could scale the scores by 3 rather than by two and get the range 1-⅔-⅓-0 where every result sums to 1. Alas, with the existing system we must swallow the ugly fact that the total result may exceed 1, and as the result the ratings get inflated. Which is a bad thing, sure.   Or is it? Remember, the Elo expectation function only cares about the differences between ratings, not their absolute values. And all teams' ratings get inflated, so all absolute values shift up from where they would've been without the loser's point. Whom would it really hurt? The new teams. Naturally, we must assign an initial rating to every team at the starting point. One way could be assigning the average rating of the previous season to the new team. But we prefer a different and a much more comprehensive solution. We claim that since the teams that at the start of the next season are different enough beasts from those that ended the previous ones, so that the Elo ratings should not carry over from season to season at all! Therefore all the teams start each season with a clean plate and an identical Elo rating Ro.   Once again, the attentive reader might argue, 'What about mid-season trades and other movements?' Well, dear reader, now you have a tool to evaluate impact of the moves on the team. If there is a visible tendency change, you can quite safely associate it with that move. Overall, the 82 game span is huge to soften any bends and curves in the progression of the Elo ratings along the season.   Speaking of game spans, we must note one more refinement being done to the ratings. In the chess world, the ratings of the participants are not updated throughout the length of the event, which is usually 3-11 games. The ratings of the participants are deemed constant for the calculation of rating changes, which accumulate, and the accumulation is actually the rating change of each participant. We apply a similar technique for the teams' Elo calculations: we accumulate the changes for the ratings for 5 games for each team and "commit" the changes after the five-game span. The remainder of the games is committed regardless of its length, from 1 to 5. Why 5? We tried all kinds of spans, and 5 gave the smoothest look and the best projections.   Now, as a demonstration, let's show how we calculate the possible rating changes in the much anticipated game where Minnesota Wild is hosting Columbus Blue Jackets on December, 31st, 2016:   Rcbj = 2250, Rmin = 2196, Ecbj = 0.577, Emin = 0.423, K = 32 (standard USCF).   Outcome Scbj Smin S-Ecbj S-Emin dRa dRb Ra' Rb' CBJ W Reg 1 0 0.423 -0.423 +13.53 -13.53 2263.53 2182.47 CBJ W OT 1 0.5 0.423 0.077 +13.53 +2.47 2263.53 2198.47 MIN W OT 0.5 1 -0.077 0.577 -2.47 +18.47 2247.53 2214.47 MIN W Reg 0 1 -0.577 0.577 -18.47 +18.47 2231.53 2214.47 Note: MIN gains rating when it gets a loser's point.   Here is a dynamic of Elo changes (without five game accumulation) for the Metropolitan Division, as an example.   See more detailed tables on our website: http://morehockeystats.com/teams/elo   Ok, we got the ratings, we got the expected results, can we get something more out of it?   To be continued...
  21. More Hockey Stats
    Original post. Catching up...
    We left our reader at the point where we demonstrated how to produce Elo ratings for hockey teams over season (and over postseason too, if anyone wondered) and how to apply it to the up and coming next games of the rated teams.

    However, in its main eparchy, chess, Elo is rarely used to produce single match outcome projections. It's much more popular when used to create a long-term projection, such as the whole tournament, which in chess lasts between five to thirteen rounds, usually.

    Therefore, the question arises, shouldn't we try to use our newborn Elo ratings to long-term projections? And the answer is an unambiguous 'Yes!' We can and should create the projections for the team over longer spans such as a seven days ahead, thirty, or even through the end of the season!

    How do we do it? Since we computed the Elo ratings for all teams, and we know the schedule ahead of all teams, we can run the Elo expectation on all matchups during the requested span and sum them. And since we assume that each team performs according the expectation, their Elo ratings do not change during the evaluation span.

    Eteam = Σ(Ematch1, Ematch2, ... , Ematchn)

    All good? No. There is one more finesse to add. The produced expectations will all be calculated in 2-0 span per game, assuming only 2 points are in play in each matchup. However, due to the loser's point it's not so. Therefore on average there are 2 + NOT/SO / Ntotal points are handed out during the season in every match (where NOT/SO is the number of games that get decided in OT or SO). So we need to compute the NOT/SO value, divide it by two (because there are two teams in each match) and multiply the expectation of each team by this factor. By doing so we receive the reliable Elo expectation, such as one in the table below, as of Jan 2nd, 2017. Spans of 7 days, 30 days and through the end of the season are displayed (games, expected points and total).
      Elo ratings for season 2016 # Team Div Elo Pts Gin7 Pin7 Tin7 Gin30 Pin30 Tin30 GinS PinS TinS 1 Columbus Blue Jackets MET 2265.22 56 4 6 62 14 23 79 47 79 135 2 Pittsburgh Penguins MET 2186.57 55 1 2 57 11 16 71 44 65 120 3 Minnesota Wild CEN 2180.88 50 3 4 54 14 21 71 46 68 118 4 San Jose Sharks PAC 2137.87 47 3 4 51 14 20 67 45 62 109 5 Washington Capitals MET 2135.54 49 4 4 53 15 18 67 46 59 108 6 Montreal Canadiens ATL 2117.99 50 4 5 55 14 18 68 45 58 108 7 New York Rangers MET 2135.43 53 3 4 57 11 14 67 43 54 107 8 Chicago Blackhawks CEN 2103.27 51 3 4 55 12 15 66 42 52 103 9 Anaheim Ducks PAC 2105.41 46 3 4 50 13 18 64 43 55 101 10 Edmonton Oilers PAC 2092.89 45 4 4 49 14 16 61 44 53 98 11 Ottawa Senators ATL 2088.34 44 2 2 46 11 11 55 45 52 96 12 Toronto Maple Leafs ATL 2097.27 41 3 4 45 12 14 55 46 54 95 13 St. Louis Blues CEN 2066.58 43 2 2 45 12 12 55 44 51 94 14 Boston Bruins ATL 2079.41 44 4 5 49 15 17 61 43 49 93 15 Carolina Hurricanes MET 2093.06 39 4 5 44 13 13 52 46 53 92 16 Los Angeles Kings PAC 2066.68 40 4 4 44 14 16 56 45 52 92 17 Philadelphia Flyers MET 2079.35 45 3 3 48 12 13 58 43 46 91 18 Calgary Flames PAC 2076.79 42 4 5 47 14 16 58 43 49 91 19 Tampa Bay Lightning ATL 2068.90 42 4 4 46 13 14 56 44 48 90 20 New York Islanders MET 2070.87 36 2 3 39 12 14 50 46 51 87 21 Florida Panthers ATL 2059.66 40 4 5 45 13 14 54 44 46 86 22 Nashville Predators CEN 2055.15 38 4 4 42 14 14 52 46 48 86 23 Dallas Stars CEN 2052.77 39 3 3 42 13 13 52 44 46 85 24 Vancouver Canucks PAC 2049.05 37 4 5 42 12 15 52 44 46 83 25 Detroit Red Wings ATL 2033.62 37 3 3 40 13 12 49 45 43 80 26 Winnipeg Jets CEN 2017.50 37 4 4 41 14 14 51 43 40 77 27 Buffalo Sabres ATL 2009.45 34 3 3 37 13 12 46 46 41 75 28 New Jersey Devils MET 1994.66 35 5 4 39 14 12 47 45 37 72 29 Arizona Coyotes PAC 1921.41 27 3 2 29 12 8 35 45 30 57 30 Colorado Avalanche CEN 1910.42 25 3 2 27 12 7 32 46 29 54
    The NOT/SO value right now is about 1.124 (i.e. about quarter of all games are decided past the regulation).
     
    So you know what's good for the people? But the people consists of men... Iconic Soviet movie
    The team projection leaves us wanting more. After all, don't we want to be able to evaluate individual players and factor it somehow in the projection to reflect the injuries and other reasons that force top players out of the lineups? Stay tuned.

    To be continued...
  22. More Hockey Stats
    Original post.
    The goalkeeper is half of the whole team
      Soviet proverb from Lev Yashin's times.
    After a foray into the calmer lands of teams' evaluation using the Elo rating, it's time to turn our attention to the really juicy stuff - the evaluation of a single player. And we'll start with the most important one - the goaltender. DISCLAIMER: this evaluation concept is still a work in progress and one of several possible implementations of the idea.

    By coincidence, it's also the simplest evaluation to make. While many stats describe the performance of a skater (goals, assists, shots, hits, blocks, faceoff wins, etc. - and even one that is accounted usually for goaltenders) only one stat truly describe the goalie's performance: the saves percentage. Usually, whole four stats are used to compare the goalies: wins (W), saves percentage (SVP), goals against average (GAA) and shutouts (SHO), but will show you first, why three of them are mostly unnecessary. Also, the name saves percentage is a bit of a misnomer, since the values of svp are usually not multiplied by 100 to look like real percent, but are shown more frequently between 0 and 1, and therefore would be more properly named as 'Saves Ratio', or 'Saves Share'.

    Wins are truly results of team efforts. I always cringe when I read that a goaltender "outdueled" his opponent, when the both barely got see each other. The GAA is much more of an indication of how well the defense operates in front of the goalie. Shootouts are first, and foremost, a very rare thing, and secondly a 15-save shootout should not be the same as 40-save shootout, although for any of the four stats listed above they create two identical entry.

    Therefore we feel ourselves on a firm ground evaluating goalie's performance through SVP only (with a slight input from shootouts, as described below) - and the Elo function, of course. For the start, each goaltender is assigned an Elo rating of 2000 for his first career appearance. We discard performances in which goalies faced less than four shots, because these usually are late relief appearances in the garbage time, not really an evidence of goaltending in a true hockey game. We only account for them to display the real SVP accrued in the season so far, and we consider dropping these appearances completely.

    After the game we get the pure SVP from the real time stats. We adjust it in two ways: If, in the very rare case, the performance is below 0.7, we set it to 0.7 . If there was a shootout (not the shootout as defined by the NHL, but a performance where a goaltender was on the ice for at least 3420 seconds and did not let a single goal in during that time), we add a shootout bonus for the performance:   Bonus = (Saves - 10) / 200   If there were less than fifteen saves in the shootout, the bonus is assigned the minimum value of 0.025. We consider adding this bonus necessary, because the opposing team is usually gives an extra effort to avoid being shut out even during the garbage time.

    Then, given the actual performance we can calculate the "Elo performance rating":
      Rperf = 2000 + (SVP - SVPvsopp) * 5000   Where SVPvsopp is the SVP against the opponent the goalie is facing - effectively the shooting % of that team minus the shots resulting in empty-net goals, sort of "Expected SVP against that opponent". That means that for every thousandth of the SVP above the expectation, the performance is five points above 2000 (the absolute average).

    Wait, there seems to be an inconsistency. Don't we need ratings of opponents for Elo changes calculation? Actually, no. Given an Elo performance of a player, we can calculate the rating change as a "draw" against a virtual opponent with that Elo performance, i.e.
        ΔR = K * (0.5 - 1 / ( 1 + 10 ** (( Rperf - Rg)/ 400)) ) )   Where K is the volatility factor mentioned in the earlier posts. Right now we are using the volatility factor of 32, but that may change - including introducing a dependency of this factor on goaltender's experience.

    And the new rating, is naturally,
      Rg' = Rg + ΔR   Now we can calculate the expected remaining svp:
      SVPrem = SVPavg + (Rg' - 2000) / 5000   Where SVPavg is the league average SVP. It would be more correct to substitute that value with the weighted averages of the remaining teams to face (with accordance to the matches remaining), and we'll be switching to this index soon.

    We can also calculate the SVP expected from the goalie at the start of the season:
      SVPexp = SVPavg0 + (Rg0 - 2000) / 5000   where SVPavg0is the average SVP of the league during the previous season and the Rg0 is the rating of the goalie at the conclusion of the previous season (including playoffs), or the initial rating of 2000.

    We post a weekly update on our Elo ratings for goaltenders, and their actual and expected SVPs on our Twitter feed. You can also access our daily stats on our website page.

    It looks like we're ready to try to take on the skaters' performances. But I'm not sure it's going to fit into one posting.

    To be continued...
  23. More Hockey Stats
    Original Post   The most important conclusion of the last chapter that dealt with goalies' Elos is that it is defined by actual performance of a goaltender versus the expected performance of the team he is facing. That is the approach we are going to inherit for evaluating skaters.   For the start we compute the average stats of a league for each season. We do that for most of the stats that are measured, from goals and assists to faceoffs taken, up to the time on ice for the goaltenders. This is a trivial calculation. Thus we obtain season stat averages Savg.   Now we can begin to work with the skaters. We assign them a rating of 2000 in each stat. The first and the most difficult step is to coerce the actual performance of a skater in each stat to a chess-like result, on the scale from 0 to 1. This is a real problem, since the result distribution for the number of players looks something like one of these chi-squares:
        Therefore we need to rebalance it somehow while preserving the following rules: They should be more or less distributive, i.e. scoring 1 goal thrice in a row in a game should produce approximately the same performance as scoring a hat trick in one game and going scoreless in the other two. They should still have the same shape as the original one. The average rating of the league in each stat should remain 2000 at the end of the season.   So first, we do not apply rating changes after a single game. We take a committing period, for example, five games, and average players' performance in every rated stat over that period. Second, we apply the following transformation to the performance:   P'player = (Pplayer - Savg) / Savg   where Savg is the season average on that stat. It could be more precise to compute against the averages against of the teams played (see the first paragraph), but we decided to go via a simpler route at this stage.   Then we scale the performance by the Adjustment Factor A:   P'playeradj = P'player / A   The adjustment factor sets the result between -0.5 and 0.5. More or less. There still are outliers, but they are very infrequently beyond 0.5 . The A factor depends on the rarity of the scoring in the stat and varies from 6 (Shot on Goal) to 90 (Shorthanded goal). The adjustment for goals, is, for example, 9. The adjustment for faceoffs won is 20. The latter one might look a bit surprising, but remember that many players do not ever take faceoffs, e.g. defensemen. Naturally, only skaters stats are computed for skaters, only goalie stats for goaltenders.   The final Result Rplayer is then: Rplayer = P'playeradj + 0.5   So for the rare events we have a lot of results in the 0.48-0.5 area and a few going to 1. For the frequent events (shots, blocks, hits), the distribution is more even.   Now that we got the player's "result" R, we can compute the elo change through the familiar formula:   ΔElo = K * (R - (1/(1+10(2000 - Eloplayer)/400)))   where K is the volatility coefficient which we define as:   16 * √(A) * √(4 / (C + 1))   A is the aforementioned Adjustment Factor and C is the Career Year for the rookies (1) and the sophomores (2), and 3 for all other players.   'What is 2000', an attentive reader would ask? 2000 is the average rating of the league in each stat. We use, because the "result" of the player was "against" the league average. If we used team averages, we would put the average "Elo against" of the teams faced instead.   After we have the ΔElo, the new Elo' of a player in a specific stat becomes:   Elo' = Elo + ΔElo   And from that we can derive the expected average performance of a player in each stat, per game:   Rexp = 1/(1+10(2000-Elo')/400) Pexp = (Rexp - 0.5) * A * Savg + Savg   which is an "unwinding" of the calculations that brought us from the actual performance to the new rating.   The calculation differs for the three following stats:   SVP - processed as described in Part V. Win/Loss - processed as a chess game against a 2000 opponent, where the result is: Rw = Pw/(Pw+Pl), Rl = Pl(Pw+Pl) over the committing period. The only subtlety here is that sometimes a hockey game may result in goalie win without a goalie loss. PlusMinus - R+/- = 0.5 * (P+/- - Savg+/-) / 10 (10 skaters on ice on average)
    Then, via the regular route we get the Elo' and the expected "result" Rexp, and the expected performance is: Pexp+/- = (Rexp+/- - 0.5) * 10 + Savg+/-   Please note that we do not compute "derived" stats, i.e. the number of points (or SHP, or PPP), or the GAA, given the GA and TOI, or GA, given SA and SV.   An example of the computed expected performances that lists expectations of top 30 Centers in Assists (Adjustment Factor 9) can be seen below:
      # Player Pos Team Games A a/g Avg. g. Avg.a  E a/g  E a/fs 1 CONNOR MCDAVID C EDM 43 34 0.791 44.00 33.00 0.706 61.54 2 JOE THORNTON C SJS 41 24 0.585 74.11 52.00 0.665 51.27 3 NICKLAS BACKSTROM C WSH 40 24 0.600 69.20 50.10 0.663 51.85 4 EVGENI MALKIN C PIT 39 27 0.692 62.09 44.73 0.659 55.33 5 SIDNEY CROSBY C PIT 33 18 0.545 61.67 51.50 0.655 46.15 6 RYAN GETZLAF C ANA 36 25 0.694 68.58 45.42 0.648 50.26 7 EVGENY KUZNETSOV C WSH 40 22 0.550 54.75 27.75 0.605 47.43 8 ANZE KOPITAR C LAK 36 16 0.444 72.73 41.55 0.594 40.33 9 ALEXANDER WENNBERG C CBJ 40 28 0.700 59.00 25.67 0.583 52.50 10 CLAUDE GIROUX C PHI 43 25 0.581 61.70 37.60 0.579 47.56 11 TYLER SEGUIN C DAL 42 26 0.619 66.86 31.14 0.566 48.65 12 RYAN O'REILLY C BUF 30 16 0.533 66.00 26.38 0.553 39.23 13 DAVID KREJCI C BOS 44 18 0.409 60.64 32.36 0.528 38.05 14 RYAN JOHANSEN C NSH 41 22 0.537 65.33 27.00 0.523 43.43 15 JOE PAVELSKI C SJS 41 23 0.561 69.64 29.09 0.517 44.21 16 HENRIK SEDIN C VAN 43 17 0.395 75.56 47.81 0.517 37.17 17 DEREK STEPAN C NYR 42 22 0.524 68.00 30.86 0.508 42.31 18 VICTOR RASK C CAR 41 19 0.463 67.00 22.67 0.497 39.37 19 MARK SCHEIFELE C WPG 40 20 0.500 44.50 17.83 0.493 39.23 20 JASON SPEZZA C DAL 35 18 0.514 62.71 37.79 0.490 37.60 21 JOHN TAVARES C NYI 38 16 0.421 68.50 35.00 0.488 37.46 22 MITCHELL MARNER C TOR 39 21 0.538 39.00 21.00 0.484 41.82 23 STEVEN STAMKOS C TBL 17 11 0.647 65.11 29.00 0.474 29.97 24 ALEKSANDER BARKOV C FLA 36 18 0.500 56.75 21.00 0.463 36.51 25 MIKAEL GRANLUND C MIN 39 21 0.538 55.80 24.40 0.460 40.80 26 PAUL STASTNY C STL 40 13 0.325 65.09 34.55 0.457 31.74 27 JEFF CARTER C LAK 41 15 0.366 69.67 24.33 0.448 33.35 28 MIKE RIBEIRO C NSH 41 18 0.439 62.88 33.06 0.447 36.32 29 MIKKO KOIVU C MIN 39 16 0.410 66.83 34.25 0.445 35.14 30 ERIC STAAL C MIN 39 22 0.564 74.46 36.77 0.442 40.99   You can see more of such expectation evaluations on our website,http://morehockeystats.com/fantasy/evaluation .   Now, we ask ourselves, how can we use these stats evaluations to produce an overall evaluation of a player?    
    To be concluded...
  24. More Hockey Stats
    Original post.
     
    Now that we obtained a way to estimate players' performances for a season, we can move on to estimate their performances for a specific game.   For the season of interest, we compute the average against for each teams, just like we computed the season averages. I.e. we calculate how many goals, shots, hits, blocks, saves are made on average against each team. Thus we obtain the team against averages Tavg. The averages are then further divided by the number of skaters and goalies (for respective stats) the team had faced.   After that we can calculate the "result" Rt of each season average stat in a chess sense, i.e. the actual performance on the scale from 0 to 1: For Goalie Wins/Losses:
      Rtwins = 0.5 + Tavgwins/(Tavgwins+Tavglosses)
    For Plus-Minus:
    Rt+/- = 0.5 + (Tavg+/- - Savg+/-) / 10 (10 skaters on ice on average)
    For the rest:
    Rstat = 0.5 + (Tavgstat - Savgstat) / K
    where K is a special adjustment coefficient that is explained in part VI (and, as we remind, describes the rarity of each event)   And from the result Rt we can produce teams' Elo against in each stat, just like we computed the players' Elos.   Then, the expected result Rp of a player against a specific team in a given stat is given by:
    Rp = 1/(1 + 10(Et - Ep)/4000)
      where Et is the team's Elo Against and the Ep is the player's Elo in that stat.   From the expected result Rp, we can compute the expected performance Ep just like in the previous article:   Pexp = (Rp - 0.5) * A * Savg + Savg   (See there exceptions for that formula).   Please note that we do not compute "derived" stats, i.e. the number of points (or SHP, or PPP), or the GAA, given the GA and TOI, or GA, given SA and SV.   Thus, if we want to project expected result of a game between two teams, since it's the expected amount of goals on each side, we compute the sum of the expected goals by each lineup (12 forwards and 6 defensemen):   Shome = SUMF1..12(MAX(PexpG)) + SUMD1..6(MAX(PexpG)) for the home team Saway = SUMF1..12(MAX(PexpG)) + SUMD1..6(MAX(PexpG)) for the away team   while filtering the players that are marked as not available or on injured reserve. Please note that we assume the top goal-scoring cadre is expected to play, if we knew the lineups precisely, we would substitute the exact lineup for the expected one.   You can see the projections at our Daily Summary page. So far we predicted correctly the outcome of 408 out of 661 games, i.e. about 61.7% . Yes, we still have a long way to go.   Now to the different side of the question. Given that a player expectation overall is a vector of [E1, E2, ... En] for all the stats, what is the overall value of that player. And the answer is, first and foremost, who's asking.   If it's a statistician, or a fantasy player, then the value V is simply:
      V = SUM1..n(WnEn)
      where Wn are the weights of the stats in the model that you are using to compare players. Fantasy Points' games (such as daily fantasy) are even giving you the weights of the stats - this is how we compute our daily fantasy projections.   Now, if you're a coach or a GM asking, then the answer is more complicated. Well, not really, mathematically wise, because it's still something of a form
      V = SUM1..n(fn(En))
      where fn is an "importance function" which is a simple weight coefficient for a fantasy player. But what are these "importance functions"?   Well, these are the styles of the coaches, their visions of how the team should play, highlighting the stats of the game that are more important for them. These functions can be approximated sufficiently by surveying the coaches and finding which components are of a bigger priority to them, for example, by paired-comparison analysis. Unfortunately, there are two obstacles that we may run into: the "intangibles", and the "perception gap".     But that's a completely different story.    
×
×
  • Create New...