NHLErrata launch
Ever since I started collecting the NHL data it was very important to me to validate the collected information. So I created a set of checks that finally formed a whole library to test both the NHL boxscore feeds and the HTML reports.
I managed to fish out quite a few errors and inconsistencies. Some where systemic, and could be fixed in software, some required a manual intervention, sometimes by editing the source file, sometimes by providing the correct value overriding what the parser would read. These interventions, classified into a variety of types, formed another library.
But I also wanted to share the errors that I found and the fixes I figured out with the analytics community. So I decided to create a website dedicated to it. It took a while, but finally a couple of days ago I was able to open NHLErrata.com. There you can find:
Both mentioned libraries, Test.pm and Errors.pm are part of my scrape-to-database package on CPAN.
I managed to fish out quite a few errors and inconsistencies. Some where systemic, and could be fixed in software, some required a manual intervention, sometimes by editing the source file, sometimes by providing the correct value overriding what the parser would read. These interventions, classified into a variety of types, formed another library.
But I also wanted to share the errors that I found and the fixes I figured out with the analytics community. So I decided to create a website dedicated to it. It took a while, but finally a couple of days ago I was able to open NHLErrata.com. There you can find:
- An overview of data sources.
- Information on missing players and events
- Information of broken reports, players and events
- Systemic problems encountered with the reports
Both mentioned libraries, Test.pm and Errors.pm are part of my scrape-to-database package on CPAN.
0 Comments
Recommended Comments
There are no comments to display.