Fork me on GitHub

Scorekeeping and Data Collection

November 20, 2014

Scorekeeping matters in the world. I think it might matter to me more than most people. Whether it is a notebook full of handwritten boardgame results or an information system tuned for throughput, I have some general thoughts.

There are a few tradeoffs that exist when starting a new database that you expect will be long-lived. The types of datasets I’m referring to here are results, observations, or events. Examples could be results of a sporting event, weather measurements, or the way I imagine courtroom activity might be recorded. The key topics I will try to define and discuss are passive vs. active, progressive vs. reflective, and the role of the data recorder.

Terms

Data Collection Tradeoff

granularity-effort graph

Passive recording is always preferred as Time, Effort in the above graph is zero. This requires an embedded recording system within the activity or an external system tightly coupled to the outputs of the activity.

If active, progressive recording with optimal granularity makes it impossible to use a participant-recorder, you may need to use a dedicated recorder, coarser granularity, and/or use reflective recording.

Notes on Active Collection Systems

Bad

Good

Conclusion

Data collection should never impact the activity itself.

The best time to start recording is at t₀, when history begins, even if it is an active system that needs to be migrated later. The second-best time to start is right now. If today is t₀, all the better.

Discussion, links, and tweets

a small avatar image of Tom

I am a software developer in Los Angeles.