After all that foofarah about purging my backlog of posts, here’s an actual new one.

Ever since living in Syracuse, Yvonne and I have participated in the Cornell Lab of Ornithology’s Great Backyard Bird Count, which took place the weekend before last. I was thinking about the GBBC the other day, and realized that it, and Project FeederWatch, are examples of distributed data collection. Now all of you regular PomeRantz readers out there (can you be a regular reader of so bursty a publication?) know that I’m obsessed with and confused by the notion of community-created resources. And the GBBC and FeederWatch are certainly community-driven, though you couldn’t really call them resources. The data collection relies entirely on contributions from the public, though, which seems to me to be the salient point. And they’re quite successful projects, both having been around for several years. And they must generate good data, or presumably the Cornell Lab would have stopped doing it years ago. So why do these projects work? How do they generate good data? Do they have some kind of quality filter on the data? From the point of view of a data submitter, it seems not. But after all, we know that self-reported data is notoriously unreliable. Is it just that birders are a sufficiently committed (some might even say obsessive) community of interest, that they feel compelled to provide good data? If that’s so, does it imply that spam can be avoided simply by having a sufficiently civic-minded community? And, at what point does a community diffuse sufficiently that not all its members are sufficiently civic-minded? It can’t be size, since as of this moment the GBBC has collected 79,402 checklists, and presumably there is approximately a 1:1 relationship between checklists and contributors. Is it how well focused the community is? And define focus, anyway. I think I may have just proposed attempting to measure the geekiness of community members.