At the 2002 VRD conference, my now-colleague Phil Edwards (at the time we were both still mere doctoral students) presented a paper titled: Characterization of Volunteer Expertise Within the Internet Public Library Reference Service. In that study he found what he called “expertise swell.” In other words, novice IPL volunteers answer questions on a limited range of subjects (users submitting questions via the IPL’s Ask A Librarian webform can self-categorize their question into a set of 25 or so subject categories); Phil hypothesized that novice answerers answer questions on subjects they know well. The longer an answerer was volunteering for the IPL, however, the wider the range of subjects in which they would answer questions; Phil hypothesized that as answerers gain experience, they “develop an infosphere ‘map’ such that they no longer have to rely solely upon previous knowledge in order to answer wider varieties of questions.”
I wonder if such a phenomenon happens with Wikipedia editors as well? Do people who edit Wikipedia start out editing articles on a fairly narrow range of topics, and do the topics expand over time? Of course, whether the narrow range of topics in which Wikipedia editors / IPL answerers start out are those topics they know well is itself an empirical question.
Seems to me that this would be a fairly easy bit of data analysis to do. First, you’d need to get the complete set of edits ever made to Wikipedia, and the usernames associated with those edits. This could probably be done with a crawler, though I’m not sure how a crawler would handle page history. Alternatively, maybe one could convince Virgil Griffith to share his WikiScanner data. Second, eliminate all anonymous edits, and all users who have only made one edit ever. That should significantly reduce the dataset to be analyzed. Third, figure out how to cluster articles by subject. The IPL has created its own subject categories, but how to do this in Wikipedia? Could this be done with WordNet? Do sets of words cluster in WordNet, that could be used for this? What is the shape of the English language, according to WordNet?
Who cares? I don’t know. This was just an idea that occurred to me over Thanksgiving, as I was reading Here Comes Everybody, something that I thought was an interesting question; I wasn’t thinking about practical applications. But it seems to me that one of the issues surrounding Wikipedia has always been how to foster involvement in collaborative projects. This seems to have never been a problem for Wikipedia itself, which has plenty of involvement, but was a problem for Nupedia, and as best as I can tell still is for Citizendium. If someone could develop a model for how “expertise swell” happens — how people decide how to expand their range of subjects in collaborative information production — then maybe we could do a better job (indeed, any job at all) at encouraging people to contribute and directing contributions. Maybe.
OR you could go to http://download.wikimedia.org/enwiki/latest/ and download the data…
Actually Wikipedia does categorize topics, in fact in two ways, from the Contents page: http://en.wikipedia.org/wiki/Portal:Contents
The Outline of knowledge and Overviews pages outline 12 major subjects, and the List of academic disciplines page divides topics into more traditional university department-like divisions. The question is, how to group articles into topics? Do articles fall into multiple topics?
And then I suppose there’s a third way, the list of categories at the bottom of each article: http://en.wikipedia.org/wiki/Special:Categories Though there are thousands of these.