I want to start this by saying that this was completely Kathleen Kern‘s idea. We were talking over dinner at LIDA last week, and she asked me, essentially, “Why isn’t there an equivalent of WorldCat for digital libraries?” I gave some lame answer: no standardization across DLs, complexity of dealing with item- and collection-level description simultaneously, cost, I don’t even remember what all I said. (In my defense, we were well into a rather good bottle of PeljeÅ¡ac red.) As I was thinking about it though, I realized that it would be completely possible to build such a database. All of the technology is in place, it wouldn’t even require developing new tools. It would just take time, money, and grad students. It’s a completely tractable problem.

And because I just can’t help myself sometimes, my first thought about what to call this database was: worDLcat. It’s a dumb name, really, and as Kathleen pointed out completely unpronounceable. Also maybe trademark infringement. But that’s what I’m going to call it here, because I don’t have any better ideas.

So anyway, this was Kathleen’s idea. But I’ve started to run with it a bit, at least in my head to start to think through what might be involved.

Point the first: WorldCat is based on MARC. But is there more to WorldCat than that? WorldCat’s search interface disguises the MARC back-end, happily for those of us who don’t think like a cataloger (i.e., most of humanity). Has OCLC added fields or values beyond what’s in MARC? I don’t know. I have an email out to a colleague at OCLC to see if I can get access to this info. I recently discovered the MCDU Project, for which Bill Moen (a fellow Syracuse alum… it’s a small discipline) analyzed a subset of WorldCat records. So that subset might be a place to start to answer this question.

Point the first, sub-point A: Would a database of DL items require fields and values not utilized in WorldCat? That don’t even exist in MARC? Maybe. I’d bet yes. It’s an empirical question.

Point the second: Item- vs. collection-level description. I think the way to go on this is the way WorldCat has gone: have a record for every item, and have each record contain a field that is essentially a pointer to the collection(s) that item resides in. In WorldCat, of course, each item is likely to reside in many (or at least a few) collections. There are unique items in most libraries’ collections, and those may have records in WorldCat, but on the whole there’s a lot of duplication between library collections. In DLs this is not so much the case. Projects like Google Books and the Internet Archive Text Archive are digitizing, well, books, which are going to held by many collections. But perhaps linking to those books is a job for the Open Library? At this point the Open Library does not list or link to the library collections that hold books, but this would be a useful future feature. Anyway, to date DLs have by and large been developed to highlight and disseminate unique materials held by libraries and other organizations. And that is as it should be, frankly. As the Library of Congress’ Final Report of the Working Group on the Future of Bibliographic Control says: Enhance access to rare, unique, and other special hidden materials. So there is likely to be a much higher percentage of unique items in a database of DL materials than in WorldCat.

Point the third: Sustainability. Well, not really sustainability… let’s say updating. If a library is a growing organism, then a DL is doubly so (whatever that would mean). New items are added to DLs all the time, items change, etc. So how could worDLcat keep up? OAI-PMH is the obvious mechanism. Which means that all items in all DLs must be OAI compliant. I would hope that this would be a critical mass / tipping point thing: worDLcat would start with the low-hanging fruit, the existing OAI compliant stuff. Then as worDLcat gains use, there would be an incentive for more DLs to become OAI compliant, just as there’s an incentive for libraries to participate in WorldCat. When a new DL comes online though, its developers would have to submit it to worDLcat, as one submits a URL to a search engine to crawl. I’m not sure we could operationalize DLs well enough to build a specialized bot to go out & crawl the web & harvest only materials from DLs. I’ve given up trying to define DLs even in my DL course: I give my students some papers to read, have a discussion about definition, & then tell them that I think it really doesn’t matter how you define what a DL is.

Point the fourth: Accessibility & discoverability. If we want to enhance access to rare, unique, and other special hidden materials, then worDLcat should be freely available, not by subscription like WorldCat. Like the Open Library, or LibraryThing. Which suddenly brings Web 2.0 into the conversation. If WorldCat were suddenly opened up and Web 2.0-ified, what would / should that look like? Not quite like LibraryThing, since the assumption is not that you own these materials yourself. But some of the user-contributed and automatic features might be appropriate: tags, reviews, recommendations. Flexibility to define complex objects within, maybe even across collections? Functionality to save items in a shopping cart-like space: MyworDLcat? Oy, that’s even more unpronounceable.

Ok, I’ve spent enough time on this post, when I should be doing other things. If anyone ever reads this, I’m looking for feedback, thoughts, ideas, etc. This is a completely tractable problem, and I believe even grant fundable. So, My Beloved Audience, if you have any ideas about this, I’d like to hear them.