worDLcat? – Jeffrey Pomerantz

I want to start this by saying that this was completely Kathleen Kern‘s idea. We were talking over dinner at LIDA last week, and she asked me, essentially, “Why isn’t there an equivalent of WorldCat for digital libraries?” I gave some lame answer: no standardization across DLs, complexity of dealing with item- and collection-level description simultaneously, cost, I don’t even remember what all I said. (In my defense, we were well into a rather good bottle of Pelješac red.) As I was thinking about it though, I realized that it would be completely possible to build such a database. All of the technology is in place, it wouldn’t even require developing new tools. It would just take time, money, and grad students. It’s a completely tractable problem.

And because I just can’t help myself sometimes, my first thought about what to call this database was: worDLcat. It’s a dumb name, really, and as Kathleen pointed out completely unpronounceable. Also maybe trademark infringement. But that’s what I’m going to call it here, because I don’t have any better ideas.

So anyway, this was Kathleen’s idea. But I’ve started to run with it a bit, at least in my head to start to think through what might be involved.

Point the first: WorldCat is based on MARC. But is there more to WorldCat than that? WorldCat’s search interface disguises the MARC back-end, happily for those of us who don’t think like a cataloger (i.e., most of humanity). Has OCLC added fields or values beyond what’s in MARC? I don’t know. I have an email out to a colleague at OCLC to see if I can get access to this info. I recently discovered the MCDU Project, for which Bill Moen (a fellow Syracuse alum… it’s a small discipline) analyzed a subset of WorldCat records. So that subset might be a place to start to answer this question.

Point the first, sub-point A: Would a database of DL items require fields and values not utilized in WorldCat? That don’t even exist in MARC? Maybe. I’d bet yes. It’s an empirical question.

Point the second: Item- vs. collection-level description. I think the way to go on this is the way WorldCat has gone: have a record for every item, and have each record contain a field that is essentially a pointer to the collection(s) that item resides in. In WorldCat, of course, each item is likely to reside in many (or at least a few) collections. There are unique items in most libraries’ collections, and those may have records in WorldCat, but on the whole there’s a lot of duplication between library collections. In DLs this is not so much the case. Projects like Google Books and the Internet Archive Text Archive are digitizing, well, books, which are going to held by many collections. But perhaps linking to those books is a job for the Open Library? At this point the Open Library does not list or link to the library collections that hold books, but this would be a useful future feature. Anyway, to date DLs have by and large been developed to highlight and disseminate unique materials held by libraries and other organizations. And that is as it should be, frankly. As the Library of Congress’ Final Report of the Working Group on the Future of Bibliographic Control says: Enhance access to rare, unique, and other special hidden materials. So there is likely to be a much higher percentage of unique items in a database of DL materials than in WorldCat.

Point the third: Sustainability. Well, not really sustainability… let’s say updating. If a library is a growing organism, then a DL is doubly so (whatever that would mean). New items are added to DLs all the time, items change, etc. So how could worDLcat keep up? OAI-PMH is the obvious mechanism. Which means that all items in all DLs must be OAI compliant. I would hope that this would be a critical mass / tipping point thing: worDLcat would start with the low-hanging fruit, the existing OAI compliant stuff. Then as worDLcat gains use, there would be an incentive for more DLs to become OAI compliant, just as there’s an incentive for libraries to participate in WorldCat. When a new DL comes online though, its developers would have to submit it to worDLcat, as one submits a URL to a search engine to crawl. I’m not sure we could operationalize DLs well enough to build a specialized bot to go out & crawl the web & harvest only materials from DLs. I’ve given up trying to define DLs even in my DL course: I give my students some papers to read, have a discussion about definition, & then tell them that I think it really doesn’t matter how you define what a DL is.

Point the fourth: Accessibility & discoverability. If we want to enhance access to rare, unique, and other special hidden materials, then worDLcat should be freely available, not by subscription like WorldCat. Like the Open Library, or LibraryThing. Which suddenly brings Web 2.0 into the conversation. If WorldCat were suddenly opened up and Web 2.0-ified, what would / should that look like? Not quite like LibraryThing, since the assumption is not that you own these materials yourself. But some of the user-contributed and automatic features might be appropriate: tags, reviews, recommendations. Flexibility to define complex objects within, maybe even across collections? Functionality to save items in a shopping cart-like space: MyworDLcat? Oy, that’s even more unpronounceable.

Ok, I’ve spent enough time on this post, when I should be doing other things. If anyone ever reads this, I’m looking for feedback, thoughts, ideas, etc. This is a completely tractable problem, and I believe even grant fundable. So, My Beloved Audience, if you have any ideas about this, I’d like to hear them.

3 Comments

Joan
11 June 2008
Jeff, I agree that this is a great idea. I’ve thought before that many of our database problems (ie multiple databases) would be solved if we had something like worldcat for journal articles, something that would make articles (whether print or electronic) as searchable from one database as worldcat makes books.

Doing this for digital libraries make absolute sense. It’s brilliant, really. One of the reasons we reference librarians may send people down the hall to special collections (as Kathleen noted) is because we simply have no notion of the content of these libraries.

Last year, at the few events I attended in honor of Fred Kilgour, I was struck repeatedly by Fred’s unique combination of vision and talent. He could visualize something, and then he had the technical skills to make it happen. I think we need a Fred Kilgour for our current library needs.
Jeffrey Pomerantz
9 March 2010
I don’t know why I never thought of this before, but of course OAIster is kind of like what I’m envisioning here: http://www.oclc.org/oaister/ It’s a union catalog of digital library materials, anyway. As of this writing, it claims “more than 23 million records representing digital resources from more than 1,100 contributors.” That’s hardly a complete set of all DL holdings in the world, but then WorldCat isn’t complete either. OAIster is also integrated with WorldCat, though I don’t know what the back end looks like, or what the record structure is. So there you are.
George Oates
1 July 2010
“But perhaps linking to those books is a job for the Open Library? At this point the Open Library does not list or link to the library collections that hold books, but this would be a useful future feature.”

Yes, absolutely. Open Library is there to connect people with books, whether they’re physical or digital. You might have seen the recent launch of a small set of ebooks available to be borrowed:

http://blog.openlibrary.org/2010/06/29/small-moves-open-library-integrates-digital-lending/

We created brand new edition records in Open Library for all of the ebook titles from OverDrive, and are trying something small with the books that have been scanned by the Internet Archive. You can see that there are 3 libraries participating in the small lending collection, as well as the Internet Archive itself. For any book from one of those libraries in the lending collection, we note them as a “contributor” (Internet Archive-specific term), as in the library that contributed the book to the scanning program. It’s not a big jump to imagine any/all libraries that have ever had a book scanned through the Archive to become contributors like this, lending e-versions of books from their collections through Open Library. A natural way to enhance this would be to start treating libraries as objects (with a searchable presence) on Open Library, which we’d really like to do, no matter what happens with the lending launch.

We’re also planning to build a way for people to add a link to an electronic version of a book somewhere else on the web (and not in the Internet Archive, though we’d be happy to store a copy, if desired). People are already beginning to do that organically in book or author descriptions. Would be nice to try and catch it in a structured way too, if we can. No ETA yet, but it’s definitely moving up the To Do list 🙂

Thanks for the mention!

Cheers,
george

(P.S. I work at Open Library.)

Jeffrey Pomerantz

3 Comments

Joan

Jeffrey Pomerantz

George Oates

Leave a Reply Cancel reply