Tuesday, July 31, 2007

A Library bigger than a building .

An ambitious project to create an online catalogue of every book in every language ever published is under way. Public goodwill is not in doubt, but some libraries remain to be convinced.

A few years ago, the idea of getting random people around the world to write their own encyclopaedia would have been madness - but that didn't stop the founders of Wikipedia doing just that, and it has turned out to be one of the most successful web projects of recent years.
With that in mind, does it sound mad to want to try and build an online catalogue of every book ever published, anywhere in the world?

The Open Library, newly launched in the USA but global in scope, is designed to make that happen.

In the words of its creators, the idea is to build a virtual library that stores details of not just "every book on sale, or every important book, or even every book in English; but simply every book."

Which would include The Curious Incident of the Dog in the Night-time, The Koran , the full text of The Adventures of Huckleberry Finn, and of course Harry Potter .

But what's the Open Library really for? Aaron Swartz, leader of the technical team working on Open Library, suggests that every book ever published needs a single authoritative page on the internet, a bit like a personal homepage.
It's kind of a bad idea for one commercial site to be the definitive source for book information on the internet .

"Right now, if you want to link to a book on the web, the main place people go is Amazon. It's kind of a bad idea for one commercial site to be the definitive source for book information on the internet, so we want to have a site that brings together information from commercial publishers, reviewers, users, libraries, everywhere.

"This site will become the place where you can find interesting books and information about them, whether they're in print, out of print, out of copyright or whatever."

Such a library has to be virtual. No building would ever be large enough to house all books; no single group or government could afford to build it, or employ the necessary staff. If the Open Library is to succeed, it has to be a virtual space, and open to everyone, Wikipedia-style.

"There are tons of books out there and tons of information about those books. There's no way even a large group of librarians is going to be able to collect it all. We think of it as an analogue to Wikipedia. There are some great encyclopaedias written by small groups of experts, but to get something as wide-ranging and varied as Wikipedia, you need to let everyone in."

To start things off, the Open Library is calling on other libraries to donate their catalogues. This alone presents huge technical challenges, since the data sets come in different formats and different languages, and each set comes with its own quirks, repetitions and errors.
What's important is keeping the data in a structured form, so that the database working behind the scenes knows the difference between an author, a title and a publisher.

"We had to build this new type of wiki software which was an exciting challenge, because you had to set it up so that instead of just having one kind of page people can edit, we have lots of different kinds.

"People can edit authors, they can edit books, they can edit text pages, and so on. So there's a lot of new stuff we had to build. And that's just the infrastructure - there were also lots of things to import, and book data to merge and make searchable."

No comments: