British Library sets out to archive the web

The NZ Herald reports that the British Library is going to archive the web for future historians. The British Library has always tried to keep a copy of everything published in the UK; that means every book, newspaper, magazine, newsletter, and pamphlet. However, obviously in 2013 possibly the majority of information is published online. The problem is though that webpages are notoriously ephemeral; here today and gone tomorrow. This potentially leaves a gaping void for historians in the future, which the British Library now intends to fill by archiving the web. An automated “web harvester” will scan and record 4.8 million sites ending with the suffix “.uk” at least once a year – a total of 1 billion web pages. Rapidly changing websites, like those of newspapers, will be harvested more frequently, as often as daily.
    The US based Internet Archive has been doing just since 1996 on a slightly ad hoc basis. It’s Wayback Machine lets you browse through over 240 billion web pages from 1996 to the present

from The Universal Machine http://universal-machine.blogspot.com/2013/04/british-library-sets-out-to-archive-web.html

Advertisements

About driwatson
I'm a New Zealand author, computer scientist and blogger specialising in Artificial Intelligence. I also have an interest in the history of computing and have just written a popular science book called "The Universal Machine - from the dawn of computing to digital consciousness."

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: