Making Knowledge Work

October 28, 2012

LIKE 39 – Archiving the Web

Filed under: Archiving, Information Management, LIKE — virginiahenry @ 3:31 pm

In a professional landscape increasingly populated by vendor cheerleaders, one-trick product ponies and garrulous ‘gurus’, it’s refreshing to spend some time with LIKE professionals.

It was great to gather at our new home for dining and learning, the upstairs room of The Castle (just by Farringdon station), and explore the monumental task of creating a web archive.

The debate was timely – a recent Economist article drew attention to the danger of cultural amnesia as contemporary record, in the form of web content, disappears in cyberspace.

Dr Peter Webster is the British Library’s Engagement and Liaison Officer for the Web Archive.  LIKE’s new dinner venue has the great luxury of a projection screen, so Peter was able to show us slides of some of the sites his team are capturing for posterity.  These included the late Robin Cook’s website, and David Cameron’s 2005 election site.

He told us about the “lost web” – sites that become victim of the disorderly disappearance of organisations and campaigns, and the “orphaned web” – sites that have served their purpose, and are abandoned.  There was a nice example of a formerly lovingly-tended site dedicated to Charles Darwin’s house, not updated since 2006 because English Heritage had taken custody of the house and, in turn, its online representation.

Since 2004 the Web Archive team have fulfilled their brief, of archiving websites of cultural and scholarly importance from the UK domain, by capturing 11,000 sites (16 terrabytes worth).  They are collaborating with other libraries, archives and collectors to get the job done, but it’s still a daunting task.  Automated domain harvesting helps, and there are collections we can all agree future historians will be glad to have: the Credit Crunch, the Jubilee, the Olympics……..    However, at this stage, predicting the exponential growth of the archive, and how easy it will be to browse is challenging to say the least.

Some questions are very hard to answer: how do you decide what is published in the UK?  The URL doesn’t necessarily give you a clue.  How do you find the owners of content to verify copyright?   What are the full implications of the non-print Legal Deposit Regulations?

 As the discussion continued, I was very glad not to have Peter Webster’s job!  But I was delighted he’s doing it, and that he and other historians and archivists are on the case.  It would be horrendous if our collective neglect caused late 20th and early 21st Century culture to become a growing black historical hole.

I say collective neglect because Peter made it clear that the content our organisations are generating now will be of importance to historians in the future.  So his message, to all of us, was plan your digital archiving strategy.  And if you want to nominate a website for inclusion in the archive – do it.

Blog at WordPress.com.