March 8th, 2010

The UK Web Archive

The British Library and IBM are working together on the UK Web Archive, which will store all accessible UK web pages, providing researchers with a great datasource of British academia, opinions and popular culture that may change radically or disappear without notice.

IBM is providing software expertise, and using it as a testbed for text-mining Big Data, estimating that it will be 220 Terabytes per year as of 2011. BigSheets (presumably a pun on BigTables) includes both open and closed source software. They have shown various interfaces including spreadsheets, tag clouds, and mutli-bubble charts.

I wrote an article about it for InfoToday: British Library and IBM Team Up on Web Archiving Project.

Collapse )
  • Current Mood