10 000 000 000 000 000 bytes archived

October 25, activists and staff the Internet Archive has conducted a the ceremony regarding significant events: the Internet archive has exceeded 10 petabyte (1016 bytes). Thanks to this archive with the the time Machine we can see how to look famous sites many years ago, to find the saved copies of web pages or simply restore your site from a "free backup".
The Internet Archive announced the distribution of 80-terabyte samples sample for 2011 to everyone for research. File format WARC contain about 2.7 billion URIs. They include the entire text content and everything else that's failed to keep, including images, video, flash, etc.
Sampling:
Start date: 09 March 2011
End date: 23 December 2011
The number of unique URLS: 2 273 840 159
Number of hosts: 29 032 069
Spider Heritrix first download 1 million most popular sites by Alexa (Habr was already there), and then went on the links.

Another interesting fact that was announced at the ceremony. For the first time, all the literary heritage of the whole people is completely digitized and posted on the Internet. These people were Balinese.
Celebration the Internet Archive was graced by the legendary scientist and ideologist of programming, by Donald Knuth. He played the organ, opening ceremony.

Комментарии
Отправить комментарий