Wednesday, 6 July 2016

How to Save Webpages to the Internet Archive's Wayback Machine

If you don't know it yet, the Internet Archive is a non-profit digital library "offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format". Taking inspiration from the Library of Alexandria, the Internet Archive includes texts, audio, moving images, and software (DOS games!) as well as archived web pages, and provides accessible services for people with disabilities.
We can use the Internet Archive to see what the web was like in the early days - including Cyberlife's Welcome Mat, circa January 1997. You might even have a browser extension such as Resurrect (Firefox) or Go Back in Time (Google Chrome) that makes it easier to call up the Internet Archive when faced with a 404.
You may not know that in addition to accessing archived websites, we can also proactively save webpages and downloads to the Internet Archive as follows:
Open the Internet Archive in a new tab and go to the bottom left corner.
Copy the URL of the website you want to archive into the Save Page Now box.

A box on the screen will appear saying that the Internet Archive is saving that page, then it will redirect you to the newly-archived copy of that page. You can then browse around the site and direct the Internet Archive to save any other pages (adoptions, information) that you see.
Another way you can use to save pages easily is to use a JavaScript bookmarklet to add that feature to your browser, available at Marklets.com. To add the bookmarklet, simply drag and drop from the blue button to your browser toolbar.

A link item will appear saying 'Save Page to Wayback Machine'. From then on, you can simply click that link to send any webpage you are on to the Internet Archive.
Not all pages will be able to be archived - some webmasters exclude access to the Internet Archive by using the robots exclusion standard, also known as robots.txt. The potential to be opted out of the archival process must be explicitly opted out of if desired.
Note also that when saving a page to the Internet Archive, any pages linked to from that page (such as downloads) must be visited by you in order to be saved.
Go forth, and happy archiving!

2 comments:

  1. Nice tutorial, I didn't know about the bookmarklet and addons! Might have to go on a Creatures site safari and see which ones have been archived.

    ReplyDelete
    Replies
    1. Resurrect makes a world of difference to me when I encounter a 404. It alters the 404 options page so that with one more click, I can access the Internet Archive's version of the page.

      I'm hoping that by publicising the 'Save Page to Wayback Machine' bookmarklet that when people find a great website, they won't just bookmark it, they'll save it to the Wayback Machine.

      Delete