Internet Archive is a non-profit digital library that has the largest collection of online assets. It was started during 1996 and claims to have the archival of more than 600 billion webpages using its WaybackMachine. You can make use of these archived pages in different ways for your project. In this article, we will explain how to make use of Internet Archive content and submit your website for archiving.
Internet Archive Content
Many people think the arhive.org hosts only archived webpages. However, in addition to webpages, you can find books, audio, video, software and images from their website. Here are some of the ways you can use Internet Archive website.
1. Find Deleted and Unavailable Content of Your Webpage
The simple and most useful way of Internet Archive is to find the content which is not currently available on the web. Let us explain this with an example. Some website builders like Weebly does not offer an option to keep your articles in “Trash”. It will be permanently gone from your site if you have mistakenly deleted a page. The problem is that their blog page is the index page and deleting that single blog page will permanently remove all blog posts you have created for years. One of our readers send us email asking how to retrieve more than 100 Weebly blog posts as he mistakenly deleted the blog index page.
Looking at Internet Archive is the simplest option to retrieve deleted content. Though Internet Archive will not provide a quick solution, at least you can view and retrieve your content from the archived pages.
- Go to WaybackMachine section of Internet Archive website.
- Enter the site or page URL you want to see the history and click “Browse History” button.
- You will see a calendar with the dates highlighted indicating there are archival available on those dates.
- Click on the date and select the snapshot you want to view.
- You can see webpage’s content on the selected date. You can change the date on the top bar to change the snapshot to different date.
- Now, you can copy and use the content if you have wrongly deleted or modified it on your live site.
In addition to viewing as a calendar, you can change the view to Collections, Changes, Summary, Site Map and URL. You will be surprised to see the amount information available about your site in Internet Archive. Below is how the “Site Map” view will look like and you can hover over the chart to select a URL to view the snapshot.
Snapshots can be also useful for documentation when you want to find how a particular site was looking a decade back. For example, below is how Google site was looking in 1999.
2. Submit Your Site Snapshot
It is also possible to save a webpage content to Internet Archive. You can submit your own site or any page you like on the web and find is not available in the archive.org website.
- Go to web section of Internet Archive and scroll down a bit.
- You will see a “Save Page Now” option as shown below.
- Enter your URL and click “Save Page” button to capture the current snapshot of the page.
3. View and Listen from Collections
As mentioned, webpages are only part of the Internet Archive website. There are large collection of eBooks, audio and videos that you can read or listen online.
- When you are in the Internet Archive homepage, scroll down and click on your favorite collection.
- For example, you can find the “European Libraries” and click on it.
- You will find more than 700K digital books and click on the book you want to read or listen.
- It will open with an eBook reader interface; you can simply zoom in or change to one-page view to enlarge the book and read online. It is also possible to read the book for you and listen while you do another task.
You can even find books published during 1900s which are difficult to find in physical libraries.
4. Check Internet Archive Projects
Internet Archive has many useful projects and you can make use them depending upon your need.
- Organizations can use the archival as a subscription service from arhive-it.org project part of Internet Archive.
- Borrow the book from their openlibrary.org project.
- Get the archive of your favorite software.
You can check out their projects page to get more details about the current projects.
5. Rebuild Your Website from Archive
Running a website needs lot of patience and many bloggers delete their site in-between and quit blogging in frustration of not getting sufficient traffic. However, after some time they regret and find no way to continue their blogging journey. If you are the one deleted your site, do not worry!!! There are many third-party service providers who can help you to rebuild your site from the Internet Archive content. You have to pay a nominal fee for content retrieval and restoring back in the required format. For example, you can rebuild your original WordPress blog for just $45 and continue from where you have left.
Check out the list of rebuild service provider in this Internet Archive page.
Blocking WaybackMachine Crawler
Finally, there are good reasons you may not want your website’s content to be part of the Internet Archive. Probably, you want to keep the site personal or find some sensitive information is archived that you have deleted from your site. The easy option is to use robots.txt file and block the Internet Archive’s crawler access. Add the following lines in your robots.txt file to block the entire site from archiving.
User-agent: ia_archiver Disallow: /
The alternate option is to contact them over email and request for exclusion.
FAQ on Using Internet Archive
Yes, if your page was previously archived.
Yes, you can find the historical version called snapshot if it is available.
Yes, you can simply go to the WaybackMachine section and save your page content.
No, for viewing snapshots. However, you need one for uploading your assets.
Block the site or page with robots.txt or contact them through email for site exclusion.
Bad idea, even a simple plagiarism checker will compare the content available from Internet Archive. Most probably you will get a copyright infringement (DMCA) notice down the line after spending lot of time or get penalized by search engines for stealing other’s content. If it is your own site, you can rebuild yourself or using third-party services. For SEO purposes, you may need to setup redirects if you still hold the old domain name.
Large collection of eBooks, audios, videos, software and much more.
Archived pages are simply snapshots like a screenshot. You can’t login, access database, view content behind password protection.