Content management platforms that need paid hosting like WordPress offers easy way to backup your content. You can use plugins, FTP account or download the content directly from the hosting server. Unfortunately, many free hosting platforms do not offer content backup option as you can’t access their hosting server. This means you will not get the backup of your site and need to rely on the service provider forever. Apps like SiteSucker are very useful in this situation to get the backup and use for restoring purpose. In this article, we will explain how to download entire website using SiteSucker app in Mac.
Why Downloading Site Will Help?
Normally, you can view the source code or use developer tools in browser to check the webpage online. Alternatively, you can download by saving the page directly from the browser. However, downloading entire site with same structure is an impossible task without third-party apps. Locally downloaded websites can help in the following situations:
- You can easily open the source code by opening the HTML page in code editor apps like Visual Studio Code. This will help you to find the elements easily compared to viewing the source code on the editor or using developer tools.
- It is easy to update the content and test your results without internet connection.
- Keep a backup of your website and use for restoring purposes when needed.
- You can get the copy of entire website and use for migrating to another server.
Example Case for Using SiteSucker
Let us explain with an example case.
You have created a HTML website using Bootstrap 3 framework and uploaded on your web server years back. Over time, you have changed the computer and lost the local backup of your original content. Now that you want to update the content with latest Bootstrap 5 version and re-upload on your server. This will be a hassle task without having a backup. The only option for you is to download entire content using FTP, modify locally and then update back. However, you need to spend long time in manually downloading the same structure of CSS, JS, HTML and images folders.
The easy option is to use SiteSucker and download entire website with the same structure. You can use local code editor apps to modify the content without even having internet connection. It is also possible to retain all hyperlinks without changing so that it is easy to re-upload the content after modification. This will save you huge time and avoid unnecessary mistakes in downloading and editing through FTP.
SiteSucker App for Mac
There are many free and premium apps available for downloading entire website’s content. We recommend using SiteSucker app since it is reliable and very easy to use. Open App Store on your MacBook and search for “sitesucker” app.
The app will cost you $4.99 which is one of the cheapest for doing such tasks. Purchase the app and install it on your Mac.
- SiteSucker will localize entire website and replicate the structure on your Mac.
- It asynchronously downloads the entire content including stylesheets, PDF files and images on the site.
There is also a Pro version which you can download from the developer’s site. Pro version will help you downloading embedded third-party content on the website like YouTube videos. You can try the 14 days free trial and purchase the Pro version if your primary objective is to download the site with embedded content. Otherwise, the App Store version is sufficient for downloading the site’s content.
Configuring SiteSucker App
Before start using the app, you have to check the default configuration and change the settings if needed. If you are new to this activity, go through the help documents on the developer’s site to understand each setting. First, open the app by going to “Applications” folder in the “Finder” app. Alternatively, you can press “Command + Space” and search for SiteSucker app using Spotlight Search. The app will look like below when you open and click on the “Settings” to open app settings screen.
As you can in the below screenshot, the app offers plenty of settings that you can configure. This will control the way the app downloads content from the site.
Important Settings for SiteSucker App
Here are some of the important options you need to check before using the app.
- Robots.txt – you can choose to ignore the robots.txt exclusions under “General” section. This is a useful option for the sites blocking user agents with robots.txt.
- Localize – the app will download and convert all hyperlinks to relative URLs. If there are no relative URL exist, it will retain the link pointing to the server. However, if you do not want to localize the URLs and structure, choose “None” for “File Modification” option under “General” tab.
- Change destination folder – by default the will download the content in your “Downloads” folder”. You can change this download location by browsing and selecting the folder under “General” tab.
- Logging – enable or disable error reporting and logging functions under “Log” tab.
- Limiting files, size and levels – restrict the number of files, size and depth levels for download under “Limit” tab.
- Change user agent string – each web browser has its own user agent string. Some websites may block the access from unknown browser user agents. The app by default uses “SiteSucker” as a user agent and you may not be able to download the content if this user agent is blocked by the website owner. In such a case, you can try changing the user agent to Firefox, Chrome, Edge or Safari and download the content.
- Change path – this is an advanced option you can use when you want to change the folder structure from the source structure. For example, images folder on the server may be available under uploads directory like this “site.com/uploads/images/”. You can change this to “local-folder/site.com/images/” folder to move the images directly under the root of the site. You can do this under “Path” section by setting up the required rules.
- Select file types – restrict the file types you want to download or ignore under “File Type” section.
Make sure to save the settings for each project you download so that you can use the same settings when using next time. If you are not sure about an option, click on the question mark icon to open the help page from the developer’s site.
Downloading Entire Website
After finishing the configuration settings, enter your site’s URL in the input box. Click on “Download” button to start exporting the entire website’s content to local Mac.
- You can view the files downloaded and remining status along with download progress of each file.
- It is possible to pause or stop the action after starting the download.
- Click on the “History” to select any previously downloaded site from the history.
- Go to the destination folder and find the downloaded content.
Caution When Using SiteSucker
SiteSucker will work on any type of sites to download the entire content. However, you should remember the followings when extracting the content:
- You can easily download HTML websites with few hundred pages. For larger sites, make sure to set the limit to restrict the number of pages.
- It is easy to download plain HTML sites with correct structure. However, you may miss the structure in certain PHP sites like WordPress.
- Avoid downloading larger sites with thousands of pages or using multiple sessions running in parallel. Your Mac may run out of memory or crash. Unfortunately, the app will not show the resources needed or time left for completing the download.
- It is not possible to extract single page using the app even you enter specific page’s URL. The app will follow certain rules like checking robots.txt, XML Sitemap, etc. and start downloading all pages from given domain. Though you can pause or stop the download, you can’t control the sequence of download to get specific page at first.
SiteSucker is one of the best apps for duplicating your own site in local Mac. You can create a backup and browse through the site’s pages similar to online website. However, the app may fail if you use without proper limits and configuration. Make sure to go through the configuration document and try to download the content in smaller junk.