How to use httrack

How to use httrack

MODULE 15:- Website Hacking

Hackers are looking for the vulnerability on website but some they find and some time not, But in other hand security researcher are using techniques to catch hackers red handed on crime place. So hackers copy the website first on local computer and start looking on code and try to find vulnerability. In this techniques they use tool name httrack. In this tutorial I will describe how to use httrack website copier.

HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility.
This is a tool download website for offline on your computer’s local directory, for testing purposes. It copies web pages one by one on the local directory.

If you are thinking to test a website for security purposes, it is recommended to download the website on your local server. Setup a local web server and start accessing the website like a real server. Now your local server has been created using it for finding vulnerabilities on the website. better luck!

Here one more thing I want to share with you that Httrack doesn’t come with Kali Linux. if you are a Kali Linux user then it is a problem, but not big. Httrack’s Linux available to download and install, So you can make it available on Kali Linux easily.

I think you are aware with the Advance package tool (APT), it is used to install, remove, reinstall packages on Debian based Operating system like Kali Linux and Ubuntu. So here I am using apt-get to install httrack. before start installation, we should update apt in Kali Linux for new headers. Execute the following command:

#apt-get update

#apt-get install httrack

You have to download httrack, Now it’s time for a new project. Here I am going to download my own website https://www.cyberpratibha.com/blog/blog So first create a directory with the project name.

How to use httrack website copier – step by step guide

Use the following command in the terminal.

  • #httrack
  • Enter Project Name: cyberpratibha.com/blog
  • Enter base path: where do you want to download the website
  • Enter URL: Website Url
  • Select an option and start mirroring website:

Video tutorial for how to use httrack website copier

If Appreciate My Work, You should consider:

  • Join Group for Discussion Facebook Group
  • Get your own self-hosted blog with a Free Domain at ($2.96/month)
  • Buy a Coffee to Us! Make Small Contribution by Paypal
  • Support us by taking our :Online Courses
  • Contact me : [email protected]

One thought on “ How to use httrack website copier command line Guide for beginners ”

Hi Vijay, I hope you are well. Thank you for the excellent resources. You have quite a strength in an excellent field. Can we schedule a time to speak? I am in the Atlanta, GA (US) area and need 15 minutes to discuss a project with you. Thank you. Mark

Account Information

Share with Your Friends

How to clone a website with httrack

How to clone a website with httrack

Jack Wallen walks you through the process of cloning a website with both the httrack command line tool and the webhttrack GUI.

If you’re a website developer, a business owner, or a student in the field of IT, you’ve probably come across an instance where you needed to quickly clone a website. Say, for instance, you have a site you administer that is having problems; you might want to clone that site and then start debugging the clone. Working with the clone certainly beats the possibility of further breaking your original site.

There are plenty of ways to clone a site. One solution I have used offers both command line and GUI options, and is called httrack. The application can be used on Linux, Windows, macOS, and Android, and does a full copy of a website for local browsing. The command line tool is, as you expect, called httrack. There is also a GUI version for Linux called webhttrack. I’m going to demonstrate how to use both the command line and GUI tools, so you can make quick clones of your websites. I’ll be demonstrating both versions from Ubuntu Linux.

Installation

Both command line and GUI tools can be found in the standard repositories. To install them, open up a terminal window and issue the command:

sudo apt install httrack webhttrack

When prompted, type your sudo password, accept the installation, and allow it to complete. That’s all there is to installing the tools.

Command line usage

First we’ll be copying a site using the command line tool. This can take some time, depending on how large your site is. The command for making the copy is:

httrack http://SITE_URL -O LOCALDIRECTORY

Where SITE_URL is the actual URL of the site you want to copy and LOCALDIRECTORY is the directory on your local drive to store the copy. Once the command completes, you’ll see the newly created clone in LOCALDIRECTORY. You can then start working with that clone without affecting your production site.

The biggest caveat to using this tool is that, depending on how the site was written and how it is housed, your results may vary. If you find httrack downloads little more than an index file, chances are, it won’t work on your site. I have found, so far, that WordPress sites, especially those housed on third-party hosts, are next to impossible to clone using this tool.

GUI usage

The GUI tool gives you a bit more user-friendly power. To start the GUI, open up a terminal window and issue the command webhttrack. This will open a browser window with the GUI at the ready. In the first screen, select your language, and click Next >>. In the next window (Figure A), enter a new project name, and select one of the pre-defined categories. Type in a base path to house the downloaded files, and click Next >>.

Figure A

In the next window (Figure B), select the action to be used (since this is a first download, the default will be fine) and type the URL for the target site in the address box.

Figure B

Click on the Set Options button and you can configure options like browser ID, scan rules, robot rules, number of connections, proxy, limits, and more (Figure C).

Figure C

If you run a clone, and find it fails, chances are you’ll need to revisit the Options section and make adjustments. This, of course, will depend upon the type of site you’re attempting to clone. Once you’ve set the options exactly how you need them, click OK then click Next >> then click the Start>> button. At this point (Figure D), the site will start cloning (saving it in the configured local directory).

Figure D

When the download completes, you can begin working with your clone.

Not perfect, but helpful

So long as you’re not trying to download a WordPress site, either httrack or webhttrack will do a great job of downloading a clone of your site, so you can debug, backup, or whatever your needs may be. As I said, depending upon the type of site you’re looking to work with, you’ll have to toy around with the settings to get this to work. Give this handy tool a try and see if it doesn’t wind up as your go-to site clone tool.

Atomic Object’s blog on everything we find fascinating.

I’ve recently been experimenting with HTTrack, an open-source utility that makes it possible to download a full copy of any website. HTTrack is essentially a web crawler, allowing users to retrieve every page of a website merely by pointing the tool to the site’s homepage.

From the HTTrack homepage:

“[HTTrack] allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site’s relative link-structure.”

I thought I’d share my experience with it.

Installing HTTrack

There are a couple of different ways to install HTTrack:

  • HTTrack Website: Download and install HTTrack manually. The download contains a README with detailed directions.
  • Homebrew: Users of Homebrew can easily install HTTrack with the formula `brew install httrack`.

Basic Syntax

The syntax of HTTrack is quite simple. You specify the URLs you wish to start the process from, any options you might want to add ([-option]), any filters specifying places you should ([+]) and should not ([-]) go, and end the command line by pressing Enter. HTTrack then goes off and does your bidding.

At its most basic, HTTrack can be run by specifying just a single URL:

httrack http://example.com

This will unleash the program on the http://example.com domain with default settings. HTTrack retrieves this URL, then parses the page for more links. Any links found within the page are downloaded next and parsed for additional links. The process continues on until the crawler cannot find any links it hasn’t already downloaded.

You can also add options to the basic command to customize HTTrack’s behavior. For example, you can specify forbidden URLs and directories, alter download speeds, and limit downloads to a certain filetype. HTTrack has a huge number of options, accessible via ` httrack –help ` and at the project website.

Custom Options

My goal for HTTrack was to create a static copy of the Atomic Object marketing website. To speed up my download and decrease the load on the server, I wanted to download only HTML, CSS, and JavaScript files. Images and other file types like videos and PDFs tend to be the largest files, so I intentionally omitted them.

Through trial and error, I came up with the following formula (broken out by line to make more readable):

Let’s take a detailed look at what each option in the command does:

httrack https://atomicobject.com
As we saw in the basic syntax above, this points HTTrack at the site we want to copy.

-atomicobject.com/assets/* -atomicobject.com/documents/*
-atomicobject.com/uploadedImages/*

A rule that begins with a minus sign indicates something that we don’t want HTTrack to download. In this case, we’ve specified three URLs not to download, because this is where all of our image and other non-HTML assets are located.

Note that each URL includes a wildcard symbol (“*”) at the end of the path. The use of the wildcard means that any file located within these three directories will match the rule, effectively disallowing the crawler from the entire directory.

+atomicobject.com/*.css +atomicobject.com/*.js
A rule preceded by a plus (+) sign indicates something we do want to download.

It’s important to understand that HTTrack determines rule precedence from left to right. Because these rules come after (i.e., to the right) of the rule telling us to ignore the `/assets` directory, they will overrule it. That means that the assets directory will be ignored, unless the filename ends in .css or .js. This allows us to retrieve any CSS and JavaScript files, while still excluding other asset types, like images and videos.

/httrack-copies/atomicobject/”
The –path option specifies where we want HTTrack to save downloaded files. Without this option, files are downloaded to the current working directory.

–verbose
The verbose option tells HTTrack to output its log to the Terminal, allowing us to monitor the program as it runs.

Conclusion

With the above settings, I can create a full copy of all HTML, CSS, and JS files on the Atomic website in just under four minutes. If you’re looking for an efficient tool to create a copy of a website, make sure to check out HTTrack.

Contents

  1. How to log into the Blitz Research website
    1. New project
    2. Find login page to start from
    3. Proxy settings
      1. Firefox
      2. Internet Explorer
    4. Capture the login details
    5. Defining filters
    6. Success

How to log into the Blitz Research website

For this step-by-step guide I am going to make HTTrack log into the Blitz Research website and download selected pages—homepage and the Blitz3D manual.

The steps used here can be applied to many websites with login forms!

New project

Create a new project in WinHTTrack:

How to use httrack

Click the Next button.

Click the Add URL button. A window will open.

How to use httrack

Click the Capture URL button. A window will open with instructions for changing your browser’s proxy settings (these settings will be different for you).

How to use httrack

Find login page to start from

Leave HTTrack alone for now and open your web browser. Go to the website you wish to mirror, and find a page where you would normally log in. If you are already logged in then you will need to log out.

For the Blitz Research website, there is a “Login” link from the homepage to another page where a login form is presented. This will be the starting point for the HTTrack project.

How to use httrack

Proxy settings

Now you need to temporarily change your browser’s proxy settings to those shown in HTTrack. If you already have proxy settings defined you will want to write them down so that you can restore them later.

Firefox

In Firefox go to the Tools menu and choose Options , then click the General icon and click the Connection settings button. Choose Manual proxy configuration and copy the Proxy’s address from the HTTrack window to HTTP Proxy box in Firefox, and copy the Proxy’s port from HTTrack to the Port box in Firefox.

How to use httrack

Click OK . Click OK again.

Internet Explorer

In Internet Explorer (IE) go to the Tools menu and choose Internet Options . Click the Connections tab and click the LAN Settings button. In the window that opens, tick the box for Use a proxy server for your LAN . Copy the Proxy’s address from the HTTrack window to Address box in IE, and copy the Proxy’s port from HTTrack to the Port box in IE.

How to use httrack

Click OK . Click OK again.

Capture the login details

With the proxy settings now in place HTTrack is ready to capture the details for the login form. Type in your username and password.

How to use httrack

Submit the form (in this case I click the Login button) and you should now see a page telling you that HTTrack has caught the link.

How to use httrack

Return to HTTrack and you will notice that the URL field is now populated with a URL (do not edit this). Click the OK button and that new URL will now show in the Web Addresses box.

How to use httrack

Defining filters

At this point you could click Next and run the project, however because the starting URL is within the “Account/” directory (www.blitzbasic.com/Account/_login.php) the project will be scoped to only download anything in Account and below.

Because my purpose is to mirror the homepage and the Blitz3D manual, I will add some Filters to control where HTTrack crawls.

Click the Set options button and select the Scan Rules tab. Set the filters to:

Line-by-line this means:

  1. Exclude all files and links
  2. Allow the homepage
  3. Allow the Manuals index page
  4. Allow all pages in the b3ddocs directory
  5. Allow these filetypes (from any server)

Click OK to accept the options.

Click the Next button.

Click Finish to begin the HTTrack mirror.

Success

Assuming everything was fine with the username/password and proxy settings, HTTrack should successfully login and mirror everything needed. I browse my project and see success!

Kali Linux Tools Listing

HTTrack Description

HTTrack HTTrack is an easy-to-use offline browser utility. It allows you to download a World Wide website from the Internet to a local directory, building recursively all directories, getting html, images, and other files from the server to your computer. HTTrack arranges the original site’s relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.

HTTrack is a version for Linux.

WinHTTrack is the version of HTTrack for Windows 2000/XP/Vista/Seven/10 later, and WebHTTrack is a graphical interface for Linux/ Unix/BSD versions.

Author: Xavier Roche

HTTrack Help

with options listed below: (* is the default value)

HTTrack Usage Example

Mirror site www.someweb.com/bob/ and only this site:

Mirror the two sites together (with shared links) and accept any .jpg files on .com sites:

Get all files starting from bobby.html, with 6 link-depth, and possibility of going everywhere on the web:

Runs the spider on www.someweb.com/bob/bobby.html using a proxy:

Updates a mirror in the current folder:

Bring you to the interactive mode:

Continues a mirror in the current folder:

How to install HTTrack

Installation on Kali Linux, Debian, Mint, Ubuntu

Installation on BlackArch