Bypass reCAPTCHA And Prevent IP Blocking Using Tor Proxy

When we run web crawlers, sometimes we get blocked by the target site. Sometimes we get reCAPTCHA to solve, and crawling gets interrupted. We can rotate the IP address with each request to avoid these issues, which solves the IP blocking and reCAPTCHA issues.

This blog is the written version of the video content I published on YouTube. If you prefer watching videos than reading blogs, then you can watch the video.

We will use a Tor proxy to rotate the IP address with each HTTP request. First, let's install the Tor browser. Open the terminal and use the following commands to install the Tor browser on your machine:

1sudo add-apt-repository ppa:micahflee/ppa
2sudo apt install torbrowser-launcher

Now, you should have a torrc file in your /etc/tor/ directory.

Edit torrc file in your /etc/tor/ directory

open the torrc file using $ sudo nano torrc and uncomment the following lines (usually these are commented out):

1ControlPort 9051
2HashedControlPassword 16:2D99FRCE35858C6F608DB3122A6C8DA4C35BE5E105B9B54A7E438B122F
3CookieAuthentication 1

There is a HashedControlPassword in your torrc file, we will replace this password with a new password created by you. Use the following command to create a new password. Open up your terminal and create a new password using the following command.

1tor --hash-password <password key>

For example,

1tor --hash-password mypass

This will create a password for the key mypass and display the password on your terminal. Note the key and password both. We will use both later.

Now, replace the HashedControlPassword in your torrc file, which is located in /etc/tor/ directory. You can use nano or any other editor. Save the torrc file.

Now, we will use the mypass keyword to renew connections with each request. First, you have to install the stem and request library.

1pip install stem
2pip install requests

Now, create a new python file and use the following code to change your IP address:

 1from stem import Signal
 2from stem.control import Controller
 3import requests
 4
 5
 6def get_tor_session():
 7    # initialize a requests Session
 8    session = requests.Session()
 9    # this requires a running Tor service in your machine and listening on port 9050 (by default)
10    session.proxies = {
11        "http": "socks5://127.0.0.1:9050",
12        "https": "socks5://127.0.0.1:9050",
13    }
14    return session
15
16
17def renew_connection():
18    with Controller.from_port(port=9051) as controller:
19        controller.authenticate(password="mypass")
20        controller.signal(Signal.NEWNYM)

See, how we have used mypasskeyword in the renew_connection method.

Now, let's use the tor session to send http request to some URLs.

 1headers = {
 2    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11"
 3}
 4
 5
 6def send_request(url_list):
 7    for url in url_list:
 8        try:
 9            # renew the connection
10            renew_connection()
11            # create a new tor session
12            session = get_tor_session()
13            html_content = session.get(url, headers=headers).text
14            print( "IP rotated to:",
15                session.get("https://ident.me", headers=headers).text)
16
17        except Exception as e:
18            print(e)
19            pass
20
21if __name__ == "__main__":
22    # IP address before IP rotation
23    print("Your Public IP:", requests.get("https://ident.me").text)
24    urls = [
25        "https://www.google.com",
26        "https://www.facebook.com",
27        "https://www.youtube.com",
28        "https://www.amazon.com",
29    ] * 10
30
31    send_request(urls)

We are using the https://ident.me site to print the IP address with each request. You will see different IP address with each print statement execution.

Following this procedure, the program might become slow. So it's better to use multiprocessing or multithreading to make the process faster. You can do the following to do multiprocessing.

 1from multiprocessing import Pool
 2
 3if __name__ == "__main__":
 4    # IP address before IP rotation
 5    print("Your Public IP:", requests.get("https://ident.me").text)
 6
 7    urls = [
 8        "https://www.google.com",
 9        "https://www.facebook.com",
10        "https://www.youtube.com",
11        "https://www.amazon.com",
12    ] * 10
13
14    # send requests in parallel using multiprocessing
15    with Pool(processes=20) as pool:
16        pool.map(send_request, [urls[i : i + 10] for i in range(0, len(urls), 10)])
17        pool.close()
18        pool.join()

Code can be found in this GitHub repository : https://github.com/sksoumik/rotate_IP

Thanks for the read.

Posts in this Series