Home » Tutorials » How to Automate Reconnaissance in Python

How to Automate Reconnaissance in Python

In the world of cybersecurity, information is power, and gathering that information quickly and efficiently can make all the difference. Imagine having the ability to scan domains, uncover hidden subdomains, and pinpoint vulnerabilities with just a few lines of code. This is where the magic happens as Python transforms tedious tasks into seamless operations, unlocking the secrets hidden within the data.

Today, you’ll learn how to automate reconnaissance using Python. We’ll dive into tools like socket, whois, nmap, and BeautifulSoup to streamline the process of gathering crucial information about a target domain. By the end of this tutorial, you’ll have a robust script capable of performing a thorough reconnaissance, giving you the edge in the cybersecurity realm. Let’s get started!

Table of Contents

Disclaimer

Please note: This tutorial is intended for educational purposes only. The techniques demonstrated are meant to help you understand how to improve cybersecurity practices, not to exploit vulnerabilities. Please ensure you have proper authorization before conducting any reconnaissance or scanning activities on domains that you do not own. Use this knowledge responsibly.

Getting Started

Make sure to install these libraries for the code to function properly:

$ pip install python-whois
$ pip install requests
$ pip install python-nmap
$ pip install tk 
$ pip install beautifulsoup4
$ pip install validators

Since we’ll be using Nmap, you’ll need to install it first. Start by downloading the version that corresponds to your operating system. The installation process is straightforward. Once installed, add Nmap to your system’s “Environment Variables” by setting its directory path, and then press OK.

If you want to learn more about how to install Nmap, check out this tutorial.

Imports

import socket
import whois
import requests
import nmap
import threading
import tkinter as tk
from tkinter import filedialog, messagebox, scrolledtext
from bs4 import BeautifulSoup
import validators

As every coder knows, you can’t kick off any project without first gathering your trusty tools. So, let’s dive into what we’ll be working with:

  • Starting with socket: This will handle all our network operations.
  • Next up is whois: Perfect for digging up details about domain owners.
  • Then there’s requests: Our go-to for making HTTP requests to web servers.
  • nmap joins the party: Essential for scanning network services and detecting operating systems.
  • We also have threading: To keep things smooth by running tasks in the background, so our main window doesn’t freeze up.
  • For the interface, we’ll use tkinter: It makes creating a graphical user interface a breeze.
  • BeautifulSoup steps in: Ready to parse HTML and help us extract all the info we need from web pages.
  • And last but not least, validators: Ensuring our URLs are properly formatted and good to go.

Browse for Wordlist

# Function to browse for subdomain wordlist
def browse_wordlist():
   global wordlist_path
   wordlist_path = filedialog.askopenfilename(title="Select Subdomain Wordlist")
   if wordlist_path:
       output_text.insert(tk.END, f"Selected wordlist: {wordlist_path}\n")

If you’re familiar with our website, you know we’re all about making our programs user-friendly, and this one is no different. The browse_wordlist() function exemplifies this by using filedialog to let users choose their preferred subdomain wordlist file.

This function is made global so every part of the script can access it, and it informs the user that the file has been successfully uploaded by displaying its path on the output_text widget with output_text.insert().

Threaded Domain and Subdomain Scan: Initiation and Execution

We can’t call a program user-friendly if it freezes while doing its job, which is why we created the start_scan_thread() function. This function starts a new thread to run the start_scan() function without freezing the main application.

# Function to start the scan in a new thread
def start_scan_thread():
   thread = threading.Thread(target=start_scan)
   thread.start()

The start_scan() function starts by checking if the user has input a domain. If not, it reminds them to enter one.

domain = domain_entry.get()
if not domain:
    messagebox.showwarning("Input Error", "Please enter a domain.")
    return

It then retrieves the domain name, converts it to an IP address, and displays it. If the function encounters issues getting the IP address, it lets the user know.

output_text.insert(tk.END, f"Starting scan for {domain}...\n")

# Get the IP address
try:
    ip = socket.gethostbyname(domain)
    output_text.insert(tk.END, f"IP Address of {domain}: {ip}\n")
except Exception as e:
    output_text.insert(tk.END, f"Error: Could not retrieve IP for {domain}: {e}\n")
    return

For the next step, it fetches registration details about the domain and shows these details in the output area. If there’s an error in retrieving this information, it provides an error message.

# WHOIS Information
output_text.insert(tk.END, f"\nFetching WHOIS information for {domain}...\n")
try:
    domain_info = whois.whois(domain)
    output_text.insert(tk.END, f"{domain_info}\n")
except Exception as e:
    output_text.insert(tk.END, f"Error: Could not retrieve WHOIS information: {e}\n")

Following that, if a wordlist path is available, it moves on to enumerate potential subdomains.

# Subdomain Enumeration
if wordlist_path:
    output_text.insert(tk.END, "\nEnumerating subdomains...\n")
    enumerate_subdomains(domain, wordlist_path)

It then performs web technology fingerprinting to identify the technologies behind the website.

# Web Technology Fingerprinting (Alternative)
output_text.insert(tk.END, "\nFingerprinting web technologies...\n")
fingerprint_web_technologies(domain)

The function continues by checking for XSS vulnerabilities.

# XSS Detection
output_text.insert(tk.END, "\nDetecting XSS vulnerabilities...\n")
detect_xss(domain)

Next, it looks for any signs of open redirects.

# Open Redirect Detection
output_text.insert(tk.END, "\nDetecting Open Redirect vulnerabilities...\n")
detect_open_redirect(domain)

Lastly, it performs OS and service detection to uncover the underlying operating system and services running on the IP address.

# OS Fingerprinting and Service Detection
output_text.insert(tk.END, "\nRunning OS and service detection...\n")
os_fingerprinting(ip)

Subdomain Enumeration

So, you might be wondering why we needed that subdomain wordlist earlier: It’s because it’s essential for the enumerate_subdomains() function. This function kicks off by opening the wordlist file and reading each line. It then puts together each word from the list with the main domain to create possible subdomain URLs. The function checks if these URLs are valid and lets you know if any of them aren’t. If a URL is valid, the function sends a request to see if the subdomain is actually live. It then tells you if the subdomain is active or not. If there are any issues during this process, the function will keep you updated on those as well.

Next, the function sends a request to each valid URL and waits to see what comes back. If it gets a “200 OK” response, it means the subdomain is live and working. If not, the subdomain probably doesn’t exist. It then updates you with the results and lets you know if anything went wrong during the process.

# Function to enumerate subdomains
def enumerate_subdomains(domain, wordlist_path):
   with open(wordlist_path, 'r') as file:
       subdomains = file.read().splitlines()


   for subdomain in subdomains:
       url = f"http://{subdomain}.{domain}"


       # Validate the URL before making the request
       if not validators.url(url):
           output_text.insert(tk.END, f"Invalid URL: {url}\n")
           continue


       try:
           response = requests.get(url)
           if response.status_code == 200:
               output_text.insert(tk.END, f"Found subdomain: {url}\n")
           else:
               output_text.insert(tk.END, f"No subdomain found at: {url} (Status code: {response.status_code})\n")
       except requests.ConnectionError:
           output_text.insert(tk.END, f"Connection error: Could not connect to {url}\n")
       except requests.exceptions.InvalidURL:
           output_text.insert(tk.END, f"Error: Invalid URL detected for {url}\n")
       except Exception as e:
           output_text.insert(tk.END, f"Unexpected error with {url}: {e}\n")

Web Technology Fingerprinting

Now that we’ve tackled scanning for potential subdomains, it’s time to shift gears and uncover the technologies behind a website. This is where the fingerprint_web_technologies() function comes into play. It starts by forming the domain URL and sending a GET request to fetch the raw HTTP response from the server. This response holds clues about the technologies in use.

Once we get the response, the function digs into specific headers to find information about the server and other tech details, which it then shows in the output_text widget. Next, it checks the Set-Cookie header to identify the Content Management System (CMS) running on the site. This can reveal potential vulnerabilities for further investigation.

Finally, the function uses BeautifulSoup to parse the HTML for any additional clues or vulnerabilities. If something goes wrong during this whole process, don’t worry—the function will let you know with an error message.

# Function to fingerprint web technologies (Alternative Approach)
def fingerprint_web_technologies(domain):
   try:
       url = f"http://{domain}"
       response = requests.get(url)


       # Check headers for web technologies
       server = response.headers.get('Server')
       x_powered_by = response.headers.get('X-Powered-By')
       set_cookie = response.headers.get('Set-Cookie')


       output_text.insert(tk.END, f"Server: {server}\n")
       output_text.insert(tk.END, f"X-Powered-By: {x_powered_by}\n")


       # Simple check for popular CMS by cookies
       if set_cookie:
           if "wp" in set_cookie.lower():
               output_text.insert(tk.END, "Detected CMS: WordPress\n")
           elif "drupal" in set_cookie.lower():
               output_text.insert(tk.END, "Detected CMS: Drupal\n")
           elif "joomla" in set_cookie.lower():
               output_text.insert(tk.END, "Detected CMS: Joomla\n")


       # Parsing HTML to detect common meta tags or scripts
       soup = BeautifulSoup(response.text, 'html.parser')
       if soup.find("meta", {"name": "generator"}):
           generator = soup.find("meta", {"name": "generator"})['content']
           output_text.insert(tk.END, f"Detected by meta generator tag: {generator}\n")


   except Exception as e:
       output_text.insert(tk.END, f"Error: Could not fingerprint technologies: {e}\n")

Detecting XSS Vulnerabilities

Let’s dive into testing if the website has any vulnerabilities that might allow malicious script injections. This is where the detect_xss() function comes into play. It starts by using a simple JavaScript alert as a payload to check for XSS vulnerabilities. If the alert box with the message “XSS” pops up, it means there’s a vulnerability.

Once the payload is ready, the function constructs a URL by adding the payload as a query parameter to the domain you provided. It then sends a GET request to this URL and waits for the response. If the response includes the XSS payload, it’s a sign that the site is vulnerable. If not, it means no vulnerability was detected. And if any issues come up during this process, the function will let you know with an error message.

# Function to detect XSS vulnerability
def detect_xss(domain):
   xss_payload = "<script>alert('XSS')</script>"
   test_url = f"http://{domain}/?q={xss_payload}"
   try:
       response = requests.get(test_url)
       if xss_payload in response.text:
           output_text.insert(tk.END, f"Potential XSS vulnerability found at {test_url}\n")
       else:
           output_text.insert(tk.END, f"No XSS vulnerability detected at {test_url}\n")
   except Exception as e:
       output_text.insert(tk.END, f"Error testing XSS: {e}\n")

Detecting Open Redirect Vulnerabilities

Next, we’ll explore open redirect vulnerabilities. Imagine it as setting a little trap with a URL and seeing if the website takes the bait. The detect_open_redirect() function starts by crafting a redirect payload, which we add as a query parameter to the domain. This creates a URL that’s meant to redirect the site to a malicious page.

We then send a GET request to this URL with the redirect feature turned off. We wait to see if the site tries to redirect us. If we get a status code of 302 and the Location header points to our payload, it means the site has a vulnerability. We’ll let you know if that’s the case. If no redirect attempt is spotted, we’ll tell you everything’s clear. And of course, if anything goes wrong along the way, you’ll get a heads-up about that too.

# Function to detect open redirect vulnerability
def detect_open_redirect(domain):
   open_redirect_payload = "http://evil.com"
   test_url = f"http://{domain}/?redirect={open_redirect_payload}"
   try:
       response = requests.get(test_url, allow_redirects=False)
       if response.status_code == 302 and response.headers.get('Location') == open_redirect_payload:
           output_text.insert(tk.END, f"Potential Open Redirect vulnerability found at {test_url}\n")
       else:
           output_text.insert(tk.END, f"No Open Redirect vulnerability detected at {test_url}\n")
   except Exception as e:
       output_text.insert(tk.END, f"Error testing Open Redirect: {e}\n")

Performing OS Fingerprinting and Service Detection

For this step, we’re getting into some pretty cool reconnaissance. We’re going to uncover not just what operating system the target is using, but also the services it’s running. We’re doing this with the help of Nmap and our os_fingerprinting() function.

Here’s how it works: We start by launching Nmap’s port scanner with nmap.PortScanner(). It then scans the target IP to figure out its operating system. We’ll see a detailed report of what Nmap finds, including how sure it is about each operating system it detects. If there’s any hiccup along the way, we’ll make sure you’re informed about it.

# Function to perform OS fingerprinting and service detection
def os_fingerprinting(ip):
   try:
       scanner = nmap.PortScanner()
       scan_result = scanner.scan(ip, arguments='-O')
       os_fingerprint = scan_result['scan'][ip].get('osmatch', [])
       for os_info in os_fingerprint:
           output_text.insert(tk.END, f"Detected OS: {os_info['name']} - Accuracy: {os_info['accuracy']}%\n")
   except Exception as e:
       output_text.insert(tk.END, f"Error during OS fingerprinting: {e}\n")

Setting Up the Main Window for Automated Reconnaissance

Well then now that we have finished everything, all that’s left is making a graphical interface that controls this entire operation, which is what we are about to do.

# Main GUI setup
root = tk.Tk()
root.title("Reconnaissance and Vulnerability Scanner - The Pycodes")

As we create the main window using tk and set its title, then we create an entry box for the input and label it, as well as the “Browse” button that calls the browse_wordlist() function and also label it,

# GUI Elements
domain_label = tk.Label(root, text="Enter the Domain:")
domain_label.pack()

domain_entry = tk.Entry(root, width=50)
domain_entry.pack()

subdomain_label = tk.Label(root, text="Select Subdomain Wordlist:")
subdomain_label.pack()

subdomain_button = tk.Button(root, text="Browse", command=browse_wordlist)
subdomain_button.pack()

Next, we create the “Start Scan” button that calls the start_scan_thread() function, as well as the scrollable output_text widget that will display the results,

scan_button = tk.Button(root, text="Start Scan", command=start_scan_thread)
scan_button.pack()

# Adding the scrolled text widget for output with a scrollbar
output_text = scrolledtext.ScrolledText(root, height=20, width=80, wrap=tk.WORD)
output_text.pack()

After that, we create the global variable wordlist_path to store the subdomain wordlist file we will upload:

# Global variable to store file path for subdomain wordlist
wordlist_path = None

And finally, we start the main event loop and keep the main window running and responsive to the user with the mainloop() method.

# Run the GUI loop
root.mainloop()

Example

I executed this script on my Linux system as shown below in the image:

I used this wordlist in my example: https://github.com/danTaler/WordLists/blob/master/Subdomain.txt

If you want to create your own wordlist, check out this tutorial.

And also on my friend’s Windows system:

Full Code

import socket
import whois
import requests
import nmap
import threading
import tkinter as tk
from tkinter import filedialog, messagebox, scrolledtext
from bs4 import BeautifulSoup
import validators


# Function to browse for subdomain wordlist
def browse_wordlist():
   global wordlist_path
   wordlist_path = filedialog.askopenfilename(title="Select Subdomain Wordlist")
   if wordlist_path:
       output_text.insert(tk.END, f"Selected wordlist: {wordlist_path}\n")


# Function to start the scan in a new thread
def start_scan_thread():
   thread = threading.Thread(target=start_scan)
   thread.start()


# Function to start the scan process
def start_scan():
   domain = domain_entry.get()
   if not domain:
       messagebox.showwarning("Input Error", "Please enter a domain.")
       return


   output_text.insert(tk.END, f"Starting scan for {domain}...\n")


   # Get the IP address
   try:
       ip = socket.gethostbyname(domain)
       output_text.insert(tk.END, f"IP Address of {domain}: {ip}\n")
   except Exception as e:
       output_text.insert(tk.END, f"Error: Could not retrieve IP for {domain}: {e}\n")
       return


   # WHOIS Information
   output_text.insert(tk.END, f"\nFetching WHOIS information for {domain}...\n")
   try:
       domain_info = whois.whois(domain)
       output_text.insert(tk.END, f"{domain_info}\n")
   except Exception as e:
       output_text.insert(tk.END, f"Error: Could not retrieve WHOIS information: {e}\n")


   # Subdomain Enumeration
   if wordlist_path:
       output_text.insert(tk.END, "\nEnumerating subdomains...\n")
       enumerate_subdomains(domain, wordlist_path)


   # Web Technology Fingerprinting (Alternative)
   output_text.insert(tk.END, "\nFingerprinting web technologies...\n")
   fingerprint_web_technologies(domain)


   # XSS Detection
   output_text.insert(tk.END, "\nDetecting XSS vulnerabilities...\n")
   detect_xss(domain)


   # Open Redirect Detection
   output_text.insert(tk.END, "\nDetecting Open Redirect vulnerabilities...\n")
   detect_open_redirect(domain)


   # OS Fingerprinting and Service Detection
   output_text.insert(tk.END, "\nRunning OS and service detection...\n")
   os_fingerprinting(ip)


# Function to enumerate subdomains
def enumerate_subdomains(domain, wordlist_path):
   with open(wordlist_path, 'r') as file:
       subdomains = file.read().splitlines()


   for subdomain in subdomains:
       url = f"http://{subdomain}.{domain}"


       # Validate the URL before making the request
       if not validators.url(url):
           output_text.insert(tk.END, f"Invalid URL: {url}\n")
           continue


       try:
           response = requests.get(url)
           if response.status_code == 200:
               output_text.insert(tk.END, f"Found subdomain: {url}\n")
           else:
               output_text.insert(tk.END, f"No subdomain found at: {url} (Status code: {response.status_code})\n")
       except requests.ConnectionError:
           output_text.insert(tk.END, f"Connection error: Could not connect to {url}\n")
       except requests.exceptions.InvalidURL:
           output_text.insert(tk.END, f"Error: Invalid URL detected for {url}\n")
       except Exception as e:
           output_text.insert(tk.END, f"Unexpected error with {url}: {e}\n")


# Function to fingerprint web technologies (Alternative Approach)
def fingerprint_web_technologies(domain):
   try:
       url = f"http://{domain}"
       response = requests.get(url)


       # Check headers for web technologies
       server = response.headers.get('Server')
       x_powered_by = response.headers.get('X-Powered-By')
       set_cookie = response.headers.get('Set-Cookie')


       output_text.insert(tk.END, f"Server: {server}\n")
       output_text.insert(tk.END, f"X-Powered-By: {x_powered_by}\n")


       # Simple check for popular CMS by cookies
       if set_cookie:
           if "wp" in set_cookie.lower():
               output_text.insert(tk.END, "Detected CMS: WordPress\n")
           elif "drupal" in set_cookie.lower():
               output_text.insert(tk.END, "Detected CMS: Drupal\n")
           elif "joomla" in set_cookie.lower():
               output_text.insert(tk.END, "Detected CMS: Joomla\n")


       # Parsing HTML to detect common meta tags or scripts
       soup = BeautifulSoup(response.text, 'html.parser')
       if soup.find("meta", {"name": "generator"}):
           generator = soup.find("meta", {"name": "generator"})['content']
           output_text.insert(tk.END, f"Detected by meta generator tag: {generator}\n")


   except Exception as e:
       output_text.insert(tk.END, f"Error: Could not fingerprint technologies: {e}\n")


# Function to detect XSS vulnerability
def detect_xss(domain):
   xss_payload = "<script>alert('XSS')</script>"
   test_url = f"http://{domain}/?q={xss_payload}"
   try:
       response = requests.get(test_url)
       if xss_payload in response.text:
           output_text.insert(tk.END, f"Potential XSS vulnerability found at {test_url}\n")
       else:
           output_text.insert(tk.END, f"No XSS vulnerability detected at {test_url}\n")
   except Exception as e:
       output_text.insert(tk.END, f"Error testing XSS: {e}\n")


# Function to detect open redirect vulnerability
def detect_open_redirect(domain):
   open_redirect_payload = "http://evil.com"
   test_url = f"http://{domain}/?redirect={open_redirect_payload}"
   try:
       response = requests.get(test_url, allow_redirects=False)
       if response.status_code == 302 and response.headers.get('Location') == open_redirect_payload:
           output_text.insert(tk.END, f"Potential Open Redirect vulnerability found at {test_url}\n")
       else:
           output_text.insert(tk.END, f"No Open Redirect vulnerability detected at {test_url}\n")
   except Exception as e:
       output_text.insert(tk.END, f"Error testing Open Redirect: {e}\n")




# Function to perform OS fingerprinting and service detection
def os_fingerprinting(ip):
   try:
       scanner = nmap.PortScanner()
       scan_result = scanner.scan(ip, arguments='-O')
       os_fingerprint = scan_result['scan'][ip].get('osmatch', [])
       for os_info in os_fingerprint:
           output_text.insert(tk.END, f"Detected OS: {os_info['name']} - Accuracy: {os_info['accuracy']}%\n")
   except Exception as e:
       output_text.insert(tk.END, f"Error during OS fingerprinting: {e}\n")




# Main GUI setup
root = tk.Tk()
root.title("Reconnaissance and Vulnerability Scanner - The Pycodes")


# GUI Elements
domain_label = tk.Label(root, text="Enter the Domain:")
domain_label.pack()


domain_entry = tk.Entry(root, width=50)
domain_entry.pack()


subdomain_label = tk.Label(root, text="Select Subdomain Wordlist:")
subdomain_label.pack()


subdomain_button = tk.Button(root, text="Browse", command=browse_wordlist)
subdomain_button.pack()


scan_button = tk.Button(root, text="Start Scan", command=start_scan_thread)
scan_button.pack()


# Adding the scrolled text widget for output with a scrollbar
output_text = scrolledtext.ScrolledText(root, height=20, width=80, wrap=tk.WORD)
output_text.pack()


# Global variable to store file path for subdomain wordlist
wordlist_path = None


# Run the GUI loop
root.mainloop()

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
×