Home » Tutorials » How to Generate Fake User Data with Python

How to Generate Fake User Data with Python

Creating realistic test data is crucial for developing and testing applications, and it’s especially handy for ethical hacking. By using fake user data, developers and ethical hackers can simulate various scenarios, making sure their applications can handle different types of input smoothly. Whether you’re testing a new feature, conducting a security audit, or just need sample data for a database, generating fake user data can save you a lot of time and effort.

In today’s article, we’re going to create a graphical user interface using the tkinter library in Python that generates fake user data with the Faker library. You’ll learn how to create realistic user profiles, including detailed options, and save them easily in CSV or TXT files. This tutorial will show you how to generate fake user data in Python, providing options to specify the gender and save the data in your preferred format.

Let’s get started!

Table of Contents

Disclaimer

Please note: This tutorial is meant for educational and testing purposes only. Please use the fake user data responsibly and ethically. Avoid using it for any harmful or illegal activities. Let’s keep our coding practices ethical and beneficial for everyone!

Necessary Libraries

First, for the code to function properly you should install these libraries via the terminal or your command prompt by running these commands:

$ pip install tk
$ pip install faker

Imports

import os
from tkinter import Tk, Label, Entry, Button, StringVar, messagebox, IntVar, Checkbutton, filedialog, Text, OptionMenu, \
   Radiobutton
from faker import Faker
from faker.providers import internet
import csv

Before starting any journey, it’s important to equip yourself with the right tools. That’s what we’ll do now by importing the necessary modules and libraries:

  • os: This module allows us to interact with the operating system.
  • tkinter: This library enables us to create a graphical user interface (GUI) and various GUI elements such as buttons, labels, message boxes, widgets, and checkboxes.
  • Faker: This library is used to generate fake data, including names, addresses, emails, and even internet-related data such as IP addresses and URLs.
  • csv: This module handles CSV files, making it easy to read from and write to CSV files.

Creating Fake User Data

Now, this is the fun part where we get to create fake identities for users! You have the option to generate either standard user data or more detailed user data, depending on what you need. Let’s dive in and see what kind of interesting profiles we can come up with!

Standard User Data

To bring this to life, we crafted the create_standard_user() function. This function generates basic user information based on gender, like a random name using fake.name() and a fake email address with fake.email(), among other details.

# Function to create a standard set of fake user information
def create_standard_user(fake, gender):
   if gender == 'male':
       name = fake.name_male()
   elif gender == 'female':
       name = fake.name_female()
   else:
       name = fake.name()


   return {
       'Full Name': name,
       'Email Address': fake.email(),
       'Phone': fake.phone_number(),
       'Street Address': fake.address(),
       'City Name': fake.city(),
       'Country Name': fake.country(),
   }

Detailed User Data

However, for added realism, we created the create_detailed_user() function. This one generates more detailed information, which you might consider private, while respecting the chosen gender. It includes details such as social security number, credit card number, date of birth, and even phone number, all generated using the Faker library.

# Function to create a detailed set of fake user information
def create_detailed_user(fake, gender):
   if gender == 'male':
       name = fake.name_male()
   elif gender == 'female':
       name = fake.name_female()
   else:
       name = fake.name()


   return {
       'Full Name': name,
       'Email Address': fake.email(),
       'Phone': fake.phone_number(),
       'Birth Date': fake.date_of_birth(),
       'Street Address': fake.address(),
       'City Name': fake.city(),
       'Country Name': fake.country(),
       'Postal Code': fake.zipcode(),
       'Occupation': fake.job(),
       'Company Name': fake.company(),
       'Private IP': fake.ipv4_private(),
       'Card Number': fake.credit_card_number(),
       'User ID': fake.user_name(),
       'Website URL': fake.url(),
       'Social Security Number': fake.ssn()
   }

Generating User Data

# Function to generate user data
def generate_user_data(count, detailed=False, gender=None):
   fake = Faker()
   fake.add_provider(internet)
   user_list = []
   for _ in range(count):
       if detailed:
           user_list.append(create_detailed_user(fake, gender))
       else:
           user_list.append(create_standard_user(fake, gender))
   return user_list

We have seen how standard and detailed fake identities are created through their respective functions. Now, let’s look at how they are triggered. Spoiler alert: it’s all thanks to the generate_user_data() function:

  • This function starts by creating an instance of the Faker class to generate fake data, and then it adds the internet provider (fake.add_provider(internet)) to generate internet-related data.
  • Next, it creates an empty user list to store the fake identities.

With these preparations complete, the function runs a loop for the specified number of times (the count of fake user profiles we want to create). In each loop iteration, it checks if the detailed flag is set to True.

  • If it is, the function calls create_detailed_user() to create a detailed user profile and store it in the user list.
  • If the detailed flag is False, it calls create_standard_user() to create a standard profile and store it in the user list. The function also ensures that the gender is factored in when calling the two functions. Once the loop is over, it returns the user list.

Saving Data to Files

This part is where we get to decide how to save our generated fake identities. You can choose to save them in a CSV format or as a TXT document:

Saving to CSV

If you want to save those fake identities in CSV format, we’ve created the save_to_csv() function specifically for that. Here’s how it works:

  • First, it extracts the column names. Then, it checks if the CSV file already exists using os.path.exists(). If the file exists, a message box will appear, asking if you want to overwrite the existing data with the new fake identities. If you choose not to overwrite, the process stops there.
  • If the CSV file does not exist, the function creates one using csv.DictWriter(). It writes the column names with writer.writeheader() and then records the data with writer.writerows().
# Function to save data to CSV
def save_to_csv(data, filename):
   keys = data[0].keys()
   if os.path.exists(filename):
       overwrite = messagebox.askyesno("Overwrite?", f"{filename} already exists. Overwrite?")
       if not overwrite:
           return
   with open(filename, 'w', newline='') as file:
       writer = csv.DictWriter(file, fieldnames=keys)
       writer.writeheader()
       writer.writerows(data)
   messagebox.showinfo("Success", f"Data successfully saved to {filename}")

Saving to TXT

On the other hand, if you want to save your data in a text document, we’ve designed the save_to_text() function just for you. It works similarly to the previous function. It first checks if the TXT file already exists. If it does, it asks if you want to overwrite the existing data. If you choose not to overwrite, the process stops there. If the TXT file does not exist, the function creates a new one and fills it with the data in a loop.

# Function to save data to text file
def save_to_text(data, filename):
   if os.path.exists(filename):
       overwrite = messagebox.askyesno("Overwrite?", f"{filename} already exists. Overwrite?")
       if not overwrite:
           return
   with open(filename, 'w') as file:
       for entry in data:
           for key, value in entry.items():
               file.write(f"{key}: {value}\n")
           file.write("\n")
   messagebox.showinfo("Success", f"Data successfully saved to {filename}")

Displaying the Data

Let’s get ready to display our data in a brand-new window! First, we create the window using the display_data() function. We set a title and define its size. Then, we create the text_widget that will showcase our results.

Finally, we insert the generated data into this widget for a clear and organized display.

# Function to display data on the console
def display_data(data):
   result_window = Tk()
   result_window.title("Generated Data - The Pycodes")
   result_window.geometry("600x400")
   text_widget = Text(result_window)
   text_widget.pack(expand=True, fill='both')
   for entry in data:
       for key, value in entry.items():
           text_widget.insert('end', f"{key}: {value}\n")
       text_widget.insert('end', "\n")
   result_window.mainloop()

Handling User Input and Generating Data

Now that we have created all these functions with their specific purposes, we need a new function to manage and handle them. This is where the handle_generate() function comes in. You might be wondering how it works. Well, here’s the scoop:

  • First, it takes the user input for the number of fake identities they want to generate, ensuring it’s a positive integer. Then, it uses detailed_var.get() to check if detailed information is needed and gender_var.get() to retrieve the selected gender. Next, it calls the generate_user_data() function to generate the specified number of fake identities.
  • After that, it checks whether the user wants to save these identities to a file. If the answer is yes, it prompts the user to choose the file format (CSV, TXT, or both) and uses the save_to_csv() and save_to_text() functions accordingly. If the user opts not to save the identities, the new window created earlier with the display_data() function will open and display the generated identities.
# Function to handle generation and saving of user data
def handle_generate():
   try:
       count = int(user_count_var.get())
       if count <= 0:
           raise ValueError("Number of users must be a positive integer.")
   except ValueError as e:
       messagebox.showerror("Error", str(e))
       return


   detailed = detailed_var.get()
   gender = gender_var.get()
   users = generate_user_data(count, detailed, gender)


   if save_var.get():
       file_type = file_type_var.get()
       if file_type in ['csv', 'csv and txt']:
           csv_filename = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
           if csv_filename:
               save_to_csv(users, csv_filename)


       if file_type in ['txt', 'csv and txt']:
           txt_filename = filedialog.asksaveasfilename(defaultextension=".txt", filetypes=[("Text files", "*.txt")])
           if txt_filename:
               save_to_text(users, txt_filename)


       if file_type not in ['csv', 'txt', 'csv and txt']:
           messagebox.showerror("Error", "Invalid file type. Data not saved.")
   else:
       display_data(users)

Setting Up the Main Window

This is the part where we create the main window, set its title, and define its size. We then create a label to describe the entry field, followed by the actual entry field itself. Next, we add checkboxes for the user to choose whether they want detailed information and if they want to save the data to a file. We also include radio buttons so the user can select the desired gender.

Following that, we create a drop-down menu for the user to choose the file type for saving the data. Then, we create the “Generate” button that triggers the handle_generate() function.

Finally, we start the main event loop to ensure that the main window keeps running and is responsive to the user, thanks to root.mainloop().

# Set up the Tkinter window
root = Tk()
root.title("User Data Generator - The Pycodes")
root.geometry("400x450")


# User count entry
Label(root, text="Number of users:").pack(pady=5)
user_count_var = StringVar()
Entry(root, textvariable=user_count_var).pack(pady=5)


# Detailed information checkbox
detailed_var = IntVar()
Checkbutton(root, text="Generate detailed information", variable=detailed_var).pack(pady=5)


# Gender selection radio buttons
gender_var = StringVar(value="both")
Label(root, text="Gender:").pack(pady=5)
Radiobutton(root, text="Both", variable=gender_var, value="both").pack(pady=5)
Radiobutton(root, text="Male", variable=gender_var, value="male").pack(pady=5)
Radiobutton(root, text="Female", variable=gender_var, value="female").pack(pady=5)


# Save to file checkbox
save_var = IntVar()
Checkbutton(root, text="Save to file", variable=save_var).pack(pady=5)


# File type selection using drop-down menu
Label(root, text="File type (if saving):").pack(pady=5)
file_type_var = StringVar(value="csv")
file_type_menu = OptionMenu(root, file_type_var, "csv", "txt", "csv and txt")
file_type_menu.pack(pady=5)


# Generate button
Button(root, text="Generate", command=handle_generate).pack(pady=20)


# Start the Tkinter event loop
root.mainloop()

Example

I ran this code on a Windows system and generated data for three female users, as shown in the image below:

Next, I ran the same example but this time for male users:

Lastly, I saved the file as CSV and TXT:

For Linux users, I also ran this script and it worked just fine as shown in the image below:

Full Code

import os
from tkinter import Tk, Label, Entry, Button, StringVar, messagebox, IntVar, Checkbutton, filedialog, Text, OptionMenu, \
   Radiobutton
from faker import Faker
from faker.providers import internet
import csv




# Function to create a standard set of fake user information
def create_standard_user(fake, gender):
   if gender == 'male':
       name = fake.name_male()
   elif gender == 'female':
       name = fake.name_female()
   else:
       name = fake.name()


   return {
       'Full Name': name,
       'Email Address': fake.email(),
       'Phone': fake.phone_number(),
       'Street Address': fake.address(),
       'City Name': fake.city(),
       'Country Name': fake.country(),
   }




# Function to create a detailed set of fake user information
def create_detailed_user(fake, gender):
   if gender == 'male':
       name = fake.name_male()
   elif gender == 'female':
       name = fake.name_female()
   else:
       name = fake.name()


   return {
       'Full Name': name,
       'Email Address': fake.email(),
       'Phone': fake.phone_number(),
       'Birth Date': fake.date_of_birth(),
       'Street Address': fake.address(),
       'City Name': fake.city(),
       'Country Name': fake.country(),
       'Postal Code': fake.zipcode(),
       'Occupation': fake.job(),
       'Company Name': fake.company(),
       'Private IP': fake.ipv4_private(),
       'Card Number': fake.credit_card_number(),
       'User ID': fake.user_name(),
       'Website URL': fake.url(),
       'Social Security Number': fake.ssn()
   }




# Function to generate user data
def generate_user_data(count, detailed=False, gender=None):
   fake = Faker()
   fake.add_provider(internet)
   user_list = []
   for _ in range(count):
       if detailed:
           user_list.append(create_detailed_user(fake, gender))
       else:
           user_list.append(create_standard_user(fake, gender))
   return user_list




# Function to save data to CSV
def save_to_csv(data, filename):
   keys = data[0].keys()
   if os.path.exists(filename):
       overwrite = messagebox.askyesno("Overwrite?", f"{filename} already exists. Overwrite?")
       if not overwrite:
           return
   with open(filename, 'w', newline='') as file:
       writer = csv.DictWriter(file, fieldnames=keys)
       writer.writeheader()
       writer.writerows(data)
   messagebox.showinfo("Success", f"Data successfully saved to {filename}")




# Function to save data to text file
def save_to_text(data, filename):
   if os.path.exists(filename):
       overwrite = messagebox.askyesno("Overwrite?", f"{filename} already exists. Overwrite?")
       if not overwrite:
           return
   with open(filename, 'w') as file:
       for entry in data:
           for key, value in entry.items():
               file.write(f"{key}: {value}\n")
           file.write("\n")
   messagebox.showinfo("Success", f"Data successfully saved to {filename}")




# Function to display data on the console
def display_data(data):
   result_window = Tk()
   result_window.title("Generated Data - The Pycodes")
   result_window.geometry("600x400")
   text_widget = Text(result_window)
   text_widget.pack(expand=True, fill='both')
   for entry in data:
       for key, value in entry.items():
           text_widget.insert('end', f"{key}: {value}\n")
       text_widget.insert('end', "\n")
   result_window.mainloop()




# Function to handle generation and saving of user data
def handle_generate():
   try:
       count = int(user_count_var.get())
       if count <= 0:
           raise ValueError("Number of users must be a positive integer.")
   except ValueError as e:
       messagebox.showerror("Error", str(e))
       return


   detailed = detailed_var.get()
   gender = gender_var.get()
   users = generate_user_data(count, detailed, gender)


   if save_var.get():
       file_type = file_type_var.get()
       if file_type in ['csv', 'csv and txt']:
           csv_filename = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
           if csv_filename:
               save_to_csv(users, csv_filename)


       if file_type in ['txt', 'csv and txt']:
           txt_filename = filedialog.asksaveasfilename(defaultextension=".txt", filetypes=[("Text files", "*.txt")])
           if txt_filename:
               save_to_text(users, txt_filename)


       if file_type not in ['csv', 'txt', 'csv and txt']:
           messagebox.showerror("Error", "Invalid file type. Data not saved.")
   else:
       display_data(users)




# Set up the Tkinter window
root = Tk()
root.title("User Data Generator - The Pycodes")
root.geometry("400x450")


# User count entry
Label(root, text="Number of users:").pack(pady=5)
user_count_var = StringVar()
Entry(root, textvariable=user_count_var).pack(pady=5)


# Detailed information checkbox
detailed_var = IntVar()
Checkbutton(root, text="Generate detailed information", variable=detailed_var).pack(pady=5)


# Gender selection radio buttons
gender_var = StringVar(value="both")
Label(root, text="Gender:").pack(pady=5)
Radiobutton(root, text="Both", variable=gender_var, value="both").pack(pady=5)
Radiobutton(root, text="Male", variable=gender_var, value="male").pack(pady=5)
Radiobutton(root, text="Female", variable=gender_var, value="female").pack(pady=5)


# Save to file checkbox
save_var = IntVar()
Checkbutton(root, text="Save to file", variable=save_var).pack(pady=5)


# File type selection using drop-down menu
Label(root, text="File type (if saving):").pack(pady=5)
file_type_var = StringVar(value="csv")
file_type_menu = OptionMenu(root, file_type_var, "csv", "txt", "csv and txt")
file_type_menu.pack(pady=5)


# Generate button
Button(root, text="Generate", command=handle_generate).pack(pady=20)


# Start the Tkinter event loop
root.mainloop()

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top