Ever come across a webpage or HTML file that you needed to save as a PDF? Whether it’s for sharing, archiving, or just making your content more accessible, converting HTML to PDF can be incredibly useful. In this tutorial, we’ll walk through how to build a simple but powerful tool in Python that converts HTML into PDFs. We’ll use Tkinter for an intuitive GUI and pdfkit for handling the conversion itself.
Today, you’ll learn how to set up pdfkit, link it with wkhtmltopdf, and build an interface that allows you to enter raw HTML, select files, or even paste URLs for direct conversion into PDF format. Let’s dive in!
Table of Contents
- Getting Started
- Fine-Tuning Our PDF Output
- Converting HTML to PDF
- File Selection Helper
- Starting Conversion in a New Thread
- Setting Up the Main Window
- Running the code
- Full Code
Getting Started
Installing wkhtmltopdf
To turn HTML content into a PDF, we need a powerful rendering engine, and that’s where wkhtmltopdf
comes in. It’s a tool that converts HTML files into PDFs by simulating how a browser would display the content. This makes it perfect for capturing everything from simple HTML pages to complex, JavaScript-heavy layouts.
We’ll install wkhtmltopdf
and link it with Python’s pdfkit
to handle the conversion smoothly and get our PDF output looking just like a web page. So, make sure to download the wkhtmltopdf
installer that suits your operating system. Remember the path where you install it, as you’ll need to include it in the code below.
As you can see, this is the path you need to add to the code.
To get started, let’s make sure we have all the tools we need. Open your command prompt or terminal, and run these commands to install the necessary packages:
$ pip install pdfkit
$ pip install validators
$ pip install tk
Imports
import tkinter as tk
from tkinter import filedialog, messagebox
import pdfkit
import threading
import validators
As usual, let’s kick things off by bringing out our toolbox of libraries:
- Tkinter: This is the heart and soul of our program’s visual side. We use
tk
to craft the interface,filedialog
to select files and choose where to save, andmessagebox
to provide users with helpful pop-up messages for feedback. - Pdfkit: The star of our program! It acts as the bridge that links HTML to PDF, using
wkhtmltopdf
(a tool for creating PDFs) behind the scenes to handle the conversion process. - Threading: Worried that the program might freeze during conversion? No need to!
Threading
allows us to run the process smoothly in the background. - Validators: This library ensures the URL is valid before proceeding, so we know we’re working with the right input every time.
Configuring the wkhtmltopdf Path
Now it’s time to bring our conversion powerhouse into the scene! Since wkhtmltopdf isn’t built into Python, we need to help pdfkit
locate it by specifying the path to its directory. This setup allows us to use wkhtmltopdf
as a sort of PDF factory, capable of reading HTML, CSS, and JavaScript to create beautifully rendered PDFs.
# Configure path to wkhtmltopdf
config = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')# This is my file path to wkhtmltopdf\bin\wkhtmltopdf.exe so be sure to put your own
Fine-Tuning Our PDF Output
First of all, in this options dictionary, we will customize the appearance of the PDF output. Here’s what we’ve done:
- We set the pages to A4 size to ensure our PDF looks great on any printer.
- We added margins on each side to keep the layout balanced and neat.
- We chose UTF-8 encoding to correctly display special characters, symbols, and various languages.
- We included a 5-second delay for deploying JavaScript elements, ensuring everything is in place during conversion.
- To prevent automatic resizing, we disabled smart shrinking.
All of these settings work together to ensure our PDF is well-structured and polished.
# Define custom options for better webpage capturing
options = {
'page-size': 'A4',
'margin-top': '0.75in',
'margin-right': '0.75in',
'margin-bottom': '0.75in',
'margin-left': '0.75in',
'encoding': "UTF-8",
'no-outline': None,
'javascript-delay': 5000, # Wait 5 seconds for JavaScript to load
'no-background': False,
'disable-smart-shrinking': True,
}
Converting HTML to PDF
At this point, we’re at the main act: the very heart of our program, the convert_to_pdf()
function. This is where the magic happens! First, it retrieves the user input from the text area using text_input.get()
and strips away any unnecessary whitespace or trailing spaces.
content = text_input.get("1.0", tk.END).strip()
Next, filedialog
steps in to ask the user where to save the PDF and what to name it. This is important because we want to make sure our users can easily find their files later on!
pdf_path = filedialog.asksaveasfilename(defaultextension=".pdf", filetypes=[("PDF files", "*.pdf")])
If the user doesn’t choose a file, we simply exit the function to avoid any issues.
if not pdf_path:
return # Exit if no file is chosen
Now, we kick off the conversion process. The function updates the loading label to let the user know that the conversion is in progress. We also make sure to refresh the GUI so the loading message is visible.
loading_label.config(text="Converting to PDF...")
root.update_idletasks() # Refresh GUI to show loading message
Then, we check what type of input the user selected—whether it’s an HTML string, an HTML file, or a URL. Depending on their choice, the function uses pdfkit
to handle the conversion accordingly.
if var.get() == "html_string":
pdfkit.from_string(content, pdf_path, configuration=config, options=options)
elif var.get() == "html_file":
pdfkit.from_file(content, pdf_path, configuration=config, options=options)
elif var.get() == "url":
if not validators.url(content):
raise ValueError("Invalid URL")
pdfkit.from_url(content, pdf_path, configuration=config, options=options)
However, if the user selects a URL, we need to make sure it’s valid using the validators
library. If it isn’t, we raise an error to let the user know something went wrong.
Finally, if everything goes smoothly and the PDF is created successfully, a message box pops up to celebrate this success!
messagebox.showinfo("Success", "PDF created successfully!")
But if anything goes wrong during the conversion, we catch the error and show a message box to provide feedback to the user.
except Exception as e:
messagebox.showerror("Error", f"An error occurred: {e}")
In the end, we clear the loading message so everything is tidy and ready for the next operation.
finally:
loading_label.config(text="") # Clear loading message
And that’s the rundown of the convert_to_pdf()
function! It ensures a smooth conversion experience for our users.
File Selection Helper
In the event that you choose the HTML File Path from the three input options when running the code, the select_file()
function is designed to save you the trouble of typing the path to the HTML file. Instead, it opens a file dialog that allows you to choose any HTML file. It also clears any pre-existing content from the output box before inserting the selected file path.
# Function to open file dialog for HTML file selection
def select_file():
file_path = filedialog.askopenfilename(filetypes=[("HTML files", "*.html"), ("All files", "*.*")])
if file_path:
text_input.delete("1.0", tk.END)
text_input.insert(tk.END, file_path)
Starting Conversion in a New Thread
To avoid freezing the main window, we use the start_conversion()
function, which starts a new thread to run the convert_to_pdf()
function in the background.
# Function to start PDF conversion in a separate thread
def start_conversion():
threading.Thread(target=convert_to_pdf).start()
Setting Up the Main Window
In this section, we build our interface using tk
, set its title, and define its geometry. Next, we create a frame that contains three radio buttons, each linked to the variable var
(html_string, html_file, url) so that the code can recognize the user’s choice and act accordingly. After that, we create another frame and label, where we add the input text widget for user input.
# Set up the main window
root = tk.Tk()
root.title("HTML to PDF Converter - The Pycodes")
root.geometry("500x450")
# Frame for radio buttons
input_type_frame = tk.LabelFrame(root, text="Choose Input Type", padx=10, pady=10)
input_type_frame.pack(padx=10, pady=5, fill="x")
var = tk.StringVar(value="html_string")
tk.Radiobutton(input_type_frame, text="HTML String", variable=var, value="html_string").pack(anchor="w", padx=10,
pady=2)
tk.Radiobutton(input_type_frame, text="HTML File Path", variable=var, value="html_file", command=select_file).pack(
anchor="w", padx=10, pady=2)
tk.Radiobutton(input_type_frame, text="URL", variable=var, value="url").pack(anchor="w", padx=10, pady=2)
# Text widget for HTML input
text_frame = tk.LabelFrame(root, text="HTML Content or Path/URL", padx=10, pady=10)
text_frame.pack(padx=10, pady=5, fill="both", expand=True)
text_input = tk.Text(text_frame, wrap="word", height=10)
text_input.pack(fill="both", expand=True)
Following this, we create the “Convert to PDF” button, which calls the start_conversion()
function when clicked. We also add a label to provide user feedback during the conversion process, with the text displayed in blue. Lastly, we start the main event loop with the mainloop()
method, which keeps the main window running and responsive to user interactions.
# Convert button and loading indicator
convert_button = tk.Button(root, text="Convert to PDF", command=start_conversion)
convert_button.pack(pady=10)
loading_label = tk.Label(root, text="", fg="blue")
loading_label.pack()
root.mainloop()
Running the Code
HTML String
This is the HTML I’ve entered in the input box :
tml>
<head><title>Sample PDF</title></head>
<body>
<h1>Hello, World!</h1>
<p>This is a sample PDF created from HTML content.</p>
</body>
</html>
As you can see, the result is here:
HTML File Path
I had this HTML saved on my PC, so I tried it :
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Sample HTML for PDF Conversion</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 20px;
background-color: #f5f5f5;
}
h1 {
color: #333;
text-align: center;
}
p {
line-height: 1.6;
color: #666;
}
.content {
max-width: 800px;
margin: 0 auto;
padding: 20px;
background-color: white;
border-radius: 8px;
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
}
</style>
</head>
<body>
<div class="content">
<h1>Welcome to PDF Conversion Testing</h1>
<p>This is a sample HTML file created to test the HTML-to-PDF conversion functionality.</p>
<p>It includes basic HTML elements such as headings, paragraphs, and some CSS styling for layout and colors.</p>
<h2>Features of this HTML File:</h2>
<ul>
<li>Responsive design with CSS</li>
<li>Formatted text and heading elements</li>
<li>A box shadow effect around the content</li>
</ul>
<p>Try using this file with your HTML-to-PDF converter to check how well it captures the layout and styles.</p>
</div>
</body>
</html>
And this is the result as you can see :
URL
For the URL, I tried Wikipedia, as you can see:
https://www.wikipedia.org/
And this was the result:
Full Code
import tkinter as tk
from tkinter import filedialog, messagebox
import pdfkit
import threading
import validators
# Configure path to wkhtmltopdf
config = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')# This is my file path to wkhtmltopdf\bin\wkhtmltopdf.exe so be sure to put your own
# Define custom options for better webpage capturing
options = {
'page-size': 'A4',
'margin-top': '0.75in',
'margin-right': '0.75in',
'margin-bottom': '0.75in',
'margin-left': '0.75in',
'encoding': "UTF-8",
'no-outline': None,
'javascript-delay': 5000, # Wait 5 seconds for JavaScript to load
'no-background': False,
'disable-smart-shrinking': True,
}
# Function to convert HTML to PDF
def convert_to_pdf():
content = text_input.get("1.0", tk.END).strip()
pdf_path = filedialog.asksaveasfilename(defaultextension=".pdf", filetypes=[("PDF files", "*.pdf")])
if not pdf_path:
return # Exit if no file is chosen
try:
loading_label.config(text="Converting to PDF...")
root.update_idletasks() # Refresh GUI to show loading message
# Check selected input type and convert
if var.get() == "html_string":
pdfkit.from_string(content, pdf_path, configuration=config, options=options)
elif var.get() == "html_file":
pdfkit.from_file(content, pdf_path, configuration=config, options=options)
elif var.get() == "url":
if not validators.url(content):
raise ValueError("Invalid URL")
pdfkit.from_url(content, pdf_path, configuration=config, options=options)
messagebox.showinfo("Success", "PDF created successfully!")
except Exception as e:
messagebox.showerror("Error", f"An error occurred: {e}")
finally:
loading_label.config(text="") # Clear loading message
# Function to open file dialog for HTML file selection
def select_file():
file_path = filedialog.askopenfilename(filetypes=[("HTML files", "*.html"), ("All files", "*.*")])
if file_path:
text_input.delete("1.0", tk.END)
text_input.insert(tk.END, file_path)
# Function to start PDF conversion in a separate thread
def start_conversion():
threading.Thread(target=convert_to_pdf).start()
# Set up the main window
root = tk.Tk()
root.title("HTML to PDF Converter - The Pycodes")
root.geometry("500x450")
# Frame for radio buttons
input_type_frame = tk.LabelFrame(root, text="Choose Input Type", padx=10, pady=10)
input_type_frame.pack(padx=10, pady=5, fill="x")
var = tk.StringVar(value="html_string")
tk.Radiobutton(input_type_frame, text="HTML String", variable=var, value="html_string").pack(anchor="w", padx=10,
pady=2)
tk.Radiobutton(input_type_frame, text="HTML File Path", variable=var, value="html_file", command=select_file).pack(
anchor="w", padx=10, pady=2)
tk.Radiobutton(input_type_frame, text="URL", variable=var, value="url").pack(anchor="w", padx=10, pady=2)
# Text widget for HTML input
text_frame = tk.LabelFrame(root, text="HTML Content or Path/URL", padx=10, pady=10)
text_frame.pack(padx=10, pady=5, fill="both", expand=True)
text_input = tk.Text(text_frame, wrap="word", height=10)
text_input.pack(fill="both", expand=True)
# Convert button and loading indicator
convert_button = tk.Button(root, text="Convert to PDF", command=start_conversion)
convert_button.pack(pady=10)
loading_label = tk.Label(root, text="", fg="blue")
loading_label.pack()
root.mainloop()
Happy Coding!