Home » General Python Tutorials » How to Convert HTML to PDF in Python Using Tkinter and pdfkit

How to Convert HTML to PDF in Python Using Tkinter and pdfkit

Ever come across a webpage or HTML file that you needed to save as a PDF? Whether it’s for sharing, archiving, or just making your content more accessible, converting HTML to PDF can be incredibly useful. In this tutorial, we’ll walk through how to build a simple but powerful tool in Python that converts HTML into PDFs. We’ll use Tkinter for an intuitive GUI and pdfkit for handling the conversion itself.

Today, you’ll learn how to set up pdfkit, link it with wkhtmltopdf, and build an interface that allows you to enter raw HTML, select files, or even paste URLs for direct conversion into PDF format. Let’s dive in!

Table of Contents

Getting Started

Installing wkhtmltopdf

To turn HTML content into a PDF, we need a powerful rendering engine, and that’s where wkhtmltopdf comes in. It’s a tool that converts HTML files into PDFs by simulating how a browser would display the content. This makes it perfect for capturing everything from simple HTML pages to complex, JavaScript-heavy layouts.

We’ll install wkhtmltopdf and link it with Python’s pdfkit to handle the conversion smoothly and get our PDF output looking just like a web page. So, make sure to download the wkhtmltopdf installer that suits your operating system. Remember the path where you install it, as you’ll need to include it in the code below.

As you can see, this is the path you need to add to the code.

To get started, let’s make sure we have all the tools we need. Open your command prompt or terminal, and run these commands to install the necessary packages:

$ pip install pdfkit
$ pip install validators
$ pip install tk

Imports

import tkinter as tk
from tkinter import filedialog, messagebox
import pdfkit
import threading
import validators  

As usual, let’s kick things off by bringing out our toolbox of libraries:

  • Tkinter: This is the heart and soul of our program’s visual side. We use tk to craft the interface, filedialog to select files and choose where to save, and messagebox to provide users with helpful pop-up messages for feedback.
  • Pdfkit: The star of our program! It acts as the bridge that links HTML to PDF, using wkhtmltopdf (a tool for creating PDFs) behind the scenes to handle the conversion process.
  • Threading: Worried that the program might freeze during conversion? No need to! Threading allows us to run the process smoothly in the background.
  • Validators: This library ensures the URL is valid before proceeding, so we know we’re working with the right input every time.

Configuring the wkhtmltopdf Path

Now it’s time to bring our conversion powerhouse into the scene! Since wkhtmltopdf isn’t built into Python, we need to help pdfkit locate it by specifying the path to its directory. This setup allows us to use wkhtmltopdf as a sort of PDF factory, capable of reading HTML, CSS, and JavaScript to create beautifully rendered PDFs.

# Configure path to wkhtmltopdf
config = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')# This is my file path to wkhtmltopdf\bin\wkhtmltopdf.exe so be sure to put your own 

Fine-Tuning Our PDF Output

First of all, in this options dictionary, we will customize the appearance of the PDF output. Here’s what we’ve done:

  • We set the pages to A4 size to ensure our PDF looks great on any printer.
  • We added margins on each side to keep the layout balanced and neat.
  • We chose UTF-8 encoding to correctly display special characters, symbols, and various languages.
  • We included a 5-second delay for deploying JavaScript elements, ensuring everything is in place during conversion.
  • To prevent automatic resizing, we disabled smart shrinking.

All of these settings work together to ensure our PDF is well-structured and polished.

# Define custom options for better webpage capturing
options = {
   'page-size': 'A4',
   'margin-top': '0.75in',
   'margin-right': '0.75in',
   'margin-bottom': '0.75in',
   'margin-left': '0.75in',
   'encoding': "UTF-8",
   'no-outline': None,
   'javascript-delay': 5000,  # Wait 5 seconds for JavaScript to load
   'no-background': False,
   'disable-smart-shrinking': True,
}

Converting HTML to PDF

At this point, we’re at the main act: the very heart of our program, the convert_to_pdf() function. This is where the magic happens! First, it retrieves the user input from the text area using text_input.get() and strips away any unnecessary whitespace or trailing spaces.

content = text_input.get("1.0", tk.END).strip()

Next, filedialog steps in to ask the user where to save the PDF and what to name it. This is important because we want to make sure our users can easily find their files later on!

pdf_path = filedialog.asksaveasfilename(defaultextension=".pdf", filetypes=[("PDF files", "*.pdf")])

If the user doesn’t choose a file, we simply exit the function to avoid any issues.

if not pdf_path:
    return  # Exit if no file is chosen

Now, we kick off the conversion process. The function updates the loading label to let the user know that the conversion is in progress. We also make sure to refresh the GUI so the loading message is visible.

loading_label.config(text="Converting to PDF...")
root.update_idletasks()  # Refresh GUI to show loading message

Then, we check what type of input the user selected—whether it’s an HTML string, an HTML file, or a URL. Depending on their choice, the function uses pdfkit to handle the conversion accordingly.

if var.get() == "html_string":
    pdfkit.from_string(content, pdf_path, configuration=config, options=options)
elif var.get() == "html_file":
    pdfkit.from_file(content, pdf_path, configuration=config, options=options)
elif var.get() == "url":
    if not validators.url(content):
        raise ValueError("Invalid URL")
    pdfkit.from_url(content, pdf_path, configuration=config, options=options)

However, if the user selects a URL, we need to make sure it’s valid using the validators library. If it isn’t, we raise an error to let the user know something went wrong.

Finally, if everything goes smoothly and the PDF is created successfully, a message box pops up to celebrate this success!

messagebox.showinfo("Success", "PDF created successfully!")

But if anything goes wrong during the conversion, we catch the error and show a message box to provide feedback to the user.

except Exception as e:
    messagebox.showerror("Error", f"An error occurred: {e}")

In the end, we clear the loading message so everything is tidy and ready for the next operation.

finally:
    loading_label.config(text="")  # Clear loading message

And that’s the rundown of the convert_to_pdf() function! It ensures a smooth conversion experience for our users.

File Selection Helper

In the event that you choose the HTML File Path from the three input options when running the code, the select_file() function is designed to save you the trouble of typing the path to the HTML file. Instead, it opens a file dialog that allows you to choose any HTML file. It also clears any pre-existing content from the output box before inserting the selected file path.

# Function to open file dialog for HTML file selection
def select_file():
   file_path = filedialog.askopenfilename(filetypes=[("HTML files", "*.html"), ("All files", "*.*")])
   if file_path:
       text_input.delete("1.0", tk.END)
       text_input.insert(tk.END, file_path)

Starting Conversion in a New Thread

To avoid freezing the main window, we use the start_conversion() function, which starts a new thread to run the convert_to_pdf() function in the background.

# Function to start PDF conversion in a separate thread
def start_conversion():
   threading.Thread(target=convert_to_pdf).start()

Setting Up the Main Window

In this section, we build our interface using tk, set its title, and define its geometry. Next, we create a frame that contains three radio buttons, each linked to the variable var (html_string, html_file, url) so that the code can recognize the user’s choice and act accordingly. After that, we create another frame and label, where we add the input text widget for user input.

# Set up the main window
root = tk.Tk()
root.title("HTML to PDF Converter - The Pycodes")
root.geometry("500x450")


# Frame for radio buttons
input_type_frame = tk.LabelFrame(root, text="Choose Input Type", padx=10, pady=10)
input_type_frame.pack(padx=10, pady=5, fill="x")


var = tk.StringVar(value="html_string")
tk.Radiobutton(input_type_frame, text="HTML String", variable=var, value="html_string").pack(anchor="w", padx=10,
                                                                                            pady=2)
tk.Radiobutton(input_type_frame, text="HTML File Path", variable=var, value="html_file", command=select_file).pack(
   anchor="w", padx=10, pady=2)
tk.Radiobutton(input_type_frame, text="URL", variable=var, value="url").pack(anchor="w", padx=10, pady=2)


# Text widget for HTML input
text_frame = tk.LabelFrame(root, text="HTML Content or Path/URL", padx=10, pady=10)
text_frame.pack(padx=10, pady=5, fill="both", expand=True)
text_input = tk.Text(text_frame, wrap="word", height=10)
text_input.pack(fill="both", expand=True)

Following this, we create the “Convert to PDF” button, which calls the start_conversion() function when clicked. We also add a label to provide user feedback during the conversion process, with the text displayed in blue. Lastly, we start the main event loop with the mainloop() method, which keeps the main window running and responsive to user interactions.

# Convert button and loading indicator
convert_button = tk.Button(root, text="Convert to PDF", command=start_conversion)
convert_button.pack(pady=10)
loading_label = tk.Label(root, text="", fg="blue")
loading_label.pack()

root.mainloop()

Running the Code

HTML String

This is the HTML I’ve entered in the input box :

tml>
<head><title>Sample PDF</title></head>
<body>
   <h1>Hello, World!</h1>
   <p>This is a sample PDF created from HTML content.</p>
</body>
</html>

As you can see, the result is here:

HTML File Path

I had this HTML saved on my PC, so I tried it :

<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="UTF-8">
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   <title>Sample HTML for PDF Conversion</title>
   <style>
       body {
           font-family: Arial, sans-serif;
           margin: 20px;
           background-color: #f5f5f5;
       }
       h1 {
           color: #333;
           text-align: center;
       }
       p {
           line-height: 1.6;
           color: #666;
       }
       .content {
           max-width: 800px;
           margin: 0 auto;
           padding: 20px;
           background-color: white;
           border-radius: 8px;
           box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
       }
   </style>
</head>
<body>
   <div class="content">
       <h1>Welcome to PDF Conversion Testing</h1>
       <p>This is a sample HTML file created to test the HTML-to-PDF conversion functionality.</p>
       <p>It includes basic HTML elements such as headings, paragraphs, and some CSS styling for layout and colors.</p>
       <h2>Features of this HTML File:</h2>
       <ul>
           <li>Responsive design with CSS</li>
           <li>Formatted text and heading elements</li>
           <li>A box shadow effect around the content</li>
       </ul>
       <p>Try using this file with your HTML-to-PDF converter to check how well it captures the layout and styles.</p>
   </div>
</body>
</html>

And this is the result as you can see :

URL

For the URL, I tried Wikipedia, as you can see:
https://www.wikipedia.org/
And this was the result:

Full Code

import tkinter as tk
from tkinter import filedialog, messagebox
import pdfkit
import threading
import validators  


# Configure path to wkhtmltopdf
config = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')# This is my file path to wkhtmltopdf\bin\wkhtmltopdf.exe so be sure to put your own 


# Define custom options for better webpage capturing
options = {
   'page-size': 'A4',
   'margin-top': '0.75in',
   'margin-right': '0.75in',
   'margin-bottom': '0.75in',
   'margin-left': '0.75in',
   'encoding': "UTF-8",
   'no-outline': None,
   'javascript-delay': 5000,  # Wait 5 seconds for JavaScript to load
   'no-background': False,
   'disable-smart-shrinking': True,
}




# Function to convert HTML to PDF
def convert_to_pdf():
   content = text_input.get("1.0", tk.END).strip()
   pdf_path = filedialog.asksaveasfilename(defaultextension=".pdf", filetypes=[("PDF files", "*.pdf")])


   if not pdf_path:
       return  # Exit if no file is chosen


   try:
       loading_label.config(text="Converting to PDF...")
       root.update_idletasks()  # Refresh GUI to show loading message


       # Check selected input type and convert
       if var.get() == "html_string":
           pdfkit.from_string(content, pdf_path, configuration=config, options=options)
       elif var.get() == "html_file":
           pdfkit.from_file(content, pdf_path, configuration=config, options=options)
       elif var.get() == "url":
           if not validators.url(content):
               raise ValueError("Invalid URL")
           pdfkit.from_url(content, pdf_path, configuration=config, options=options)


       messagebox.showinfo("Success", "PDF created successfully!")
   except Exception as e:
       messagebox.showerror("Error", f"An error occurred: {e}")
   finally:
       loading_label.config(text="")  # Clear loading message




# Function to open file dialog for HTML file selection
def select_file():
   file_path = filedialog.askopenfilename(filetypes=[("HTML files", "*.html"), ("All files", "*.*")])
   if file_path:
       text_input.delete("1.0", tk.END)
       text_input.insert(tk.END, file_path)




# Function to start PDF conversion in a separate thread
def start_conversion():
   threading.Thread(target=convert_to_pdf).start()




# Set up the main window
root = tk.Tk()
root.title("HTML to PDF Converter - The Pycodes")
root.geometry("500x450")


# Frame for radio buttons
input_type_frame = tk.LabelFrame(root, text="Choose Input Type", padx=10, pady=10)
input_type_frame.pack(padx=10, pady=5, fill="x")


var = tk.StringVar(value="html_string")
tk.Radiobutton(input_type_frame, text="HTML String", variable=var, value="html_string").pack(anchor="w", padx=10,
                                                                                            pady=2)
tk.Radiobutton(input_type_frame, text="HTML File Path", variable=var, value="html_file", command=select_file).pack(
   anchor="w", padx=10, pady=2)
tk.Radiobutton(input_type_frame, text="URL", variable=var, value="url").pack(anchor="w", padx=10, pady=2)


# Text widget for HTML input
text_frame = tk.LabelFrame(root, text="HTML Content or Path/URL", padx=10, pady=10)
text_frame.pack(padx=10, pady=5, fill="both", expand=True)
text_input = tk.Text(text_frame, wrap="word", height=10)
text_input.pack(fill="both", expand=True)


# Convert button and loading indicator
convert_button = tk.Button(root, text="Convert to PDF", command=start_conversion)
convert_button.pack(pady=10)
loading_label = tk.Label(root, text="", fg="blue")
loading_label.pack()


root.mainloop()

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top