Home » Tutorials » How to Convert PDF to Docx in Python

How to Convert PDF to Docx in Python

In today’s fast-paced digital landscape, who hasn’t faced the hassle of trying to edit a PDF document? It’s pretty common to need a file in a more flexible format like DOCX, which is easier to tweak and update. Whether you’re collaborating on a project, updating your resume, or preparing a presentation, being able to convert PDFs to DOCX can save you a ton of time and trouble.

So, let’s get right into it! Today, I’ll show you how to convert PDF to Docx in Python. We’ll create a graphical user interface (GUI) that makes it super easy to pick a PDF file, transform it into a DOCX file, and save it wherever you want on your computer. Not only will this skill make your life easier, but it’ll also add a cool project to your coding portfolio.

Let’s get started!

Table of Contents

Necessary Libraries

Make sure to install these libraries for the code to function properly, you can use the terminal or your command prompt and run these commands:

$ pip install tk 
$ pip install pdf2docx 
$ pip install PyPDF2 
$ pip install python-docx

Imports

We aim to make our program user-friendly. First, we create a graphical user interface by importing the tkinter library. This library includes filedialog for selecting directories, messagebox for displaying message boxes, and simpledialog for entering passwords when dealing with encrypted PDF files.

In the next step, we import Converter from pdf2docx to convert PDF files into DOCX format. We also use the os module to interact with the operating system. Additionally, we work with Word documents by importing docx, and manage PDF files by importing PyPDF2.

import tkinter as tk
from tkinter import filedialog, messagebox, simpledialog
from pdf2docx import Converter
import os
from docx import Document
import PyPDF2

Functions for Converting PDF to DOCX

Now, let’s define our functions – the heartbeat of our code:

create_widgets Function

We start by defining our function with four parameters:

  • root: which is the main window where we will place the widgets.
  • input_file: the selected PDF file.
  • output_file: the resulting DOCX file.
  • status_label: which displays the progress of the conversion.

After defining our parameters, we proceed to create a button labeled “Select PDF File” that triggers the load_input_file() function when clicked. This button allows users to select the PDF they wish to convert. Next, we create another button labeled “Select Output DOCX File” that, when clicked, triggers the select_output_file() function. This button enables users to specify where to save the converted DOCX file.

Finally, we add a button labeled “Convert” to initiate the convert_pdf2docx() function. This one starts the process of converting the selected PDF into a DOCX file.

def create_widgets(root, input_file, output_file, status_label):
   # Button to select the input PDF file
   input_button = tk.Button(root, text="Select PDF File", command=lambda: load_input_file(input_file, status_label))
   input_button.pack(pady=20)


   # Button to select the output DOCX file location
   output_button = tk.Button(root, text="Select Output DOCX File", command=lambda: select_output_file(output_file, status_label))
   output_button.pack(pady=20)


   # Button to start the conversion process
   convert_button = tk.Button(root, text="Convert", command=lambda: convert_pdf2docx(input_file, output_file, status_label))
   convert_button.pack(pady=20)


   # Label to display the status of the conversion
   status_label.pack(pady=10)

load_input_file Function

This function, unlike the previous one, takes only two parameters: input_file and status_label. Once triggered, it opens a directory dialog using filedialog.askopenfilename(), which is specifically configured to filter for PDF files using the filetypes option.

After the user selects a file, its path is stored in input_file, and then the status_label widget displays this path.

def load_input_file(input_file, status_label):
   input_file['path'] = filedialog.askopenfilename(filetypes=[("PDF files", "*.pdf")])
   if input_file['path']:
       status_label.config(text=f"Selected PDF: {input_file['path']}")

select_output_file Function

Now, this function here takes two parameters: the output_file, which will store the path of our converted DOCX file, and the status_label. Once this function is triggered, it opens a file dialog where the user can name the converted DOCX file and select where to store it. It ensures that the file will have the .docx extension by default, thanks to defaultextension = ".docx".

Additionally, the file dialog is restricted to DOCX files, as specified by the filetypes option. Similarly to the previous function, once the user selects the directory path of the DOCX document, it is stored in output_file and then displayed by the status_label widget.

def select_output_file(output_file, status_label):
   output_file['path'] = filedialog.asksaveasfilename(defaultextension=".docx", filetypes=[("DOCX files", "*.docx")])
   if output_file['path']:
       status_label.config(text=f"Output will be saved as: {output_file['path']}")

convert_pdf2docx Function

Since this function converts our selected PDF to DOCX, it takes three parameters: input_file, which stores the path to the selected PDF; output_file, which contains the path where we want our DOCX file saved; and status_label, which displays the conversion progress. Let’s discuss how the function works:

  • First, the function verifies that the user has provided both the input and output file paths. Once confirmed, it opens the PDF file from the input_file path. It then checks if the PDF is encrypted. If not encrypted, it proceeds with the conversion steps we’ll discuss shortly. However, if the PDF is encrypted, a message box appears indicating the encryption, followed by a simpledialog box prompting the user to enter the password. If no password is provided, an exception is thrown.
  • Assuming the password is inputted (if the PDF is encrypted) and the file is open, the function initializes a converter object from the pdf2docx library to convert our selected file. The DOCX file is then saved to the path specified in output_file. After the conversion, the converter object closes.
  • The function then checks if the DOCX file exists and whether it is empty. Once it confirms everything is OK, it updates the status_label widget to display ‘Conversion Completed Successfully’. A message box then pops up to indicate the successful conversion. If the conversion fails, an error message appears. Finally, the function triggers the check_docx_file() function to perform a final validation.
def convert_pdf2docx(input_file, output_file, status_label):
   if input_file['path'] and output_file['path']:
       try:
           with open(input_file['path'], 'rb') as pdf_file:
               pdf_reader = PyPDF2.PdfReader(pdf_file)
               if pdf_reader.is_encrypted:
                   messagebox.showinfo("Info", "The selected PDF file is encrypted. Please enter the password.")
                   password = simpledialog.askstring("Password", "Enter Password:", show='*')
                   if password:
                       pdf_reader.decrypt_password(password)
                   else:
                       raise Exception("Password not provided.")


           cv = Converter(input_file['path'])
           cv.convert(output_file['path'], start=0, end=None)
           cv.close()


           if os.path.exists(output_file['path']) and os.path.getsize(output_file['path']) > 0:
               status_label.config(text="Conversion Completed Successfully")
               messagebox.showinfo("Success", "PDF successfully converted to DOCX!")
               check_docx_file(output_file['path'])
           else:
               raise Exception("The file appears to be empty or missing.")
       except Exception as e:
           messagebox.showerror("Error", f"Failed to convert PDF: {e}")
           status_label.config(text="Conversion Failed")

check_docx_file Function

The objective of this one is to verify the functionality of the output DOCX file. It does this by attempting to open the DOCX file using the Document class from the docx library and then counts the paragraphs in the document. If the file opens successfully and the paragraphs can be counted, a message will be displayed indicating success along with the number of paragraphs found. If the file cannot be opened or read, an error message will be displayed.

def check_docx_file(path):
   try:
       doc = Document(path)
       messagebox.showinfo("File Check", f"Successfully opened the DOCX file. It contains {len(doc.paragraphs)} paragraphs.")
   except Exception as e:
       messagebox.showerror("File Check Error", f"Failed to open the DOCX file: {e}")

Main Function

This function is responsible for creating the main window, setting its title to “PDF to DOCX Converter – The Pycodes“, and defining its geometry to 400×200 pixels. It initializes the input_file and output_file dictionaries with keys ‘path’ set to None to store the paths for the PDF and DOCX files, respectively.

Additionally, the function creates a status_label widget with empty text in green to display progress. Finally, it calls the create_widgets() function to create the GUI elements within the main window.

def main():
   root = tk.Tk()
   root.title("PDF to DOCX Converter - The Pycodes")
   root.geometry("400x200")
   input_file = {'path': None}
   output_file = {'path': None}
   status_label = tk.Label(root, text="", fg="green")
   create_widgets(root, input_file, output_file, status_label)
   root.mainloop()

Main Loop

Lastly, the function enters the main loop, which ensures that the main window remains responsive to user interactions until it is closed.

if __name__ == "__main__":
   main()

Example

Full Code

import tkinter as tk
from tkinter import filedialog, messagebox, simpledialog
from pdf2docx import Converter
import os
from docx import Document
import PyPDF2


def create_widgets(root, input_file, output_file, status_label):
   # Button to select the input PDF file
   input_button = tk.Button(root, text="Select PDF File", command=lambda: load_input_file(input_file, status_label))
   input_button.pack(pady=20)


   # Button to select the output DOCX file location
   output_button = tk.Button(root, text="Select Output DOCX File", command=lambda: select_output_file(output_file, status_label))
   output_button.pack(pady=20)


   # Button to start the conversion process
   convert_button = tk.Button(root, text="Convert", command=lambda: convert_pdf2docx(input_file, output_file, status_label))
   convert_button.pack(pady=20)


   # Label to display the status of the conversion
   status_label.pack(pady=10)


def load_input_file(input_file, status_label):
   input_file['path'] = filedialog.askopenfilename(filetypes=[("PDF files", "*.pdf")])
   if input_file['path']:
       status_label.config(text=f"Selected PDF: {input_file['path']}")


def select_output_file(output_file, status_label):
   output_file['path'] = filedialog.asksaveasfilename(defaultextension=".docx", filetypes=[("DOCX files", "*.docx")])
   if output_file['path']:
       status_label.config(text=f"Output will be saved as: {output_file['path']}")


def convert_pdf2docx(input_file, output_file, status_label):
   if input_file['path'] and output_file['path']:
       try:
           with open(input_file['path'], 'rb') as pdf_file:
               pdf_reader = PyPDF2.PdfReader(pdf_file)
               if pdf_reader.is_encrypted:
                   messagebox.showinfo("Info", "The selected PDF file is encrypted. Please enter the password.")
                   password = simpledialog.askstring("Password", "Enter Password:", show='*')
                   if password:
                       pdf_reader.decrypt_password(password)
                   else:
                       raise Exception("Password not provided.")


           cv = Converter(input_file['path'])
           cv.convert(output_file['path'], start=0, end=None)
           cv.close()


           if os.path.exists(output_file['path']) and os.path.getsize(output_file['path']) > 0:
               status_label.config(text="Conversion Completed Successfully")
               messagebox.showinfo("Success", "PDF successfully converted to DOCX!")
               check_docx_file(output_file['path'])
           else:
               raise Exception("The file appears to be empty or missing.")
       except Exception as e:
           messagebox.showerror("Error", f"Failed to convert PDF: {e}")
           status_label.config(text="Conversion Failed")


def check_docx_file(path):
   try:
       doc = Document(path)
       messagebox.showinfo("File Check", f"Successfully opened the DOCX file. It contains {len(doc.paragraphs)} paragraphs.")
   except Exception as e:
       messagebox.showerror("File Check Error", f"Failed to open the DOCX file: {e}")


def main():
   root = tk.Tk()
   root.title("PDF to DOCX Converter - The Pycodes")
   root.geometry("400x200")
   input_file = {'path': None}
   output_file = {'path': None}
   status_label = tk.Label(root, text="", fg="green")
   create_widgets(root, input_file, output_file, status_label)
   root.mainloop()


if __name__ == "__main__":
   main()

Happy Coding!

Subscribe for Top Free Python Tutorials!

Receive the best directly.  Elevate Your Coding Journey!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
×