Ultimate guide to Build a PDF to Text Converter with Python

Learn how to build a PDF to Text Converter using Python and Streamlit. Step-by-step tutorial with code, preview, and download options.

Ever needed to convert a PDF to text without losing formatting or wasting time copy-pasting? You’re not alone. From contracts to academic papers, PDFs are everywhere — but editing them? Not so easy.

In this guide, I’ll show you exactly how to build a PDF to Text converter using Python and Streamlit. It’s fast, free, and perfect for anyone looking to extract clean text from any PDF file — right from a web browser.

Let’s dive in!


🛠️ Tools You’ll Need

Before we start, make sure you’ve got the following installed:

  • Python 3.x
  • Streamlit – For the UI
  • PyPDF2 – To read and extract text from PDFs

Use this command to install the packages:

pip install streamlit PyPDF2

🧩 Step-by-Step Implementation

🔹 Step 1: Import Required Libraries

We’ll start by importing our tools. Add this at the top of your app.py file:

import PyPDF2
import os
import streamlit as st
  • PyPDF2 handles PDF reading and text extraction.
  • os helps manage temporary file cleanup.
  • streamlit powers the web UI.

🔹 Step 2: Create a Function to Convert PDF to Text

Let’s define the core function that reads a PDF and returns plain text.

def pdf_to_text(pdf_path, output_path):
with open(pdf_path, 'rb') as pdfobj:
pdfreader = PyPDF2.PdfReader(pdfobj)
num_pages = len(pdfreader.pages)
text = ""
for i in range(num_pages):
pageObj = pdfreader.pages[i]
text += pageObj.extract_text()

with open(output_path, 'w') as txtfile:
txtfile.write(text)

return text

📌 Note: PyPDF2 works best with text-based PDFs (not scanned images).


🔹 Step 3: Build the Streamlit Interface

Now, let’s make things interactive using Streamlit.

st.title('PDF to Text Converter')

uploaded_file = st.file_uploader("Upload your PDF file", type="pdf")

This creates a nice file upload widget. Once a file is uploaded, we’ll process it.


🔹 Step 4: Save and Process the Uploaded PDF

if uploaded_file is not None:
pdf_path = f"temp/{uploaded_file.name}"
with open(pdf_path, "wb") as f:
f.write(uploaded_file.getbuffer())

This saves the PDF to a temporary folder named temp/. You can create that folder in your project root.


🔹 Step 5: Extract and Preview the Text

    output_text = pdf_to_text(pdf_path, "temp/converted_text.txt")
preview_text = output_text[:1000]

st.subheader('Text Preview:')
st.text(preview_text)

You’ll get a quick preview of the extracted content — super helpful before downloading.


🔹 Step 6: Enable Text File Download

    st.download_button(
label="Download full text as .txt",
data=output_text,
file_name="converted_text.txt",
mime="text/plain"
)

With one click, users can download the converted text as a .txt file.


🔹 Step 7: Clean Up Temporary Files

    os.remove(pdf_path)

This keeps things tidy by deleting the uploaded file after processing.


💻 Full Working Code

Here’s the complete script:

import PyPDF2
import os
import streamlit as st

def pdf_to_text(pdf_path, output_path):
with open(pdf_path, 'rb') as pdfobj:
pdfreader = PyPDF2.PdfReader(pdfobj)
num_pages = len(pdfreader.pages)
text = ""
for i in range(num_pages):
pageObj = pdfreader.pages[i]
text += pageObj.extract_text()
with open(output_path, 'w') as txtfile:
txtfile.write(text)
return text

st.title('PDF to Text Converter')

uploaded_file = st.file_uploader("Upload your PDF file", type="pdf")

if uploaded_file is not None:
pdf_path = f"temp/{uploaded_file.name}"
with open(pdf_path, "wb") as f:
f.write(uploaded_file.getbuffer())

output_text = pdf_to_text(pdf_path, "temp/converted_text.txt")
preview_text = output_text[:1000]

st.subheader('Text Preview:')
st.text(preview_text)

st.download_button(
label="Download full text as .txt",
data=output_text,
file_name="converted_text.txt",
mime="text/plain"
)

os.remove(pdf_path)

🔄 Bonus Ideas for Enhancement

Want to take it further? Try these:

  • Add OCR with Tesseract for scanned PDFs.
  • Support multiple files at once.
  • Enable language detection for multilingual documents.
  • Auto-clean formatting or remove line breaks intelligently.

🧠 Conclusion

And just like that, you’ve built a fully functional PDF to Text Converter using Python and Streamlit!

This tool can be a real time-saver — whether you’re processing legal docs, student handouts, or business PDFs.

👉 Try it out, customize it, and let me know what you’d add next.
Drop your questions in the comments or explore more Python tools on the Ossels AI Blog.

Posted by Ananya Rajeev

Ananya Rajeev is a Kerala-born data scientist and AI enthusiast who simplifies generative and agentic AI for curious minds. B.Tech grad, code lover, and storyteller at heart.