Ultimate guide to Build a PDF to Text Converter with Python

Learn how to build a PDF to Text Converter using Python and Streamlit. Step-by-step tutorial with code, preview, and download options.

Ever needed to convert a PDF to text without losing formatting or wasting time copy-pasting? You’re not alone. From contracts to academic papers, PDFs are everywhere โ€” but editing them? Not so easy.

In this guide, Iโ€™ll show you exactly how to build a PDF to Text converter using Python and Streamlit. It’s fast, free, and perfect for anyone looking to extract clean text from any PDF file โ€” right from a web browser.

Letโ€™s dive in!


๐Ÿ› ๏ธ Tools Youโ€™ll Need

Before we start, make sure youโ€™ve got the following installed:

  • Python 3.x
  • Streamlit โ€“ For the UI
  • PyPDF2 โ€“ To read and extract text from PDFs

Use this command to install the packages:

pip install streamlit PyPDF2

๐Ÿงฉ Step-by-Step Implementation

๐Ÿ”น Step 1: Import Required Libraries

Weโ€™ll start by importing our tools. Add this at the top of your app.py file:

import PyPDF2
import os
import streamlit as st
  • PyPDF2 handles PDF reading and text extraction.
  • os helps manage temporary file cleanup.
  • streamlit powers the web UI.

๐Ÿ”น Step 2: Create a Function to Convert PDF to Text

Letโ€™s define the core function that reads a PDF and returns plain text.

def pdf_to_text(pdf_path, output_path):
with open(pdf_path, 'rb') as pdfobj:
pdfreader = PyPDF2.PdfReader(pdfobj)
num_pages = len(pdfreader.pages)
text = ""
for i in range(num_pages):
pageObj = pdfreader.pages[i]
text += pageObj.extract_text()

with open(output_path, 'w') as txtfile:
txtfile.write(text)

return text

๐Ÿ“Œ Note: PyPDF2 works best with text-based PDFs (not scanned images).


๐Ÿ”น Step 3: Build the Streamlit Interface

Now, letโ€™s make things interactive using Streamlit.

st.title('PDF to Text Converter')

uploaded_file = st.file_uploader("Upload your PDF file", type="pdf")

This creates a nice file upload widget. Once a file is uploaded, weโ€™ll process it.


๐Ÿ”น Step 4: Save and Process the Uploaded PDF

if uploaded_file is not None:
pdf_path = f"temp/{uploaded_file.name}"
with open(pdf_path, "wb") as f:
f.write(uploaded_file.getbuffer())

This saves the PDF to a temporary folder named temp/. You can create that folder in your project root.


๐Ÿ”น Step 5: Extract and Preview the Text

    output_text = pdf_to_text(pdf_path, "temp/converted_text.txt")
preview_text = output_text[:1000]

st.subheader('Text Preview:')
st.text(preview_text)

Youโ€™ll get a quick preview of the extracted content โ€” super helpful before downloading.


๐Ÿ”น Step 6: Enable Text File Download

    st.download_button(
label="Download full text as .txt",
data=output_text,
file_name="converted_text.txt",
mime="text/plain"
)

With one click, users can download the converted text as a .txt file.


๐Ÿ”น Step 7: Clean Up Temporary Files

    os.remove(pdf_path)

This keeps things tidy by deleting the uploaded file after processing.


๐Ÿ’ป Full Working Code

Hereโ€™s the complete script:

import PyPDF2
import os
import streamlit as st

def pdf_to_text(pdf_path, output_path):
with open(pdf_path, 'rb') as pdfobj:
pdfreader = PyPDF2.PdfReader(pdfobj)
num_pages = len(pdfreader.pages)
text = ""
for i in range(num_pages):
pageObj = pdfreader.pages[i]
text += pageObj.extract_text()
with open(output_path, 'w') as txtfile:
txtfile.write(text)
return text

st.title('PDF to Text Converter')

uploaded_file = st.file_uploader("Upload your PDF file", type="pdf")

if uploaded_file is not None:
pdf_path = f"temp/{uploaded_file.name}"
with open(pdf_path, "wb") as f:
f.write(uploaded_file.getbuffer())

output_text = pdf_to_text(pdf_path, "temp/converted_text.txt")
preview_text = output_text[:1000]

st.subheader('Text Preview:')
st.text(preview_text)

st.download_button(
label="Download full text as .txt",
data=output_text,
file_name="converted_text.txt",
mime="text/plain"
)

os.remove(pdf_path)

๐Ÿ”„ Bonus Ideas for Enhancement

Want to take it further? Try these:

  • Add OCR with Tesseract for scanned PDFs.
  • Support multiple files at once.
  • Enable language detection for multilingual documents.
  • Auto-clean formatting or remove line breaks intelligently.

๐Ÿง  Conclusion

And just like that, you’ve built a fully functional PDF to Text Converter using Python and Streamlit!

This tool can be a real time-saver โ€” whether you’re processing legal docs, student handouts, or business PDFs.

๐Ÿ‘‰ Try it out, customize it, and let me know what youโ€™d add next.
Drop your questions in the comments or explore more Python tools on the Ossels AI Blog.

Posted by Ananya Rajeev

Ananya Rajeev is a Kerala-born data scientist and AI enthusiast who simplifies generative and agentic AI for curious minds. B.Tech grad, code lover, and storyteller at heart.