Ever needed to convert a PDF to text without losing formatting or wasting time copy-pasting? You’re not alone. From contracts to academic papers, PDFs are everywhere — but editing them? Not so easy.
In this guide, I’ll show you exactly how to build a PDF to Text converter using Python and Streamlit. It’s fast, free, and perfect for anyone looking to extract clean text from any PDF file — right from a web browser.
Let’s dive in!
🛠️ Tools You’ll Need
Before we start, make sure you’ve got the following installed:
- Python 3.x
- Streamlit – For the UI
- PyPDF2 – To read and extract text from PDFs
Use this command to install the packages:
pip install streamlit PyPDF2
🧩 Step-by-Step Implementation
🔹 Step 1: Import Required Libraries
We’ll start by importing our tools. Add this at the top of your app.py file:
import PyPDF2
import os
import streamlit as st
PyPDF2handles PDF reading and text extraction.oshelps manage temporary file cleanup.streamlitpowers the web UI.
🔹 Step 2: Create a Function to Convert PDF to Text
Let’s define the core function that reads a PDF and returns plain text.
def pdf_to_text(pdf_path, output_path):
with open(pdf_path, 'rb') as pdfobj:
pdfreader = PyPDF2.PdfReader(pdfobj)
num_pages = len(pdfreader.pages)
text = ""
for i in range(num_pages):
pageObj = pdfreader.pages[i]
text += pageObj.extract_text()
with open(output_path, 'w') as txtfile:
txtfile.write(text)
return text
📌 Note: PyPDF2 works best with text-based PDFs (not scanned images).
🔹 Step 3: Build the Streamlit Interface
Now, let’s make things interactive using Streamlit.
st.title('PDF to Text Converter')
uploaded_file = st.file_uploader("Upload your PDF file", type="pdf")
This creates a nice file upload widget. Once a file is uploaded, we’ll process it.
🔹 Step 4: Save and Process the Uploaded PDF
if uploaded_file is not None:
pdf_path = f"temp/{uploaded_file.name}"
with open(pdf_path, "wb") as f:
f.write(uploaded_file.getbuffer())
This saves the PDF to a temporary folder named temp/. You can create that folder in your project root.
🔹 Step 5: Extract and Preview the Text
output_text = pdf_to_text(pdf_path, "temp/converted_text.txt")
preview_text = output_text[:1000]
st.subheader('Text Preview:')
st.text(preview_text)
You’ll get a quick preview of the extracted content — super helpful before downloading.
🔹 Step 6: Enable Text File Download
st.download_button(
label="Download full text as .txt",
data=output_text,
file_name="converted_text.txt",
mime="text/plain"
)
With one click, users can download the converted text as a .txt file.
🔹 Step 7: Clean Up Temporary Files
os.remove(pdf_path)
This keeps things tidy by deleting the uploaded file after processing.
💻 Full Working Code
Here’s the complete script:
import PyPDF2
import os
import streamlit as st
def pdf_to_text(pdf_path, output_path):
with open(pdf_path, 'rb') as pdfobj:
pdfreader = PyPDF2.PdfReader(pdfobj)
num_pages = len(pdfreader.pages)
text = ""
for i in range(num_pages):
pageObj = pdfreader.pages[i]
text += pageObj.extract_text()
with open(output_path, 'w') as txtfile:
txtfile.write(text)
return text
st.title('PDF to Text Converter')
uploaded_file = st.file_uploader("Upload your PDF file", type="pdf")
if uploaded_file is not None:
pdf_path = f"temp/{uploaded_file.name}"
with open(pdf_path, "wb") as f:
f.write(uploaded_file.getbuffer())
output_text = pdf_to_text(pdf_path, "temp/converted_text.txt")
preview_text = output_text[:1000]
st.subheader('Text Preview:')
st.text(preview_text)
st.download_button(
label="Download full text as .txt",
data=output_text,
file_name="converted_text.txt",
mime="text/plain"
)
os.remove(pdf_path)

🔄 Bonus Ideas for Enhancement
Want to take it further? Try these:
- Add OCR with Tesseract for scanned PDFs.
- Support multiple files at once.
- Enable language detection for multilingual documents.
- Auto-clean formatting or remove line breaks intelligently.
🧠 Conclusion
And just like that, you’ve built a fully functional PDF to Text Converter using Python and Streamlit!
This tool can be a real time-saver — whether you’re processing legal docs, student handouts, or business PDFs.
👉 Try it out, customize it, and let me know what you’d add next.
Drop your questions in the comments or explore more Python tools on the Ossels AI Blog.