Ever needed to convert a PDF to text without losing formatting or wasting time copy-pasting? You’re not alone. From contracts to academic papers, PDFs are everywhere โ but editing them? Not so easy.
In this guide, Iโll show you exactly how to build a PDF to Text converter using Python and Streamlit. It’s fast, free, and perfect for anyone looking to extract clean text from any PDF file โ right from a web browser.
Letโs dive in!
๐ ๏ธ Tools Youโll Need
Before we start, make sure youโve got the following installed:
- Python 3.x
- Streamlit โ For the UI
- PyPDF2 โ To read and extract text from PDFs
Use this command to install the packages:
pip install streamlit PyPDF2
๐งฉ Step-by-Step Implementation
๐น Step 1: Import Required Libraries
Weโll start by importing our tools. Add this at the top of your app.py file:
import PyPDF2
import os
import streamlit as st
PyPDF2handles PDF reading and text extraction.oshelps manage temporary file cleanup.streamlitpowers the web UI.
๐น Step 2: Create a Function to Convert PDF to Text
Letโs define the core function that reads a PDF and returns plain text.
def pdf_to_text(pdf_path, output_path):
with open(pdf_path, 'rb') as pdfobj:
pdfreader = PyPDF2.PdfReader(pdfobj)
num_pages = len(pdfreader.pages)
text = ""
for i in range(num_pages):
pageObj = pdfreader.pages[i]
text += pageObj.extract_text()
with open(output_path, 'w') as txtfile:
txtfile.write(text)
return text
๐ Note: PyPDF2 works best with text-based PDFs (not scanned images).
๐น Step 3: Build the Streamlit Interface
Now, letโs make things interactive using Streamlit.
st.title('PDF to Text Converter')
uploaded_file = st.file_uploader("Upload your PDF file", type="pdf")
This creates a nice file upload widget. Once a file is uploaded, weโll process it.
๐น Step 4: Save and Process the Uploaded PDF
if uploaded_file is not None:
pdf_path = f"temp/{uploaded_file.name}"
with open(pdf_path, "wb") as f:
f.write(uploaded_file.getbuffer())
This saves the PDF to a temporary folder named temp/. You can create that folder in your project root.
๐น Step 5: Extract and Preview the Text
output_text = pdf_to_text(pdf_path, "temp/converted_text.txt")
preview_text = output_text[:1000]
st.subheader('Text Preview:')
st.text(preview_text)
Youโll get a quick preview of the extracted content โ super helpful before downloading.
๐น Step 6: Enable Text File Download
st.download_button(
label="Download full text as .txt",
data=output_text,
file_name="converted_text.txt",
mime="text/plain"
)
With one click, users can download the converted text as a .txt file.
๐น Step 7: Clean Up Temporary Files
os.remove(pdf_path)
This keeps things tidy by deleting the uploaded file after processing.
๐ป Full Working Code
Hereโs the complete script:
import PyPDF2
import os
import streamlit as st
def pdf_to_text(pdf_path, output_path):
with open(pdf_path, 'rb') as pdfobj:
pdfreader = PyPDF2.PdfReader(pdfobj)
num_pages = len(pdfreader.pages)
text = ""
for i in range(num_pages):
pageObj = pdfreader.pages[i]
text += pageObj.extract_text()
with open(output_path, 'w') as txtfile:
txtfile.write(text)
return text
st.title('PDF to Text Converter')
uploaded_file = st.file_uploader("Upload your PDF file", type="pdf")
if uploaded_file is not None:
pdf_path = f"temp/{uploaded_file.name}"
with open(pdf_path, "wb") as f:
f.write(uploaded_file.getbuffer())
output_text = pdf_to_text(pdf_path, "temp/converted_text.txt")
preview_text = output_text[:1000]
st.subheader('Text Preview:')
st.text(preview_text)
st.download_button(
label="Download full text as .txt",
data=output_text,
file_name="converted_text.txt",
mime="text/plain"
)
os.remove(pdf_path)

๐ Bonus Ideas for Enhancement
Want to take it further? Try these:
- Add OCR with Tesseract for scanned PDFs.
- Support multiple files at once.
- Enable language detection for multilingual documents.
- Auto-clean formatting or remove line breaks intelligently.
๐ง Conclusion
And just like that, you’ve built a fully functional PDF to Text Converter using Python and Streamlit!
This tool can be a real time-saver โ whether you’re processing legal docs, student handouts, or business PDFs.
๐ Try it out, customize it, and let me know what youโd add next.
Drop your questions in the comments or explore more Python tools on the Ossels AI Blog.