How to Predict Your Salary Using Python and Machine Learning

Learn how to predict your salary using Python and machine learning. Build a full salary estimator with Streamlit, scikit-learn, and real-world data.

Want to predict your salary using Python? In this hands-on machine learning project, weโ€™ll build a salary prediction model using scikit-learn, deploy it with Streamlit, and walk through every line of code โ€” no fluff, just results. In this hands-on project, weโ€™ll train a machine learning model using demographic data, evaluate its performance, and deploy it as a Streamlit app.

This tutorial includes complete code + output for every step โ€” no placeholders, no skipping. Letโ€™s roll. ๐ŸŽฏ


๐Ÿ”ง Part 1: Income Estimator Model in Jupyter Notebook

๐Ÿ› ๏ธ 1. Import Required Libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

import joblib

๐Ÿ“ฅ 2. Load and Inspect the Dataset

df = pd.read_csv("adult_data.csv")
df.head()

Output:

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countrysalary
39State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K

๐Ÿงผ 3. Preprocess the Data

# Drop rows with missing values
df.dropna(inplace=True)

# Encode categorical variables
label_encoders = {}
for column in df.select_dtypes(include='object').columns:
le = LabelEncoder()
df[column] = le.fit_transform(df[column])
label_encoders[column] = le

# Feature matrix and target
X = df.drop('salary', axis=1)
y = df['salary']

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

โœ‚๏ธ 4. Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

๐Ÿค– 5. Train Logistic Regression Model

model = LogisticRegression()
model.fit(X_train, y_train)

๐Ÿ“Š 6. Evaluate the Model

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

Output:

              precision    recall  f1-score   support

0 0.82 0.91 0.86 7421
1 0.72 0.53 0.61 2339

accuracy 0.80 9760
macro avg 0.77 0.72 0.74 9760
weighted avg 0.79 0.80 0.79 9760

๐Ÿ’พ 7. Save the Model and Scaler

joblib.dump(model, 'salary_prediction_model.pkl')
joblib.dump(scaler, 'scaler.pkl')

Saved in the same directory โ€” ready to be used in the web app.


๐Ÿ–ฅ๏ธ Part 2: Streamlit Web App for Salary Prediction

Create a Python file named income_estimator.py with the following complete code:

import streamlit as st
import joblib
import numpy as np

# Load model and scaler
model = joblib.load("salary_prediction_model.pkl")
scaler = joblib.load("scaler.pkl")

# Define categorical options
workclass_options = ['State-gov', 'Self-emp-not-inc', 'Private', 'Federal-gov', 'Local-gov',
'Self-emp-inc', 'Without-pay', 'Never-worked']
education_options = ['Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college', 'Assoc-acdm',
'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school', '5th-6th', '10th', '1st-4th', 'Preschool', '12th']
marital_status_options = ['Never-married', 'Married-civ-spouse', 'Divorced', 'Married-spouse-absent', 'Separated',
'Married-AF-spouse', 'Widowed']
occupation_options = ['Adm-clerical', 'Exec-managerial', 'Handlers-cleaners', 'Prof-specialty', 'Other-service',
'Sales', 'Craft-repair', 'Transport-moving', 'Farming-fishing', 'Machine-op-inspct']
relationship_options = ['Not-in-family', 'Husband', 'Wife', 'Own-child', 'Unmarried', 'Other-relative']
race_options = ['White', 'Black', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo', 'Other']
sex_options = ['Male', 'Female']
native_country_options = ['United-States', 'Cuba', 'Jamaica', 'India', 'Mexico', 'South', 'Puerto-Rico', 'Honduras', 'England']

# UI
st.title("๐Ÿ’ผ Salary Prediction App")

age = st.number_input("Age", min_value=18, max_value=90, value=30)
workclass = st.selectbox("Workclass", workclass_options)
education = st.selectbox("Education", education_options)
education_num = st.number_input("Education Number", min_value=1, max_value=16, value=10)
marital_status = st.selectbox("Marital Status", marital_status_options)
occupation = st.selectbox("Occupation", occupation_options)
relationship = st.selectbox("Relationship", relationship_options)
race = st.selectbox("Race", race_options)
sex = st.selectbox("Sex", sex_options)
capital_gain = st.number_input("Capital Gain", min_value=0, value=0)
capital_loss = st.number_input("Capital Loss", min_value=0, value=0)
hours_per_week = st.number_input("Hours Per Week", min_value=1, max_value=100, value=40)
native_country = st.selectbox("Native Country", native_country_options)
fnlwgt = st.number_input("Final Weight (fnlwgt)", min_value=1000, max_value=1000000, value=100000)

# Encode input
input_data = np.array([[age, workclass_options.index(workclass), education_options.index(education),
fnlwgt, education_num, marital_status_options.index(marital_status),
occupation_options.index(occupation), relationship_options.index(relationship),
race_options.index(race), sex_options.index(sex), np.log1p(capital_gain),
np.log1p(capital_loss), hours_per_week, native_country_options.index(native_country)]])

# Scale
input_data = scaler.transform(input_data)

# Predict
if st.button("Predict Salary"):
prediction = model.predict(input_data)
salary_result = "<=50K" if prediction[0] == 0 else ">50K"
st.success(f"๐Ÿงพ Predicted Salary: **{salary_result}**")

๐Ÿ“ฆ How to Run the App

streamlit run income_estimator.py

This will launch your app in a browser like this:


๐Ÿง  Final Thoughts

Youโ€™ve now:

  • Cleaned and preprocessed tabular data
  • Trained a logistic regression model
  • Deployed it in a production-ready Streamlit app

This project makes a great starting point for more complex income prediction or HR analytics systems.


๐Ÿ’ฌ Whatโ€™s Next?

๐Ÿ‘‰ Want to take this further with XGBoost, SHAP explanations, or database integration?

Comment below or contact Ossels AI โ€” we build ML tools that solve real problems.

Posted by Ananya Rajeev

Ananya Rajeev is a Kerala-born data scientist and AI enthusiast who simplifies generative and agentic AI for curious minds. B.Tech grad, code lover, and storyteller at heart.