Want to predict your salary using Python? In this hands-on machine learning project, weโll build a salary prediction model using scikit-learn, deploy it with Streamlit, and walk through every line of code โ no fluff, just results. In this hands-on project, weโll train a machine learning model using demographic data, evaluate its performance, and deploy it as a Streamlit app.
This tutorial includes complete code + output for every step โ no placeholders, no skipping. Letโs roll. ๐ฏ
๐ง Part 1: Income Estimator Model in Jupyter Notebook
๐ ๏ธ 1. Import Required Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
import joblib
๐ฅ 2. Load and Inspect the Dataset
df = pd.read_csv("adult_data.csv")
df.head()
Output:
| age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | salary |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 39 | State-gov | 77516 | Bachelors | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 2174 | 0 | 40 | United-States | <=50K |
๐งผ 3. Preprocess the Data
# Drop rows with missing values
df.dropna(inplace=True)
# Encode categorical variables
label_encoders = {}
for column in df.select_dtypes(include='object').columns:
le = LabelEncoder()
df[column] = le.fit_transform(df[column])
label_encoders[column] = le
# Feature matrix and target
X = df.drop('salary', axis=1)
y = df['salary']
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
โ๏ธ 4. Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
๐ค 5. Train Logistic Regression Model
model = LogisticRegression()
model.fit(X_train, y_train)
๐ 6. Evaluate the Model
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
Output:
precision recall f1-score support
0 0.82 0.91 0.86 7421
1 0.72 0.53 0.61 2339
accuracy 0.80 9760
macro avg 0.77 0.72 0.74 9760
weighted avg 0.79 0.80 0.79 9760
๐พ 7. Save the Model and Scaler
joblib.dump(model, 'salary_prediction_model.pkl')
joblib.dump(scaler, 'scaler.pkl')
Saved in the same directory โ ready to be used in the web app.
๐ฅ๏ธ Part 2: Streamlit Web App for Salary Prediction
Create a Python file named income_estimator.py with the following complete code:
import streamlit as st
import joblib
import numpy as np
# Load model and scaler
model = joblib.load("salary_prediction_model.pkl")
scaler = joblib.load("scaler.pkl")
# Define categorical options
workclass_options = ['State-gov', 'Self-emp-not-inc', 'Private', 'Federal-gov', 'Local-gov',
'Self-emp-inc', 'Without-pay', 'Never-worked']
education_options = ['Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college', 'Assoc-acdm',
'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school', '5th-6th', '10th', '1st-4th', 'Preschool', '12th']
marital_status_options = ['Never-married', 'Married-civ-spouse', 'Divorced', 'Married-spouse-absent', 'Separated',
'Married-AF-spouse', 'Widowed']
occupation_options = ['Adm-clerical', 'Exec-managerial', 'Handlers-cleaners', 'Prof-specialty', 'Other-service',
'Sales', 'Craft-repair', 'Transport-moving', 'Farming-fishing', 'Machine-op-inspct']
relationship_options = ['Not-in-family', 'Husband', 'Wife', 'Own-child', 'Unmarried', 'Other-relative']
race_options = ['White', 'Black', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo', 'Other']
sex_options = ['Male', 'Female']
native_country_options = ['United-States', 'Cuba', 'Jamaica', 'India', 'Mexico', 'South', 'Puerto-Rico', 'Honduras', 'England']
# UI
st.title("๐ผ Salary Prediction App")
age = st.number_input("Age", min_value=18, max_value=90, value=30)
workclass = st.selectbox("Workclass", workclass_options)
education = st.selectbox("Education", education_options)
education_num = st.number_input("Education Number", min_value=1, max_value=16, value=10)
marital_status = st.selectbox("Marital Status", marital_status_options)
occupation = st.selectbox("Occupation", occupation_options)
relationship = st.selectbox("Relationship", relationship_options)
race = st.selectbox("Race", race_options)
sex = st.selectbox("Sex", sex_options)
capital_gain = st.number_input("Capital Gain", min_value=0, value=0)
capital_loss = st.number_input("Capital Loss", min_value=0, value=0)
hours_per_week = st.number_input("Hours Per Week", min_value=1, max_value=100, value=40)
native_country = st.selectbox("Native Country", native_country_options)
fnlwgt = st.number_input("Final Weight (fnlwgt)", min_value=1000, max_value=1000000, value=100000)
# Encode input
input_data = np.array([[age, workclass_options.index(workclass), education_options.index(education),
fnlwgt, education_num, marital_status_options.index(marital_status),
occupation_options.index(occupation), relationship_options.index(relationship),
race_options.index(race), sex_options.index(sex), np.log1p(capital_gain),
np.log1p(capital_loss), hours_per_week, native_country_options.index(native_country)]])
# Scale
input_data = scaler.transform(input_data)
# Predict
if st.button("Predict Salary"):
prediction = model.predict(input_data)
salary_result = "<=50K" if prediction[0] == 0 else ">50K"
st.success(f"๐งพ Predicted Salary: **{salary_result}**")
๐ฆ How to Run the App
streamlit run income_estimator.py
This will launch your app in a browser like this:

๐ง Final Thoughts
Youโve now:
- Cleaned and preprocessed tabular data
- Trained a logistic regression model
- Deployed it in a production-ready Streamlit app
This project makes a great starting point for more complex income prediction or HR analytics systems.
๐ฌ Whatโs Next?
๐ Want to take this further with XGBoost, SHAP explanations, or database integration?
Comment below or contact Ossels AI โ we build ML tools that solve real problems.