Ridge vs Lasso Regression on Boston Housing – Which Works Better?

Ridge and Lasso regression on the Boston Housing dataset. Learn which works better for predicting prices with step-by-step Python code and visuals.

Want to predict housing prices and shrink your model smartly? In this hands-on guide, we’ll dive into the famous Boston Housing dataset and show you how to analyze it using Ridge and Lasso regression — two powerful tools that help you avoid overfitting and make better predictions.

Whether you’re brand new to machine learning or just need a clean example, this tutorial is for you. We’ll walk through everything from loading the data to tuning hyperparameters and visualizing results.

Let’s roll.


🧠 What Are Ridge and Lasso Regression?

Regular linear regression tries to fit all features perfectly, even noisy or irrelevant ones — not ideal.

That’s where Ridge and Lasso step in:

  • Ridge Regression adds a penalty for large weights, shrinking coefficients toward zero (but not exactly zero).
  • Lasso Regression does the same but is aggressive — it can eliminate irrelevant features entirely.

Think of them like personal trainers for your model:

  • Ridge says “trim the fat.”
  • Lasso says “cut the junk.”

🧰 Step 1: Import Your Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

We’re using:

  • pandas to handle data
  • numpy for math
  • matplotlib for beautiful plots later

🏡 Step 2: Load the Boston Housing Dataset

Even though the dataset is deprecated in scikit-learn, it’s still useful for learning.

from sklearn.datasets import load_boston

boston = load_boston()
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data['Price'] = boston.target

Output:

RMCRIMZNLSTATPrice
6.570.0218.05.324.0
  • RM = average number of rooms
  • LSTAT = % lower status population
  • Price = target variable (in $1000s)

🧹 Step 3: Clean & Prepare Data

print(data.head())
print(data.isnull().sum())

This checks:

  • First few rows
  • Missing values (should be 0 across the board)

🧪 Step 4: Separate Features and Target

Let’s extract X (features) and y (target):

X = data.drop(columns='Price')
y = data['Price']

Boom. Done. Now it’s model time.


🔢 Step 5: Linear Regression (The Baseline)

Let’s try plain linear regression first — no penalties.

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score

lin_model = LinearRegression()
neg_mse_scores = cross_val_score(lin_model, X, y, scoring='neg_mean_squared_error', cv=5)
mean_mse = np.mean(neg_mse_scores)
print(f"Linear Regression Mean MSE: {mean_mse}")

Output Example:

Linear Regression Mean MSE: -34.23

This will be our baseline for comparison.


🧗 Step 6: Ridge Regression with Grid Search

We’ll use GridSearchCV to find the best alpha (penalty strength).

from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV

ridge = Ridge()
params_ridge = {'alpha': np.logspace(-3, 3, 13)} # Try values from 0.001 to 1000
ridge_cv = GridSearchCV(ridge, params_ridge, scoring='neg_mean_squared_error', cv=5)
ridge_cv.fit(X, y)

print(f"Best Ridge Alpha: {ridge_cv.best_params_['alpha']}")
print(f"Best Ridge Score: {ridge_cv.best_score_}")

Output Example:

Best Ridge Alpha: 10.0
Best Ridge Score: -29.87

Ridge is doing better already — smaller error means tighter predictions!


✂️ Step 7: Lasso Regression with Grid Search

Lasso is up next — let’s see if it can outshine Ridge.

from sklearn.linear_model import Lasso

lasso = Lasso(max_iter=10000)
params_lasso = {'alpha': np.logspace(-3, 3, 13)}
lasso_cv = GridSearchCV(lasso, params_lasso, scoring='neg_mean_squared_error', cv=5)
lasso_cv.fit(X, y)

print(f"Best Lasso Alpha: {lasso_cv.best_params_['alpha']}")
print(f"Best Lasso Score: {lasso_cv.best_score_}")

Output Example:

Best Lasso Alpha: 0.01
Best Lasso Score: -29.55

Sweet! Lasso not only shrinks, but might completely zero out some weak features.


📊 Step 8: Visualize Ridge vs. Lasso Performance

Time for a side-by-side comparison across alphas.

ridge_results = pd.DataFrame(ridge_cv.cv_results_)
lasso_results = pd.DataFrame(lasso_cv.cv_results_)

plt.figure(figsize=(10, 5))
plt.plot(params_ridge['alpha'], -ridge_results['mean_test_score'], label='Ridge')
plt.plot(params_lasso['alpha'], -lasso_results['mean_test_score'], label='Lasso')
plt.xscale('log')
plt.xlabel('Alpha')
plt.ylabel('Negative MSE')
plt.title('Ridge vs Lasso Regression Performance on Boston Housing')
plt.legend()
plt.grid(True)
plt.show()

You’ll likely see:

  • Ridge smooths out slowly
  • Lasso dips then spikes (it punishes harder!)

🧾 Summary Table

ModelBest AlphaMean MSEFeature Elimination
LinearN/A-34.23
Ridge10.0-29.87
Lasso0.01-29.55

✅ Final Thoughts

This project showed how regularization improves prediction and can even simplify your model.

  • Use Ridge when all features might matter.
  • Use Lasso when you want to shrink and prune unnecessary features.
  • Always use cross-validation to tune hyperparameters!

📚 Learn More with Ossels AI

If you enjoyed this tutorial, you’ll love our other hands-on AI projects:

👉 How to Predict Your Salary Using Python and Machine Learning
👉 Build a Bitcoin Price Predictor with LSTM
👉 Ultimate Guide to Generative AI Tools in 2025


💬 Got Questions?

Drop your comments below or reach out via Ossels AI. We’d love to see what you’re building with Ridge and Lasso!

Posted by Ananya Rajeev

Ananya Rajeev is a Kerala-born data scientist and AI enthusiast who simplifies generative and agentic AI for curious minds. B.Tech grad, code lover, and storyteller at heart.