EDA Case Study: House Price

Task Description

House Prices is a classical Kaggle competition. The task is to predicts final price of each house. For more detail, refer to https://www.kaggle.com/c/house-prices-advanced-regression-techniques/.

Goal of this notebook

As it is a famous competition, there exists lots of excelent analysis on how to do eda and how to build model for this task. See https://www.kaggle.com/khandelwallaksya/house-prices-eda for a reference. In this notebook, we will show how dataprep.eda can simply the eda process using a few lines of code.

In conclusion: * Understand the problem. We’ll look at each variable and do a philosophical analysis about their meaning and importance for this problem. * Univariable study. We’ll just focus on the dependent variable (‘SalePrice’) and try to know a little bit more about it. * Multivariate study. We’ll try to understand how the dependent variable and independent variables relate. * Basic cleaning. We’ll clean the dataset and handle the missing data, outliers and categorical variables.

Import libraries

[1]:
from dataprep.eda import plot
from dataprep.eda import plot_correlation
from dataprep.eda import plot_missing
from dataprep.datasets import load_dataset

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid", color_codes=True)
sns.set(font_scale=1)