Datasets

DataPrep provides a collections of datasets. You could easily load them using one line of code and explore the functionalities of dataprep on them.

List Available Datasets

You could list the name of all available datasets by calling get_dataset_names, as shown in below.

[1]:
from dataprep.datasets import get_dataset_names
get_dataset_names()
[1]:
['covid19',
 'wine-quality-red',
 'iris',
 'waste_hauler',
 'countries',
 'patient_info',
 'house_prices_train',
 'adult',
 'house_prices_test',
 'titanic']

Load Dataset

After you know the available dataset names from get_dataset_names. Next you could load the dataset by calling load_dataset.

[2]:
from dataprep.datasets import load_dataset
df = load_dataset("titanic")
df
[2]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns

Analyze Dataset

After you get the dataset, you could try to use dataprep to explore the dataset. For example, you may want to create a profiling report of the dataset using dataprep.eda.

[3]:
from dataprep.eda import create_report
report = create_report(df)
report