`clean_ml()`: Clean dataset for downstreaming machine learning tasks.¶

Introduction¶

The function clean_ml() cleans a dataset for downstreaming machine learning tasks with commonly used operators. It deals with categrical columns and numerical columns sperately. We set the default cleaning pipeline according to existing tools.

Currently, the supported components and operators are listed below:

cat_encoding: encoding categrical columns
- no_encoding
- one_hot
cat_imputation: imputing missing values in categorical columns
- constant
- most_frequent
- drop
num_imputataion : imputing missing values in numerical columns
- mean
- median
- most_frequent
- drop
num_scaling: scaling numerical columns
- standarize
- minmax
- maxabs
variance_threshold: dropping numerical columns with low variance

Users can also specify include_operators and exclude_operators to include or exclude specified operators listed above. User can also customize the pipeline with user-defined operators.

An example dataset¶

The example dataset is a very traditional dataset adult. It has 48842 rows and 15 columns. In this dataset, ‘?’ means the missing values.

[4]:

import pandas as pd
pd.set_option('display.min_rows', 30)
df = pd.read_csv('adult.csv')
df

[4]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	2	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	1	0	2	United-States	<=50K
1	3	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	0	United-States	<=50K
2	2	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	0	2	United-States	<=50K
3	3	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	0	2	United-States	<=50K
4	1	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	0	2	Cuba	<=50K
5	2	Private	284582	Masters	14	Married-civ-spouse	Exec-managerial	Wife	White	Female	0	0	2	United-States	<=50K
6	3	Private	160187	9th	5	Married-spouse-absent	Other-service	Not-in-family	Black	Female	0	0	0	Jamaica	<=50K
7	3	Self-emp-not-inc	209642	HS-grad	9	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	2	United-States	>50K
8	1	Private	45781	Masters	14	Never-married	Prof-specialty	Not-in-family	White	Female	4	0	3	United-States	>50K
9	2	Private	159449	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	2	0	2	United-States	>50K
10	2	Private	280464	Some-college	10	Married-civ-spouse	Exec-managerial	Husband	Black	Male	0	0	4	United-States	>50K
11	1	State-gov	141297	Bachelors	13	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	0	0	2	India	>50K
12	0	Private	122272	Bachelors	13	Never-married	Adm-clerical	Own-child	White	Female	0	0	1	United-States	<=50K
13	1	Private	205019	Assoc-acdm	12	Never-married	Sales	Not-in-family	Black	Male	0	0	3	United-States	<=50K
14	2	Private	121772	Assoc-voc	11	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	0	0	2	?	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	3	Private	224655	HS-grad	9	Separated	Priv-house-serv	Not-in-family	White	Female	0	0	1	United-States	<=50K
48828	2	Private	247547	Assoc-voc	11	Never-married	Adm-clerical	Unmarried	Black	Female	0	0	2	United-States	<=50K
48829	4	Private	292710	Assoc-acdm	12	Divorced	Prof-specialty	Not-in-family	White	Male	0	0	2	United-States	<=50K
48830	1	Private	173449	HS-grad	9	Married-civ-spouse	Handlers-cleaners	Husband	White	Male	0	0	2	United-States	<=50K
48831	3	Private	285570	HS-grad	9	Married-civ-spouse	Adm-clerical	Husband	White	Male	0	0	2	United-States	<=50K
48832	4	Private	89686	HS-grad	9	Married-civ-spouse	Sales	Husband	White	Male	0	0	3	United-States	<=50K
48833	1	Private	440129	HS-grad	9	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	2	United-States	<=50K
48834	0	Private	350977	HS-grad	9	Never-married	Other-service	Own-child	White	Female	0	0	2	United-States	<=50K
48835	3	Local-gov	349230	Masters	14	Divorced	Other-service	Not-in-family	White	Male	0	0	2	United-States	<=50K
48836	1	Private	245211	Bachelors	13	Never-married	Prof-specialty	Own-child	White	Male	0	0	2	United-States	<=50K
48837	2	Private	215419	Bachelors	13	Divorced	Prof-specialty	Not-in-family	White	Female	0	0	2	United-States	<=50K
48838	4	?	321403	HS-grad	9	Widowed	?	Other-relative	Black	Male	0	0	2	United-States	<=50K
48839	2	Private	374983	Bachelors	13	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	3	United-States	<=50K
48840	2	Private	83891	Bachelors	13	Divorced	Adm-clerical	Own-child	Asian-Pac-Islander	Male	2	0	2	United-States	<=50K
48841	1	Self-emp-inc	182148	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	3	United-States	>50K

48842 rows × 15 columns

Split the dataset as training dataframe and test dataframe¶

[5]:

training_rate = 0.7
index = df.index
number_of_rows = len(index)
training_df = df.iloc[:int(training_rate * number_of_rows), :]
test_df = df.iloc[int(training_rate * number_of_rows):, :]

[6]:

training_df

[6]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	2	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	1	0	2	United-States	<=50K
1	3	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	0	United-States	<=50K
2	2	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	0	2	United-States	<=50K
3	3	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	0	2	United-States	<=50K
4	1	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	0	2	Cuba	<=50K
5	2	Private	284582	Masters	14	Married-civ-spouse	Exec-managerial	Wife	White	Female	0	0	2	United-States	<=50K
6	3	Private	160187	9th	5	Married-spouse-absent	Other-service	Not-in-family	Black	Female	0	0	0	Jamaica	<=50K
7	3	Self-emp-not-inc	209642	HS-grad	9	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	2	United-States	>50K
8	1	Private	45781	Masters	14	Never-married	Prof-specialty	Not-in-family	White	Female	4	0	3	United-States	>50K
9	2	Private	159449	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	2	0	2	United-States	>50K
10	2	Private	280464	Some-college	10	Married-civ-spouse	Exec-managerial	Husband	Black	Male	0	0	4	United-States	>50K
11	1	State-gov	141297	Bachelors	13	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	0	0	2	India	>50K
12	0	Private	122272	Bachelors	13	Never-married	Adm-clerical	Own-child	White	Female	0	0	1	United-States	<=50K
13	1	Private	205019	Assoc-acdm	12	Never-married	Sales	Not-in-family	Black	Male	0	0	3	United-States	<=50K
14	2	Private	121772	Assoc-voc	11	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	0	0	2	?	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
34174	2	Private	173651	Some-college	10	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	2	United-States	>50K
34175	3	Private	149337	HS-grad	9	Separated	Handlers-cleaners	Not-in-family	White	Male	0	0	2	United-States	<=50K
34176	4	Private	146674	HS-grad	9	Married-civ-spouse	Other-service	Husband	Black	Male	0	0	2	?	>50K
34177	4	Private	173483	Bachelors	13	Divorced	Prof-specialty	Not-in-family	White	Female	0	0	1	United-States	<=50K
34178	0	Private	223669	11th	7	Never-married	Other-service	Own-child	White	Male	0	0	0	United-States	<=50K
34179	3	Private	182177	Some-college	10	Divorced	Protective-serv	Unmarried	White	Female	0	0	1	United-States	<=50K
34180	0	Private	109414	Some-college	10	Never-married	Sales	Other-relative	Asian-Pac-Islander	Male	0	0	0	India	<=50K
34181	3	Self-emp-inc	150917	Some-college	10	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	3	2	United-States	>50K
34182	4	Self-emp-not-inc	39128	HS-grad	9	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	1	United-States	<=50K
34183	3	Local-gov	103540	Bachelors	13	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	3	United-States	>50K
34184	4	Private	110212	HS-grad	9	Married-civ-spouse	Other-service	Husband	Black	Male	0	0	2	United-States	<=50K
34185	2	Private	222450	HS-grad	9	Never-married	Sales	Not-in-family	White	Male	0	4	2	El-Salvador	<=50K
34186	0	?	113760	HS-grad	9	Never-married	?	Own-child	White	Female	0	0	2	United-States	<=50K
34187	2	?	253717	11th	7	Married-civ-spouse	?	Wife	White	Female	0	0	0	United-States	<=50K
34188	0	Private	306908	HS-grad	9	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	0	0	2	United-States	<=50K

34189 rows × 15 columns

[7]:

test_df

[7]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
34189	2	Self-emp-not-inc	263871	Some-college	10	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	3	United-States	<=50K
34190	2	State-gov	55294	Bachelors	13	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	2	United-States	>50K
34191	0	Private	174063	Assoc-voc	11	Never-married	Other-service	Own-child	White	Female	0	0	0	United-States	<=50K
34192	3	State-gov	258735	Some-college	10	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	2	United-States	>50K
34193	3	Private	275867	HS-grad	9	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	2	United-States	<=50K
34194	0	Private	154235	Some-college	10	Never-married	Sales	Own-child	White	Female	0	0	1	United-States	<=50K
34195	1	Local-gov	210448	Some-college	10	Married-civ-spouse	Craft-repair	Other-relative	White	Male	0	0	2	United-States	<=50K
34196	1	Private	337908	Some-college	10	Divorced	Adm-clerical	Unmarried	Black	Female	0	0	1	United-States	<=50K
34197	1	State-gov	205333	Bachelors	13	Never-married	Prof-specialty	Not-in-family	White	Female	0	0	0	United-States	<=50K
34198	0	Private	187447	Some-college	10	Separated	Other-service	Own-child	White	Male	0	0	2	United-States	<=50K
34199	1	Private	153589	9th	5	Separated	Craft-repair	Not-in-family	White	Male	0	0	2	United-States	<=50K
34200	1	Local-gov	149988	Some-college	10	Divorced	Adm-clerical	Unmarried	White	Female	0	0	2	United-States	<=50K
34201	2	Private	398959	Some-college	10	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	2	United-States	<=50K
34202	0	?	194096	Some-college	10	Never-married	?	Own-child	White	Female	0	0	1	United-States	<=50K
34203	2	Private	44041	Assoc-acdm	12	Married-spouse-absent	Adm-clerical	Other-relative	White	Male	0	0	3	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	3	Private	224655	HS-grad	9	Separated	Priv-house-serv	Not-in-family	White	Female	0	0	1	United-States	<=50K
48828	2	Private	247547	Assoc-voc	11	Never-married	Adm-clerical	Unmarried	Black	Female	0	0	2	United-States	<=50K
48829	4	Private	292710	Assoc-acdm	12	Divorced	Prof-specialty	Not-in-family	White	Male	0	0	2	United-States	<=50K
48830	1	Private	173449	HS-grad	9	Married-civ-spouse	Handlers-cleaners	Husband	White	Male	0	0	2	United-States	<=50K
48831	3	Private	285570	HS-grad	9	Married-civ-spouse	Adm-clerical	Husband	White	Male	0	0	2	United-States	<=50K
48832	4	Private	89686	HS-grad	9	Married-civ-spouse	Sales	Husband	White	Male	0	0	3	United-States	<=50K
48833	1	Private	440129	HS-grad	9	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	2	United-States	<=50K
48834	0	Private	350977	HS-grad	9	Never-married	Other-service	Own-child	White	Female	0	0	2	United-States	<=50K
48835	3	Local-gov	349230	Masters	14	Divorced	Other-service	Not-in-family	White	Male	0	0	2	United-States	<=50K
48836	1	Private	245211	Bachelors	13	Never-married	Prof-specialty	Own-child	White	Male	0	0	2	United-States	<=50K
48837	2	Private	215419	Bachelors	13	Divorced	Prof-specialty	Not-in-family	White	Female	0	0	2	United-States	<=50K
48838	4	?	321403	HS-grad	9	Widowed	?	Other-relative	Black	Male	0	0	2	United-States	<=50K
48839	2	Private	374983	Bachelors	13	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	3	United-States	<=50K
48840	2	Private	83891	Bachelors	13	Divorced	Adm-clerical	Own-child	Asian-Pac-Islander	Male	2	0	2	United-States	<=50K
48841	1	Self-emp-inc	182148	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	3	United-States	>50K

14653 rows × 15 columns

1. Default `clean_ml()`¶

By default, the cleaning pipeline of clean_ml() function: * For categorical columns: constant imputation -> one-hot encoding * For numerical columns: mean imputation -> standardzation

The default NULL values are: {np.nan, float("NaN"), "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", "1.#IND", "1.#QNAN", "<NA>", "N/A", "NA", "NULL", "NaN", "n/a", "nan", "null", "", None}

The default filling value for categorical columns is ‘missing_value’

[8]:

from dataprep.clean import clean_ml
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class")

[9]:

cleaned_training_df

[9]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	0.181564	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.064247	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	1.054765	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
1	0.955953	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.009237	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
2	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.246964	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
3	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.428035	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-1.193092	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
4	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.412302	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
5	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.901345	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.520184	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
6	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.279485	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	-1.968313	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-2.185441	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
7	0.955953	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.189970	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
8	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.365494	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.520184	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	5.032415	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
9	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.286491	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	2.380648	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
10	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.862254	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	2.292380	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
11	-0.592825	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.458800	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
12	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.639397	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
13	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.146086	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	0.744962	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
14	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.644143	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...	0.357352	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
34174	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.151677	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34175	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.382480	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34176	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.407759	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34177	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.153272	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34178	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.323123	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-1.193092	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34179	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.070743	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34180	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.761452	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	-2.185441	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34181	0.955953	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	-0.367482	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	5.199568	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34182	1.730342	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.428648	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34183	0.955953	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	-0.817212	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34184	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.753877	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34185	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.311551	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	7.001429	0.053470	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34186	-1.367214	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	-0.720198	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34187	0.181564	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	0.608356	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-1.193092	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34188	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.113276	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K

34189 rows × 15 columns

[10]:

cleaned_test_df

[10]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
34189	0.181564	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.704744	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34190	0.181564	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.275191	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34191	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.147766	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...	0.357352	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34192	0.955953	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.655990	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34193	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.818617	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34194	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.335985	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34195	-0.592825	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	0.197621	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34196	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.407546	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34197	-0.592825	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.149067	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34198	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.020718	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34199	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.342117	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	-1.968313	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34200	-0.592825	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	-0.376300	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34201	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.987078	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34202	-1.367214	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	0.042399	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34203	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.382011	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	0.744962	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.332483	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48828	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.549787	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...	0.357352	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48829	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.978500	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	0.744962	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48830	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.153595	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48831	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.910723	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48832	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.948722	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48833	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	2.377888	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48834	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.531605	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48835	0.955953	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	1.515021	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.520184	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48836	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.527612	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48837	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.244809	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48838	1.730342	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	1.250871	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48839	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.759484	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48840	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.003732	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	2.380648	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48841	-0.592825	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	-0.071019	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K

14653 rows × 15 columns

2. `cat_imputation` and `cat_null_value` parameter¶

There are three choices for cat_imputation parameter: * constant: filling the missing value with constant values. The default is ‘missing_value’. * most_frequent: filling the missing value with most frequent value of this column. * drop: drop this column if there are missing values.

cat_null_value parameter is a list including user-specified null values. The element in this list can be any type. For example: * [‘?’] * [‘abc’, np.nan, ‘?’, 1265]

By default, the specified missing values are replaced by “missing_value”

[18]:

cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
                                                cat_imputation="constant",
                                                cat_encoding="no_encoding", cat_null_value=['?'])

[19]:

cleaned_training_df

[19]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	0.181564	State-gov	-1.064247	Bachelors	1.132573	Never-married	Adm-clerical	Not-in-family	White	Male	1.054765	-0.206016	0.053470	United-States	<=50K
1	0.955953	Self-emp-not-inc	-1.009237	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	-2.185441	United-States	<=50K
2	0.181564	Private	0.246964	HS-grad	-0.417870	Divorced	Handlers-cleaners	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
3	0.955953	Private	0.428035	11th	-1.193092	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
4	-0.592825	Private	1.412302	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Wife	Black	Female	-0.271118	-0.206016	0.053470	Cuba	<=50K
5	0.181564	Private	0.901345	Masters	1.520184	Married-civ-spouse	Exec-managerial	Wife	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
6	0.955953	Private	-0.279485	9th	-1.968313	Married-spouse-absent	Other-service	Not-in-family	Black	Female	-0.271118	-0.206016	-2.185441	Jamaica	<=50K
7	0.955953	Self-emp-not-inc	0.189970	HS-grad	-0.417870	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
8	-0.592825	Private	-1.365494	Masters	1.520184	Never-married	Prof-specialty	Not-in-family	White	Female	5.032415	-0.206016	1.172925	United-States	>50K
9	0.181564	Private	-0.286491	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	2.380648	-0.206016	0.053470	United-States	>50K
10	0.181564	Private	0.862254	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	Black	Male	-0.271118	-0.206016	2.292380	United-States	>50K
11	-0.592825	State-gov	-0.458800	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	India	>50K
12	-1.367214	Private	-0.639397	Bachelors	1.132573	Never-married	Adm-clerical	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
13	-0.592825	Private	0.146086	Assoc-acdm	0.744962	Never-married	Sales	Not-in-family	Black	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
14	0.181564	Private	-0.644143	Assoc-voc	0.357352	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	missing_value	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
34174	0.181564	Private	-0.151677	Some-college	-0.030259	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34175	0.955953	Private	-0.382480	HS-grad	-0.417870	Separated	Handlers-cleaners	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34176	1.730342	Private	-0.407759	HS-grad	-0.417870	Married-civ-spouse	Other-service	Husband	Black	Male	-0.271118	-0.206016	0.053470	missing_value	>50K
34177	1.730342	Private	-0.153272	Bachelors	1.132573	Divorced	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34178	-1.367214	Private	0.323123	11th	-1.193092	Never-married	Other-service	Own-child	White	Male	-0.271118	-0.206016	-2.185441	United-States	<=50K
34179	0.955953	Private	-0.070743	Some-college	-0.030259	Divorced	Protective-serv	Unmarried	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34180	-1.367214	Private	-0.761452	Some-college	-0.030259	Never-married	Sales	Other-relative	Asian-Pac-Islander	Male	-0.271118	-0.206016	-2.185441	India	<=50K
34181	0.955953	Self-emp-inc	-0.367482	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	5.199568	0.053470	United-States	>50K
34182	1.730342	Self-emp-not-inc	-1.428648	HS-grad	-0.417870	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	-1.065986	United-States	<=50K
34183	0.955953	Local-gov	-0.817212	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	>50K
34184	1.730342	Private	-0.753877	HS-grad	-0.417870	Married-civ-spouse	Other-service	Husband	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34185	0.181564	Private	0.311551	HS-grad	-0.417870	Never-married	Sales	Not-in-family	White	Male	-0.271118	7.001429	0.053470	El-Salvador	<=50K
34186	-1.367214	missing_value	-0.720198	HS-grad	-0.417870	Never-married	missing_value	Own-child	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
34187	0.181564	missing_value	0.608356	11th	-1.193092	Married-civ-spouse	missing_value	Wife	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34188	-1.367214	Private	1.113276	HS-grad	-0.417870	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K

34189 rows × 15 columns

[20]:

cleaned_test_df

[20]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
34189	0.181564	Self-emp-not-inc	0.704744	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
34190	0.181564	State-gov	-1.275191	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34191	-1.367214	Private	-0.147766	Assoc-voc	0.357352	Never-married	Other-service	Own-child	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34192	0.955953	State-gov	0.655990	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34193	0.955953	Private	0.818617	HS-grad	-0.417870	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34194	-1.367214	Private	-0.335985	Some-college	-0.030259	Never-married	Sales	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34195	-0.592825	Local-gov	0.197621	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Other-relative	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34196	-0.592825	Private	1.407546	Some-college	-0.030259	Divorced	Adm-clerical	Unmarried	Black	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34197	-0.592825	State-gov	0.149067	Bachelors	1.132573	Never-married	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34198	-1.367214	Private	-0.020718	Some-college	-0.030259	Separated	Other-service	Own-child	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34199	-0.592825	Private	-0.342117	9th	-1.968313	Separated	Craft-repair	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34200	-0.592825	Local-gov	-0.376300	Some-college	-0.030259	Divorced	Adm-clerical	Unmarried	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
34201	0.181564	Private	1.987078	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34202	-1.367214	missing_value	0.042399	Some-college	-0.030259	Never-married	missing_value	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34203	0.181564	Private	-1.382011	Assoc-acdm	0.744962	Married-spouse-absent	Adm-clerical	Other-relative	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	0.955953	Private	0.332483	HS-grad	-0.417870	Separated	Priv-house-serv	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
48828	0.181564	Private	0.549787	Assoc-voc	0.357352	Never-married	Adm-clerical	Unmarried	Black	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48829	1.730342	Private	0.978500	Assoc-acdm	0.744962	Divorced	Prof-specialty	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48830	-0.592825	Private	-0.153595	HS-grad	-0.417870	Married-civ-spouse	Handlers-cleaners	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48831	0.955953	Private	0.910723	HS-grad	-0.417870	Married-civ-spouse	Adm-clerical	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48832	1.730342	Private	-0.948722	HS-grad	-0.417870	Married-civ-spouse	Sales	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
48833	-0.592825	Private	2.377888	HS-grad	-0.417870	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48834	-1.367214	Private	1.531605	HS-grad	-0.417870	Never-married	Other-service	Own-child	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48835	0.955953	Local-gov	1.515021	Masters	1.520184	Divorced	Other-service	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48836	-0.592825	Private	0.527612	Bachelors	1.132573	Never-married	Prof-specialty	Own-child	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48837	0.181564	Private	0.244809	Bachelors	1.132573	Divorced	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48838	1.730342	missing_value	1.250871	HS-grad	-0.417870	Widowed	missing_value	Other-relative	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48839	0.181564	Private	1.759484	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
48840	0.181564	Private	-1.003732	Bachelors	1.132573	Divorced	Adm-clerical	Own-child	Asian-Pac-Islander	Male	2.380648	-0.206016	0.053470	United-States	<=50K
48841	-0.592825	Self-emp-inc	-0.071019	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	>50K

14653 rows × 15 columns

[21]:

cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
                                                cat_imputation="most_frequent",
                                                cat_encoding="no_encoding", cat_null_value=['?'])

[22]:

cleaned_training_df

[22]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	0.181564	State-gov	-1.064247	Bachelors	1.132573	Never-married	Adm-clerical	Not-in-family	White	Male	1.054765	-0.206016	0.053470	United-States	<=50K
1	0.955953	Self-emp-not-inc	-1.009237	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	-2.185441	United-States	<=50K
2	0.181564	Private	0.246964	HS-grad	-0.417870	Divorced	Handlers-cleaners	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
3	0.955953	Private	0.428035	11th	-1.193092	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
4	-0.592825	Private	1.412302	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Wife	Black	Female	-0.271118	-0.206016	0.053470	Cuba	<=50K
5	0.181564	Private	0.901345	Masters	1.520184	Married-civ-spouse	Exec-managerial	Wife	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
6	0.955953	Private	-0.279485	9th	-1.968313	Married-spouse-absent	Other-service	Not-in-family	Black	Female	-0.271118	-0.206016	-2.185441	Jamaica	<=50K
7	0.955953	Self-emp-not-inc	0.189970	HS-grad	-0.417870	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
8	-0.592825	Private	-1.365494	Masters	1.520184	Never-married	Prof-specialty	Not-in-family	White	Female	5.032415	-0.206016	1.172925	United-States	>50K
9	0.181564	Private	-0.286491	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	2.380648	-0.206016	0.053470	United-States	>50K
10	0.181564	Private	0.862254	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	Black	Male	-0.271118	-0.206016	2.292380	United-States	>50K
11	-0.592825	State-gov	-0.458800	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	India	>50K
12	-1.367214	Private	-0.639397	Bachelors	1.132573	Never-married	Adm-clerical	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
13	-0.592825	Private	0.146086	Assoc-acdm	0.744962	Never-married	Sales	Not-in-family	Black	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
14	0.181564	Private	-0.644143	Assoc-voc	0.357352	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	United-States	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
34174	0.181564	Private	-0.151677	Some-college	-0.030259	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34175	0.955953	Private	-0.382480	HS-grad	-0.417870	Separated	Handlers-cleaners	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34176	1.730342	Private	-0.407759	HS-grad	-0.417870	Married-civ-spouse	Other-service	Husband	Black	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34177	1.730342	Private	-0.153272	Bachelors	1.132573	Divorced	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34178	-1.367214	Private	0.323123	11th	-1.193092	Never-married	Other-service	Own-child	White	Male	-0.271118	-0.206016	-2.185441	United-States	<=50K
34179	0.955953	Private	-0.070743	Some-college	-0.030259	Divorced	Protective-serv	Unmarried	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34180	-1.367214	Private	-0.761452	Some-college	-0.030259	Never-married	Sales	Other-relative	Asian-Pac-Islander	Male	-0.271118	-0.206016	-2.185441	India	<=50K
34181	0.955953	Self-emp-inc	-0.367482	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	5.199568	0.053470	United-States	>50K
34182	1.730342	Self-emp-not-inc	-1.428648	HS-grad	-0.417870	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	-1.065986	United-States	<=50K
34183	0.955953	Local-gov	-0.817212	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	>50K
34184	1.730342	Private	-0.753877	HS-grad	-0.417870	Married-civ-spouse	Other-service	Husband	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34185	0.181564	Private	0.311551	HS-grad	-0.417870	Never-married	Sales	Not-in-family	White	Male	-0.271118	7.001429	0.053470	El-Salvador	<=50K
34186	-1.367214	Private	-0.720198	HS-grad	-0.417870	Never-married	Prof-specialty	Own-child	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
34187	0.181564	Private	0.608356	11th	-1.193092	Married-civ-spouse	Prof-specialty	Wife	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34188	-1.367214	Private	1.113276	HS-grad	-0.417870	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K

34189 rows × 15 columns

[23]:

cleaned_test_df

[23]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
34189	0.181564	Self-emp-not-inc	0.704744	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
34190	0.181564	State-gov	-1.275191	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34191	-1.367214	Private	-0.147766	Assoc-voc	0.357352	Never-married	Other-service	Own-child	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34192	0.955953	State-gov	0.655990	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34193	0.955953	Private	0.818617	HS-grad	-0.417870	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34194	-1.367214	Private	-0.335985	Some-college	-0.030259	Never-married	Sales	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34195	-0.592825	Local-gov	0.197621	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Other-relative	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34196	-0.592825	Private	1.407546	Some-college	-0.030259	Divorced	Adm-clerical	Unmarried	Black	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34197	-0.592825	State-gov	0.149067	Bachelors	1.132573	Never-married	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34198	-1.367214	Private	-0.020718	Some-college	-0.030259	Separated	Other-service	Own-child	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34199	-0.592825	Private	-0.342117	9th	-1.968313	Separated	Craft-repair	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34200	-0.592825	Local-gov	-0.376300	Some-college	-0.030259	Divorced	Adm-clerical	Unmarried	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
34201	0.181564	Private	1.987078	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34202	-1.367214	Private	0.042399	Some-college	-0.030259	Never-married	Prof-specialty	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34203	0.181564	Private	-1.382011	Assoc-acdm	0.744962	Married-spouse-absent	Adm-clerical	Other-relative	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	0.955953	Private	0.332483	HS-grad	-0.417870	Separated	Priv-house-serv	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
48828	0.181564	Private	0.549787	Assoc-voc	0.357352	Never-married	Adm-clerical	Unmarried	Black	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48829	1.730342	Private	0.978500	Assoc-acdm	0.744962	Divorced	Prof-specialty	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48830	-0.592825	Private	-0.153595	HS-grad	-0.417870	Married-civ-spouse	Handlers-cleaners	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48831	0.955953	Private	0.910723	HS-grad	-0.417870	Married-civ-spouse	Adm-clerical	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48832	1.730342	Private	-0.948722	HS-grad	-0.417870	Married-civ-spouse	Sales	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
48833	-0.592825	Private	2.377888	HS-grad	-0.417870	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48834	-1.367214	Private	1.531605	HS-grad	-0.417870	Never-married	Other-service	Own-child	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48835	0.955953	Local-gov	1.515021	Masters	1.520184	Divorced	Other-service	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48836	-0.592825	Private	0.527612	Bachelors	1.132573	Never-married	Prof-specialty	Own-child	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48837	0.181564	Private	0.244809	Bachelors	1.132573	Divorced	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48838	1.730342	Private	1.250871	HS-grad	-0.417870	Widowed	Prof-specialty	Other-relative	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48839	0.181564	Private	1.759484	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
48840	0.181564	Private	-1.003732	Bachelors	1.132573	Divorced	Adm-clerical	Own-child	Asian-Pac-Islander	Male	2.380648	-0.206016	0.053470	United-States	<=50K
48841	-0.592825	Self-emp-inc	-0.071019	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	>50K

14653 rows × 15 columns

[24]:

cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
                                                cat_imputation="drop",
                                                cat_encoding="no_encoding", cat_null_value=['?'])

[25]:

cleaned_training_df

[25]:

	age	fnlwgt	education	education-num	marital-status	relationship	race	sex	capitalgain	capitalloss	hoursperweek	class
0	0.181564	-1.064247	Bachelors	1.132573	Never-married	Not-in-family	White	Male	1.054765	-0.206016	0.053470	<=50K
1	0.955953	-1.009237	Bachelors	1.132573	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	-2.185441	<=50K
2	0.181564	0.246964	HS-grad	-0.417870	Divorced	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	<=50K
3	0.955953	0.428035	11th	-1.193092	Married-civ-spouse	Husband	Black	Male	-0.271118	-0.206016	0.053470	<=50K
4	-0.592825	1.412302	Bachelors	1.132573	Married-civ-spouse	Wife	Black	Female	-0.271118	-0.206016	0.053470	<=50K
5	0.181564	0.901345	Masters	1.520184	Married-civ-spouse	Wife	White	Female	-0.271118	-0.206016	0.053470	<=50K
6	0.955953	-0.279485	9th	-1.968313	Married-spouse-absent	Not-in-family	Black	Female	-0.271118	-0.206016	-2.185441	<=50K
7	0.955953	0.189970	HS-grad	-0.417870	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	>50K
8	-0.592825	-1.365494	Masters	1.520184	Never-married	Not-in-family	White	Female	5.032415	-0.206016	1.172925	>50K
9	0.181564	-0.286491	Bachelors	1.132573	Married-civ-spouse	Husband	White	Male	2.380648	-0.206016	0.053470	>50K
10	0.181564	0.862254	Some-college	-0.030259	Married-civ-spouse	Husband	Black	Male	-0.271118	-0.206016	2.292380	>50K
11	-0.592825	-0.458800	Bachelors	1.132573	Married-civ-spouse	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	>50K
12	-1.367214	-0.639397	Bachelors	1.132573	Never-married	Own-child	White	Female	-0.271118	-0.206016	-1.065986	<=50K
13	-0.592825	0.146086	Assoc-acdm	0.744962	Never-married	Not-in-family	Black	Male	-0.271118	-0.206016	1.172925	<=50K
14	0.181564	-0.644143	Assoc-voc	0.357352	Married-civ-spouse	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...
34174	0.181564	-0.151677	Some-college	-0.030259	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	>50K
34175	0.955953	-0.382480	HS-grad	-0.417870	Separated	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	<=50K
34176	1.730342	-0.407759	HS-grad	-0.417870	Married-civ-spouse	Husband	Black	Male	-0.271118	-0.206016	0.053470	>50K
34177	1.730342	-0.153272	Bachelors	1.132573	Divorced	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	<=50K
34178	-1.367214	0.323123	11th	-1.193092	Never-married	Own-child	White	Male	-0.271118	-0.206016	-2.185441	<=50K
34179	0.955953	-0.070743	Some-college	-0.030259	Divorced	Unmarried	White	Female	-0.271118	-0.206016	-1.065986	<=50K
34180	-1.367214	-0.761452	Some-college	-0.030259	Never-married	Other-relative	Asian-Pac-Islander	Male	-0.271118	-0.206016	-2.185441	<=50K
34181	0.955953	-0.367482	Some-college	-0.030259	Married-civ-spouse	Husband	White	Male	-0.271118	5.199568	0.053470	>50K
34182	1.730342	-1.428648	HS-grad	-0.417870	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	-1.065986	<=50K
34183	0.955953	-0.817212	Bachelors	1.132573	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	1.172925	>50K
34184	1.730342	-0.753877	HS-grad	-0.417870	Married-civ-spouse	Husband	Black	Male	-0.271118	-0.206016	0.053470	<=50K
34185	0.181564	0.311551	HS-grad	-0.417870	Never-married	Not-in-family	White	Male	-0.271118	7.001429	0.053470	<=50K
34186	-1.367214	-0.720198	HS-grad	-0.417870	Never-married	Own-child	White	Female	-0.271118	-0.206016	0.053470	<=50K
34187	0.181564	0.608356	11th	-1.193092	Married-civ-spouse	Wife	White	Female	-0.271118	-0.206016	-2.185441	<=50K
34188	-1.367214	1.113276	HS-grad	-0.417870	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	<=50K

34189 rows × 12 columns

[26]:

cleaned_test_df

[26]:

	age	fnlwgt	education	education-num	marital-status	relationship	race	sex	capitalgain	capitalloss	hoursperweek	class
34189	0.181564	0.704744	Some-college	-0.030259	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	1.172925	<=50K
34190	0.181564	-1.275191	Bachelors	1.132573	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	>50K
34191	-1.367214	-0.147766	Assoc-voc	0.357352	Never-married	Own-child	White	Female	-0.271118	-0.206016	-2.185441	<=50K
34192	0.955953	0.655990	Some-college	-0.030259	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	>50K
34193	0.955953	0.818617	HS-grad	-0.417870	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	<=50K
34194	-1.367214	-0.335985	Some-college	-0.030259	Never-married	Own-child	White	Female	-0.271118	-0.206016	-1.065986	<=50K
34195	-0.592825	0.197621	Some-college	-0.030259	Married-civ-spouse	Other-relative	White	Male	-0.271118	-0.206016	0.053470	<=50K
34196	-0.592825	1.407546	Some-college	-0.030259	Divorced	Unmarried	Black	Female	-0.271118	-0.206016	-1.065986	<=50K
34197	-0.592825	0.149067	Bachelors	1.132573	Never-married	Not-in-family	White	Female	-0.271118	-0.206016	-2.185441	<=50K
34198	-1.367214	-0.020718	Some-college	-0.030259	Separated	Own-child	White	Male	-0.271118	-0.206016	0.053470	<=50K
34199	-0.592825	-0.342117	9th	-1.968313	Separated	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	<=50K
34200	-0.592825	-0.376300	Some-college	-0.030259	Divorced	Unmarried	White	Female	-0.271118	-0.206016	0.053470	<=50K
34201	0.181564	1.987078	Some-college	-0.030259	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	<=50K
34202	-1.367214	0.042399	Some-college	-0.030259	Never-married	Own-child	White	Female	-0.271118	-0.206016	-1.065986	<=50K
34203	0.181564	-1.382011	Assoc-acdm	0.744962	Married-spouse-absent	Other-relative	White	Male	-0.271118	-0.206016	1.172925	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...
48827	0.955953	0.332483	HS-grad	-0.417870	Separated	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	<=50K
48828	0.181564	0.549787	Assoc-voc	0.357352	Never-married	Unmarried	Black	Female	-0.271118	-0.206016	0.053470	<=50K
48829	1.730342	0.978500	Assoc-acdm	0.744962	Divorced	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	<=50K
48830	-0.592825	-0.153595	HS-grad	-0.417870	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	<=50K
48831	0.955953	0.910723	HS-grad	-0.417870	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	<=50K
48832	1.730342	-0.948722	HS-grad	-0.417870	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	1.172925	<=50K
48833	-0.592825	2.377888	HS-grad	-0.417870	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	0.053470	<=50K
48834	-1.367214	1.531605	HS-grad	-0.417870	Never-married	Own-child	White	Female	-0.271118	-0.206016	0.053470	<=50K
48835	0.955953	1.515021	Masters	1.520184	Divorced	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	<=50K
48836	-0.592825	0.527612	Bachelors	1.132573	Never-married	Own-child	White	Male	-0.271118	-0.206016	0.053470	<=50K
48837	0.181564	0.244809	Bachelors	1.132573	Divorced	Not-in-family	White	Female	-0.271118	-0.206016	0.053470	<=50K
48838	1.730342	1.250871	HS-grad	-0.417870	Widowed	Other-relative	Black	Male	-0.271118	-0.206016	0.053470	<=50K
48839	0.181564	1.759484	Bachelors	1.132573	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	1.172925	<=50K
48840	0.181564	-1.003732	Bachelors	1.132573	Divorced	Own-child	Asian-Pac-Islander	Male	2.380648	-0.206016	0.053470	<=50K
48841	-0.592825	-0.071019	Bachelors	1.132573	Married-civ-spouse	Husband	White	Male	-0.271118	-0.206016	1.172925	>50K

14653 rows × 12 columns

3. `fill_val` parameter¶

By default, the filling value of categorical missing value is “missing value”. However, user can specify this string with whatever string they like, such as "missing", "NaN", "I'm a cat.", "Fyodor Dostoyevsky".

[30]:

cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
                                                cat_null_value=['?'], cat_encoding="no_encoding",
                                                fill_val="AHAHAHAHAHA!!!")

[31]:

cleaned_training_df

[31]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	0.181564	State-gov	-1.064247	Bachelors	1.132573	Never-married	Adm-clerical	Not-in-family	White	Male	1.054765	-0.206016	0.053470	United-States	<=50K
1	0.955953	Self-emp-not-inc	-1.009237	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	-2.185441	United-States	<=50K
2	0.181564	Private	0.246964	HS-grad	-0.417870	Divorced	Handlers-cleaners	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
3	0.955953	Private	0.428035	11th	-1.193092	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
4	-0.592825	Private	1.412302	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Wife	Black	Female	-0.271118	-0.206016	0.053470	Cuba	<=50K
5	0.181564	Private	0.901345	Masters	1.520184	Married-civ-spouse	Exec-managerial	Wife	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
6	0.955953	Private	-0.279485	9th	-1.968313	Married-spouse-absent	Other-service	Not-in-family	Black	Female	-0.271118	-0.206016	-2.185441	Jamaica	<=50K
7	0.955953	Self-emp-not-inc	0.189970	HS-grad	-0.417870	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
8	-0.592825	Private	-1.365494	Masters	1.520184	Never-married	Prof-specialty	Not-in-family	White	Female	5.032415	-0.206016	1.172925	United-States	>50K
9	0.181564	Private	-0.286491	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	2.380648	-0.206016	0.053470	United-States	>50K
10	0.181564	Private	0.862254	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	Black	Male	-0.271118	-0.206016	2.292380	United-States	>50K
11	-0.592825	State-gov	-0.458800	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	India	>50K
12	-1.367214	Private	-0.639397	Bachelors	1.132573	Never-married	Adm-clerical	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
13	-0.592825	Private	0.146086	Assoc-acdm	0.744962	Never-married	Sales	Not-in-family	Black	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
14	0.181564	Private	-0.644143	Assoc-voc	0.357352	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	AHAHAHAHAHA!!!	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
34174	0.181564	Private	-0.151677	Some-college	-0.030259	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34175	0.955953	Private	-0.382480	HS-grad	-0.417870	Separated	Handlers-cleaners	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34176	1.730342	Private	-0.407759	HS-grad	-0.417870	Married-civ-spouse	Other-service	Husband	Black	Male	-0.271118	-0.206016	0.053470	AHAHAHAHAHA!!!	>50K
34177	1.730342	Private	-0.153272	Bachelors	1.132573	Divorced	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34178	-1.367214	Private	0.323123	11th	-1.193092	Never-married	Other-service	Own-child	White	Male	-0.271118	-0.206016	-2.185441	United-States	<=50K
34179	0.955953	Private	-0.070743	Some-college	-0.030259	Divorced	Protective-serv	Unmarried	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34180	-1.367214	Private	-0.761452	Some-college	-0.030259	Never-married	Sales	Other-relative	Asian-Pac-Islander	Male	-0.271118	-0.206016	-2.185441	India	<=50K
34181	0.955953	Self-emp-inc	-0.367482	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	5.199568	0.053470	United-States	>50K
34182	1.730342	Self-emp-not-inc	-1.428648	HS-grad	-0.417870	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	-1.065986	United-States	<=50K
34183	0.955953	Local-gov	-0.817212	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	>50K
34184	1.730342	Private	-0.753877	HS-grad	-0.417870	Married-civ-spouse	Other-service	Husband	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34185	0.181564	Private	0.311551	HS-grad	-0.417870	Never-married	Sales	Not-in-family	White	Male	-0.271118	7.001429	0.053470	El-Salvador	<=50K
34186	-1.367214	AHAHAHAHAHA!!!	-0.720198	HS-grad	-0.417870	Never-married	AHAHAHAHAHA!!!	Own-child	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
34187	0.181564	AHAHAHAHAHA!!!	0.608356	11th	-1.193092	Married-civ-spouse	AHAHAHAHAHA!!!	Wife	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34188	-1.367214	Private	1.113276	HS-grad	-0.417870	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K

34189 rows × 15 columns

[32]:

cleaned_test_df

[32]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
34189	0.181564	Self-emp-not-inc	0.704744	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
34190	0.181564	State-gov	-1.275191	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34191	-1.367214	Private	-0.147766	Assoc-voc	0.357352	Never-married	Other-service	Own-child	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34192	0.955953	State-gov	0.655990	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34193	0.955953	Private	0.818617	HS-grad	-0.417870	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34194	-1.367214	Private	-0.335985	Some-college	-0.030259	Never-married	Sales	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34195	-0.592825	Local-gov	0.197621	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Other-relative	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34196	-0.592825	Private	1.407546	Some-college	-0.030259	Divorced	Adm-clerical	Unmarried	Black	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34197	-0.592825	State-gov	0.149067	Bachelors	1.132573	Never-married	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34198	-1.367214	Private	-0.020718	Some-college	-0.030259	Separated	Other-service	Own-child	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34199	-0.592825	Private	-0.342117	9th	-1.968313	Separated	Craft-repair	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34200	-0.592825	Local-gov	-0.376300	Some-college	-0.030259	Divorced	Adm-clerical	Unmarried	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
34201	0.181564	Private	1.987078	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34202	-1.367214	AHAHAHAHAHA!!!	0.042399	Some-college	-0.030259	Never-married	AHAHAHAHAHA!!!	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34203	0.181564	Private	-1.382011	Assoc-acdm	0.744962	Married-spouse-absent	Adm-clerical	Other-relative	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	0.955953	Private	0.332483	HS-grad	-0.417870	Separated	Priv-house-serv	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
48828	0.181564	Private	0.549787	Assoc-voc	0.357352	Never-married	Adm-clerical	Unmarried	Black	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48829	1.730342	Private	0.978500	Assoc-acdm	0.744962	Divorced	Prof-specialty	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48830	-0.592825	Private	-0.153595	HS-grad	-0.417870	Married-civ-spouse	Handlers-cleaners	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48831	0.955953	Private	0.910723	HS-grad	-0.417870	Married-civ-spouse	Adm-clerical	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48832	1.730342	Private	-0.948722	HS-grad	-0.417870	Married-civ-spouse	Sales	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
48833	-0.592825	Private	2.377888	HS-grad	-0.417870	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48834	-1.367214	Private	1.531605	HS-grad	-0.417870	Never-married	Other-service	Own-child	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48835	0.955953	Local-gov	1.515021	Masters	1.520184	Divorced	Other-service	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48836	-0.592825	Private	0.527612	Bachelors	1.132573	Never-married	Prof-specialty	Own-child	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48837	0.181564	Private	0.244809	Bachelors	1.132573	Divorced	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48838	1.730342	AHAHAHAHAHA!!!	1.250871	HS-grad	-0.417870	Widowed	AHAHAHAHAHA!!!	Other-relative	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48839	0.181564	Private	1.759484	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
48840	0.181564	Private	-1.003732	Bachelors	1.132573	Divorced	Adm-clerical	Own-child	Asian-Pac-Islander	Male	2.380648	-0.206016	0.053470	United-States	<=50K
48841	-0.592825	Self-emp-inc	-0.071019	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	>50K

14653 rows × 15 columns

4. `num_imputation` and `num_null_value` parameter¶

There are three choices for num_imputation parameter: * mean: filling the missing value with mean value of this column. * meduab: filling the missing value with median value of this column. * most_frequent: filling the missing value with most frequent value of this column. * drop: drop this column if there are missing values.

The default null values are same to the null values metioned in cat_imputation parameter.

The imputing process is quite similar with the cat_imputation parameter section. Thus, we don’t show redundant examples here.

num_null_value parameter is a list including user-specified null values. The element in this list can be any type. For example: * [‘?’] * [‘abc’, np.nan, ‘?’, 1265]

The usage of num_null_value parameter is same to cat_null_value parameter. Thus we don’t show redundant examples here.

5. `cat_encoding` parameter¶

There are three choices for cat_encoding parameter: * no_encoding: don’t do any encoding for categorical columns. * one_hot: do one_hot encoding for categorical columns.

The default value is one_hot.

[36]:

cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class", cat_encoding="no_encoding")

[37]:

cleaned_training_df

[37]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	0.181564	State-gov	-1.064247	Bachelors	1.132573	Never-married	Adm-clerical	Not-in-family	White	Male	1.054765	-0.206016	0.053470	United-States	<=50K
1	0.955953	Self-emp-not-inc	-1.009237	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	-2.185441	United-States	<=50K
2	0.181564	Private	0.246964	HS-grad	-0.417870	Divorced	Handlers-cleaners	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
3	0.955953	Private	0.428035	11th	-1.193092	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
4	-0.592825	Private	1.412302	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Wife	Black	Female	-0.271118	-0.206016	0.053470	Cuba	<=50K
5	0.181564	Private	0.901345	Masters	1.520184	Married-civ-spouse	Exec-managerial	Wife	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
6	0.955953	Private	-0.279485	9th	-1.968313	Married-spouse-absent	Other-service	Not-in-family	Black	Female	-0.271118	-0.206016	-2.185441	Jamaica	<=50K
7	0.955953	Self-emp-not-inc	0.189970	HS-grad	-0.417870	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
8	-0.592825	Private	-1.365494	Masters	1.520184	Never-married	Prof-specialty	Not-in-family	White	Female	5.032415	-0.206016	1.172925	United-States	>50K
9	0.181564	Private	-0.286491	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	2.380648	-0.206016	0.053470	United-States	>50K
10	0.181564	Private	0.862254	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	Black	Male	-0.271118	-0.206016	2.292380	United-States	>50K
11	-0.592825	State-gov	-0.458800	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	India	>50K
12	-1.367214	Private	-0.639397	Bachelors	1.132573	Never-married	Adm-clerical	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
13	-0.592825	Private	0.146086	Assoc-acdm	0.744962	Never-married	Sales	Not-in-family	Black	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
14	0.181564	Private	-0.644143	Assoc-voc	0.357352	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	-0.271118	-0.206016	0.053470	?	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
34174	0.181564	Private	-0.151677	Some-college	-0.030259	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34175	0.955953	Private	-0.382480	HS-grad	-0.417870	Separated	Handlers-cleaners	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34176	1.730342	Private	-0.407759	HS-grad	-0.417870	Married-civ-spouse	Other-service	Husband	Black	Male	-0.271118	-0.206016	0.053470	?	>50K
34177	1.730342	Private	-0.153272	Bachelors	1.132573	Divorced	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34178	-1.367214	Private	0.323123	11th	-1.193092	Never-married	Other-service	Own-child	White	Male	-0.271118	-0.206016	-2.185441	United-States	<=50K
34179	0.955953	Private	-0.070743	Some-college	-0.030259	Divorced	Protective-serv	Unmarried	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34180	-1.367214	Private	-0.761452	Some-college	-0.030259	Never-married	Sales	Other-relative	Asian-Pac-Islander	Male	-0.271118	-0.206016	-2.185441	India	<=50K
34181	0.955953	Self-emp-inc	-0.367482	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	5.199568	0.053470	United-States	>50K
34182	1.730342	Self-emp-not-inc	-1.428648	HS-grad	-0.417870	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	-1.065986	United-States	<=50K
34183	0.955953	Local-gov	-0.817212	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	>50K
34184	1.730342	Private	-0.753877	HS-grad	-0.417870	Married-civ-spouse	Other-service	Husband	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34185	0.181564	Private	0.311551	HS-grad	-0.417870	Never-married	Sales	Not-in-family	White	Male	-0.271118	7.001429	0.053470	El-Salvador	<=50K
34186	-1.367214	?	-0.720198	HS-grad	-0.417870	Never-married	?	Own-child	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
34187	0.181564	?	0.608356	11th	-1.193092	Married-civ-spouse	?	Wife	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34188	-1.367214	Private	1.113276	HS-grad	-0.417870	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K

34189 rows × 15 columns

[38]:

cleaned_test_df

[38]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
34189	0.181564	Self-emp-not-inc	0.704744	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
34190	0.181564	State-gov	-1.275191	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34191	-1.367214	Private	-0.147766	Assoc-voc	0.357352	Never-married	Other-service	Own-child	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34192	0.955953	State-gov	0.655990	Some-college	-0.030259	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	>50K
34193	0.955953	Private	0.818617	HS-grad	-0.417870	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34194	-1.367214	Private	-0.335985	Some-college	-0.030259	Never-married	Sales	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34195	-0.592825	Local-gov	0.197621	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Other-relative	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34196	-0.592825	Private	1.407546	Some-college	-0.030259	Divorced	Adm-clerical	Unmarried	Black	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34197	-0.592825	State-gov	0.149067	Bachelors	1.132573	Never-married	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	-2.185441	United-States	<=50K
34198	-1.367214	Private	-0.020718	Some-college	-0.030259	Separated	Other-service	Own-child	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34199	-0.592825	Private	-0.342117	9th	-1.968313	Separated	Craft-repair	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34200	-0.592825	Local-gov	-0.376300	Some-college	-0.030259	Divorced	Adm-clerical	Unmarried	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
34201	0.181564	Private	1.987078	Some-college	-0.030259	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
34202	-1.367214	?	0.042399	Some-college	-0.030259	Never-married	?	Own-child	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
34203	0.181564	Private	-1.382011	Assoc-acdm	0.744962	Married-spouse-absent	Adm-clerical	Other-relative	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	0.955953	Private	0.332483	HS-grad	-0.417870	Separated	Priv-house-serv	Not-in-family	White	Female	-0.271118	-0.206016	-1.065986	United-States	<=50K
48828	0.181564	Private	0.549787	Assoc-voc	0.357352	Never-married	Adm-clerical	Unmarried	Black	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48829	1.730342	Private	0.978500	Assoc-acdm	0.744962	Divorced	Prof-specialty	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48830	-0.592825	Private	-0.153595	HS-grad	-0.417870	Married-civ-spouse	Handlers-cleaners	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48831	0.955953	Private	0.910723	HS-grad	-0.417870	Married-civ-spouse	Adm-clerical	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48832	1.730342	Private	-0.948722	HS-grad	-0.417870	Married-civ-spouse	Sales	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
48833	-0.592825	Private	2.377888	HS-grad	-0.417870	Married-civ-spouse	Craft-repair	Husband	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48834	-1.367214	Private	1.531605	HS-grad	-0.417870	Never-married	Other-service	Own-child	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48835	0.955953	Local-gov	1.515021	Masters	1.520184	Divorced	Other-service	Not-in-family	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48836	-0.592825	Private	0.527612	Bachelors	1.132573	Never-married	Prof-specialty	Own-child	White	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48837	0.181564	Private	0.244809	Bachelors	1.132573	Divorced	Prof-specialty	Not-in-family	White	Female	-0.271118	-0.206016	0.053470	United-States	<=50K
48838	1.730342	?	1.250871	HS-grad	-0.417870	Widowed	?	Other-relative	Black	Male	-0.271118	-0.206016	0.053470	United-States	<=50K
48839	0.181564	Private	1.759484	Bachelors	1.132573	Married-civ-spouse	Prof-specialty	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	<=50K
48840	0.181564	Private	-1.003732	Bachelors	1.132573	Divorced	Adm-clerical	Own-child	Asian-Pac-Islander	Male	2.380648	-0.206016	0.053470	United-States	<=50K
48841	-0.592825	Self-emp-inc	-0.071019	Bachelors	1.132573	Married-civ-spouse	Exec-managerial	Husband	White	Male	-0.271118	-0.206016	1.172925	United-States	>50K

14653 rows × 15 columns

[39]:

cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class", cat_encoding="one_hot")

[40]:

cleaned_training_df

[40]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	0.181564	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.064247	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	1.054765	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
1	0.955953	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.009237	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
2	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.246964	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
3	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.428035	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-1.193092	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
4	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.412302	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
5	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.901345	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.520184	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
6	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.279485	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	-1.968313	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-2.185441	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
7	0.955953	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.189970	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
8	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.365494	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.520184	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	5.032415	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
9	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.286491	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	2.380648	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
10	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.862254	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	2.292380	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
11	-0.592825	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.458800	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
12	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.639397	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
13	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.146086	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	0.744962	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
14	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.644143	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...	0.357352	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
34174	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.151677	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34175	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.382480	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34176	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.407759	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34177	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.153272	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34178	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.323123	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-1.193092	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34179	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.070743	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34180	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.761452	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	-2.185441	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34181	0.955953	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	-0.367482	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	5.199568	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34182	1.730342	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.428648	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34183	0.955953	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	-0.817212	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34184	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.753877	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34185	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.311551	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	7.001429	0.053470	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34186	-1.367214	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	-0.720198	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34187	0.181564	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	0.608356	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-1.193092	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34188	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.113276	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K

34189 rows × 15 columns

[41]:

cleaned_test_df

[41]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
34189	0.181564	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.704744	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34190	0.181564	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.275191	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34191	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.147766	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...	0.357352	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34192	0.955953	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.655990	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34193	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.818617	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34194	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.335985	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34195	-0.592825	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	0.197621	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34196	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.407546	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34197	-0.592825	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.149067	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-2.185441	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34198	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.020718	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34199	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.342117	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	-1.968313	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34200	-0.592825	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	-0.376300	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34201	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.987078	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34202	-1.367214	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	0.042399	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34203	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.382011	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	0.744962	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.332483	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	-1.065986	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48828	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.549787	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...	0.357352	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48829	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.978500	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	0.744962	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48830	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.153595	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48831	0.955953	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.910723	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48832	1.730342	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.948722	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48833	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	2.377888	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48834	-1.367214	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.531605	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48835	0.955953	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	1.515021	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.520184	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48836	-0.592825	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.527612	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48837	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.244809	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48838	1.730342	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	1.250871	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48839	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.759484	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48840	0.181564	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.003732	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	2.380648	-0.206016	0.053470	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48841	-0.592825	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	-0.071019	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	-0.271118	-0.206016	1.172925	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K

14653 rows × 15 columns

6. `variance_threshold` and `variance` parameter¶

There are two choices for variance_threshold parameter: * True: filtering numerical columns whose variance is less than the variance value. * False: do nothing

The default variance_threshold is False.

The default variance is 0.0.

[42]:

cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
                                                variance_threshold=True, variance=6.0)

[43]:

cleaned_training_df

[43]:

	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	native-country	class
0	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.064247	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
1	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.009237	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
2	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.246964	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
3	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.428035	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-1.193092	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
4	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.412302	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
5	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.901345	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.520184	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
6	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.279485	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	-1.968313	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
7	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.189970	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
8	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.365494	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.520184	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
9	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.286491	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
10	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.862254	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
11	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.458800	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
12	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.639397	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
13	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.146086	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	0.744962	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
14	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.644143	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...	0.357352	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
...	...	...	...	...	...	...	...	...	...	...	...
34174	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.151677	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34175	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.382480	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34176	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.407759	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34177	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.153272	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34178	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.323123	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-1.193092	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34179	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.070743	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34180	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.761452	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34181	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	-0.367482	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34182	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.428648	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34183	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	-0.817212	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34184	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.753877	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34185	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.311551	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34186	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	-0.720198	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34187	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	0.608356	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-1.193092	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34188	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.113276	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K

34189 rows × 11 columns

[44]:

cleaned_test_df

[44]:

	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	native-country	class
34189	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.704744	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34190	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.275191	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34191	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.147766	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...	0.357352	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34192	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.655990	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K
34193	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.818617	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34194	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.335985	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34195	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	0.197621	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34196	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.407546	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34197	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.149067	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34198	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.020718	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34199	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.342117	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	-1.968313	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34200	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	-0.376300	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34201	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.987078	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34202	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	0.042399	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	-0.030259	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
34203	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.382011	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	0.744962	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
...	...	...	...	...	...	...	...	...	...	...	...
48827	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.332483	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48828	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.549787	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...	0.357352	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48829	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.978500	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	0.744962	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48830	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.153595	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48831	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.910723	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48832	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.948722	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48833	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	2.377888	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48834	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.531605	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48835	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	1.515021	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.520184	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48836	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.527612	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48837	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	0.244809	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48838	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]	1.250871	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	-0.417870	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]	[0.0, 1.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48839	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	1.759484	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48840	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-1.003732	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	[0.0, 0.0, 1.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	<=50K
48841	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]	-0.071019	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	1.132573	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0]	[1.0, 0.0]	[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	>50K

14653 rows × 11 columns

7. `num_scaling` parameter¶

There are three choices for num_scaling parameter: * standardize: standarding each numerical column with mean value and std value of this column. The transformation is (x - mean) / std. * minmax: scaling each numerical column with min value and max value of this column. The transformation is (x - min) / (max - min) * maxabs: scaling each numerical column with max absolute value of this column. The transformation is x / maxabs.

The default num_scaling is standardize.

[55]:

cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
                                                cat_encoding='no_encoding',
                                                num_scaling="minmax")

[56]:

cleaned_training_df

[56]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	0.50	State-gov	0.044302	Bachelors	0.800000	Never-married	Adm-clerical	Not-in-family	White	Male	0.25	0.00	0.50	United-States	<=50K
1	0.75	Self-emp-not-inc	0.048238	Bachelors	0.800000	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.00	0.00	0.00	United-States	<=50K
2	0.50	Private	0.138113	HS-grad	0.533333	Divorced	Handlers-cleaners	Not-in-family	White	Male	0.00	0.00	0.50	United-States	<=50K
3	0.75	Private	0.151068	11th	0.400000	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0.00	0.00	0.50	United-States	<=50K
4	0.25	Private	0.221488	Bachelors	0.800000	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0.00	0.00	0.50	Cuba	<=50K
5	0.50	Private	0.184932	Masters	0.866667	Married-civ-spouse	Exec-managerial	Wife	White	Female	0.00	0.00	0.50	United-States	<=50K
6	0.75	Private	0.100448	9th	0.266667	Married-spouse-absent	Other-service	Not-in-family	Black	Female	0.00	0.00	0.00	Jamaica	<=50K
7	0.75	Self-emp-not-inc	0.134036	HS-grad	0.533333	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.00	0.00	0.50	United-States	>50K
8	0.25	Private	0.022749	Masters	0.866667	Never-married	Prof-specialty	Not-in-family	White	Female	1.00	0.00	0.75	United-States	>50K
9	0.50	Private	0.099947	Bachelors	0.800000	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.50	0.00	0.50	United-States	>50K
10	0.50	Private	0.182135	Some-college	0.600000	Married-civ-spouse	Exec-managerial	Husband	Black	Male	0.00	0.00	1.00	United-States	>50K
11	0.25	State-gov	0.087619	Bachelors	0.800000	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	0.00	0.00	0.50	India	>50K
12	0.00	Private	0.074698	Bachelors	0.800000	Never-married	Adm-clerical	Own-child	White	Female	0.00	0.00	0.25	United-States	<=50K
13	0.25	Private	0.130896	Assoc-acdm	0.733333	Never-married	Sales	Not-in-family	Black	Male	0.00	0.00	0.75	United-States	<=50K
14	0.50	Private	0.074359	Assoc-voc	0.666667	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	0.00	0.00	0.50	?	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
34174	0.50	Private	0.109592	Some-college	0.600000	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.00	0.00	0.50	United-States	>50K
34175	0.75	Private	0.093079	HS-grad	0.533333	Separated	Handlers-cleaners	Not-in-family	White	Male	0.00	0.00	0.50	United-States	<=50K
34176	1.00	Private	0.091271	HS-grad	0.533333	Married-civ-spouse	Other-service	Husband	Black	Male	0.00	0.00	0.50	?	>50K
34177	1.00	Private	0.109478	Bachelors	0.800000	Divorced	Prof-specialty	Not-in-family	White	Female	0.00	0.00	0.25	United-States	<=50K
34178	0.00	Private	0.143562	11th	0.400000	Never-married	Other-service	Own-child	White	Male	0.00	0.00	0.00	United-States	<=50K
34179	0.75	Private	0.115383	Some-college	0.600000	Divorced	Protective-serv	Unmarried	White	Female	0.00	0.00	0.25	United-States	<=50K
34180	0.00	Private	0.065966	Some-college	0.600000	Never-married	Sales	Other-relative	Asian-Pac-Islander	Male	0.00	0.00	0.00	India	<=50K
34181	0.75	Self-emp-inc	0.094152	Some-college	0.600000	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.00	0.75	0.50	United-States	>50K
34182	1.00	Self-emp-not-inc	0.018231	HS-grad	0.533333	Married-civ-spouse	Craft-repair	Husband	White	Male	0.00	0.00	0.25	United-States	<=50K
34183	0.75	Local-gov	0.061976	Bachelors	0.800000	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.00	0.00	0.75	United-States	>50K
34184	1.00	Private	0.066508	HS-grad	0.533333	Married-civ-spouse	Other-service	Husband	Black	Male	0.00	0.00	0.50	United-States	<=50K
34185	0.50	Private	0.142734	HS-grad	0.533333	Never-married	Sales	Not-in-family	White	Male	0.00	1.00	0.50	El-Salvador	<=50K
34186	0.00	?	0.068917	HS-grad	0.533333	Never-married	?	Own-child	White	Female	0.00	0.00	0.50	United-States	<=50K
34187	0.50	?	0.163970	11th	0.400000	Married-civ-spouse	?	Wife	White	Female	0.00	0.00	0.00	United-States	<=50K
34188	0.00	Private	0.200094	HS-grad	0.533333	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	0.00	0.00	0.50	United-States	<=50K

34189 rows × 15 columns

[57]:

cleaned_test_df

[57]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
34189	0.50	Self-emp-not-inc	0.170866	Some-college	0.600000	Married-civ-spouse	Craft-repair	Husband	White	Male	0.0	0.0	0.75	United-States	<=50K
34190	0.50	State-gov	0.029210	Bachelors	0.800000	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.0	0.0	0.50	United-States	>50K
34191	0.00	Private	0.109872	Assoc-voc	0.666667	Never-married	Other-service	Own-child	White	Female	0.0	0.0	0.00	United-States	<=50K
34192	0.75	State-gov	0.167378	Some-college	0.600000	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.0	0.0	0.50	United-States	>50K
34193	0.75	Private	0.179013	HS-grad	0.533333	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
34194	0.00	Private	0.096406	Some-college	0.600000	Never-married	Sales	Own-child	White	Female	0.0	0.0	0.25	United-States	<=50K
34195	0.25	Local-gov	0.134583	Some-college	0.600000	Married-civ-spouse	Craft-repair	Other-relative	White	Male	0.0	0.0	0.50	United-States	<=50K
34196	0.25	Private	0.221148	Some-college	0.600000	Divorced	Adm-clerical	Unmarried	Black	Female	0.0	0.0	0.25	United-States	<=50K
34197	0.25	State-gov	0.131109	Bachelors	0.800000	Never-married	Prof-specialty	Not-in-family	White	Female	0.0	0.0	0.00	United-States	<=50K
34198	0.00	Private	0.118962	Some-college	0.600000	Separated	Other-service	Own-child	White	Male	0.0	0.0	0.50	United-States	<=50K
34199	0.25	Private	0.095967	9th	0.266667	Separated	Craft-repair	Not-in-family	White	Male	0.0	0.0	0.50	United-States	<=50K
34200	0.25	Local-gov	0.093522	Some-college	0.600000	Divorced	Adm-clerical	Unmarried	White	Female	0.0	0.0	0.50	United-States	<=50K
34201	0.50	Private	0.262611	Some-college	0.600000	Married-civ-spouse	Craft-repair	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
34202	0.00	?	0.123478	Some-college	0.600000	Never-married	?	Own-child	White	Female	0.0	0.0	0.25	United-States	<=50K
34203	0.50	Private	0.021567	Assoc-acdm	0.733333	Married-spouse-absent	Adm-clerical	Other-relative	White	Male	0.0	0.0	0.75	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	0.75	Private	0.144232	HS-grad	0.533333	Separated	Priv-house-serv	Not-in-family	White	Female	0.0	0.0	0.25	United-States	<=50K
48828	0.50	Private	0.159779	Assoc-voc	0.666667	Never-married	Adm-clerical	Unmarried	Black	Female	0.0	0.0	0.50	United-States	<=50K
48829	1.00	Private	0.190452	Assoc-acdm	0.733333	Divorced	Prof-specialty	Not-in-family	White	Male	0.0	0.0	0.50	United-States	<=50K
48830	0.25	Private	0.109455	HS-grad	0.533333	Married-civ-spouse	Handlers-cleaners	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
48831	0.75	Private	0.185603	HS-grad	0.533333	Married-civ-spouse	Adm-clerical	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
48832	1.00	Private	0.052567	HS-grad	0.533333	Married-civ-spouse	Sales	Husband	White	Male	0.0	0.0	0.75	United-States	<=50K
48833	0.25	Private	0.290572	HS-grad	0.533333	Married-civ-spouse	Craft-repair	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
48834	0.00	Private	0.230024	HS-grad	0.533333	Never-married	Other-service	Own-child	White	Female	0.0	0.0	0.50	United-States	<=50K
48835	0.75	Local-gov	0.228838	Masters	0.866667	Divorced	Other-service	Not-in-family	White	Male	0.0	0.0	0.50	United-States	<=50K
48836	0.25	Private	0.158193	Bachelors	0.800000	Never-married	Prof-specialty	Own-child	White	Male	0.0	0.0	0.50	United-States	<=50K
48837	0.50	Private	0.137959	Bachelors	0.800000	Divorced	Prof-specialty	Not-in-family	White	Female	0.0	0.0	0.50	United-States	<=50K
48838	1.00	?	0.209939	HS-grad	0.533333	Widowed	?	Other-relative	Black	Male	0.0	0.0	0.50	United-States	<=50K
48839	0.50	Private	0.246328	Bachelors	0.800000	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.0	0.0	0.75	United-States	<=50K
48840	0.50	Private	0.048632	Bachelors	0.800000	Divorced	Adm-clerical	Own-child	Asian-Pac-Islander	Male	0.5	0.0	0.50	United-States	<=50K
48841	0.25	Self-emp-inc	0.115363	Bachelors	0.800000	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.0	0.0	0.75	United-States	>50K

14653 rows × 15 columns

8. `include_operators` and `exclude_operators` parameter¶

The include_operators indicates which operator must be included in the cleaning pipeline. It is a list. For example: * ['one_hot', 'minmax', 'median', 'most_frequent']

The exclude_operators indicates which operator must be excluded in the cleaning pipeline. It has the same format with include_operators.

The valid choices for include_operators and exclude_operators: * one_hot * constant * most_frequent * drop * mean * median * standardize * minmax * maxabs

9. `customized_cat_pipeline` and `customized_num_pipeline` parameter¶

Experienced users can specify their own customized_cat_pipeline and customized_num_pipeline. The two parameters are lists including dictionaries of each component. Each compontent is also a dictionary including the name of specified operator and related parameters. For example: * [ {"cat_imputation": {"operator": 'constant', "cat_null_value": ['?'], "fill_val": "Hahahaha!!!!!"}}, ]

Users can also specifiy their own operators. They just need to define a typical class with the __init__ function, the fit, transform and fit_transform functions. When using them, the name of the class can be put at the operator’s position.

[58]:

from typing import Any, Union
import dask.dataframe as dd
import pandas as pd
import numpy as np

class MaxAbsScaler:
    def __init__(self) -> None:
        self.name = "minmaxScaler"

    def fit(self,
            df: pd.Series) -> Any:
        self.maxabs = df.abs().max()
        return self

    def transform(self,
            df: pd.Series) -> pd.Series:
        result = df.map(self.compute_val)
        return result

    def fit_transform(self,
            df: pd.Series) -> pd.Series:
        return  self.fit(df).transform(df)

    def compute_val(self, val):
        return val / self.maxabs

customized_cat_pipeline = [
    {"cat_imputation": {"operator": 'constant', "cat_null_value": ['?'], "fill_val": "Hahahaha!!!!!"}},
]
customized_num_pipeline = [
    {"num_scaling": {"operator": MaxAbsScaler}},
]
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, customized_cat_pipeline=customized_cat_pipeline, customized_num_pipeline=customized_num_pipeline)

[59]:

cleaned_training_df

[59]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
0	0.50	State-gov	0.052210	Bachelors	0.8125	Never-married	Adm-clerical	Not-in-family	White	Male	0.25	0.00	0.50	United-States	<=50K
1	0.75	Self-emp-not-inc	0.056113	Bachelors	0.8125	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.00	0.00	0.00	United-States	<=50K
2	0.50	Private	0.145245	HS-grad	0.5625	Divorced	Handlers-cleaners	Not-in-family	White	Male	0.00	0.00	0.50	United-States	<=50K
3	0.75	Private	0.158093	11th	0.4375	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0.00	0.00	0.50	United-States	<=50K
4	0.25	Private	0.227930	Bachelors	0.8125	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0.00	0.00	0.50	Cuba	<=50K
5	0.50	Private	0.191676	Masters	0.8750	Married-civ-spouse	Exec-managerial	Wife	White	Female	0.00	0.00	0.50	United-States	<=50K
6	0.75	Private	0.107891	9th	0.3125	Married-spouse-absent	Other-service	Not-in-family	Black	Female	0.00	0.00	0.00	Jamaica	<=50K
7	0.75	Self-emp-not-inc	0.141201	HS-grad	0.5625	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.00	0.00	0.50	United-States	>50K
8	0.25	Private	0.030835	Masters	0.8750	Never-married	Prof-specialty	Not-in-family	White	Female	1.00	0.00	0.75	United-States	>50K
9	0.50	Private	0.107394	Bachelors	0.8125	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.50	0.00	0.50	United-States	>50K
10	0.50	Private	0.188902	Some-college	0.6250	Married-civ-spouse	Exec-managerial	Husband	Black	Male	0.00	0.00	1.00	United-States	>50K
11	0.25	State-gov	0.095168	Bachelors	0.8125	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	0.00	0.00	0.50	India	>50K
12	0.00	Private	0.082354	Bachelors	0.8125	Never-married	Adm-clerical	Own-child	White	Female	0.00	0.00	0.25	United-States	<=50K
13	0.25	Private	0.138087	Assoc-acdm	0.7500	Never-married	Sales	Not-in-family	Black	Male	0.00	0.00	0.75	United-States	<=50K
14	0.50	Private	0.082018	Assoc-voc	0.6875	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	0.00	0.00	0.50	Hahahaha!!!!!	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
34174	0.50	Private	0.116960	Some-college	0.6250	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.00	0.00	0.50	United-States	>50K
34175	0.75	Private	0.100584	HS-grad	0.5625	Separated	Handlers-cleaners	Not-in-family	White	Male	0.00	0.00	0.50	United-States	<=50K
34176	1.00	Private	0.098790	HS-grad	0.5625	Married-civ-spouse	Other-service	Husband	Black	Male	0.00	0.00	0.50	Hahahaha!!!!!	>50K
34177	1.00	Private	0.116847	Bachelors	0.8125	Divorced	Prof-specialty	Not-in-family	White	Female	0.00	0.00	0.25	United-States	<=50K
34178	0.00	Private	0.150649	11th	0.4375	Never-married	Other-service	Own-child	White	Male	0.00	0.00	0.00	United-States	<=50K
34179	0.75	Private	0.122702	Some-college	0.6250	Divorced	Protective-serv	Unmarried	White	Female	0.00	0.00	0.25	United-States	<=50K
34180	0.00	Private	0.073694	Some-college	0.6250	Never-married	Sales	Other-relative	Asian-Pac-Islander	Male	0.00	0.00	0.00	India	<=50K
34181	0.75	Self-emp-inc	0.101648	Some-college	0.6250	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.00	0.75	0.50	United-States	>50K
34182	1.00	Self-emp-not-inc	0.026354	HS-grad	0.5625	Married-civ-spouse	Craft-repair	Husband	White	Male	0.00	0.00	0.25	United-States	<=50K
34183	0.75	Local-gov	0.069738	Bachelors	0.8125	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.00	0.00	0.75	United-States	>50K
34184	1.00	Private	0.074232	HS-grad	0.5625	Married-civ-spouse	Other-service	Husband	Black	Male	0.00	0.00	0.50	United-States	<=50K
34185	0.50	Private	0.149828	HS-grad	0.5625	Never-married	Sales	Not-in-family	White	Male	0.00	1.00	0.50	El-Salvador	<=50K
34186	0.00	Hahahaha!!!!!	0.076621	HS-grad	0.5625	Never-married	Hahahaha!!!!!	Own-child	White	Female	0.00	0.00	0.50	United-States	<=50K
34187	0.50	Hahahaha!!!!!	0.170887	11th	0.4375	Married-civ-spouse	Hahahaha!!!!!	Wife	White	Female	0.00	0.00	0.00	United-States	<=50K
34188	0.00	Private	0.206713	HS-grad	0.5625	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	0.00	0.00	0.50	United-States	<=50K

34189 rows × 15 columns

[60]:

cleaned_test_df

[60]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capitalgain	capitalloss	hoursperweek	native-country	class
34189	0.50	Self-emp-not-inc	0.177726	Some-college	0.6250	Married-civ-spouse	Craft-repair	Husband	White	Male	0.0	0.0	0.75	United-States	<=50K
34190	0.50	State-gov	0.037242	Bachelors	0.8125	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.0	0.0	0.50	United-States	>50K
34191	0.00	Private	0.117237	Assoc-voc	0.6875	Never-married	Other-service	Own-child	White	Female	0.0	0.0	0.00	United-States	<=50K
34192	0.75	State-gov	0.174267	Some-college	0.6250	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.0	0.0	0.50	United-States	>50K
34193	0.75	Private	0.185806	HS-grad	0.5625	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
34194	0.00	Private	0.103883	Some-college	0.6250	Never-married	Sales	Own-child	White	Female	0.0	0.0	0.25	United-States	<=50K
34195	0.25	Local-gov	0.141744	Some-college	0.6250	Married-civ-spouse	Craft-repair	Other-relative	White	Male	0.0	0.0	0.50	United-States	<=50K
34196	0.25	Private	0.227593	Some-college	0.6250	Divorced	Adm-clerical	Unmarried	Black	Female	0.0	0.0	0.25	United-States	<=50K
34197	0.25	State-gov	0.138299	Bachelors	0.8125	Never-married	Prof-specialty	Not-in-family	White	Female	0.0	0.0	0.00	United-States	<=50K
34198	0.00	Private	0.126252	Some-college	0.6250	Separated	Other-service	Own-child	White	Male	0.0	0.0	0.50	United-States	<=50K
34199	0.25	Private	0.103447	9th	0.3125	Separated	Craft-repair	Not-in-family	White	Male	0.0	0.0	0.50	United-States	<=50K
34200	0.25	Local-gov	0.101022	Some-college	0.6250	Divorced	Adm-clerical	Unmarried	White	Female	0.0	0.0	0.50	United-States	<=50K
34201	0.50	Private	0.268713	Some-college	0.6250	Married-civ-spouse	Craft-repair	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
34202	0.00	Hahahaha!!!!!	0.130730	Some-college	0.6250	Never-married	Hahahaha!!!!!	Own-child	White	Female	0.0	0.0	0.25	United-States	<=50K
34203	0.50	Private	0.029663	Assoc-acdm	0.7500	Married-spouse-absent	Adm-clerical	Other-relative	White	Male	0.0	0.0	0.75	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
48827	0.75	Private	0.151313	HS-grad	0.5625	Separated	Priv-house-serv	Not-in-family	White	Female	0.0	0.0	0.25	United-States	<=50K
48828	0.50	Private	0.166731	Assoc-voc	0.6875	Never-married	Adm-clerical	Unmarried	Black	Female	0.0	0.0	0.50	United-States	<=50K
48829	1.00	Private	0.197150	Assoc-acdm	0.7500	Divorced	Prof-specialty	Not-in-family	White	Male	0.0	0.0	0.50	United-States	<=50K
48830	0.25	Private	0.116824	HS-grad	0.5625	Married-civ-spouse	Handlers-cleaners	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
48831	0.75	Private	0.192341	HS-grad	0.5625	Married-civ-spouse	Adm-clerical	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
48832	1.00	Private	0.060407	HS-grad	0.5625	Married-civ-spouse	Sales	Husband	White	Male	0.0	0.0	0.75	United-States	<=50K
48833	0.25	Private	0.296442	HS-grad	0.5625	Married-civ-spouse	Craft-repair	Husband	White	Male	0.0	0.0	0.50	United-States	<=50K
48834	0.00	Private	0.236395	HS-grad	0.5625	Never-married	Other-service	Own-child	White	Female	0.0	0.0	0.50	United-States	<=50K
48835	0.75	Local-gov	0.235218	Masters	0.8750	Divorced	Other-service	Not-in-family	White	Male	0.0	0.0	0.50	United-States	<=50K
48836	0.25	Private	0.165158	Bachelors	0.8125	Never-married	Prof-specialty	Own-child	White	Male	0.0	0.0	0.50	United-States	<=50K
48837	0.50	Private	0.145092	Bachelors	0.8125	Divorced	Prof-specialty	Not-in-family	White	Female	0.0	0.0	0.50	United-States	<=50K
48838	1.00	Hahahaha!!!!!	0.216476	HS-grad	0.5625	Widowed	Hahahaha!!!!!	Other-relative	Black	Male	0.0	0.0	0.50	United-States	<=50K
48839	0.50	Private	0.252564	Bachelors	0.8125	Married-civ-spouse	Prof-specialty	Husband	White	Male	0.0	0.0	0.75	United-States	<=50K
48840	0.50	Private	0.056503	Bachelors	0.8125	Divorced	Adm-clerical	Own-child	Asian-Pac-Islander	Male	0.5	0.0	0.50	United-States	<=50K
48841	0.25	Self-emp-inc	0.122683	Bachelors	0.8125	Married-civ-spouse	Exec-managerial	Husband	White	Male	0.0	0.0	0.75	United-States	>50K

14653 rows × 15 columns

[ ]:

clean_ml(): Clean dataset for downstreaming machine learning tasks.¶

Introduction¶

An example dataset¶

Split the dataset as training dataframe and test dataframe¶

1. Default clean_ml()¶

2. cat_imputation and cat_null_value parameter¶

3. fill_val parameter¶

4. num_imputation and num_null_value parameter¶

5. cat_encoding parameter¶

6. variance_threshold and variance parameter¶

7. num_scaling parameter¶

8. include_operators and exclude_operators parameter¶

9. customized_cat_pipeline and customized_num_pipeline parameter¶

`clean_ml()`: Clean dataset for downstreaming machine learning tasks.¶

1. Default `clean_ml()`¶

2. `cat_imputation` and `cat_null_value` parameter¶

3. `fill_val` parameter¶

4. `num_imputation` and `num_null_value` parameter¶

5. `cat_encoding` parameter¶

6. `variance_threshold` and `variance` parameter¶

7. `num_scaling` parameter¶

8. `include_operators` and `exclude_operators` parameter¶

9. `customized_cat_pipeline` and `customized_num_pipeline` parameter¶