clean_ml()
: Clean dataset for downstreaming machine learning tasks.¶
Introduction¶
The function clean_ml()
cleans a dataset for downstreaming machine learning tasks with commonly used operators. It deals with categrical columns and numerical columns sperately. We set the default cleaning pipeline according to existing tools.
Currently, the supported components and operators are listed below:
cat_encoding
: encoding categrical columnsno_encoding
one_hot
cat_imputation
: imputing missing values in categorical columnsconstant
most_frequent
drop
num_imputataion
: imputing missing values in numerical columnsmean
median
most_frequent
drop
num_scaling
: scaling numerical columnsstandarize
minmax
maxabs
variance_threshold
: dropping numerical columns with low variance
Users can also specify include_operators
and exclude_operators
to include or exclude specified operators listed above. User can also customize the pipeline with user-defined operators.
An example dataset¶
The example dataset is a very traditional dataset adult. It has 48842 rows and 15 columns. In this dataset, ‘?’ means the missing values.
[4]:
import pandas as pd
pd.set_option('display.min_rows', 30)
df = pd.read_csv('adult.csv')
df
[4]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | State-gov | 77516 | Bachelors | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 1 | 0 | 2 | United-States | <=50K |
1 | 3 | Self-emp-not-inc | 83311 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 0 | United-States | <=50K |
2 | 2 | Private | 215646 | HS-grad | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 2 | United-States | <=50K |
3 | 3 | Private | 234721 | 11th | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 2 | United-States | <=50K |
4 | 1 | Private | 338409 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 2 | Cuba | <=50K |
5 | 2 | Private | 284582 | Masters | 14 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0 | 0 | 2 | United-States | <=50K |
6 | 3 | Private | 160187 | 9th | 5 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | 0 | 0 | 0 | Jamaica | <=50K |
7 | 3 | Self-emp-not-inc | 209642 | HS-grad | 9 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 2 | United-States | >50K |
8 | 1 | Private | 45781 | Masters | 14 | Never-married | Prof-specialty | Not-in-family | White | Female | 4 | 0 | 3 | United-States | >50K |
9 | 2 | Private | 159449 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 2 | 0 | 2 | United-States | >50K |
10 | 2 | Private | 280464 | Some-college | 10 | Married-civ-spouse | Exec-managerial | Husband | Black | Male | 0 | 0 | 4 | United-States | >50K |
11 | 1 | State-gov | 141297 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | 0 | 0 | 2 | India | >50K |
12 | 0 | Private | 122272 | Bachelors | 13 | Never-married | Adm-clerical | Own-child | White | Female | 0 | 0 | 1 | United-States | <=50K |
13 | 1 | Private | 205019 | Assoc-acdm | 12 | Never-married | Sales | Not-in-family | Black | Male | 0 | 0 | 3 | United-States | <=50K |
14 | 2 | Private | 121772 | Assoc-voc | 11 | Married-civ-spouse | Craft-repair | Husband | Asian-Pac-Islander | Male | 0 | 0 | 2 | ? | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 3 | Private | 224655 | HS-grad | 9 | Separated | Priv-house-serv | Not-in-family | White | Female | 0 | 0 | 1 | United-States | <=50K |
48828 | 2 | Private | 247547 | Assoc-voc | 11 | Never-married | Adm-clerical | Unmarried | Black | Female | 0 | 0 | 2 | United-States | <=50K |
48829 | 4 | Private | 292710 | Assoc-acdm | 12 | Divorced | Prof-specialty | Not-in-family | White | Male | 0 | 0 | 2 | United-States | <=50K |
48830 | 1 | Private | 173449 | HS-grad | 9 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | 0 | 0 | 2 | United-States | <=50K |
48831 | 3 | Private | 285570 | HS-grad | 9 | Married-civ-spouse | Adm-clerical | Husband | White | Male | 0 | 0 | 2 | United-States | <=50K |
48832 | 4 | Private | 89686 | HS-grad | 9 | Married-civ-spouse | Sales | Husband | White | Male | 0 | 0 | 3 | United-States | <=50K |
48833 | 1 | Private | 440129 | HS-grad | 9 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 2 | United-States | <=50K |
48834 | 0 | Private | 350977 | HS-grad | 9 | Never-married | Other-service | Own-child | White | Female | 0 | 0 | 2 | United-States | <=50K |
48835 | 3 | Local-gov | 349230 | Masters | 14 | Divorced | Other-service | Not-in-family | White | Male | 0 | 0 | 2 | United-States | <=50K |
48836 | 1 | Private | 245211 | Bachelors | 13 | Never-married | Prof-specialty | Own-child | White | Male | 0 | 0 | 2 | United-States | <=50K |
48837 | 2 | Private | 215419 | Bachelors | 13 | Divorced | Prof-specialty | Not-in-family | White | Female | 0 | 0 | 2 | United-States | <=50K |
48838 | 4 | ? | 321403 | HS-grad | 9 | Widowed | ? | Other-relative | Black | Male | 0 | 0 | 2 | United-States | <=50K |
48839 | 2 | Private | 374983 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0 | 0 | 3 | United-States | <=50K |
48840 | 2 | Private | 83891 | Bachelors | 13 | Divorced | Adm-clerical | Own-child | Asian-Pac-Islander | Male | 2 | 0 | 2 | United-States | <=50K |
48841 | 1 | Self-emp-inc | 182148 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 3 | United-States | >50K |
48842 rows × 15 columns
Split the dataset as training dataframe and test dataframe¶
[5]:
training_rate = 0.7
index = df.index
number_of_rows = len(index)
training_df = df.iloc[:int(training_rate * number_of_rows), :]
test_df = df.iloc[int(training_rate * number_of_rows):, :]
[6]:
training_df
[6]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | State-gov | 77516 | Bachelors | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 1 | 0 | 2 | United-States | <=50K |
1 | 3 | Self-emp-not-inc | 83311 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 0 | United-States | <=50K |
2 | 2 | Private | 215646 | HS-grad | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 2 | United-States | <=50K |
3 | 3 | Private | 234721 | 11th | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 2 | United-States | <=50K |
4 | 1 | Private | 338409 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 2 | Cuba | <=50K |
5 | 2 | Private | 284582 | Masters | 14 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0 | 0 | 2 | United-States | <=50K |
6 | 3 | Private | 160187 | 9th | 5 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | 0 | 0 | 0 | Jamaica | <=50K |
7 | 3 | Self-emp-not-inc | 209642 | HS-grad | 9 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 2 | United-States | >50K |
8 | 1 | Private | 45781 | Masters | 14 | Never-married | Prof-specialty | Not-in-family | White | Female | 4 | 0 | 3 | United-States | >50K |
9 | 2 | Private | 159449 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 2 | 0 | 2 | United-States | >50K |
10 | 2 | Private | 280464 | Some-college | 10 | Married-civ-spouse | Exec-managerial | Husband | Black | Male | 0 | 0 | 4 | United-States | >50K |
11 | 1 | State-gov | 141297 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | 0 | 0 | 2 | India | >50K |
12 | 0 | Private | 122272 | Bachelors | 13 | Never-married | Adm-clerical | Own-child | White | Female | 0 | 0 | 1 | United-States | <=50K |
13 | 1 | Private | 205019 | Assoc-acdm | 12 | Never-married | Sales | Not-in-family | Black | Male | 0 | 0 | 3 | United-States | <=50K |
14 | 2 | Private | 121772 | Assoc-voc | 11 | Married-civ-spouse | Craft-repair | Husband | Asian-Pac-Islander | Male | 0 | 0 | 2 | ? | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 2 | Private | 173651 | Some-college | 10 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0 | 0 | 2 | United-States | >50K |
34175 | 3 | Private | 149337 | HS-grad | 9 | Separated | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 2 | United-States | <=50K |
34176 | 4 | Private | 146674 | HS-grad | 9 | Married-civ-spouse | Other-service | Husband | Black | Male | 0 | 0 | 2 | ? | >50K |
34177 | 4 | Private | 173483 | Bachelors | 13 | Divorced | Prof-specialty | Not-in-family | White | Female | 0 | 0 | 1 | United-States | <=50K |
34178 | 0 | Private | 223669 | 11th | 7 | Never-married | Other-service | Own-child | White | Male | 0 | 0 | 0 | United-States | <=50K |
34179 | 3 | Private | 182177 | Some-college | 10 | Divorced | Protective-serv | Unmarried | White | Female | 0 | 0 | 1 | United-States | <=50K |
34180 | 0 | Private | 109414 | Some-college | 10 | Never-married | Sales | Other-relative | Asian-Pac-Islander | Male | 0 | 0 | 0 | India | <=50K |
34181 | 3 | Self-emp-inc | 150917 | Some-college | 10 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 3 | 2 | United-States | >50K |
34182 | 4 | Self-emp-not-inc | 39128 | HS-grad | 9 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 1 | United-States | <=50K |
34183 | 3 | Local-gov | 103540 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0 | 0 | 3 | United-States | >50K |
34184 | 4 | Private | 110212 | HS-grad | 9 | Married-civ-spouse | Other-service | Husband | Black | Male | 0 | 0 | 2 | United-States | <=50K |
34185 | 2 | Private | 222450 | HS-grad | 9 | Never-married | Sales | Not-in-family | White | Male | 0 | 4 | 2 | El-Salvador | <=50K |
34186 | 0 | ? | 113760 | HS-grad | 9 | Never-married | ? | Own-child | White | Female | 0 | 0 | 2 | United-States | <=50K |
34187 | 2 | ? | 253717 | 11th | 7 | Married-civ-spouse | ? | Wife | White | Female | 0 | 0 | 0 | United-States | <=50K |
34188 | 0 | Private | 306908 | HS-grad | 9 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | 0 | 0 | 2 | United-States | <=50K |
34189 rows × 15 columns
[7]:
test_df
[7]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 2 | Self-emp-not-inc | 263871 | Some-college | 10 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 3 | United-States | <=50K |
34190 | 2 | State-gov | 55294 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0 | 0 | 2 | United-States | >50K |
34191 | 0 | Private | 174063 | Assoc-voc | 11 | Never-married | Other-service | Own-child | White | Female | 0 | 0 | 0 | United-States | <=50K |
34192 | 3 | State-gov | 258735 | Some-college | 10 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 2 | United-States | >50K |
34193 | 3 | Private | 275867 | HS-grad | 9 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0 | 0 | 2 | United-States | <=50K |
34194 | 0 | Private | 154235 | Some-college | 10 | Never-married | Sales | Own-child | White | Female | 0 | 0 | 1 | United-States | <=50K |
34195 | 1 | Local-gov | 210448 | Some-college | 10 | Married-civ-spouse | Craft-repair | Other-relative | White | Male | 0 | 0 | 2 | United-States | <=50K |
34196 | 1 | Private | 337908 | Some-college | 10 | Divorced | Adm-clerical | Unmarried | Black | Female | 0 | 0 | 1 | United-States | <=50K |
34197 | 1 | State-gov | 205333 | Bachelors | 13 | Never-married | Prof-specialty | Not-in-family | White | Female | 0 | 0 | 0 | United-States | <=50K |
34198 | 0 | Private | 187447 | Some-college | 10 | Separated | Other-service | Own-child | White | Male | 0 | 0 | 2 | United-States | <=50K |
34199 | 1 | Private | 153589 | 9th | 5 | Separated | Craft-repair | Not-in-family | White | Male | 0 | 0 | 2 | United-States | <=50K |
34200 | 1 | Local-gov | 149988 | Some-college | 10 | Divorced | Adm-clerical | Unmarried | White | Female | 0 | 0 | 2 | United-States | <=50K |
34201 | 2 | Private | 398959 | Some-college | 10 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 2 | United-States | <=50K |
34202 | 0 | ? | 194096 | Some-college | 10 | Never-married | ? | Own-child | White | Female | 0 | 0 | 1 | United-States | <=50K |
34203 | 2 | Private | 44041 | Assoc-acdm | 12 | Married-spouse-absent | Adm-clerical | Other-relative | White | Male | 0 | 0 | 3 | United-States | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 3 | Private | 224655 | HS-grad | 9 | Separated | Priv-house-serv | Not-in-family | White | Female | 0 | 0 | 1 | United-States | <=50K |
48828 | 2 | Private | 247547 | Assoc-voc | 11 | Never-married | Adm-clerical | Unmarried | Black | Female | 0 | 0 | 2 | United-States | <=50K |
48829 | 4 | Private | 292710 | Assoc-acdm | 12 | Divorced | Prof-specialty | Not-in-family | White | Male | 0 | 0 | 2 | United-States | <=50K |
48830 | 1 | Private | 173449 | HS-grad | 9 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | 0 | 0 | 2 | United-States | <=50K |
48831 | 3 | Private | 285570 | HS-grad | 9 | Married-civ-spouse | Adm-clerical | Husband | White | Male | 0 | 0 | 2 | United-States | <=50K |
48832 | 4 | Private | 89686 | HS-grad | 9 | Married-civ-spouse | Sales | Husband | White | Male | 0 | 0 | 3 | United-States | <=50K |
48833 | 1 | Private | 440129 | HS-grad | 9 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 2 | United-States | <=50K |
48834 | 0 | Private | 350977 | HS-grad | 9 | Never-married | Other-service | Own-child | White | Female | 0 | 0 | 2 | United-States | <=50K |
48835 | 3 | Local-gov | 349230 | Masters | 14 | Divorced | Other-service | Not-in-family | White | Male | 0 | 0 | 2 | United-States | <=50K |
48836 | 1 | Private | 245211 | Bachelors | 13 | Never-married | Prof-specialty | Own-child | White | Male | 0 | 0 | 2 | United-States | <=50K |
48837 | 2 | Private | 215419 | Bachelors | 13 | Divorced | Prof-specialty | Not-in-family | White | Female | 0 | 0 | 2 | United-States | <=50K |
48838 | 4 | ? | 321403 | HS-grad | 9 | Widowed | ? | Other-relative | Black | Male | 0 | 0 | 2 | United-States | <=50K |
48839 | 2 | Private | 374983 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0 | 0 | 3 | United-States | <=50K |
48840 | 2 | Private | 83891 | Bachelors | 13 | Divorced | Adm-clerical | Own-child | Asian-Pac-Islander | Male | 2 | 0 | 2 | United-States | <=50K |
48841 | 1 | Self-emp-inc | 182148 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 3 | United-States | >50K |
14653 rows × 15 columns
1. Default clean_ml()
¶
By default, the cleaning pipeline of clean_ml()
function: * For categorical columns: constant imputation -> one-hot encoding
* For numerical columns: mean imputation -> standardzation
The default NULL values are: {np.nan, float("NaN"), "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", "1.#IND", "1.#QNAN", "<NA>", "N/A", "NA", "NULL", "NaN", "n/a", "nan", "null", "", None}
The default filling value for categorical columns is ‘missing_value’
[8]:
from dataprep.clean import clean_ml
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class")
[9]:
cleaned_training_df
[9]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.181564 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.064247 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | 1.054765 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
1 | 0.955953 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.009237 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
2 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.246964 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
3 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.428035 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -1.193092 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
4 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.412302 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
5 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.901345 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.520184 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
6 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.279485 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | -1.968313 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -2.185441 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
7 | 0.955953 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.189970 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
8 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.365494 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.520184 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | 5.032415 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
9 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.286491 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | 2.380648 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
10 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.862254 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 2.292380 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
11 | -0.592825 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.458800 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
12 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.639397 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
13 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.146086 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | 0.744962 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
14 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.644143 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ... | 0.357352 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.151677 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34175 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.382480 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34176 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.407759 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34177 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.153272 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34178 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.323123 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -1.193092 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34179 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.070743 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34180 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.761452 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | -2.185441 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34181 | 0.955953 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | -0.367482 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | 5.199568 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34182 | 1.730342 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.428648 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34183 | 0.955953 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | -0.817212 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34184 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.753877 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34185 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.311551 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | 7.001429 | 0.053470 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34186 | -1.367214 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | -0.720198 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34187 | 0.181564 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | 0.608356 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -1.193092 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34188 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.113276 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34189 rows × 15 columns
[10]:
cleaned_test_df
[10]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 0.181564 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.704744 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34190 | 0.181564 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.275191 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34191 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.147766 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ... | 0.357352 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34192 | 0.955953 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.655990 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34193 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.818617 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34194 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.335985 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34195 | -0.592825 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | 0.197621 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34196 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.407546 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34197 | -0.592825 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.149067 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34198 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.020718 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34199 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.342117 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | -1.968313 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34200 | -0.592825 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | -0.376300 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34201 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.987078 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34202 | -1.367214 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | 0.042399 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34203 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.382011 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | 0.744962 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.332483 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48828 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.549787 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ... | 0.357352 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48829 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.978500 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | 0.744962 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48830 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.153595 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48831 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.910723 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48832 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.948722 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48833 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 2.377888 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48834 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.531605 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48835 | 0.955953 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | 1.515021 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.520184 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48836 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.527612 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48837 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.244809 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48838 | 1.730342 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | 1.250871 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48839 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.759484 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48840 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.003732 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | 2.380648 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48841 | -0.592825 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | -0.071019 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
14653 rows × 15 columns
2. cat_imputation
and cat_null_value
parameter¶
There are three choices for cat_imputation
parameter: * constant
: filling the missing value with constant values. The default is ‘missing_value’. * most_frequent
: filling the missing value with most frequent value of this column. * drop
: drop this column if there are missing values.
cat_null_value
parameter is a list including user-specified null values. The element in this list can be any type. For example: * [‘?’] * [‘abc’, np.nan, ‘?’, 1265]
By default, the specified missing values are replaced by “missing_value”
[18]:
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
cat_imputation="constant",
cat_encoding="no_encoding", cat_null_value=['?'])
[19]:
cleaned_training_df
[19]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.181564 | State-gov | -1.064247 | Bachelors | 1.132573 | Never-married | Adm-clerical | Not-in-family | White | Male | 1.054765 | -0.206016 | 0.053470 | United-States | <=50K |
1 | 0.955953 | Self-emp-not-inc | -1.009237 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
2 | 0.181564 | Private | 0.246964 | HS-grad | -0.417870 | Divorced | Handlers-cleaners | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
3 | 0.955953 | Private | 0.428035 | 11th | -1.193092 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
4 | -0.592825 | Private | 1.412302 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | -0.271118 | -0.206016 | 0.053470 | Cuba | <=50K |
5 | 0.181564 | Private | 0.901345 | Masters | 1.520184 | Married-civ-spouse | Exec-managerial | Wife | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
6 | 0.955953 | Private | -0.279485 | 9th | -1.968313 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | -0.271118 | -0.206016 | -2.185441 | Jamaica | <=50K |
7 | 0.955953 | Self-emp-not-inc | 0.189970 | HS-grad | -0.417870 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
8 | -0.592825 | Private | -1.365494 | Masters | 1.520184 | Never-married | Prof-specialty | Not-in-family | White | Female | 5.032415 | -0.206016 | 1.172925 | United-States | >50K |
9 | 0.181564 | Private | -0.286491 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 2.380648 | -0.206016 | 0.053470 | United-States | >50K |
10 | 0.181564 | Private | 0.862254 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | Black | Male | -0.271118 | -0.206016 | 2.292380 | United-States | >50K |
11 | -0.592825 | State-gov | -0.458800 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | India | >50K |
12 | -1.367214 | Private | -0.639397 | Bachelors | 1.132573 | Never-married | Adm-clerical | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
13 | -0.592825 | Private | 0.146086 | Assoc-acdm | 0.744962 | Never-married | Sales | Not-in-family | Black | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
14 | 0.181564 | Private | -0.644143 | Assoc-voc | 0.357352 | Married-civ-spouse | Craft-repair | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | missing_value | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 0.181564 | Private | -0.151677 | Some-college | -0.030259 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34175 | 0.955953 | Private | -0.382480 | HS-grad | -0.417870 | Separated | Handlers-cleaners | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34176 | 1.730342 | Private | -0.407759 | HS-grad | -0.417870 | Married-civ-spouse | Other-service | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | missing_value | >50K |
34177 | 1.730342 | Private | -0.153272 | Bachelors | 1.132573 | Divorced | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34178 | -1.367214 | Private | 0.323123 | 11th | -1.193092 | Never-married | Other-service | Own-child | White | Male | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34179 | 0.955953 | Private | -0.070743 | Some-college | -0.030259 | Divorced | Protective-serv | Unmarried | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34180 | -1.367214 | Private | -0.761452 | Some-college | -0.030259 | Never-married | Sales | Other-relative | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | -2.185441 | India | <=50K |
34181 | 0.955953 | Self-emp-inc | -0.367482 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | 5.199568 | 0.053470 | United-States | >50K |
34182 | 1.730342 | Self-emp-not-inc | -1.428648 | HS-grad | -0.417870 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34183 | 0.955953 | Local-gov | -0.817212 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | >50K |
34184 | 1.730342 | Private | -0.753877 | HS-grad | -0.417870 | Married-civ-spouse | Other-service | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34185 | 0.181564 | Private | 0.311551 | HS-grad | -0.417870 | Never-married | Sales | Not-in-family | White | Male | -0.271118 | 7.001429 | 0.053470 | El-Salvador | <=50K |
34186 | -1.367214 | missing_value | -0.720198 | HS-grad | -0.417870 | Never-married | missing_value | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34187 | 0.181564 | missing_value | 0.608356 | 11th | -1.193092 | Married-civ-spouse | missing_value | Wife | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34188 | -1.367214 | Private | 1.113276 | HS-grad | -0.417870 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34189 rows × 15 columns
[20]:
cleaned_test_df
[20]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 0.181564 | Self-emp-not-inc | 0.704744 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
34190 | 0.181564 | State-gov | -1.275191 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34191 | -1.367214 | Private | -0.147766 | Assoc-voc | 0.357352 | Never-married | Other-service | Own-child | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34192 | 0.955953 | State-gov | 0.655990 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34193 | 0.955953 | Private | 0.818617 | HS-grad | -0.417870 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34194 | -1.367214 | Private | -0.335985 | Some-college | -0.030259 | Never-married | Sales | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34195 | -0.592825 | Local-gov | 0.197621 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Other-relative | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34196 | -0.592825 | Private | 1.407546 | Some-college | -0.030259 | Divorced | Adm-clerical | Unmarried | Black | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34197 | -0.592825 | State-gov | 0.149067 | Bachelors | 1.132573 | Never-married | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34198 | -1.367214 | Private | -0.020718 | Some-college | -0.030259 | Separated | Other-service | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34199 | -0.592825 | Private | -0.342117 | 9th | -1.968313 | Separated | Craft-repair | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34200 | -0.592825 | Local-gov | -0.376300 | Some-college | -0.030259 | Divorced | Adm-clerical | Unmarried | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34201 | 0.181564 | Private | 1.987078 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34202 | -1.367214 | missing_value | 0.042399 | Some-college | -0.030259 | Never-married | missing_value | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34203 | 0.181564 | Private | -1.382011 | Assoc-acdm | 0.744962 | Married-spouse-absent | Adm-clerical | Other-relative | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 0.955953 | Private | 0.332483 | HS-grad | -0.417870 | Separated | Priv-house-serv | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
48828 | 0.181564 | Private | 0.549787 | Assoc-voc | 0.357352 | Never-married | Adm-clerical | Unmarried | Black | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48829 | 1.730342 | Private | 0.978500 | Assoc-acdm | 0.744962 | Divorced | Prof-specialty | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48830 | -0.592825 | Private | -0.153595 | HS-grad | -0.417870 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48831 | 0.955953 | Private | 0.910723 | HS-grad | -0.417870 | Married-civ-spouse | Adm-clerical | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48832 | 1.730342 | Private | -0.948722 | HS-grad | -0.417870 | Married-civ-spouse | Sales | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
48833 | -0.592825 | Private | 2.377888 | HS-grad | -0.417870 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48834 | -1.367214 | Private | 1.531605 | HS-grad | -0.417870 | Never-married | Other-service | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48835 | 0.955953 | Local-gov | 1.515021 | Masters | 1.520184 | Divorced | Other-service | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48836 | -0.592825 | Private | 0.527612 | Bachelors | 1.132573 | Never-married | Prof-specialty | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48837 | 0.181564 | Private | 0.244809 | Bachelors | 1.132573 | Divorced | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48838 | 1.730342 | missing_value | 1.250871 | HS-grad | -0.417870 | Widowed | missing_value | Other-relative | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48839 | 0.181564 | Private | 1.759484 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
48840 | 0.181564 | Private | -1.003732 | Bachelors | 1.132573 | Divorced | Adm-clerical | Own-child | Asian-Pac-Islander | Male | 2.380648 | -0.206016 | 0.053470 | United-States | <=50K |
48841 | -0.592825 | Self-emp-inc | -0.071019 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | >50K |
14653 rows × 15 columns
[21]:
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
cat_imputation="most_frequent",
cat_encoding="no_encoding", cat_null_value=['?'])
[22]:
cleaned_training_df
[22]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.181564 | State-gov | -1.064247 | Bachelors | 1.132573 | Never-married | Adm-clerical | Not-in-family | White | Male | 1.054765 | -0.206016 | 0.053470 | United-States | <=50K |
1 | 0.955953 | Self-emp-not-inc | -1.009237 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
2 | 0.181564 | Private | 0.246964 | HS-grad | -0.417870 | Divorced | Handlers-cleaners | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
3 | 0.955953 | Private | 0.428035 | 11th | -1.193092 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
4 | -0.592825 | Private | 1.412302 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | -0.271118 | -0.206016 | 0.053470 | Cuba | <=50K |
5 | 0.181564 | Private | 0.901345 | Masters | 1.520184 | Married-civ-spouse | Exec-managerial | Wife | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
6 | 0.955953 | Private | -0.279485 | 9th | -1.968313 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | -0.271118 | -0.206016 | -2.185441 | Jamaica | <=50K |
7 | 0.955953 | Self-emp-not-inc | 0.189970 | HS-grad | -0.417870 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
8 | -0.592825 | Private | -1.365494 | Masters | 1.520184 | Never-married | Prof-specialty | Not-in-family | White | Female | 5.032415 | -0.206016 | 1.172925 | United-States | >50K |
9 | 0.181564 | Private | -0.286491 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 2.380648 | -0.206016 | 0.053470 | United-States | >50K |
10 | 0.181564 | Private | 0.862254 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | Black | Male | -0.271118 | -0.206016 | 2.292380 | United-States | >50K |
11 | -0.592825 | State-gov | -0.458800 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | India | >50K |
12 | -1.367214 | Private | -0.639397 | Bachelors | 1.132573 | Never-married | Adm-clerical | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
13 | -0.592825 | Private | 0.146086 | Assoc-acdm | 0.744962 | Never-married | Sales | Not-in-family | Black | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
14 | 0.181564 | Private | -0.644143 | Assoc-voc | 0.357352 | Married-civ-spouse | Craft-repair | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 0.181564 | Private | -0.151677 | Some-college | -0.030259 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34175 | 0.955953 | Private | -0.382480 | HS-grad | -0.417870 | Separated | Handlers-cleaners | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34176 | 1.730342 | Private | -0.407759 | HS-grad | -0.417870 | Married-civ-spouse | Other-service | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34177 | 1.730342 | Private | -0.153272 | Bachelors | 1.132573 | Divorced | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34178 | -1.367214 | Private | 0.323123 | 11th | -1.193092 | Never-married | Other-service | Own-child | White | Male | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34179 | 0.955953 | Private | -0.070743 | Some-college | -0.030259 | Divorced | Protective-serv | Unmarried | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34180 | -1.367214 | Private | -0.761452 | Some-college | -0.030259 | Never-married | Sales | Other-relative | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | -2.185441 | India | <=50K |
34181 | 0.955953 | Self-emp-inc | -0.367482 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | 5.199568 | 0.053470 | United-States | >50K |
34182 | 1.730342 | Self-emp-not-inc | -1.428648 | HS-grad | -0.417870 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34183 | 0.955953 | Local-gov | -0.817212 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | >50K |
34184 | 1.730342 | Private | -0.753877 | HS-grad | -0.417870 | Married-civ-spouse | Other-service | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34185 | 0.181564 | Private | 0.311551 | HS-grad | -0.417870 | Never-married | Sales | Not-in-family | White | Male | -0.271118 | 7.001429 | 0.053470 | El-Salvador | <=50K |
34186 | -1.367214 | Private | -0.720198 | HS-grad | -0.417870 | Never-married | Prof-specialty | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34187 | 0.181564 | Private | 0.608356 | 11th | -1.193092 | Married-civ-spouse | Prof-specialty | Wife | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34188 | -1.367214 | Private | 1.113276 | HS-grad | -0.417870 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34189 rows × 15 columns
[23]:
cleaned_test_df
[23]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 0.181564 | Self-emp-not-inc | 0.704744 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
34190 | 0.181564 | State-gov | -1.275191 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34191 | -1.367214 | Private | -0.147766 | Assoc-voc | 0.357352 | Never-married | Other-service | Own-child | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34192 | 0.955953 | State-gov | 0.655990 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34193 | 0.955953 | Private | 0.818617 | HS-grad | -0.417870 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34194 | -1.367214 | Private | -0.335985 | Some-college | -0.030259 | Never-married | Sales | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34195 | -0.592825 | Local-gov | 0.197621 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Other-relative | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34196 | -0.592825 | Private | 1.407546 | Some-college | -0.030259 | Divorced | Adm-clerical | Unmarried | Black | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34197 | -0.592825 | State-gov | 0.149067 | Bachelors | 1.132573 | Never-married | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34198 | -1.367214 | Private | -0.020718 | Some-college | -0.030259 | Separated | Other-service | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34199 | -0.592825 | Private | -0.342117 | 9th | -1.968313 | Separated | Craft-repair | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34200 | -0.592825 | Local-gov | -0.376300 | Some-college | -0.030259 | Divorced | Adm-clerical | Unmarried | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34201 | 0.181564 | Private | 1.987078 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34202 | -1.367214 | Private | 0.042399 | Some-college | -0.030259 | Never-married | Prof-specialty | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34203 | 0.181564 | Private | -1.382011 | Assoc-acdm | 0.744962 | Married-spouse-absent | Adm-clerical | Other-relative | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 0.955953 | Private | 0.332483 | HS-grad | -0.417870 | Separated | Priv-house-serv | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
48828 | 0.181564 | Private | 0.549787 | Assoc-voc | 0.357352 | Never-married | Adm-clerical | Unmarried | Black | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48829 | 1.730342 | Private | 0.978500 | Assoc-acdm | 0.744962 | Divorced | Prof-specialty | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48830 | -0.592825 | Private | -0.153595 | HS-grad | -0.417870 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48831 | 0.955953 | Private | 0.910723 | HS-grad | -0.417870 | Married-civ-spouse | Adm-clerical | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48832 | 1.730342 | Private | -0.948722 | HS-grad | -0.417870 | Married-civ-spouse | Sales | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
48833 | -0.592825 | Private | 2.377888 | HS-grad | -0.417870 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48834 | -1.367214 | Private | 1.531605 | HS-grad | -0.417870 | Never-married | Other-service | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48835 | 0.955953 | Local-gov | 1.515021 | Masters | 1.520184 | Divorced | Other-service | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48836 | -0.592825 | Private | 0.527612 | Bachelors | 1.132573 | Never-married | Prof-specialty | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48837 | 0.181564 | Private | 0.244809 | Bachelors | 1.132573 | Divorced | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48838 | 1.730342 | Private | 1.250871 | HS-grad | -0.417870 | Widowed | Prof-specialty | Other-relative | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48839 | 0.181564 | Private | 1.759484 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
48840 | 0.181564 | Private | -1.003732 | Bachelors | 1.132573 | Divorced | Adm-clerical | Own-child | Asian-Pac-Islander | Male | 2.380648 | -0.206016 | 0.053470 | United-States | <=50K |
48841 | -0.592825 | Self-emp-inc | -0.071019 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | >50K |
14653 rows × 15 columns
[24]:
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
cat_imputation="drop",
cat_encoding="no_encoding", cat_null_value=['?'])
[25]:
cleaned_training_df
[25]:
age | fnlwgt | education | education-num | marital-status | relationship | race | sex | capitalgain | capitalloss | hoursperweek | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.181564 | -1.064247 | Bachelors | 1.132573 | Never-married | Not-in-family | White | Male | 1.054765 | -0.206016 | 0.053470 | <=50K |
1 | 0.955953 | -1.009237 | Bachelors | 1.132573 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | -2.185441 | <=50K |
2 | 0.181564 | 0.246964 | HS-grad | -0.417870 | Divorced | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
3 | 0.955953 | 0.428035 | 11th | -1.193092 | Married-civ-spouse | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
4 | -0.592825 | 1.412302 | Bachelors | 1.132573 | Married-civ-spouse | Wife | Black | Female | -0.271118 | -0.206016 | 0.053470 | <=50K |
5 | 0.181564 | 0.901345 | Masters | 1.520184 | Married-civ-spouse | Wife | White | Female | -0.271118 | -0.206016 | 0.053470 | <=50K |
6 | 0.955953 | -0.279485 | 9th | -1.968313 | Married-spouse-absent | Not-in-family | Black | Female | -0.271118 | -0.206016 | -2.185441 | <=50K |
7 | 0.955953 | 0.189970 | HS-grad | -0.417870 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | >50K |
8 | -0.592825 | -1.365494 | Masters | 1.520184 | Never-married | Not-in-family | White | Female | 5.032415 | -0.206016 | 1.172925 | >50K |
9 | 0.181564 | -0.286491 | Bachelors | 1.132573 | Married-civ-spouse | Husband | White | Male | 2.380648 | -0.206016 | 0.053470 | >50K |
10 | 0.181564 | 0.862254 | Some-college | -0.030259 | Married-civ-spouse | Husband | Black | Male | -0.271118 | -0.206016 | 2.292380 | >50K |
11 | -0.592825 | -0.458800 | Bachelors | 1.132573 | Married-civ-spouse | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | >50K |
12 | -1.367214 | -0.639397 | Bachelors | 1.132573 | Never-married | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | <=50K |
13 | -0.592825 | 0.146086 | Assoc-acdm | 0.744962 | Never-married | Not-in-family | Black | Male | -0.271118 | -0.206016 | 1.172925 | <=50K |
14 | 0.181564 | -0.644143 | Assoc-voc | 0.357352 | Married-civ-spouse | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 0.181564 | -0.151677 | Some-college | -0.030259 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | >50K |
34175 | 0.955953 | -0.382480 | HS-grad | -0.417870 | Separated | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
34176 | 1.730342 | -0.407759 | HS-grad | -0.417870 | Married-civ-spouse | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | >50K |
34177 | 1.730342 | -0.153272 | Bachelors | 1.132573 | Divorced | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | <=50K |
34178 | -1.367214 | 0.323123 | 11th | -1.193092 | Never-married | Own-child | White | Male | -0.271118 | -0.206016 | -2.185441 | <=50K |
34179 | 0.955953 | -0.070743 | Some-college | -0.030259 | Divorced | Unmarried | White | Female | -0.271118 | -0.206016 | -1.065986 | <=50K |
34180 | -1.367214 | -0.761452 | Some-college | -0.030259 | Never-married | Other-relative | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | -2.185441 | <=50K |
34181 | 0.955953 | -0.367482 | Some-college | -0.030259 | Married-civ-spouse | Husband | White | Male | -0.271118 | 5.199568 | 0.053470 | >50K |
34182 | 1.730342 | -1.428648 | HS-grad | -0.417870 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | -1.065986 | <=50K |
34183 | 0.955953 | -0.817212 | Bachelors | 1.132573 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | >50K |
34184 | 1.730342 | -0.753877 | HS-grad | -0.417870 | Married-civ-spouse | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
34185 | 0.181564 | 0.311551 | HS-grad | -0.417870 | Never-married | Not-in-family | White | Male | -0.271118 | 7.001429 | 0.053470 | <=50K |
34186 | -1.367214 | -0.720198 | HS-grad | -0.417870 | Never-married | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | <=50K |
34187 | 0.181564 | 0.608356 | 11th | -1.193092 | Married-civ-spouse | Wife | White | Female | -0.271118 | -0.206016 | -2.185441 | <=50K |
34188 | -1.367214 | 1.113276 | HS-grad | -0.417870 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
34189 rows × 12 columns
[26]:
cleaned_test_df
[26]:
age | fnlwgt | education | education-num | marital-status | relationship | race | sex | capitalgain | capitalloss | hoursperweek | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 0.181564 | 0.704744 | Some-college | -0.030259 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | <=50K |
34190 | 0.181564 | -1.275191 | Bachelors | 1.132573 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | >50K |
34191 | -1.367214 | -0.147766 | Assoc-voc | 0.357352 | Never-married | Own-child | White | Female | -0.271118 | -0.206016 | -2.185441 | <=50K |
34192 | 0.955953 | 0.655990 | Some-college | -0.030259 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | >50K |
34193 | 0.955953 | 0.818617 | HS-grad | -0.417870 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
34194 | -1.367214 | -0.335985 | Some-college | -0.030259 | Never-married | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | <=50K |
34195 | -0.592825 | 0.197621 | Some-college | -0.030259 | Married-civ-spouse | Other-relative | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
34196 | -0.592825 | 1.407546 | Some-college | -0.030259 | Divorced | Unmarried | Black | Female | -0.271118 | -0.206016 | -1.065986 | <=50K |
34197 | -0.592825 | 0.149067 | Bachelors | 1.132573 | Never-married | Not-in-family | White | Female | -0.271118 | -0.206016 | -2.185441 | <=50K |
34198 | -1.367214 | -0.020718 | Some-college | -0.030259 | Separated | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
34199 | -0.592825 | -0.342117 | 9th | -1.968313 | Separated | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
34200 | -0.592825 | -0.376300 | Some-college | -0.030259 | Divorced | Unmarried | White | Female | -0.271118 | -0.206016 | 0.053470 | <=50K |
34201 | 0.181564 | 1.987078 | Some-college | -0.030259 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
34202 | -1.367214 | 0.042399 | Some-college | -0.030259 | Never-married | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | <=50K |
34203 | 0.181564 | -1.382011 | Assoc-acdm | 0.744962 | Married-spouse-absent | Other-relative | White | Male | -0.271118 | -0.206016 | 1.172925 | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 0.955953 | 0.332483 | HS-grad | -0.417870 | Separated | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | <=50K |
48828 | 0.181564 | 0.549787 | Assoc-voc | 0.357352 | Never-married | Unmarried | Black | Female | -0.271118 | -0.206016 | 0.053470 | <=50K |
48829 | 1.730342 | 0.978500 | Assoc-acdm | 0.744962 | Divorced | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
48830 | -0.592825 | -0.153595 | HS-grad | -0.417870 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
48831 | 0.955953 | 0.910723 | HS-grad | -0.417870 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
48832 | 1.730342 | -0.948722 | HS-grad | -0.417870 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | <=50K |
48833 | -0.592825 | 2.377888 | HS-grad | -0.417870 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
48834 | -1.367214 | 1.531605 | HS-grad | -0.417870 | Never-married | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | <=50K |
48835 | 0.955953 | 1.515021 | Masters | 1.520184 | Divorced | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
48836 | -0.592825 | 0.527612 | Bachelors | 1.132573 | Never-married | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
48837 | 0.181564 | 0.244809 | Bachelors | 1.132573 | Divorced | Not-in-family | White | Female | -0.271118 | -0.206016 | 0.053470 | <=50K |
48838 | 1.730342 | 1.250871 | HS-grad | -0.417870 | Widowed | Other-relative | Black | Male | -0.271118 | -0.206016 | 0.053470 | <=50K |
48839 | 0.181564 | 1.759484 | Bachelors | 1.132573 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | <=50K |
48840 | 0.181564 | -1.003732 | Bachelors | 1.132573 | Divorced | Own-child | Asian-Pac-Islander | Male | 2.380648 | -0.206016 | 0.053470 | <=50K |
48841 | -0.592825 | -0.071019 | Bachelors | 1.132573 | Married-civ-spouse | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | >50K |
14653 rows × 12 columns
3. fill_val
parameter¶
By default, the filling value of categorical missing value is “missing value”. However, user can specify this string with whatever string they like, such as "missing"
, "NaN"
, "I'm a cat."
, "Fyodor Dostoyevsky"
.
[30]:
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
cat_null_value=['?'], cat_encoding="no_encoding",
fill_val="AHAHAHAHAHA!!!")
[31]:
cleaned_training_df
[31]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.181564 | State-gov | -1.064247 | Bachelors | 1.132573 | Never-married | Adm-clerical | Not-in-family | White | Male | 1.054765 | -0.206016 | 0.053470 | United-States | <=50K |
1 | 0.955953 | Self-emp-not-inc | -1.009237 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
2 | 0.181564 | Private | 0.246964 | HS-grad | -0.417870 | Divorced | Handlers-cleaners | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
3 | 0.955953 | Private | 0.428035 | 11th | -1.193092 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
4 | -0.592825 | Private | 1.412302 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | -0.271118 | -0.206016 | 0.053470 | Cuba | <=50K |
5 | 0.181564 | Private | 0.901345 | Masters | 1.520184 | Married-civ-spouse | Exec-managerial | Wife | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
6 | 0.955953 | Private | -0.279485 | 9th | -1.968313 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | -0.271118 | -0.206016 | -2.185441 | Jamaica | <=50K |
7 | 0.955953 | Self-emp-not-inc | 0.189970 | HS-grad | -0.417870 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
8 | -0.592825 | Private | -1.365494 | Masters | 1.520184 | Never-married | Prof-specialty | Not-in-family | White | Female | 5.032415 | -0.206016 | 1.172925 | United-States | >50K |
9 | 0.181564 | Private | -0.286491 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 2.380648 | -0.206016 | 0.053470 | United-States | >50K |
10 | 0.181564 | Private | 0.862254 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | Black | Male | -0.271118 | -0.206016 | 2.292380 | United-States | >50K |
11 | -0.592825 | State-gov | -0.458800 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | India | >50K |
12 | -1.367214 | Private | -0.639397 | Bachelors | 1.132573 | Never-married | Adm-clerical | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
13 | -0.592825 | Private | 0.146086 | Assoc-acdm | 0.744962 | Never-married | Sales | Not-in-family | Black | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
14 | 0.181564 | Private | -0.644143 | Assoc-voc | 0.357352 | Married-civ-spouse | Craft-repair | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | AHAHAHAHAHA!!! | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 0.181564 | Private | -0.151677 | Some-college | -0.030259 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34175 | 0.955953 | Private | -0.382480 | HS-grad | -0.417870 | Separated | Handlers-cleaners | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34176 | 1.730342 | Private | -0.407759 | HS-grad | -0.417870 | Married-civ-spouse | Other-service | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | AHAHAHAHAHA!!! | >50K |
34177 | 1.730342 | Private | -0.153272 | Bachelors | 1.132573 | Divorced | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34178 | -1.367214 | Private | 0.323123 | 11th | -1.193092 | Never-married | Other-service | Own-child | White | Male | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34179 | 0.955953 | Private | -0.070743 | Some-college | -0.030259 | Divorced | Protective-serv | Unmarried | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34180 | -1.367214 | Private | -0.761452 | Some-college | -0.030259 | Never-married | Sales | Other-relative | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | -2.185441 | India | <=50K |
34181 | 0.955953 | Self-emp-inc | -0.367482 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | 5.199568 | 0.053470 | United-States | >50K |
34182 | 1.730342 | Self-emp-not-inc | -1.428648 | HS-grad | -0.417870 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34183 | 0.955953 | Local-gov | -0.817212 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | >50K |
34184 | 1.730342 | Private | -0.753877 | HS-grad | -0.417870 | Married-civ-spouse | Other-service | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34185 | 0.181564 | Private | 0.311551 | HS-grad | -0.417870 | Never-married | Sales | Not-in-family | White | Male | -0.271118 | 7.001429 | 0.053470 | El-Salvador | <=50K |
34186 | -1.367214 | AHAHAHAHAHA!!! | -0.720198 | HS-grad | -0.417870 | Never-married | AHAHAHAHAHA!!! | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34187 | 0.181564 | AHAHAHAHAHA!!! | 0.608356 | 11th | -1.193092 | Married-civ-spouse | AHAHAHAHAHA!!! | Wife | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34188 | -1.367214 | Private | 1.113276 | HS-grad | -0.417870 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34189 rows × 15 columns
[32]:
cleaned_test_df
[32]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 0.181564 | Self-emp-not-inc | 0.704744 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
34190 | 0.181564 | State-gov | -1.275191 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34191 | -1.367214 | Private | -0.147766 | Assoc-voc | 0.357352 | Never-married | Other-service | Own-child | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34192 | 0.955953 | State-gov | 0.655990 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34193 | 0.955953 | Private | 0.818617 | HS-grad | -0.417870 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34194 | -1.367214 | Private | -0.335985 | Some-college | -0.030259 | Never-married | Sales | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34195 | -0.592825 | Local-gov | 0.197621 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Other-relative | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34196 | -0.592825 | Private | 1.407546 | Some-college | -0.030259 | Divorced | Adm-clerical | Unmarried | Black | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34197 | -0.592825 | State-gov | 0.149067 | Bachelors | 1.132573 | Never-married | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34198 | -1.367214 | Private | -0.020718 | Some-college | -0.030259 | Separated | Other-service | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34199 | -0.592825 | Private | -0.342117 | 9th | -1.968313 | Separated | Craft-repair | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34200 | -0.592825 | Local-gov | -0.376300 | Some-college | -0.030259 | Divorced | Adm-clerical | Unmarried | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34201 | 0.181564 | Private | 1.987078 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34202 | -1.367214 | AHAHAHAHAHA!!! | 0.042399 | Some-college | -0.030259 | Never-married | AHAHAHAHAHA!!! | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34203 | 0.181564 | Private | -1.382011 | Assoc-acdm | 0.744962 | Married-spouse-absent | Adm-clerical | Other-relative | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 0.955953 | Private | 0.332483 | HS-grad | -0.417870 | Separated | Priv-house-serv | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
48828 | 0.181564 | Private | 0.549787 | Assoc-voc | 0.357352 | Never-married | Adm-clerical | Unmarried | Black | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48829 | 1.730342 | Private | 0.978500 | Assoc-acdm | 0.744962 | Divorced | Prof-specialty | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48830 | -0.592825 | Private | -0.153595 | HS-grad | -0.417870 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48831 | 0.955953 | Private | 0.910723 | HS-grad | -0.417870 | Married-civ-spouse | Adm-clerical | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48832 | 1.730342 | Private | -0.948722 | HS-grad | -0.417870 | Married-civ-spouse | Sales | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
48833 | -0.592825 | Private | 2.377888 | HS-grad | -0.417870 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48834 | -1.367214 | Private | 1.531605 | HS-grad | -0.417870 | Never-married | Other-service | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48835 | 0.955953 | Local-gov | 1.515021 | Masters | 1.520184 | Divorced | Other-service | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48836 | -0.592825 | Private | 0.527612 | Bachelors | 1.132573 | Never-married | Prof-specialty | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48837 | 0.181564 | Private | 0.244809 | Bachelors | 1.132573 | Divorced | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48838 | 1.730342 | AHAHAHAHAHA!!! | 1.250871 | HS-grad | -0.417870 | Widowed | AHAHAHAHAHA!!! | Other-relative | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48839 | 0.181564 | Private | 1.759484 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
48840 | 0.181564 | Private | -1.003732 | Bachelors | 1.132573 | Divorced | Adm-clerical | Own-child | Asian-Pac-Islander | Male | 2.380648 | -0.206016 | 0.053470 | United-States | <=50K |
48841 | -0.592825 | Self-emp-inc | -0.071019 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | >50K |
14653 rows × 15 columns
4. num_imputation
and num_null_value
parameter¶
There are three choices for num_imputation
parameter: * mean
: filling the missing value with mean value of this column. * meduab
: filling the missing value with median value of this column. * most_frequent
: filling the missing value with most frequent value of this column. * drop
: drop this column if there are missing values.
The default null values are same to the null values metioned in cat_imputation
parameter.
The imputing process is quite similar with the cat_imputation
parameter section. Thus, we don’t show redundant examples here.
num_null_value
parameter is a list including user-specified null values. The element in this list can be any type. For example: * [‘?’] * [‘abc’, np.nan, ‘?’, 1265]
The usage of num_null_value
parameter is same to cat_null_value
parameter. Thus we don’t show redundant examples here.
5. cat_encoding
parameter¶
There are three choices for cat_encoding
parameter: * no_encoding
: don’t do any encoding for categorical columns. * one_hot
: do one_hot encoding for categorical columns.
The default value is one_hot
.
[36]:
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class", cat_encoding="no_encoding")
[37]:
cleaned_training_df
[37]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.181564 | State-gov | -1.064247 | Bachelors | 1.132573 | Never-married | Adm-clerical | Not-in-family | White | Male | 1.054765 | -0.206016 | 0.053470 | United-States | <=50K |
1 | 0.955953 | Self-emp-not-inc | -1.009237 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
2 | 0.181564 | Private | 0.246964 | HS-grad | -0.417870 | Divorced | Handlers-cleaners | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
3 | 0.955953 | Private | 0.428035 | 11th | -1.193092 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
4 | -0.592825 | Private | 1.412302 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | -0.271118 | -0.206016 | 0.053470 | Cuba | <=50K |
5 | 0.181564 | Private | 0.901345 | Masters | 1.520184 | Married-civ-spouse | Exec-managerial | Wife | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
6 | 0.955953 | Private | -0.279485 | 9th | -1.968313 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | -0.271118 | -0.206016 | -2.185441 | Jamaica | <=50K |
7 | 0.955953 | Self-emp-not-inc | 0.189970 | HS-grad | -0.417870 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
8 | -0.592825 | Private | -1.365494 | Masters | 1.520184 | Never-married | Prof-specialty | Not-in-family | White | Female | 5.032415 | -0.206016 | 1.172925 | United-States | >50K |
9 | 0.181564 | Private | -0.286491 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 2.380648 | -0.206016 | 0.053470 | United-States | >50K |
10 | 0.181564 | Private | 0.862254 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | Black | Male | -0.271118 | -0.206016 | 2.292380 | United-States | >50K |
11 | -0.592825 | State-gov | -0.458800 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | India | >50K |
12 | -1.367214 | Private | -0.639397 | Bachelors | 1.132573 | Never-married | Adm-clerical | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
13 | -0.592825 | Private | 0.146086 | Assoc-acdm | 0.744962 | Never-married | Sales | Not-in-family | Black | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
14 | 0.181564 | Private | -0.644143 | Assoc-voc | 0.357352 | Married-civ-spouse | Craft-repair | Husband | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | 0.053470 | ? | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 0.181564 | Private | -0.151677 | Some-college | -0.030259 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34175 | 0.955953 | Private | -0.382480 | HS-grad | -0.417870 | Separated | Handlers-cleaners | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34176 | 1.730342 | Private | -0.407759 | HS-grad | -0.417870 | Married-civ-spouse | Other-service | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | ? | >50K |
34177 | 1.730342 | Private | -0.153272 | Bachelors | 1.132573 | Divorced | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34178 | -1.367214 | Private | 0.323123 | 11th | -1.193092 | Never-married | Other-service | Own-child | White | Male | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34179 | 0.955953 | Private | -0.070743 | Some-college | -0.030259 | Divorced | Protective-serv | Unmarried | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34180 | -1.367214 | Private | -0.761452 | Some-college | -0.030259 | Never-married | Sales | Other-relative | Asian-Pac-Islander | Male | -0.271118 | -0.206016 | -2.185441 | India | <=50K |
34181 | 0.955953 | Self-emp-inc | -0.367482 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | 5.199568 | 0.053470 | United-States | >50K |
34182 | 1.730342 | Self-emp-not-inc | -1.428648 | HS-grad | -0.417870 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34183 | 0.955953 | Local-gov | -0.817212 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | >50K |
34184 | 1.730342 | Private | -0.753877 | HS-grad | -0.417870 | Married-civ-spouse | Other-service | Husband | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34185 | 0.181564 | Private | 0.311551 | HS-grad | -0.417870 | Never-married | Sales | Not-in-family | White | Male | -0.271118 | 7.001429 | 0.053470 | El-Salvador | <=50K |
34186 | -1.367214 | ? | -0.720198 | HS-grad | -0.417870 | Never-married | ? | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34187 | 0.181564 | ? | 0.608356 | 11th | -1.193092 | Married-civ-spouse | ? | Wife | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34188 | -1.367214 | Private | 1.113276 | HS-grad | -0.417870 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34189 rows × 15 columns
[38]:
cleaned_test_df
[38]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 0.181564 | Self-emp-not-inc | 0.704744 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
34190 | 0.181564 | State-gov | -1.275191 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34191 | -1.367214 | Private | -0.147766 | Assoc-voc | 0.357352 | Never-married | Other-service | Own-child | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34192 | 0.955953 | State-gov | 0.655990 | Some-college | -0.030259 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | >50K |
34193 | 0.955953 | Private | 0.818617 | HS-grad | -0.417870 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34194 | -1.367214 | Private | -0.335985 | Some-college | -0.030259 | Never-married | Sales | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34195 | -0.592825 | Local-gov | 0.197621 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Other-relative | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34196 | -0.592825 | Private | 1.407546 | Some-college | -0.030259 | Divorced | Adm-clerical | Unmarried | Black | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34197 | -0.592825 | State-gov | 0.149067 | Bachelors | 1.132573 | Never-married | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | -2.185441 | United-States | <=50K |
34198 | -1.367214 | Private | -0.020718 | Some-college | -0.030259 | Separated | Other-service | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34199 | -0.592825 | Private | -0.342117 | 9th | -1.968313 | Separated | Craft-repair | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34200 | -0.592825 | Local-gov | -0.376300 | Some-college | -0.030259 | Divorced | Adm-clerical | Unmarried | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34201 | 0.181564 | Private | 1.987078 | Some-college | -0.030259 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
34202 | -1.367214 | ? | 0.042399 | Some-college | -0.030259 | Never-married | ? | Own-child | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
34203 | 0.181564 | Private | -1.382011 | Assoc-acdm | 0.744962 | Married-spouse-absent | Adm-clerical | Other-relative | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 0.955953 | Private | 0.332483 | HS-grad | -0.417870 | Separated | Priv-house-serv | Not-in-family | White | Female | -0.271118 | -0.206016 | -1.065986 | United-States | <=50K |
48828 | 0.181564 | Private | 0.549787 | Assoc-voc | 0.357352 | Never-married | Adm-clerical | Unmarried | Black | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48829 | 1.730342 | Private | 0.978500 | Assoc-acdm | 0.744962 | Divorced | Prof-specialty | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48830 | -0.592825 | Private | -0.153595 | HS-grad | -0.417870 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48831 | 0.955953 | Private | 0.910723 | HS-grad | -0.417870 | Married-civ-spouse | Adm-clerical | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48832 | 1.730342 | Private | -0.948722 | HS-grad | -0.417870 | Married-civ-spouse | Sales | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
48833 | -0.592825 | Private | 2.377888 | HS-grad | -0.417870 | Married-civ-spouse | Craft-repair | Husband | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48834 | -1.367214 | Private | 1.531605 | HS-grad | -0.417870 | Never-married | Other-service | Own-child | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48835 | 0.955953 | Local-gov | 1.515021 | Masters | 1.520184 | Divorced | Other-service | Not-in-family | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48836 | -0.592825 | Private | 0.527612 | Bachelors | 1.132573 | Never-married | Prof-specialty | Own-child | White | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48837 | 0.181564 | Private | 0.244809 | Bachelors | 1.132573 | Divorced | Prof-specialty | Not-in-family | White | Female | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48838 | 1.730342 | ? | 1.250871 | HS-grad | -0.417870 | Widowed | ? | Other-relative | Black | Male | -0.271118 | -0.206016 | 0.053470 | United-States | <=50K |
48839 | 0.181564 | Private | 1.759484 | Bachelors | 1.132573 | Married-civ-spouse | Prof-specialty | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | <=50K |
48840 | 0.181564 | Private | -1.003732 | Bachelors | 1.132573 | Divorced | Adm-clerical | Own-child | Asian-Pac-Islander | Male | 2.380648 | -0.206016 | 0.053470 | United-States | <=50K |
48841 | -0.592825 | Self-emp-inc | -0.071019 | Bachelors | 1.132573 | Married-civ-spouse | Exec-managerial | Husband | White | Male | -0.271118 | -0.206016 | 1.172925 | United-States | >50K |
14653 rows × 15 columns
[39]:
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class", cat_encoding="one_hot")
[40]:
cleaned_training_df
[40]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.181564 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.064247 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | 1.054765 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
1 | 0.955953 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.009237 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
2 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.246964 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
3 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.428035 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -1.193092 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
4 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.412302 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
5 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.901345 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.520184 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
6 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.279485 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | -1.968313 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -2.185441 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
7 | 0.955953 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.189970 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
8 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.365494 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.520184 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | 5.032415 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
9 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.286491 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | 2.380648 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
10 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.862254 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 2.292380 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
11 | -0.592825 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.458800 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
12 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.639397 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
13 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.146086 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | 0.744962 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
14 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.644143 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ... | 0.357352 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.151677 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34175 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.382480 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34176 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.407759 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34177 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.153272 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34178 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.323123 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -1.193092 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34179 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.070743 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34180 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.761452 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | -2.185441 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34181 | 0.955953 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | -0.367482 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | 5.199568 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34182 | 1.730342 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.428648 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34183 | 0.955953 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | -0.817212 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34184 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.753877 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34185 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.311551 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | 7.001429 | 0.053470 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34186 | -1.367214 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | -0.720198 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34187 | 0.181564 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | 0.608356 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -1.193092 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34188 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.113276 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34189 rows × 15 columns
[41]:
cleaned_test_df
[41]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 0.181564 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.704744 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34190 | 0.181564 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.275191 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34191 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.147766 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ... | 0.357352 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34192 | 0.955953 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.655990 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34193 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.818617 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34194 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.335985 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34195 | -0.592825 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | 0.197621 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34196 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.407546 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34197 | -0.592825 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.149067 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -2.185441 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34198 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.020718 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34199 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.342117 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | -1.968313 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34200 | -0.592825 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | -0.376300 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34201 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.987078 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34202 | -1.367214 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | 0.042399 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34203 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.382011 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | 0.744962 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.332483 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | -1.065986 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48828 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.549787 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ... | 0.357352 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48829 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.978500 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | 0.744962 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48830 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.153595 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48831 | 0.955953 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.910723 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48832 | 1.730342 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.948722 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48833 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 2.377888 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48834 | -1.367214 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.531605 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48835 | 0.955953 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | 1.515021 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.520184 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48836 | -0.592825 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.527612 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48837 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.244809 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48838 | 1.730342 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | 1.250871 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48839 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.759484 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48840 | 0.181564 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.003732 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | 2.380648 | -0.206016 | 0.053470 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48841 | -0.592825 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | -0.071019 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | -0.271118 | -0.206016 | 1.172925 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
14653 rows × 15 columns
6. variance_threshold
and variance
parameter¶
There are two choices for variance_threshold
parameter: * True
: filtering numerical columns whose variance is less than the variance
value. * False
: do nothing
The default variance_threshold
is False.
The default variance
is 0.0.
[42]:
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
variance_threshold=True, variance=6.0)
[43]:
cleaned_training_df
[43]:
workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.064247 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
1 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.009237 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
2 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.246964 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
3 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.428035 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -1.193092 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
4 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.412302 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
5 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.901345 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.520184 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
6 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.279485 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | -1.968313 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
7 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.189970 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
8 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.365494 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.520184 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
9 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.286491 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
10 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.862254 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
11 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.458800 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
12 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.639397 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
13 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.146086 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | 0.744962 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
14 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.644143 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ... | 0.357352 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.151677 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34175 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.382480 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34176 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.407759 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34177 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.153272 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34178 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.323123 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -1.193092 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34179 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.070743 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34180 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.761452 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34181 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | -0.367482 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34182 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.428648 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34183 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | -0.817212 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34184 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.753877 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34185 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.311551 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34186 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | -0.720198 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34187 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | 0.608356 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -1.193092 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34188 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.113276 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34189 rows × 11 columns
[44]:
cleaned_test_df
[44]:
workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|
34189 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.704744 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34190 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.275191 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34191 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.147766 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ... | 0.357352 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34192 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.655990 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
34193 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.818617 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34194 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.335985 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34195 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | 0.197621 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34196 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.407546 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34197 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.149067 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34198 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.020718 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34199 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.342117 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | -1.968313 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34200 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | -0.376300 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34201 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.987078 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34202 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | 0.042399 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | -0.030259 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
34203 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.382011 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | 0.744962 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.332483 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48828 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.549787 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ... | 0.357352 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48829 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.978500 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | 0.744962 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48830 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.153595 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48831 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.910723 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48832 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -0.948722 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48833 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 2.377888 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48834 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.531605 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48835 | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | 1.515021 | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.520184 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48836 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.527612 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48837 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0.244809 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48838 | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] | 1.250871 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | -0.417870 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [0.0, 1.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48839 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 1.759484 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48840 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -1.003732 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | [0.0, 0.0, 1.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | <=50K |
48841 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0] | -0.071019 | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | 1.132573 | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | [0.0, 1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... | >50K |
14653 rows × 11 columns
7. num_scaling
parameter¶
There are three choices for num_scaling
parameter: * standardize
: standarding each numerical column with mean value and std value of this column. The transformation is (x - mean) / std. * minmax
: scaling each numerical column with min value and max value of this column. The transformation is (x - min) / (max - min) * maxabs
: scaling each numerical column with max absolute value of this column. The transformation is x / maxabs.
The default num_scaling
is standardize
.
[55]:
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, target="class",
cat_encoding='no_encoding',
num_scaling="minmax")
[56]:
cleaned_training_df
[56]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.50 | State-gov | 0.044302 | Bachelors | 0.800000 | Never-married | Adm-clerical | Not-in-family | White | Male | 0.25 | 0.00 | 0.50 | United-States | <=50K |
1 | 0.75 | Self-emp-not-inc | 0.048238 | Bachelors | 0.800000 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.00 | 0.00 | 0.00 | United-States | <=50K |
2 | 0.50 | Private | 0.138113 | HS-grad | 0.533333 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
3 | 0.75 | Private | 0.151068 | 11th | 0.400000 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
4 | 0.25 | Private | 0.221488 | Bachelors | 0.800000 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0.00 | 0.00 | 0.50 | Cuba | <=50K |
5 | 0.50 | Private | 0.184932 | Masters | 0.866667 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0.00 | 0.00 | 0.50 | United-States | <=50K |
6 | 0.75 | Private | 0.100448 | 9th | 0.266667 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | 0.00 | 0.00 | 0.00 | Jamaica | <=50K |
7 | 0.75 | Self-emp-not-inc | 0.134036 | HS-grad | 0.533333 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.00 | 0.00 | 0.50 | United-States | >50K |
8 | 0.25 | Private | 0.022749 | Masters | 0.866667 | Never-married | Prof-specialty | Not-in-family | White | Female | 1.00 | 0.00 | 0.75 | United-States | >50K |
9 | 0.50 | Private | 0.099947 | Bachelors | 0.800000 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.50 | 0.00 | 0.50 | United-States | >50K |
10 | 0.50 | Private | 0.182135 | Some-college | 0.600000 | Married-civ-spouse | Exec-managerial | Husband | Black | Male | 0.00 | 0.00 | 1.00 | United-States | >50K |
11 | 0.25 | State-gov | 0.087619 | Bachelors | 0.800000 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | 0.00 | 0.00 | 0.50 | India | >50K |
12 | 0.00 | Private | 0.074698 | Bachelors | 0.800000 | Never-married | Adm-clerical | Own-child | White | Female | 0.00 | 0.00 | 0.25 | United-States | <=50K |
13 | 0.25 | Private | 0.130896 | Assoc-acdm | 0.733333 | Never-married | Sales | Not-in-family | Black | Male | 0.00 | 0.00 | 0.75 | United-States | <=50K |
14 | 0.50 | Private | 0.074359 | Assoc-voc | 0.666667 | Married-civ-spouse | Craft-repair | Husband | Asian-Pac-Islander | Male | 0.00 | 0.00 | 0.50 | ? | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 0.50 | Private | 0.109592 | Some-college | 0.600000 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.00 | 0.00 | 0.50 | United-States | >50K |
34175 | 0.75 | Private | 0.093079 | HS-grad | 0.533333 | Separated | Handlers-cleaners | Not-in-family | White | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
34176 | 1.00 | Private | 0.091271 | HS-grad | 0.533333 | Married-civ-spouse | Other-service | Husband | Black | Male | 0.00 | 0.00 | 0.50 | ? | >50K |
34177 | 1.00 | Private | 0.109478 | Bachelors | 0.800000 | Divorced | Prof-specialty | Not-in-family | White | Female | 0.00 | 0.00 | 0.25 | United-States | <=50K |
34178 | 0.00 | Private | 0.143562 | 11th | 0.400000 | Never-married | Other-service | Own-child | White | Male | 0.00 | 0.00 | 0.00 | United-States | <=50K |
34179 | 0.75 | Private | 0.115383 | Some-college | 0.600000 | Divorced | Protective-serv | Unmarried | White | Female | 0.00 | 0.00 | 0.25 | United-States | <=50K |
34180 | 0.00 | Private | 0.065966 | Some-college | 0.600000 | Never-married | Sales | Other-relative | Asian-Pac-Islander | Male | 0.00 | 0.00 | 0.00 | India | <=50K |
34181 | 0.75 | Self-emp-inc | 0.094152 | Some-college | 0.600000 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.00 | 0.75 | 0.50 | United-States | >50K |
34182 | 1.00 | Self-emp-not-inc | 0.018231 | HS-grad | 0.533333 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0.00 | 0.00 | 0.25 | United-States | <=50K |
34183 | 0.75 | Local-gov | 0.061976 | Bachelors | 0.800000 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.00 | 0.00 | 0.75 | United-States | >50K |
34184 | 1.00 | Private | 0.066508 | HS-grad | 0.533333 | Married-civ-spouse | Other-service | Husband | Black | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
34185 | 0.50 | Private | 0.142734 | HS-grad | 0.533333 | Never-married | Sales | Not-in-family | White | Male | 0.00 | 1.00 | 0.50 | El-Salvador | <=50K |
34186 | 0.00 | ? | 0.068917 | HS-grad | 0.533333 | Never-married | ? | Own-child | White | Female | 0.00 | 0.00 | 0.50 | United-States | <=50K |
34187 | 0.50 | ? | 0.163970 | 11th | 0.400000 | Married-civ-spouse | ? | Wife | White | Female | 0.00 | 0.00 | 0.00 | United-States | <=50K |
34188 | 0.00 | Private | 0.200094 | HS-grad | 0.533333 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
34189 rows × 15 columns
[57]:
cleaned_test_df
[57]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 0.50 | Self-emp-not-inc | 0.170866 | Some-college | 0.600000 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0.0 | 0.0 | 0.75 | United-States | <=50K |
34190 | 0.50 | State-gov | 0.029210 | Bachelors | 0.800000 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | >50K |
34191 | 0.00 | Private | 0.109872 | Assoc-voc | 0.666667 | Never-married | Other-service | Own-child | White | Female | 0.0 | 0.0 | 0.00 | United-States | <=50K |
34192 | 0.75 | State-gov | 0.167378 | Some-college | 0.600000 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | >50K |
34193 | 0.75 | Private | 0.179013 | HS-grad | 0.533333 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34194 | 0.00 | Private | 0.096406 | Some-college | 0.600000 | Never-married | Sales | Own-child | White | Female | 0.0 | 0.0 | 0.25 | United-States | <=50K |
34195 | 0.25 | Local-gov | 0.134583 | Some-college | 0.600000 | Married-civ-spouse | Craft-repair | Other-relative | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34196 | 0.25 | Private | 0.221148 | Some-college | 0.600000 | Divorced | Adm-clerical | Unmarried | Black | Female | 0.0 | 0.0 | 0.25 | United-States | <=50K |
34197 | 0.25 | State-gov | 0.131109 | Bachelors | 0.800000 | Never-married | Prof-specialty | Not-in-family | White | Female | 0.0 | 0.0 | 0.00 | United-States | <=50K |
34198 | 0.00 | Private | 0.118962 | Some-college | 0.600000 | Separated | Other-service | Own-child | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34199 | 0.25 | Private | 0.095967 | 9th | 0.266667 | Separated | Craft-repair | Not-in-family | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34200 | 0.25 | Local-gov | 0.093522 | Some-college | 0.600000 | Divorced | Adm-clerical | Unmarried | White | Female | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34201 | 0.50 | Private | 0.262611 | Some-college | 0.600000 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34202 | 0.00 | ? | 0.123478 | Some-college | 0.600000 | Never-married | ? | Own-child | White | Female | 0.0 | 0.0 | 0.25 | United-States | <=50K |
34203 | 0.50 | Private | 0.021567 | Assoc-acdm | 0.733333 | Married-spouse-absent | Adm-clerical | Other-relative | White | Male | 0.0 | 0.0 | 0.75 | United-States | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 0.75 | Private | 0.144232 | HS-grad | 0.533333 | Separated | Priv-house-serv | Not-in-family | White | Female | 0.0 | 0.0 | 0.25 | United-States | <=50K |
48828 | 0.50 | Private | 0.159779 | Assoc-voc | 0.666667 | Never-married | Adm-clerical | Unmarried | Black | Female | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48829 | 1.00 | Private | 0.190452 | Assoc-acdm | 0.733333 | Divorced | Prof-specialty | Not-in-family | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48830 | 0.25 | Private | 0.109455 | HS-grad | 0.533333 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48831 | 0.75 | Private | 0.185603 | HS-grad | 0.533333 | Married-civ-spouse | Adm-clerical | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48832 | 1.00 | Private | 0.052567 | HS-grad | 0.533333 | Married-civ-spouse | Sales | Husband | White | Male | 0.0 | 0.0 | 0.75 | United-States | <=50K |
48833 | 0.25 | Private | 0.290572 | HS-grad | 0.533333 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48834 | 0.00 | Private | 0.230024 | HS-grad | 0.533333 | Never-married | Other-service | Own-child | White | Female | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48835 | 0.75 | Local-gov | 0.228838 | Masters | 0.866667 | Divorced | Other-service | Not-in-family | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48836 | 0.25 | Private | 0.158193 | Bachelors | 0.800000 | Never-married | Prof-specialty | Own-child | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48837 | 0.50 | Private | 0.137959 | Bachelors | 0.800000 | Divorced | Prof-specialty | Not-in-family | White | Female | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48838 | 1.00 | ? | 0.209939 | HS-grad | 0.533333 | Widowed | ? | Other-relative | Black | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48839 | 0.50 | Private | 0.246328 | Bachelors | 0.800000 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.0 | 0.0 | 0.75 | United-States | <=50K |
48840 | 0.50 | Private | 0.048632 | Bachelors | 0.800000 | Divorced | Adm-clerical | Own-child | Asian-Pac-Islander | Male | 0.5 | 0.0 | 0.50 | United-States | <=50K |
48841 | 0.25 | Self-emp-inc | 0.115363 | Bachelors | 0.800000 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.0 | 0.0 | 0.75 | United-States | >50K |
14653 rows × 15 columns
8. include_operators
and exclude_operators
parameter¶
The include_operators
indicates which operator must be included in the cleaning pipeline. It is a list. For example: * ['one_hot', 'minmax', 'median', 'most_frequent']
The exclude_operators
indicates which operator must be excluded in the cleaning pipeline. It has the same format with include_operators
.
The valid choices for include_operators
and exclude_operators
: * one_hot
* constant
* most_frequent
* drop
* mean
* median
* standardize
* minmax
* maxabs
9. customized_cat_pipeline
and customized_num_pipeline
parameter¶
Experienced users can specify their own customized_cat_pipeline
and customized_num_pipeline
. The two parameters are lists including dictionaries of each component. Each compontent is also a dictionary including the name of specified operator and related parameters. For example: * [ {"cat_imputation": {"operator": 'constant', "cat_null_value": ['?'], "fill_val": "Hahahaha!!!!!"}}, ]
Users can also specifiy their own operators. They just need to define a typical class with the __init__
function, the fit
, transform
and fit_transform
functions. When using them, the name of the class can be put at the operator’s position.
[58]:
from typing import Any, Union
import dask.dataframe as dd
import pandas as pd
import numpy as np
class MaxAbsScaler:
def __init__(self) -> None:
self.name = "minmaxScaler"
def fit(self,
df: pd.Series) -> Any:
self.maxabs = df.abs().max()
return self
def transform(self,
df: pd.Series) -> pd.Series:
result = df.map(self.compute_val)
return result
def fit_transform(self,
df: pd.Series) -> pd.Series:
return self.fit(df).transform(df)
def compute_val(self, val):
return val / self.maxabs
customized_cat_pipeline = [
{"cat_imputation": {"operator": 'constant', "cat_null_value": ['?'], "fill_val": "Hahahaha!!!!!"}},
]
customized_num_pipeline = [
{"num_scaling": {"operator": MaxAbsScaler}},
]
cleaned_training_df, cleaned_test_df = clean_ml(training_df, test_df, customized_cat_pipeline=customized_cat_pipeline, customized_num_pipeline=customized_num_pipeline)
[59]:
cleaned_training_df
[59]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.50 | State-gov | 0.052210 | Bachelors | 0.8125 | Never-married | Adm-clerical | Not-in-family | White | Male | 0.25 | 0.00 | 0.50 | United-States | <=50K |
1 | 0.75 | Self-emp-not-inc | 0.056113 | Bachelors | 0.8125 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.00 | 0.00 | 0.00 | United-States | <=50K |
2 | 0.50 | Private | 0.145245 | HS-grad | 0.5625 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
3 | 0.75 | Private | 0.158093 | 11th | 0.4375 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
4 | 0.25 | Private | 0.227930 | Bachelors | 0.8125 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0.00 | 0.00 | 0.50 | Cuba | <=50K |
5 | 0.50 | Private | 0.191676 | Masters | 0.8750 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0.00 | 0.00 | 0.50 | United-States | <=50K |
6 | 0.75 | Private | 0.107891 | 9th | 0.3125 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | 0.00 | 0.00 | 0.00 | Jamaica | <=50K |
7 | 0.75 | Self-emp-not-inc | 0.141201 | HS-grad | 0.5625 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.00 | 0.00 | 0.50 | United-States | >50K |
8 | 0.25 | Private | 0.030835 | Masters | 0.8750 | Never-married | Prof-specialty | Not-in-family | White | Female | 1.00 | 0.00 | 0.75 | United-States | >50K |
9 | 0.50 | Private | 0.107394 | Bachelors | 0.8125 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.50 | 0.00 | 0.50 | United-States | >50K |
10 | 0.50 | Private | 0.188902 | Some-college | 0.6250 | Married-civ-spouse | Exec-managerial | Husband | Black | Male | 0.00 | 0.00 | 1.00 | United-States | >50K |
11 | 0.25 | State-gov | 0.095168 | Bachelors | 0.8125 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | 0.00 | 0.00 | 0.50 | India | >50K |
12 | 0.00 | Private | 0.082354 | Bachelors | 0.8125 | Never-married | Adm-clerical | Own-child | White | Female | 0.00 | 0.00 | 0.25 | United-States | <=50K |
13 | 0.25 | Private | 0.138087 | Assoc-acdm | 0.7500 | Never-married | Sales | Not-in-family | Black | Male | 0.00 | 0.00 | 0.75 | United-States | <=50K |
14 | 0.50 | Private | 0.082018 | Assoc-voc | 0.6875 | Married-civ-spouse | Craft-repair | Husband | Asian-Pac-Islander | Male | 0.00 | 0.00 | 0.50 | Hahahaha!!!!! | >50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34174 | 0.50 | Private | 0.116960 | Some-college | 0.6250 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.00 | 0.00 | 0.50 | United-States | >50K |
34175 | 0.75 | Private | 0.100584 | HS-grad | 0.5625 | Separated | Handlers-cleaners | Not-in-family | White | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
34176 | 1.00 | Private | 0.098790 | HS-grad | 0.5625 | Married-civ-spouse | Other-service | Husband | Black | Male | 0.00 | 0.00 | 0.50 | Hahahaha!!!!! | >50K |
34177 | 1.00 | Private | 0.116847 | Bachelors | 0.8125 | Divorced | Prof-specialty | Not-in-family | White | Female | 0.00 | 0.00 | 0.25 | United-States | <=50K |
34178 | 0.00 | Private | 0.150649 | 11th | 0.4375 | Never-married | Other-service | Own-child | White | Male | 0.00 | 0.00 | 0.00 | United-States | <=50K |
34179 | 0.75 | Private | 0.122702 | Some-college | 0.6250 | Divorced | Protective-serv | Unmarried | White | Female | 0.00 | 0.00 | 0.25 | United-States | <=50K |
34180 | 0.00 | Private | 0.073694 | Some-college | 0.6250 | Never-married | Sales | Other-relative | Asian-Pac-Islander | Male | 0.00 | 0.00 | 0.00 | India | <=50K |
34181 | 0.75 | Self-emp-inc | 0.101648 | Some-college | 0.6250 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.00 | 0.75 | 0.50 | United-States | >50K |
34182 | 1.00 | Self-emp-not-inc | 0.026354 | HS-grad | 0.5625 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0.00 | 0.00 | 0.25 | United-States | <=50K |
34183 | 0.75 | Local-gov | 0.069738 | Bachelors | 0.8125 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.00 | 0.00 | 0.75 | United-States | >50K |
34184 | 1.00 | Private | 0.074232 | HS-grad | 0.5625 | Married-civ-spouse | Other-service | Husband | Black | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
34185 | 0.50 | Private | 0.149828 | HS-grad | 0.5625 | Never-married | Sales | Not-in-family | White | Male | 0.00 | 1.00 | 0.50 | El-Salvador | <=50K |
34186 | 0.00 | Hahahaha!!!!! | 0.076621 | HS-grad | 0.5625 | Never-married | Hahahaha!!!!! | Own-child | White | Female | 0.00 | 0.00 | 0.50 | United-States | <=50K |
34187 | 0.50 | Hahahaha!!!!! | 0.170887 | 11th | 0.4375 | Married-civ-spouse | Hahahaha!!!!! | Wife | White | Female | 0.00 | 0.00 | 0.00 | United-States | <=50K |
34188 | 0.00 | Private | 0.206713 | HS-grad | 0.5625 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | 0.00 | 0.00 | 0.50 | United-States | <=50K |
34189 rows × 15 columns
[60]:
cleaned_test_df
[60]:
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34189 | 0.50 | Self-emp-not-inc | 0.177726 | Some-college | 0.6250 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0.0 | 0.0 | 0.75 | United-States | <=50K |
34190 | 0.50 | State-gov | 0.037242 | Bachelors | 0.8125 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | >50K |
34191 | 0.00 | Private | 0.117237 | Assoc-voc | 0.6875 | Never-married | Other-service | Own-child | White | Female | 0.0 | 0.0 | 0.00 | United-States | <=50K |
34192 | 0.75 | State-gov | 0.174267 | Some-college | 0.6250 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | >50K |
34193 | 0.75 | Private | 0.185806 | HS-grad | 0.5625 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34194 | 0.00 | Private | 0.103883 | Some-college | 0.6250 | Never-married | Sales | Own-child | White | Female | 0.0 | 0.0 | 0.25 | United-States | <=50K |
34195 | 0.25 | Local-gov | 0.141744 | Some-college | 0.6250 | Married-civ-spouse | Craft-repair | Other-relative | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34196 | 0.25 | Private | 0.227593 | Some-college | 0.6250 | Divorced | Adm-clerical | Unmarried | Black | Female | 0.0 | 0.0 | 0.25 | United-States | <=50K |
34197 | 0.25 | State-gov | 0.138299 | Bachelors | 0.8125 | Never-married | Prof-specialty | Not-in-family | White | Female | 0.0 | 0.0 | 0.00 | United-States | <=50K |
34198 | 0.00 | Private | 0.126252 | Some-college | 0.6250 | Separated | Other-service | Own-child | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34199 | 0.25 | Private | 0.103447 | 9th | 0.3125 | Separated | Craft-repair | Not-in-family | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34200 | 0.25 | Local-gov | 0.101022 | Some-college | 0.6250 | Divorced | Adm-clerical | Unmarried | White | Female | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34201 | 0.50 | Private | 0.268713 | Some-college | 0.6250 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
34202 | 0.00 | Hahahaha!!!!! | 0.130730 | Some-college | 0.6250 | Never-married | Hahahaha!!!!! | Own-child | White | Female | 0.0 | 0.0 | 0.25 | United-States | <=50K |
34203 | 0.50 | Private | 0.029663 | Assoc-acdm | 0.7500 | Married-spouse-absent | Adm-clerical | Other-relative | White | Male | 0.0 | 0.0 | 0.75 | United-States | <=50K |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48827 | 0.75 | Private | 0.151313 | HS-grad | 0.5625 | Separated | Priv-house-serv | Not-in-family | White | Female | 0.0 | 0.0 | 0.25 | United-States | <=50K |
48828 | 0.50 | Private | 0.166731 | Assoc-voc | 0.6875 | Never-married | Adm-clerical | Unmarried | Black | Female | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48829 | 1.00 | Private | 0.197150 | Assoc-acdm | 0.7500 | Divorced | Prof-specialty | Not-in-family | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48830 | 0.25 | Private | 0.116824 | HS-grad | 0.5625 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48831 | 0.75 | Private | 0.192341 | HS-grad | 0.5625 | Married-civ-spouse | Adm-clerical | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48832 | 1.00 | Private | 0.060407 | HS-grad | 0.5625 | Married-civ-spouse | Sales | Husband | White | Male | 0.0 | 0.0 | 0.75 | United-States | <=50K |
48833 | 0.25 | Private | 0.296442 | HS-grad | 0.5625 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48834 | 0.00 | Private | 0.236395 | HS-grad | 0.5625 | Never-married | Other-service | Own-child | White | Female | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48835 | 0.75 | Local-gov | 0.235218 | Masters | 0.8750 | Divorced | Other-service | Not-in-family | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48836 | 0.25 | Private | 0.165158 | Bachelors | 0.8125 | Never-married | Prof-specialty | Own-child | White | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48837 | 0.50 | Private | 0.145092 | Bachelors | 0.8125 | Divorced | Prof-specialty | Not-in-family | White | Female | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48838 | 1.00 | Hahahaha!!!!! | 0.216476 | HS-grad | 0.5625 | Widowed | Hahahaha!!!!! | Other-relative | Black | Male | 0.0 | 0.0 | 0.50 | United-States | <=50K |
48839 | 0.50 | Private | 0.252564 | Bachelors | 0.8125 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0.0 | 0.0 | 0.75 | United-States | <=50K |
48840 | 0.50 | Private | 0.056503 | Bachelors | 0.8125 | Divorced | Adm-clerical | Own-child | Asian-Pac-Islander | Male | 0.5 | 0.0 | 0.50 | United-States | <=50K |
48841 | 0.25 | Self-emp-inc | 0.122683 | Bachelors | 0.8125 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0.0 | 0.0 | 0.75 | United-States | >50K |
14653 rows × 15 columns
[ ]: