dataprep.eda.missing.
plot_missing
df (Union[DataFrame, DataFrame]) – the pandas data_frame for which plots are calculated for each column.
Union
DataFrame
col1 (Optional[str]) – a valid column name of the data frame.
Optional
str
col2 (Optional[str]) – a valid column name of the data frame.
config (Optional[Dict[str, Any]]) – A dictionary for configuring the visualizations. E.g. config={“spectrum.bins”: 20}
Dict
Any
display (Optional[List[str]]) – A list containing the names of the visualizations to display E.g. display=[“Stats”, “Spectrum”]
List
dtype (str or DType or dict of str or dict of DType, default None) – Specify Data Types for designated column or all columns. E.g. dtype = {“a”: Continuous, “b”: “Nominal”} or dtype = {“a”: Continuous(), “b”: “nominal”} or dtype = Continuous() or dtype = “Continuous” or dtype = Continuous().
progress (bool) – Enable the progress bar.
bool
Examples
>>> from dataprep.eda.missing.computation import plot_missing >>> import pandas as pd >>> df = pd.read_csv("suicide-rate.csv") >>> plot_missing(df, "HDI_for_year") >>> plot_missing(df, "HDI_for_year", "population")
Container
This module implements the plot_missing(df) function’s calculating intermediate part
dataprep.eda.missing.compute.
compute_missing
This function is designed to deal with missing values There are three functions: plot_missing(df), plot_missing(df, x) plot_missing(df, x, y)
df (Union[DataFrame, DataFrame, EDAFrame]) – the pandas data_frame for which plots are calculated for each column
EDAFrame
col1 (Optional[str]) – a valid column name of the data frame
col2 (Optional[str]) – a valid column name of the data frame
cfg (Union[Config, Dict[str, Any], None], default None) – When a user call plot_missing(), the created Config object will be passed to compute_missing(). When a user call compute_missing() directly, if he/she wants to customize the output, cfg is a dictionary for configuring. If not, cfg is None and default values will be used for parameters.
display (Optional[List[str]], default None) – A list containing the names of the visualizations to display. Only exist when a user call compute_missing() directly and want to customize the output
dtype (str or DType or dict of str or dict of DType, default None) – Specify Data Types for designated column or all columns. E.g. dtype = {“a”: Continuous, “b”: “Nominal”} or dtype = {“a”: Continuous(), “b”: “nominal”} or dtype = Continuous() or dtype = “Continuous” or dtype = Continuous()
Intermediate
This module implements the plot_missing(df, x, y) function’s visualization part.
dataprep.eda.missing.render.
render_missing
Render the visualizations from plot_missing
Dict[str, Any]