dataprep.eda.missing

plot_missing

dataprep.eda.missing.plot_missing(df, col1=None, col2=None, *, config=None, display=None, dtype=None, progress=True)[source]
Parameters
  • df (Union[DataFrame, DataFrame]) – the pandas data_frame for which plots are calculated for each column.

  • col1 (Optional[str]) – a valid column name of the data frame.

  • col2 (Optional[str]) – a valid column name of the data frame.

  • config (Optional[Dict[str, Any]]) – A dictionary for configuring the visualizations. E.g. config={“spectrum.bins”: 20}

  • display (Optional[List[str]]) – A list containing the names of the visualizations to display E.g. display=[“Stats”, “Spectrum”]

  • dtype (str or DType or dict of str or dict of DType, default None) – Specify Data Types for designated column or all columns. E.g. dtype = {“a”: Continuous, “b”: “Nominal”} or dtype = {“a”: Continuous(), “b”: “nominal”} or dtype = Continuous() or dtype = “Continuous” or dtype = Continuous().

  • progress (bool) – Enable the progress bar.

Examples

>>> from dataprep.eda.missing.computation import plot_missing
>>> import pandas as pd
>>> df = pd.read_csv("suicide-rate.csv")
>>> plot_missing(df, "HDI_for_year")
>>> plot_missing(df, "HDI_for_year", "population")
Return type

Container

compute_missing

This module implements the plot_missing(df) function’s calculating intermediate part

dataprep.eda.missing.compute.compute_missing(df, col1=None, col2=None, *, cfg=None, display=None, dtype=None)[source]

This function is designed to deal with missing values There are three functions: plot_missing(df), plot_missing(df, x) plot_missing(df, x, y)

Parameters
  • df (Union[DataFrame, DataFrame, EDAFrame]) – the pandas data_frame for which plots are calculated for each column

  • col1 (Optional[str]) – a valid column name of the data frame

  • col2 (Optional[str]) – a valid column name of the data frame

  • cfg (Union[Config, Dict[str, Any], None], default None) – When a user call plot_missing(), the created Config object will be passed to compute_missing(). When a user call compute_missing() directly, if he/she wants to customize the output, cfg is a dictionary for configuring. If not, cfg is None and default values will be used for parameters.

  • display (Optional[List[str]], default None) – A list containing the names of the visualizations to display. Only exist when a user call compute_missing() directly and want to customize the output

  • dtype (str or DType or dict of str or dict of DType, default None) – Specify Data Types for designated column or all columns. E.g. dtype = {“a”: Continuous, “b”: “Nominal”} or dtype = {“a”: Continuous(), “b”: “nominal”} or dtype = Continuous() or dtype = “Continuous” or dtype = Continuous()

Examples

>>> from dataprep.eda.missing.computation import plot_missing
>>> import pandas as pd
>>> df = pd.read_csv("suicide-rate.csv")
>>> plot_missing(df, "HDI_for_year")
>>> plot_missing(df, "HDI_for_year", "population")
Return type

Intermediate

render_missing

This module implements the plot_missing(df, x, y) function’s visualization part.

dataprep.eda.missing.render.render_missing(itmdt, cfg)[source]

Render the visualizations from plot_missing

Return type

Dict[str, Any]