dataprep.eda.diff.
plot_diff
This function is to compute and visualize the differences between 2 or more(up to 5) datasets.
df (Union[List[Union[DataFrame, DataFrame]], DataFrame, DataFrame]) – The DataFrame(s) to be compared.
Union
List
DataFrame
x (Optional[str]) – The column to be emphasized in the comparision.
Optional
str
config (Optional[Dict[str, Any]]) – A dictionary for configuring the visualizations E.g. config={“hist.bins”: 20}
Dict
Any
display (Optional[List[str]]) – A list containing the names of the visualizations to display E.g. display=[“Histogram”]
dtype (str or DType or dict of str or dict of DType, default None) – Specify Data Types for designated column or all columns. E.g. dtype = {“a”: Continuous, “b”: “Nominal”} or dtype = {“a”: Continuous(), “b”: “nominal”} or dtype = Continuous() or dtype = “Continuous” or dtype = Continuous().
progress (bool) – Whether to show the progress bar.
bool
Examples
>>> from dataprep.datasets import load_dataset >>> from dataprep.eda import plot_diff >>> df_train = load_dataset('house_prices_train') >>> df_test = load_dataset('house_prices_test') >>> plot_diff([df_train, df_test])
Container
Computations for plot_diff([df…]).
dataprep.eda.diff.compute.
compute_diff
All in one compute function.
df (Union[List[Union[DataFrame, DataFrame]], DataFrame, DataFrame]) – DataFrame from which visualizations are generated
cfg (Union[Config, Dict[str, Any], None], default None) – When a user call plot(), the created Config object will be passed to compute(). When a user call compute() directly, if he/she wants to customize the output, cfg is a dictionary for configuring. If not, cfg is None and default values will be used for parameters.
display (Optional[List[str]], default None) – A list containing the names of the visualizations to display. Only exist when a user call compute() directly and want to customize the output
x (Optional[str], default None) – A valid column name from the dataframe
dtype (str or DType or dict of str or dict of DType, default None) – Specify Data Types for designated column or all columns. E.g. dtype = {“a”: Continuous, “b”: “Nominal”} or dtype = {“a”: Continuous(), “b”: “nominal”} or dtype = Continuous() or dtype = “Continuous” or dtype = Continuous()
Intermediate
This module implements the visualization for the plot_diff function.
dataprep.eda.diff.render.
render_diff
Render a basic plot
itmdt (Intermediate) – The Intermediate containing results from the compute function.
cfg (Config) – Config instance
Config
Dict[str, Any]