dataprep.eda.correlation¶

plot_correlation¶

dataprep.eda.correlation.plot_correlation(df, col1=None, col2=None, *, value_range=None, k=None, config=None, display=None, progress=True)[source]¶

There are also some parameters such as k and value_range to satisfy your requirement

Parameters

df (Union[DataFrame, DataFrame]) – The pandas data_frame for which plots are calculated for each column.
col1 (Optional[str]) – A valid column name of the data frame.
col2 (Optional[str]) – A valid column name of the data frame.
value_range (Optional[Tuple[float, float]]) – Range of value.
k (Optional[int]) – Choose top-k element.
config (Optional[Dict[str, Any]]) – A dictionary for configuring the visualizations E.g. config={“scatter.sample_size”: 5000}
display (Optional[List[str]]) – A list containing the names of the visualizations to display E.g. display=[“Pearson”]
progress (bool) – Enable the progress bar.

Examples

>>> from dataprep.eda.correlation.computation import plot_correlation
>>> import pandas as pd
>>> df = pd.read_csv("suicide-rate.csv")
>>> plot_correlation(df)
>>> plot_correlation(df, k=6)
>>> plot_correlation(df, "suicides")
>>> plot_correlation(df, "suicides", k=3)
>>> plot_correlation(df, "suicides", value_range=[-1, 0.3])
>>> plot_correlation(df, "suicides", value_range=[-1, 0.3], k=2)
>>> plot_correlation(df, x_name="population", y_name="suicides_no")
>>> plot_correlation(df, x_name="population", y_name="suicides", k=5)

Note

This function only supports numerical or categorical data, and it is better to drop None, Nan and Null value before using it

Return type: Container

compute_correlation¶

This module implements the intermediates computation for plot_correlation(df) function.

dataprep.eda.correlation.compute.compute_correlation(df, col1=None, col2=None, *, cfg=None, display=None, value_range=None, k=None)[source]¶

Parameters

df (Union[DataFrame, DataFrame, EDAFrame]) – The pandas dataframe for which plots are calculated for each column.
cfg (Union[Config, Dict[str, Any], None], default None) – Config instance
col1 (Optional[str]) – A valid column name of the dataframe
col2 (Optional[str]) – A valid column name of the dataframe
value_range (Optional[Tuple[float, float]]) – If the correlation value is out of the range, don’t show it.
cfg – When a user call plot_correlation(), the created Config object will be passed to compute_correlation(). When a user call compute_correlation() directly, if he/she wants to customize the output, cfg is a dictionary for configuring. If not, cfg is None and default values will be used for parameters.
display (Optional[List[str]], default None) – A list containing the names of the visualizations to display. Only exist when a user call compute_correlation() directly and want to customize the output
k (Optional[int]) – Choose top-k element

Return type

Intermediate

render_correlation¶

This module implements the visualization for plot_correlation(df) function

dataprep.eda.correlation.render.render_correlation(itmdt, cfg)[source]¶

Render a correlation plot

Parameters

itmdt (Intermediate) – Intermediate computations
cfg (Config) – Config instance

Return type

Any