dataprep.eda.correlation

plot_correlation

dataprep.eda.correlation.plot_correlation(df, col1=None, col2=None, *, value_range=None, k=None, config=None, display=None, progress=True)[source]

There are also some parameters such as k and value_range to satisfy your requirement

Parameters
  • df (Union[DataFrame, DataFrame]) – The pandas data_frame for which plots are calculated for each column.

  • col1 (Optional[str]) – A valid column name of the data frame.

  • col2 (Optional[str]) – A valid column name of the data frame.

  • value_range (Optional[Tuple[float, float]]) – Range of value.

  • k (Optional[int]) – Choose top-k element.

  • config (Optional[Dict[str, Any]]) – A dictionary for configuring the visualizations E.g. config={“scatter.sample_size”: 5000}

  • display (Optional[List[str]]) – A list containing the names of the visualizations to display E.g. display=[“Pearson”]

  • progress (bool) – Enable the progress bar.

Examples

>>> from dataprep.eda.correlation.computation import plot_correlation
>>> import pandas as pd
>>> df = pd.read_csv("suicide-rate.csv")
>>> plot_correlation(df)
>>> plot_correlation(df, k=6)
>>> plot_correlation(df, "suicides")
>>> plot_correlation(df, "suicides", k=3)
>>> plot_correlation(df, "suicides", value_range=[-1, 0.3])
>>> plot_correlation(df, "suicides", value_range=[-1, 0.3], k=2)
>>> plot_correlation(df, x_name="population", y_name="suicides_no")
>>> plot_correlation(df, x_name="population", y_name="suicides", k=5)

Note

This function only supports numerical or categorical data, and it is better to drop None, Nan and Null value before using it

Return type

Container

compute_correlation

This module implements the intermediates computation for plot_correlation(df) function.

dataprep.eda.correlation.compute.compute_correlation(df, col1=None, col2=None, *, cfg=None, display=None, value_range=None, k=None)[source]
Parameters
  • df (Union[DataFrame, DataFrame, EDAFrame]) – The pandas dataframe for which plots are calculated for each column.

  • cfg (Union[Config, Dict[str, Any], None], default None) – Config instance

  • col1 (Optional[str]) – A valid column name of the dataframe

  • col2 (Optional[str]) – A valid column name of the dataframe

  • value_range (Optional[Tuple[float, float]]) – If the correlation value is out of the range, don’t show it.

  • cfg – When a user call plot_correlation(), the created Config object will be passed to compute_correlation(). When a user call compute_correlation() directly, if he/she wants to customize the output, cfg is a dictionary for configuring. If not, cfg is None and default values will be used for parameters.

  • display (Optional[List[str]], default None) – A list containing the names of the visualizations to display. Only exist when a user call compute_correlation() directly and want to customize the output

  • k (Optional[int]) – Choose top-k element

Return type

Intermediate

render_correlation

This module implements the visualization for plot_correlation(df) function

dataprep.eda.correlation.render.render_correlation(itmdt, cfg)[source]

Render a correlation plot

Parameters
Return type

Any