dataprep.eda.correlation.
plot_correlation
There are also some parameters such as k and value_range to satisfy your requirement
df (Union[DataFrame, DataFrame]) – The pandas data_frame for which plots are calculated for each column.
Union
DataFrame
col1 (Optional[str]) – A valid column name of the data frame.
Optional
str
col2 (Optional[str]) – A valid column name of the data frame.
value_range (Optional[Tuple[float, float]]) – Range of value.
Tuple
float
k (Optional[int]) – Choose top-k element.
int
config (Optional[Dict[str, Any]]) – A dictionary for configuring the visualizations E.g. config={“scatter.sample_size”: 5000}
Dict
Any
display (Optional[List[str]]) – A list containing the names of the visualizations to display E.g. display=[“Pearson”]
List
progress (bool) – Enable the progress bar.
bool
Examples
>>> from dataprep.eda.correlation.computation import plot_correlation >>> import pandas as pd >>> df = pd.read_csv("suicide-rate.csv") >>> plot_correlation(df) >>> plot_correlation(df, k=6) >>> plot_correlation(df, "suicides") >>> plot_correlation(df, "suicides", k=3) >>> plot_correlation(df, "suicides", value_range=[-1, 0.3]) >>> plot_correlation(df, "suicides", value_range=[-1, 0.3], k=2) >>> plot_correlation(df, x_name="population", y_name="suicides_no") >>> plot_correlation(df, x_name="population", y_name="suicides", k=5)
Note
This function only supports numerical or categorical data, and it is better to drop None, Nan and Null value before using it
Container
This module implements the intermediates computation for plot_correlation(df) function.
dataprep.eda.correlation.compute.
compute_correlation
df (Union[DataFrame, DataFrame, EDAFrame]) – The pandas dataframe for which plots are calculated for each column.
EDAFrame
cfg (Union[Config, Dict[str, Any], None], default None) – Config instance
col1 (Optional[str]) – A valid column name of the dataframe
col2 (Optional[str]) – A valid column name of the dataframe
value_range (Optional[Tuple[float, float]]) – If the correlation value is out of the range, don’t show it.
cfg – When a user call plot_correlation(), the created Config object will be passed to compute_correlation(). When a user call compute_correlation() directly, if he/she wants to customize the output, cfg is a dictionary for configuring. If not, cfg is None and default values will be used for parameters.
display (Optional[List[str]], default None) – A list containing the names of the visualizations to display. Only exist when a user call compute_correlation() directly and want to customize the output
k (Optional[int]) – Choose top-k element
Intermediate
This module implements the visualization for plot_correlation(df) function
dataprep.eda.correlation.render.
render_correlation
Render a correlation plot
itmdt (Intermediate) – Intermediate computations
cfg (Config) – Config instance
Config