This module implements the create_report(df) function.
dataprep.eda.create_report.
create_report
This function is to generate and render element in a report object.
df (DataFrame) – The DataFrame for which data are calculated.
DataFrame
config (Optional[Dict[str, Any]]) – A dictionary for configuring the visualizations E.g. config={“hist.bins”: 20}
Optional
Dict
str
Any
display (Optional[List[str]]) – The list that contains the names of plots user wants to display, E.g. display = [“bar”, “hist”] Without user’s specifications, the default is “auto”
List
title (Optional[str], default "DataPrep Report") – The title of the report, which will be shown on the navigation bar.
mode (Optional[str], default "basic") – This controls what type of report to be generated. Currently only the ‘basic’ is fully implemented.
progress (bool) – Whether to show the progress bar.
bool
Examples
>>> import pandas as pd >>> from dataprep.eda import create_report >>> df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') >>> report = create_report(df) >>> report # show report in notebook >>> report.save('My Fantastic Report') # save report to local disk >>> report.show_browser() # show report in the browser
Report
This module implements the formatting for create_report(df) function.
dataprep.eda.create_report.formatter.
basic_computations
Computations for the basic version.
df (EDAFrame) – The DataFrame for which data are calculated.
EDAFrame
df_num – The DataFrame of numerical column (used for correlation). It is seperated from df since the small distinct value numerical column in df is regarded as categorical column, and will transform to str then used for other plots. But they should be regarded as numerical column in df_num and used in correlation. This is a temporary fix, in the future we should treat those small distinct value numerical columns as ordinary in both correlation plots and other plots.
cfg (Config) – The config dict user passed in. E.g. config = {“hist.bins”: 20} Without user’s specifications, the default is “auto”
Config
Tuple[Dict[str, Any], Optional[Dict[str, Any]]]
Tuple
format_basic
Format basic version.
A dictionary in which formatted data is stored. This variable acts like an API in passing data to the template engine.
Dict[str, Any]
format_report
Format the data and figures needed by report
df (Union[DataFrame, DataFrame]) – The DataFrame for which data are calculated.
Union
cfg (Config) – The config instance
mode (Optional[str]) – This controls what type of report to be generated. Currently only the ‘basic’ is fully implemented.
A dictionary in which formatted data will be stored. This variable acts like an API in passing data to the template engine.