Dataprep supports customizability for plot(), plot_missing(), plot_correlation() and create_report(). There are mainly two settings, display and config.
plot()
plot_missing()
plot_correlation()
create_report()
display
config
display is a list of names which controls the Tabs, Sections and Sessions you want to show.
config is a dictionary that contains the customizable parameters and designated values.
For your convenience, the input for display can directly be copied from the Tabs. You can save the computations by diaplaying less contents.
And for config, we developed the how-to guide function to help you mange the frequently-use parameters. Click the question mark icon in the upper right corner of each plot, in the pop-up you will see the customizable parameters for that plot, parameters’ brief descriptions and parameters’ default settings. You can easily use the Copy All Parameters button to copy the parameters with their default settings into a dictionary. Then customize the settings and pass to the config argument.
how-to guide
question mark icon
Copy All Parameters
There are two types of parameters, global and local.
Local parameters are plot-specified and the names are separated by .. The portion before the first . is plot name and the portion after the . is parameter name. e.g. bar.bars.
.
bar.bars
Global parameter applies to all the plots which has that parameter. It is single-word. e.g. ngroups .
ngroups
When global and local parameter are both given, the global parameter will be overwrote by local parameters for specific plots. You can find more details about parameters in parameter_configurations.
[1]:
from dataprep.eda import plot,create_report from dataprep.datasets import load_dataset df = load_dataset('titanic') plot(df, 'Pclass', display=['Stats', 'Bar Chart', 'Pie Chart'])
[2]:
create_report(df,display=["Overview","Interactions"])
[3]:
plot(df, display=["Stats", "Insights"])
[4]:
plot(df, "Pclass", config={'bar.bars': 10, 'bar.sort_descending': True, 'bar.yscale': 'linear', 'height': 400, 'width': 450, })
[5]:
plot(df,config={'insight.missing.threshold':20, 'insight.duplicates.threshold':20})