dataprep.eda.distribution.
plot
Generates plots for exploratory data analysis.
If no columns are specified, the distribution of each coloumn is plotted. A histogram is plotted if the column contains numerical values, a bar chart is plotted if the column contains categorical values, a line chart is plotted if the column is of type datetime.
If one column (x) is specified, the distribution of x is plotted in various ways. If x contains categorical values, a bar chart and pie chart are plotted. If x contains numerical values, a histogram, kernel density estimate plot, box plot, and qq plot are plotted. If x contains datetime values, a line chart is plotted.
If two columns (x and y) are specified, plots depicting the relationship between the variables will be displayed. If x and y contain numerical values, a scatter plot, hexbin plot, and binned box plot are plotted. If one of x and y contain categorical values and the other contains numerical values, a box plot and multiline histogram are plotted. If x and y contain categorical vales, a nested bar chart, stacked bar chart, and heat map are plotted. If one of x and y contains datetime values and the other contains numerical values, a line chart and a box plot are shown. If one of x and y contains datetime values and the other contains categorical values, a multiline chart and a stacked box plot are shown.
If x, y, and z are specified, they must be one each of type datetime, numerical, and categorical. A multiline chart containing an aggregate on the numerical column grouped by the categorical column over time is plotted.
df (Union[DataFrame, DataFrame]) – DataFrame from which visualizations are generated
Union
DataFrame
col1 (Optional[str], default None) – A valid column name from the dataframe
col2 (Optional[str], default None) – A valid column name from the dataframe
col3 (Optional[str], default None) – A valid column name from the dataframe
config (Optional[Dict[str, Any]]) – A dictionary for configuring the visualizations E.g. config={“hist.bins”: 20}
Optional
Dict
str
Any
display (Optional[List[str]]) – A list containing the names of the visualizations to display E.g. display=[“Histogram”]
List
dtype (str or DType or dict of str or dict of DType, default None) – Specify Data Types for designated column or all columns. E.g. dtype = {“a”: Continuous, “b”: “Nominal”} or dtype = {“a”: Continuous(), “b”: “nominal”} or dtype = Continuous() or dtype = “Continuous” or dtype = Continuous().
progress (bool) – Enable the progress bar.
bool
Examples
>>> import pandas as pd >>> from dataprep.eda import * >>> iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') >>> plot(iris) >>> plot(iris, "petal_length") >>> plot(iris, "petal_width", "species")
Container
Computations for plot(df, …)
dataprep.eda.distribution.compute.
compute
All in one compute function.
cfg (Union[Config, Dict[str, Any], None], default None) – When a user call plot(), the created Config object will be passed to compute(). When a user call compute() directly, if he/she wants to customize the output, cfg is a dictionary for configuring. If not, cfg is None and default values will be used for parameters.
display (Optional[List[str]], default None) – A list containing the names of the visualizations to display. Only exist when a user call compute() directly and want to customize the output
dtype (str or DType or dict of str or dict of DType, default None) – Specify Data Types for designated column or all columns. E.g. dtype = {“a”: Continuous, “b”: “Nominal”} or dtype = {“a”: Continuous(), “b”: “nominal”} or dtype = Continuous() or dtype = “Continuous” or dtype = Continuous()
Intermediate
This module implements the visualization for the plot(df) function.
dataprep.eda.distribution.render.
render
Render a basic plot
itmdt (Intermediate) – The Intermediate containing results from the compute function.
cfg (Config) – Config instance
Config
Union[LayoutDOM, Dict[str, Any]]
LayoutDOM