Functions for config downloading and maintaining
dataprep.connector.config_manager.
config_directory
Returns the config directory path
Path
download_config
Download the config from Github into the temp directory.
None
ensure_config
Ensure the config for impdb is downloaded
bool
get_git_branch_hash
Get current config files repo’s hash
str
initialize_path
Determines if the given config_path is local or in GitHub. Fetches the full path.
is_obsolete
Test if the implicit db config files are obsolete and need to be re-downloaded.
separate_branch
Separate the config path into db name and branch
Tuple[str, str]
Tuple
This module contains the Connector class. Every data fetching action should begin with instantiating this Connector class.
dataprep.connector.connector.
Connector
Bases: object
object
This is the main class of the connector component. Initialize Connector class as the example code.
config_path (str) – The path to the config. It can be hosted, e.g. “yelp”, or from local filesystem, e.g. “./yelp”
_auth (Optional[Dict[str, Any]] = None) – The parameters for authentication, e.g. OAuth2
_concurrency (int = 5) – The concurrency setting. By default it is 1 reqs/sec.
update (bool = True) – Force update the config file even if the local version exists.
**kwargs – Parameters that shared by different queries.
Example
>>> from dataprep.connector import Connector >>> dc = Connector("yelp", _auth={"access_token": access_token})
info
Show the basic information and provide guidance for users to issue queries.
query
Query the API to get a table.
table (str) – The table name.
_q (Optional[str] = None) – Search string to be matched in the response.
_auth (Optional[Dict[str, Any]] = None) – The parameters for authentication. Usually the authentication parameters should be defined when instantiating the Connector. In case some tables have different authentication options, a different authentication parameter can be defined here. This parameter will override the one from Connector if passed.
_count (Optional[int] = None) – Count of returned records.
**where – The additional parameters required for the query.
Union[Awaitable[DataFrame], DataFrame]
Union
Awaitable
DataFrame
populate_field
Populate a dict based on the fields definition and provided vars.
Dict[str, str]
Dict
validate_fields
Check required fields are provided.
This module contains back end functions helping developers use data connector.
dataprep.connector.info.
get_schema
This method returns the schema of the table that will be returned, so that the user knows what information to expect.
schema (Dict[str, Any]) – The schema for the table from the config file.
Any
The returned data’s schema.
pandas.DataFrame
Note
The schema is defined in the configuration file. The user can either use the default one or change it by editing the configuration file.
update (bool) – Force update the config file even if the local version exists.
websites
Displays names of websites supported by data connector.
This module handles displaying information on how to connect and query.
dataprep.connector.info_ui.
info_ui
Fills out info.txt template file. Renders the template to an html file.
dbname (str) – Name of the website
tbs (Dict[str, Any]) – Table containing info to be displayed.
Module contains the loaded config schema.
Module defines ImplicitDatabase and ImplicitTable, where ImplicitDatabase is a conceptual model describes a website and ImplicitTable describes an API endpoint.
dataprep.connector.implicit_database.
ImplicitDatabase
A website that provides data can be treat as a database, represented as ImplicitDatabase in DataConnector.
name
tables
ImplicitTable
ImplicitTable class abstracts the request and the response to a Restful API, so that the remote API can be treated as a database table.
config
from_json
Create rows from json string.
Dict[str, List[Any]]
List
from_response
Create a dataframe from a http body payload.
Module defines errors used in this library.
dataprep.connector.errors.
InvalidAuthParams
Bases: ValueError
ValueError
The parameters used for Authorization are invalid.
params
InvalidParameterError
Bases: Exception
Exception
The parameter used in the query is invalid
param
MissingRequiredAuthParams
Some parameters for Authorization are missing.
RequestError
Bases: dataprep.errors.DataprepError
dataprep.errors.DataprepError
A error indicating the status code of the API response is not 200.
message
status_code
UniversalParameterOverridden
The parameter is overrided by the universal parameter
uparam
dataprep.connector.
read_sql
Run the SQL query, download the data from database into a dataframe. Please check out https://github.com/sfu-db/connector-x for more details.
conn (str) – the connection string.
query (Union[List[str], str]) – a SQL query or a list of SQL query.
return_type (str) – the return type of this function. It can be “arrow”, “pandas”, “modin”, “dask” or “polars”.
protocol (str) – the protocol used to fetch data from source. Valid protocols are database dependent (https://github.com/sfu-db/connector-x/blob/main/Types.md).
partition_on (Optional[str]) – the column to partition the result.
Optional
partition_range (Optional[Tuple[int, int]]) – the value range of the partition column.
int
partition_num (Optional[int]) – how many partition to generate.
>>> db_url = "postgresql://username:password@server:port/database" >>> query = "SELECT * FROM lineitem" >>> read_sql(db_url, query, partition_on="partition_col", partition_num=10)