connect() function

The connect function loads the configurations and initiates the connection with the website.

In this section, we show what parameters are available and representative examples of how to connect to different websites.


  • config_path is the path to the configuration file folder. There are two ways to load it. Details can be found in the previous configuraton file section.

  • update specifies if you want to pull the newest configuration files from the repo where existing configuration files are hosted.

  • _auth is the credentials to access the data passed through a dictionary. Details can be found in the authorization section.

  • _concurrency is how many queries you want to issue concurrently. By default it is one request/second. If you set it to 5, that means you want to issue five queries per second.


Below shows some possible ways to call the connect function.

For how to get the API credentials, please refer to our step-by-step tutorials here for some websites.

  • dblp

dblp_connector = connect('./dblp', _concurrency = 5)

dblp is the simplest case where you do not need any credentials. In this case, we load from a local configuration file folder and issue five queries per second.

  • YouTube

auth_token = '<insert API key>'
youtube_connector = connect('youtube', _auth={'access_token':auth_token}, _concurrency = 3)

YouTube requires an API key to access the APIs. In this case, you need to first go to the youTube website, get the API key, and then get connected. Here, for the configuration files, we are using the existing ones in our repo. And you want to issue three queries per second.

  • Twitter

client_id = '<insert Consumer Key>'
client_secret = '<insert Consumer Secret Key>'
twitter_connector = connect("twitter", _auth={"client_id":twitter_client_id, "client_secret":twitter_client_secret})

Twitter uses OAuth2.0 for authentication, that needs both client_id and client_secret. Here, for the configuration files, we are using the existing ones in our repo.