query() function

The query function issues a query to a specific API endpoint and return the result in a Pandas DataFrame. It should be executed after you get connected to the website using the connect function.

In this section, we show what parameters are available and examples of how to call the query function.

The query function adopts async IO.

To call from a Jupyter notebook, you can use

df = await conn.query(...)

Otherwise, use

import asyncio
df = asyncio.run(conn.query(...))

Parameters

  • table is the path to the configuration file folder. There are two ways to load it. Details can be found in the previous configuraton file section.

  • _q (optional) is the search keyword.

  • _auth (optional) pass the authorization credentials. Usually the authentication parameters should be defined when instantiating the Connector. In case some tables have different authentication options, a different authentication parameter can be defined here. This parameter will override the one from Connector if passed.

  • _count is the number of results to be returned. Our auto-pagination scheme makes it possible to get results from multiple pages using one function call. See details in later sections.

  • Other parameters are defined in the configuration file. You can view them using info().

Examples

Below shows some possible ways to call the query function.

  • dblp

df = await conn.query("publication", q="CVPR 2020", _count=2000)

Through this query, we fetch the top 2000 search results for keyword “CVPR 2020” via dblp publication API.

  • YouTube

df = await dc.query("videos", _q="Data Science", part="snippet", type="videos", _count=40)

This line of code queries the YouTube video search API for 40 results with keywords “Data Science”. part="snippet", type="videos" are additional filtering conditions defined in the YouTube API.

  • Twitter

df = await dc.query("tweets", _q="COVID-19", _count=50)

This query searches 50 results for tweets related to COVID-19 from Twitter search API.