The API
How to use the cQuery API
The cQuery API just requires a simple http request. There are three basic ways to retrieve live web page content from the cQuery API, as detailed below.
API fetch #1: Raw CSS Selector content retrieval
The cQuery API can be called with an explicit 'css_selector' parameter and optional 'data_type' parameter. CQuery will attempt to extract the content identified with CSS Selector 'css_selector' from the web page found at location 'source_url'. CQuery will attempt to format any content found into the format specified by 'data_type'.
API fetch #2: Content profile based content retrieval
A more sophisticated and powerful way to retrieve content from cQuery is to use 'content profiles'. A content profile is a JSON data structure that allows the definition of multiple content definitions. This allows for numerous and complex content to be extracted from a web page with just a single cQuery API call. The structure of the JSON 'content profile' is detailed below.
API fetch #3: Name-based profile content retrieval
Name-based content profile retrieval is essetially identical to normal profile content based retrieval except that the content profile has already been defined and stored in your cQuery account. This allows the retrieval of complex web page content with only two pieces of information: the name of the stored content profile to use; and the url of the source web page.
Request Property Name | Description | Allowable Values | Mandatory? |
---|---|---|---|
api_key | The API key associated with the cQuery account. | A valid api key | Yes |
source_url | The webpage URL containing the text/content you wish to extract. | A valid, 'url encoded' url | Yes |
css_selector | A 'CSS Selector' that targets the part of the page you wish to extract. | A valid, 'url encoded' css selector | No |
data_type | If set to 'number' cQuery will attempt to extract a number from the selected content. Future releases may support the extration of multiple numbers from content. | text | number | No - defaults to 'text' |
response_format | Specify in which format you wish cQuery to response. cQuery responses are detailed below. | raw | json | xml | No - defaults to 'raw' |
retain_attributes | If the 'retain_attributes' parameter is present and has the value 'true', cQuery will return the HTML attributes associated with the selected content. This does not apply for content profile based content retrieval. | true | No - defaults to 'falase' |
inline_content_profile | The easiest way to explain the content profile data structure is by example, below is a content profile that extract key content from a BBC news article:{ "title":{ "selector":"H1.story-header", "content":"innerText", "dataType":"text" }, "subtitle":{ "selector":"P.introduction", "content":"innerText", "dataType":"text" }, "photo_captions":"DIV.story-body DIV.caption.body-narrow-width SPAN", "text":"DIV.story-body P" } |
A valid 'url encoded' JSON content profile definition | No |
content_profile | The name of the cQuery stored content profile. The format of this parameter should be: "[profile_group_name]/[profile_name]/[data_item_name]", for example: "bbc_website/news_article/headline". The 'data_item_name' part is optional but if supplied cQuery will only return the content for that specific item. |
A valid stored content profile name | No |
The cQuery API responses
JSON
(overview of JSON success and error responses to go here...)
XML
(overview of XML success and error responses to go here...)
Raw
When 'response_format' is set to 'raw', cQuery will simply return the raw content. If multiple content items are found, cQuery will concatenate these together using the '|' pipe character as a deliminator.