The API

How to use the cQuery API

The cQuery API just requires a simple http request. There are three basic ways to retrieve live web page content from the cQuery API, as detailed below.

API fetch #1: Raw CSS Selector content retrieval

The cQuery API can be called with an explicit 'css_selector' parameter and optional 'data_type' parameter. CQuery will attempt to extract the content identified with CSS Selector 'css_selector' from the web page found at location 'source_url'. CQuery will attempt to format any content found into the format specified by 'data_type'.

http://cquery.com/api/fetch?api_key=XXX&css_selector=XXX&data_type=XXX&response_format=XXX&source_url=XXX

API fetch #2: Content profile based content retrieval

A more sophisticated and powerful way to retrieve content from cQuery is to use 'content profiles'. A content profile is a JSON data structure that allows the definition of multiple content definitions. This allows for numerous and complex content to be extracted from a web page with just a single cQuery API call. The structure of the JSON 'content profile' is detailed below.

http://cquery.com/api/content-profile/1.0/fetch_content?api_key=XXX&inline_content_profile=XXX&source_url=XXX

API fetch #3: Name-based profile content retrieval

Name-based content profile retrieval is essetially identical to normal profile content based retrieval except that the content profile has already been defined and stored in your cQuery account. This allows the retrieval of complex web page content with only two pieces of information: the name of the stored content profile to use; and the url of the source web page.

http://cquery.com/api/content-profile/1.0/fetch_content?api_key=XXX&content_profile=XXX&source_url=XXX

Request Property Name	Description	Allowable Values	Mandatory?
api_key	The API key associated with the cQuery account.	A valid api key	Yes
source_url	The webpage URL containing the text/content you wish to extract.	A valid, 'url encoded' url	Yes
css_selector	A 'CSS Selector' that targets the part of the page you wish to extract.	A valid, 'url encoded' css selector	No
data_type	If set to 'number' cQuery will attempt to extract a number from the selected content. Future releases may support the extration of multiple numbers from content.	text \| number	No - defaults to 'text'
response_format	Specify in which format you wish cQuery to response. cQuery responses are detailed below.	raw \| json \| xml	No - defaults to 'raw'
retain_attributes	If the 'retain_attributes' parameter is present and has the value 'true', cQuery will return the HTML attributes associated with the selected content. This does not apply for content profile based content retrieval.	true	No - defaults to 'falase'
inline_content_profile	The easiest way to explain the content profile data structure is by example, below is a content profile that extract key content from a BBC news article: { "title":{ "selector":"H1.story-header", "content":"innerText", "dataType":"text" }, "subtitle":{ "selector":"P.introduction", "content":"innerText", "dataType":"text" }, "photo_captions":"DIV.story-body DIV.caption.body-narrow-width SPAN", "text":"DIV.story-body P" }	A valid 'url encoded' JSON content profile definition	No
content_profile	The name of the cQuery stored content profile. The format of this parameter should be: "[profile_group_name]/[profile_name]/[data_item_name]", for example: "bbc_website/news_article/headline". The 'data_item_name' part is optional but if supplied cQuery will only return the content for that specific item.	A valid stored content profile name	No

The cQuery API responses

JSON

(overview of JSON success and error responses to go here...)

XML

(overview of XML success and error responses to go here...)

Raw

When 'response_format' is set to 'raw', cQuery will simply return the raw content. If multiple content items are found, cQuery will concatenate these together using the '|' pipe character as a deliminator.