All Products
Search
Document Center

OpenSearch:CreateSpider

Last Updated:Mar 11, 2024

Creates a website import task.

URL

POST /v4/openapi/app-groups/[appGroupIdentity]/chatos/spiders
  • [app_group_identity] specifies the OpenSearch instance that you want to access. You can specify an instance name to access an instance that is in service.

  • The sample URL omits information such as the request headers and encoding method.

  • The sample URL also omits the endpoint that is used to connect to an OpenSearch instance.

Protocol

HTTP

HTTP request method

POST

Supported format

JSON

Request parameters

Parameter

Type

Required

Valid value

Default value

Description

url

STRING

Yes

The website URL. The URL must be unique within an OpenSearch instance.

category

STRING

Yes

The category of the data that is to be imported from the website. The value of this parameter is consistent with the value of the category field in the main table. The category must be unique within an OpenSearch instance.

urlRegex

List<STRING>

No

A regular expression that is used as a URL filter condition to filter web page URLs. Multiple filter conditions are supported.

The default URL filter condition is a URL that starts with the URL of the website that you want to access. For example, if the URL of the website is http://www.abc.com/, the default regular expression is http://www\.abc\.com/.*.

xpathSelectors

List<STRING>

No

An XPath selector that is used to query the specified content on web pages. Multiple XPath selectors are supported.

For example, if you want to query content in the div tag on web pages, set this parameter to //div.

cssSelectors

List<STRING>

No

A CSS selector that is used to query the specified content on web pages. Multiple CSS selectors are supported.

For example, if you want to query content in the <div class="content">Web Page Content</div> format on web pages,

set this parameter to div.content.

Sample request

{ 
 "category": "OpenSearch documentation"
 "url": "http://xxx"
}

Response parameters

Parameter

Type

Description

errors

LIST

The error details.

status

STRING

The execution result of the request. Valid values: OK and FAIL. A value of OK indicates that the request is successful. A value of FAIL indicates that the request fails. In this case, troubleshoot errors based on the error code.

request_id

STRING

The ID of the request.

code

STRING

The error code.

message

STRING

The error message.

latency

STRING

The latency of the request.

Sample response

{
 "status" : "OK",
 "requestId" : "",
 "httpCode": 200,
 "code": "",
 "message": "",
 "latency" : 123
 
}

Usage notes

  • The website import task crawls the content from the website of the specified URL. By default, the web pages whose URLs start with the specified URL are included.

  • If the website URL is valid but the robots.txt file of the website does not support the crawling feature, an error is returned.

  • Only one website import task that is running can exist in an OpenSearch instance.