You can build a real-time data warehouse by using the real-time write capability of Elasticsearch.
DataWorks allows you to add Alibaba Cloud Elasticsearch V5.X, V6.X, and V7.X clusters as data sources. Self-managed Elasticsearch clusters are not supported.
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- Select the region where the required workspace resides, find the workspace, and then click Data Analytics.
- Move the pointer over the icon and choose .Alternatively, you can click the required workflow, right-click Data Integration, and then choose .
- In the Create Node dialog box, set the Node Name and Location parameters.Notice The node name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
- Click Commit.
- On the configuration tab of the real-time synchronization node, drag Elasticsearch in the Output section to the canvas. Connect it to the configured reader or conversion node in the canvas.
- Click the Elasticsearch node. In the panel that appears, configure the parameters.
Parameter Description Data source The Elasticsearch data source that you configured. In this example, you can select only an Elasticsearch data source.
If no data source is available, click New data source on the right to add a data source on the Data Source page. For more information, see Add an Elasticsearch data source.
Index The name of the index to which you want to write data.You can click Create Index on the right to create an index. You can directly use the default index information to create an index. Alternatively, you can modify the settings of Index Name, Index Type, Dynamic Mapping Status, Shards, Replicas, and Statement for Creating Index and create an index.
- Index Type: This parameter is available only for Elasticsearch V6.X, V5.X, or earlier.
- Dynamic Mapping Status: This parameter is used to specify the value of the dynamic parameter. The dynamic
parameter determines whether Elasticsearch Writer dynamically writes new fields to
the mappings of the index.
- If you use an Elasticsearch cluster whose version is earlier than V7.10, this parameter has the following valid values: true, false, and strict.
- If you use an Elasticsearch cluster whose version is V7.10 or later, this parameter has the following valid values: true, false, strict, and runtime.
- true: indicates that Elasticsearch Writer writes new fields to the mappings of the index and the fields can be searched.
- false: indicates that Elasticsearch Writer writes new fields to the mappings of the index but the fields cannot be searched.
- strict: indicates that if Elasticsearch Writer detects new fields, it returns an error message and does not write the fields to the mappings of the index.
- runtime: indicates that Elasticsearch Writer writes new fields to the mappings of the index as runtime fields but the fields cannot be searched.
- Shards: the number of primary shards. An index can be divided into multiple primary shards. These primary shards can be distributed among different nodes to support distributed searches. When you create an index, you must specify the number of primary shards for the index. After the index is created, you cannot change the number. For more information, see shard.
- Replicas: the number of replica shards for each primary shard. The replica shards can be used for fault tolerance and to process the read request workloads of the cluster. If the capacity of the cluster is insufficient, only a single backup is required for each primary shard, or the cluster encounters bottlenecks in write performance, set Replicas to 1.
- Statement for Creating Index: The field configurations are configured in properties. You can modify the types of the fields.
Enable Partitioning for Elasticsearch Indexes Specifies whether to enable the routing mechanism. You can customize the value of the routing parameter. The default value of routing is the ID of a document. A Hash function is used to convert the value of routing to obtain a number. The number is used to divide the number of primary shards to obtain a remainder. The remainder indicates the position of the document in the primary shards. Set Primary Key (By_Id) Set the method used to assign values to the IDs of Elasticsearch indexes during data synchronization.
- Primary Key: uses one of the columns in the source table as the primary key.
- Composite Primary Key: combines multiple columns in the source table to form the primary key.
Field Mapping Configure field mappings between the source and destination. The synchronization node synchronizes data based on the field mappings.
- Click the icon in the top toolbar.