All Products
Search
Document Center

Configure Table Store(OTS) reader

Last Updated: Jan 12, 2019

The Table Store(OTS) Reader plug‑in provides the ability to read data from Table Store, which allows incremental data extraction within the specified data extraction range. Currently, the following three extraction methods are supported:

  • Full table extraction
  • Specified range extraction
  • Specified partition extraction

Table Store is a NoSQL database service built upon Alibaba Cloud’s Apsara distributed system, enabling you to store and access massive structured data in real time. Table Store organizes data into instances and tables. Using data partition and server load balancing technology, it provides seamless scaling.

In short, Table Store Reader connects to Table Store server by using official Table Store Java SDK, reads and transfers data to data synchronization field information according to official data synchronization protocol standard, and then transmits the information to downstream Writer side.

Based on Table Store table range, Table Store Reader divides the range into multiple tasks according to the number of data synchronization concurrencies N. Each task is implemented with an Table Store Reader thread.

Currently, Table Store reader supports all Table Store types. The conversion of Table Store types in the Table Store Reader is as follows:

Category Table Store data type
Integer Integer
Floating point Double
String String
Boolean Boolean
Binary Binary

Note:

Table Store itself does not support “date” type. Long value is generally used as Unix TimeStamp at application layer when an error is reported.

Parameter description

  • endpoint

    • Description: The EndPoint of Table Store Server (service address). For details, see RAM.

    • Required: Yes

    • Default value: None

  • accessId

    • Description: The accessId of Table Store

    • Required: Yes

    • Default value: None

  • accessKey

    • Description: The accessKey of Table Store

    • Required: Yes

    • Default value: None

  • instanceName

    • Description: The name of Table Store instance. The instance is an entity for using and managing Table Store service.

      After you enable the Table Store service, you can create an instance in the Console to create and manage tables. Instance is the basic unit for Table Store resource management. All access control and resource measurement made by the Table Store for applications are completed at the instance level.

    • Required: Yes

    • Default value: None

  • table

    • Description: The name of the table to be extracted. Only one table can be filled in. Multi-table synchronization is not required for Table Store.

    • Required: Yes

    • Default value: None

  • column

    • Description: The column name set to be synchronized in the configured table. Field information is described with arrays in JSON. Because Table Store itself is a NoSQL system, the corresponding field name must be specified when Table Store Reader extracts data.

      • Reading of ordinary column is supported. For example: {“name”:”col1”}
      • Reading of partial columns is supported. Table Store Reader does not read unconfigured columns.
      • Reading of constant columns is supported. For example: {“type”:”STRING”, “value” : “DataX”}. “type” is used to describe constant types. Currently supported types include STRING, INT, DOUBLE, BOOL, BINARY (entered with a value encoded using Base64), INF_MIN (minimum system limit value for Table Store. You cannot enter value attribute if this value is specified, otherwise an error may occur), INF_MAX (maximum system limit value for Table Store. You cannot enter value attribute if this value is specified, otherwise an error may occur).
      • Function or custom expression is not supported. Because Table Store itself does not provide function or expression similar to SQL, Table Store Reader does not provide function or expression either.
    • Required: Yes

    • Default value: None
  • begin/end

    • Description: This configuration item that must be used in pairs allows data to be extracted from Table Store table range. “begin/end” describes the distribution of Table Store PrimaryKeys within the range which must cover all PrimaryKeys. The range of PrimaryKeys under the Table Store table requires to be specified. For the range with infinite limit, use {“type”:”INF_MIN”} and {“type”:”INF_MAX”}. For example, if you want to extract data from an Table Store table with the primary key of [DeviceID, SellerID], begin/end is configured as follows:

      1. "range": {
      2. "begin": [
      3. {"type":"INF_MIN"}, //Specify minimum deviceID
      4. {"type":"INT", "value":"0"} //Specify minimum SellerID
      5. ],
      6. "end": [
      7. {"type":"INF_MAX"}, //Specify maximum deviceID for extraction
      8. {"type":"INT", "value":"9999"} //Specify maximum SellerID for extraction
      9. ]
      10. }

      To extract data from the entire table, use the following configuration:

      1. "range": {
      2. "begin": [
      3. {"type":"INF_MIN"}, //Specify minimum deviceID
      4. {"type":"INF_MIN"} //Specify minimum SellerID
      5. ],
      6. "end": [
      7. {"type":"INF_MAX"}, //Specify maximum deviceID for extraction
      8. {"type":"INF_MAX"} //Specify maximum SellerID for extraction
      9. ]
      10. }
    • Required: Yes

    • Default value:null
  • split

    • Description: This is an advanced configuration item for the configuration of custom splitting, which is generally not recommended.

      Application scenario: The custom splitting rule is generally used when Table Store Reader’s auto splitting policy is invalid in the hotspot where Table Store data is stored. “split” specifies a splitting point within the range between Begin and End and only the information of splitting point for partitionKey, which means that only partitionKey is configured for split, but not all PrimaryKeys require to be specified.

      If you want to extract data from an Table Store table with the primary key of [DeviceID, SellerID], the configuration is as follows:

      1. "range": {
      2. "begin": {
      3. {"type":"INF_MIN"}, //Specify minimum deviceID
      4. {"type":"INF_MIN"} //Specify minimum deviceID
      5. },
      6. "end": {
      7. {"type":"INF_MAX"}, //Specify maximum deviceID for extraction
      8. {"type":"INF_MAX"} //Specify maximum deviceID for extraction
      9. },
      10. // The specified splitting point. If a splitting point is specified, Job splits Task according to begin, end, and split.
      11. // The column to be split is only Partition Key (first column of PrimaryKey).
      12. // INF_MIN, INF_MAX, STRING, INT are supported.
      13. "split":[
      14. {"type":"STRING", "value":"1"},
      15. {"type":"STRING", "value":"2"},
      16. {"type":"STRING", "value":"3"},
      17. {"type":"STRING", "value":"4"},
      18. {"type":"STRING", "value":"5"}
      19. ]
      20. }
    • Required: No

    • Default value: None

Development in wizard mode

Wizard mode is not supported currently.

Development in script mode

Configure a job to extract data synchronously from the entire Table Store table to local machine.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "ots",
  7. "parameter": {
  8. "datasource": "datasourceName",
  9. "table": "",
  10. "column": [
  11. {
  12. "name": "col1"
  13. },
  14. {
  15. "name": "col2"
  16. },
  17. {
  18. "name": "col3"
  19. },
  20. {
  21. "type": "STRING",
  22. "value": "yunshi"
  23. },
  24. {
  25. "type": "INT",
  26. "value": ""
  27. },
  28. {
  29. "type": "DOUBLE",
  30. "value": ""
  31. },
  32. {
  33. "type": "BOOL",
  34. "value": ""
  35. },
  36. {
  37. "type": "BINARY",
  38. "value": "Base64(bin)"
  39. }
  40. ],
  41. "range": {
  42. "begin": [
  43. {
  44. "type": "INF_MIN"
  45. }
  46. ],
  47. "end": [
  48. {
  49. "type": "INF_MAX"
  50. }
  51. ]
  52. }
  53. }
  54. }
  55. },
  56. "writer": {}
  57. }

Configure an Table Store Reader to define the range for data extraction.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "ots",
  7. "parameter": {
  8. "endpoint":"",
  9. "accessId":"",
  10. "accessKey":"",
  11. "instanceName":"",
  12. // Export the name of data table
  13. "table":"",
  14. // The name (case‑sensitive) of a column to be exported. Duplicate class and constant column are supported.
  15. // Constant column: Supported types include STRING, INT, DOUBLE, BOOL, and BINARY.
  16. // Note: BINARY must be converted using Base64 to the corresponding string before it is passed into the plug‑in.
  17. "column":[
  18. {"name":"col1"}, // Ordinary
  19. {"name":"col2"}, // Ordinary
  20. {"name":"col3"}, // Ordinary
  21. {"type":"STRING","value" : ""}, // Constant (String)
  22. {"type":"INT","value" : ""}, // Constant (Integer)
  23. {"type":"DOUBLE","value" : ""}, // Constant (Floating point)
  24. {"type":"BOOL","value" : ""}, // Constant (Boolean)
  25. {"type":"BINARY","value" : "Base64(bin)"} // Constant (Binary)
  26. ],
  27. "range":{
  28. // Export the start range of data
  29. // INF_MIN, INF_MAX, STRING, INT are supported.
  30. "begin":[
  31. {"type":"INF_MIN"},
  32. {"type":"INF_MAX"},
  33. {"type":"STRING", "value":"hello"},
  34. {"type":"INT", "value":"2999"},
  35. ],
  36. // Export the end range of data
  37. // INF_MIN, INF_MAX, STRING, INT are supported.
  38. "end":[
  39. {"type":"INF_MAX"},
  40. {"type":"INF_MIN"},
  41. {"type":"STRING", "value":"hello"},
  42. {"type":"INT", "value":"2999"},
  43. ]
  44. }
  45. }
  46. }
  47. },
  48. "writer": {
  49. }
  50. }
  51. }