edit-icon download-icon

Configure OTS reader

Last Updated: Dec 04, 2017

The OTS Reader plug‑in provides the ability to read data from OTS, which allows incremental data extraction within the specified data extraction range. Currently, the following three extraction methods are supported:

  • Full table extraction
  • Specified range extraction
  • Specified partition extraction

OTS is a NoSQL database service built upon Alibaba Cloud’s Apsara distributed system, enabling you to store and access massive structured data in real time. OTS organizes data into instances and tables. Using data partition and server load balancing technology, it provides seamless scaling.

In short, OTS Reader connects to OTS server by using official OTS Java SDK, reads and transfers data to data synchronization field information according to official data synchronization protocol standard, and then transmits the information to downstream Writer side.

Based on OTS table range, OTS Reader divides the range into multiple tasks according to the number of data synchronization concurrencies N. Each task is implemented with an OTS Reader thread.

Currently, OTS reader supports all OTS types. The conversion of OTS types in the OTSReader is as follows:

Category OTS data type
Integer Integer
Floating point Double
String String
Boolean Boolean
Binary Binary

Note:

OTS itself does not support “date” type. Long value is generally used as Unix TimeStamp at application layer when an error is reported.

Parameter description

  • endpoint

    • Description: The EndPoint of OTS Server (service address). For details, see RAM.

    • Required: Yes

    • Default value: None

  • accessId

    • Description: The accessId of OTS

    • Required: Yes

    • Default value: None

  • accessKey

    • Description: The accessKey of OTS

    • Required: Yes

    • Default value: None

  • instanceName

    • Description: The name of OTS instance. The instance is an entity for using and managing OTS service.

      After you enable the OTS service, you can create an instance in the Console to create and manage tables. Instance is the basic unit for OTS resource management. All access control and resource measurement made by the OTS for applications are completed at the instance level.

    • Required: Yes

    • Default value: None

  • table

    • Description: The name of the table to be extracted. Only one table can be filled in. Multi-table synchronization is not required for OTS.

    • Required: Yes

    • Default value: None

  • column

    • Description: The column name set to be synchronized in the configured table. Field information is described with arrays in JSON. Because OTS itself is a NoSQL system, the corresponding field name must be specified when OTS Reader extracts data.

      • Reading of ordinary column is supported. For example: {“name”:”col1”}
      • Reading of partial columns is supported. OTS Reader does not read unconfigured columns.
      • Reading of constant columns is supported. For example: {“type”:”STRING”, “value” : “DataX”}. “type” is used to describe constant types. Currently supported types include STRING, INT, DOUBLE, BOOL, BINARY (entered with a value encoded using Base64), INF_MIN (minimum system limit value for OTS. You cannot enter value attribute if this value is specified, otherwise an error may occur), INF_MAX (maximum system limit value for OTS. You cannot enter value attribute if this value is specified, otherwise an error may occur).
      • Function or custom expression is not supported. Because OTS itself does not provide function or expression similar to SQL, OTS Reader does not provide function or expression either.
    • Required: Yes

    • Default value: None
  • begin/end

    • Description: This configuration item that must be used in pairs allows data to be extracted from OTS table range. “begin/end” describes the distribution of OTS PrimaryKeys within the range which must cover all PrimaryKeys. The range of PrimaryKeys under the OTS table requires to be specified. For the range with infinite limit, use {“type”:”INF_MIN”} and {“type”:”INF_MAX”}. For example, if you want to extract data from an OTS table with the primary key of [DeviceID, SellerID], begin/end is configured as follows:

      1. "range": {
      2. "begin": [
      3. {"type":"INF_MIN"}, //Specify minimum deviceID
      4. {"type":"INT", "value":"0"} //Specify minimum SellerID
      5. ],
      6. "end": [
      7. {"type":"INF_MAX"}, //Specify maximum deviceID for extraction
      8. {"type":"INT", "value":"9999"} //Specify maximum SellerID for extraction
      9. ]
      10. }

      To extract data from the entire table, use the following configuration:

      1. "range": {
      2. "begin": [
      3. {"type":"INF_MIN"}, //Specify minimum deviceID
      4. {"type":"INF_MIN"} //Specify minimum SellerID
      5. ],
      6. "end": [
      7. {"type":"INF_MAX"}, //Specify maximum deviceID for extraction
      8. {"type":"INF_MAX"} //Specify maximum SellerID for extraction
      9. ]
      10. }
    • Required: Yes

    • Default value:null
  • split

    • Description: This is an advanced configuration item for the configuration of custom splitting, which is generally not recommended.

      Application scenario: The custom splitting rule is generally used when OTS Reader’s auto splitting policy is invalid in the hotspot where OTS data is stored. “split” specifies a splitting point within the range between Begin and End and only the information of splitting point for partitionKey, which means that only partitionKey is configured for split, but not all PrimaryKeys require to be specified.

      If you want to extract data from an OTS table with the primary key of [DeviceID, SellerID], the configuration is as follows:

      1. "range": {
      2. "begin": {
      3. {"type":"INF_MIN"}, //Specify minimum deviceID
      4. {"type":"INF_MIN"} //Specify minimum deviceID
      5. },
      6. "end": {
      7. {"type":"INF_MAX"}, //Specify maximum deviceID for extraction
      8. {"type":"INF_MAX"} //Specify maximum deviceID for extraction
      9. },
      10. // The specified splitting point. If a splitting point is specified, Job splits Task according to begin, end, and split.
      11. // The column to be split is only Partition Key (first column of PrimaryKey).
      12. // INF_MIN, INF_MAX, STRING, INT are supported.
      13. "split":[
      14. {"type":"STRING", "value":"1"},
      15. {"type":"STRING", "value":"2"},
      16. {"type":"STRING", "value":"3"},
      17. {"type":"STRING", "value":"4"},
      18. {"type":"STRING", "value":"5"}
      19. ]
      20. }
    • Required: No

    • Default value: None

Development in wizard mode

Wizard mode is not supported currently.

Development in script mode

Configure a job to extract data synchronously from the entire OTS table to local machine.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "ots",
  7. "parameter": {
  8. "datasource": "datasourceName",
  9. "table": "",
  10. "column": [
  11. {
  12. "name": "col1"
  13. },
  14. {
  15. "name": "col2"
  16. },
  17. {
  18. "name": "col3"
  19. },
  20. {
  21. "type": "STRING",
  22. "value": "yunshi"
  23. },
  24. {
  25. "type": "INT",
  26. "value": ""
  27. },
  28. {
  29. "type": "DOUBLE",
  30. "value": ""
  31. },
  32. {
  33. "type": "BOOL",
  34. "value": ""
  35. },
  36. {
  37. "type": "BINARY",
  38. "value": "Base64(bin)"
  39. }
  40. ],
  41. "range": {
  42. "begin": [
  43. {
  44. "type": "INF_MIN"
  45. }
  46. ],
  47. "end": [
  48. {
  49. "type": "INF_MAX"
  50. }
  51. ]
  52. }
  53. }
  54. }
  55. },
  56. "writer": {}
  57. }

Configure an OTS Reader to define the range for data extraction.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "ots",
  7. "parameter": {
  8. "endpoint":"",
  9. "accessId":"",
  10. "accessKey":"",
  11. "instanceName":"",
  12. // Export the name of data table
  13. "table":"",
  14. // The name (case‑sensitive) of a column to be exported. Duplicate class and constant column are supported.
  15. // Constant column: Supported types include STRING, INT, DOUBLE, BOOL, and BINARY.
  16. // Note: BINARY must be converted using Base64 to the corresponding string before it is passed into the plug‑in.
  17. "column":[
  18. {"name":"col1"}, // Ordinary
  19. {"name":"col2"}, // Ordinary
  20. {"name":"col3"}, // Ordinary
  21. {"type":"STRING","value" : ""}, // Constant (String)
  22. {"type":"INT","value" : ""}, // Constant (Integer)
  23. {"type":"DOUBLE","value" : ""}, // Constant (Floating point)
  24. {"type":"BOOL","value" : ""}, // Constant (Boolean)
  25. {"type":"BINARY","value" : "Base64(bin)"} // Constant (Binary)
  26. ],
  27. "range":{
  28. // Export the start range of data
  29. // INF_MIN, INF_MAX, STRING, INT are supported.
  30. "begin":[
  31. {"type":"INF_MIN"},
  32. {"type":"INF_MAX"},
  33. {"type":"STRING", "value":"hello"},
  34. {"type":"INT", "value":"2999"},
  35. ],
  36. // Export the end range of data
  37. // INF_MIN, INF_MAX, STRING, INT are supported.
  38. "end":[
  39. {"type":"INF_MAX"},
  40. {"type":"INF_MIN"},
  41. {"type":"STRING", "value":"hello"},
  42. {"type":"INT", "value":"2999"},
  43. ]
  44. }
  45. }
  46. }
  47. },
  48. "writer": {
  49. }
  50. }
  51. }
Thank you! We've received your feedback.