edit-icon download-icon

Configure OTSReader-Internal

Last Updated: Apr 17, 2018

Table Store (OTS) is a NoSQL database service built on Alibaba Cloud’s Apsara distributed system, enabling you to store and access massive structured data in real time. OTS organizes data into instances and tables. Using data partition and server load balancing technology, it provides seamless scaling.

OTSReader-Internal is used to export table data for OTS Internal model while OTS Reader is used to export data for OTS Public model.

OTS Internal model supports multi-version columns, so OTSReader-Internal also provides two data export modes:

  • Multi-version mode: Because OTS supports multiple versions, a multi-version mode is provided to export data of multiple versions. Export solution: The Reader plug-in expands a cell of OTS into a one-dimensional table consisting of four tuples: PrimaryKey (column 1-4), ColumnName, Timestamp, and Value (the principle is similar to the multi-version mode of HBase Reader). The four tuples are passed in to the Writer as four columns in Datax record.

  • Normal mode: Similar to the normal mode of HBase Reader, only the value of the latest version of data in each column and row needs to be exported. For more information, see the normal mode supported by HBase Reader in Hbase Reader Configuration.

In short, OTS Reader connects to OTS server and reads data through OTS official Java SDK. OTS Reader optimizes the read process using features such as read timeout retry and exceptional read retry.

Currently, OTS Reader supports all OTS types. The conversion of OTS types in the OTSReader-Internal is as follows:

Internal data integration type OTS data type
Long Integer
Double Double
String String
Boolean Boolean
Bytes Binary

Parameter description

  • mode

    • Description: The operation mode of the plug-in, supporting normal and multiVersion, which refers to normal mode and multi-version mode respectively.

    • Required: Yes

    • Default value: None

  • endpoint

    • Description: The EndPoint of OTS Server

    • Required: Yes

    • Default value: None

  • accessId

    • Description: The accessId of OTS

    • Required: Yes

    • Default value: None

  • accessKey

    1. - Description: The [AccessKey] of OTS (~~53045~~)
    • Required: Yes

    • Default value: None

  • instanceName

    • Description: The name of OTS instance. The instance is an entity for using and managing OTS service.

      After you enable the OTS service, you can create an instance in the Console to create and manage tables. Instance is the basic unit for OTS resource management. All access control and resource measurement made by the OTS for applications are completed at the instance level.

    • Required: Yes

    • Default value: None

  • table

    • Description: The name of the table to be extracted. Only one table can be filled in. Multi-table synchronization is not required for OTS.

    • Required: Yes

    • Default value: None

  • range

    • Description: The export range: [begin,end)

      • Begin < end, which means reading data in positive sequence.
      • Begin > end, which means reading data in inverted sequence.
      • Begin and end cannot be equal.
      • The following types are supported: string, int, and binary. Binary data is passed in as Base64 strings in binary format. INF_MIN represents an infinitely small value and INF_MAX represents an infinitely large value.
    • Required: No

    • Default value: To read from the beginning to the end of the table

  • range:{“begin”}

    • Description: The range of beginning value for export, which can be an empty array, PK prefix, or complete PK. When data is read in positive sequence, the default PK suffix is INF_MIN. When data is read in inverted sequence, the default PK suffix is INF_MAX. The example is as follows:

      If your table has two PrimaryKeys in the type of string and int, the data of the table can be entered in the following three methods:

      • [] —> To read from the beginning of the table.
      • [{“type”:”string”, “value”:”a”}] —> From [{“type”:”string”, “value”:”a”},{“type”:”INF_MIN”}].
      • [{“type”:”string”, “value”:”a”},{“type”:”INF_MIN”}]

      PrimaryKey column in binary type is special. Json doesn’t support directly passing in binary data, so the following rules are defined: To pass in binary data, you must use (Java) Base64.encodeBase64String method to convert binary data into a visualized string and then enter the string in value. The example is as follows (Java):

      • byte[] bytes = “hello”.getBytes(); —> Create binary data. Here the byte value of string hello is used.
      • String inputValue = Base64.encodeBase64String(bytes) —> Call Base64 method to convert binary data into visualized strings.
      • Run the preceding code, and then the inputValue of “aGVsbG8=” can be obtained.
      • Finally, write the value into the configuration: {“type”:”binary”,”value” : “aGVsbG8=”}.
    • Required: No

    • Default value: To read the data from the beginning of the table

  • range:{“end”}

    • Description: The range of end value for export, which can be an empty array, PK prefix, or complete PK. When data is read in positive sequence, the default PK suffix is INF_MAX. When data is read in inverted sequence, the default PK suffix is INF_MIN. The example is as follows:

      If your table has two PKs in the type of string and int, the data of the table can be entered in the following three methods:

      • [] —> To read from the beginning of the table.
      • [{“type”:”string”, “value”:”a”}], [{“type”:”string”, “value”:”a”},{“type”:”INF_MIN”}]
      • [{“type”:”string”, “value”:”a”},{“type”:”INF_MIN”}]

      PrimaryKey column in binary type is special. Json doesn’t support directly passing in binary data, so the following rules are defined: To pass in binary data, you must use (Java) Base64.encodeBase64String method to convert binary data into a visualized string and then enter the string in value. The example is as follows (Java):

      • byte[] bytes = “hello”.getBytes(); # Create binary data. Here the byte value of string hello is used.
      • String inputValue = Base64.encodeBase64String(bytes) # Call Base64 method to convert binary data into visualized strings
      • Run the preceding code, and then the inputValue of “aGVsbG8=” can be obtained
      • Finally, write the value into the configuration: {“type”:”binary”,”value” : “aGVsbG8=”}
    • Required: No

    • Default value: To read to the end of the table

  • range:{“split”}

    • Description: If too much data needs to be exported, you can enable concurrent export. Split can split the data in the current range into multiple concurrent tasks according to split points.

      Note:

      • The value entered in split must be in the first column of PrimaryKey (partition key) and the value type must be consistent with that of PartitionKey.
      • The value range must be between begin and end.
      • The value within the split must increase or decrease progressively depending on the positive and inverted relationship between begin and end.
    • Required: No

    • Default value: Empty split point

  • column

    • Description: To specify the columns to be exported. Regular column and constant column are supported.

      Format (multi-version mode is supported)

      Regular column format: {“name”:”{your column name}”}

      Note:

      • In the multi-version mode, constant column is not supported.
      • PrimaryKey column can’t be specified. The exported four tuples include complete PrimaryKey by default.
      • The specified column must be unique.
    • Required: No

    • Default value: All the versions of columns are exported by default.

    • Description: To specify the columns to be exported. Regular column and constant column are supported.

      Format (normal mode is supported):

      Regular column format: {“name”:”{your column name}”}

      Constant column format: {“type”:””, “value”:””}. The supported types are string, int, binary, bool, and double.

      Binary data must be converted to appropriate strings using base64 before being passed in.

      Note:PrimaryKey column must be specified separately.

    • Required: Yes

    • Default value: None

  • timeRange (only multi-version mode is supported)

    • Description: The time range of the request data. The read range is [begin,end).

      Note: Begin must be smaller than end.

    • Required: No

    • Default value: All versions are read by default
  • timeRange:{“begin”} (only multi-version mode is supported)

    • Description: The start time of the time range of request data. The value range is 0-LONG_MAX.

    • Required: No

    • Default value: 0

  • timeRange:{“end”} (only multi-version mode is supported)

    • Description: the end time of the time range of request data. The value range is 0-LONG_MAX.

    • Required: No

    • Default value: Long Max(9223372036854775806L)

  • maxVersion (only multi-version mode is supported)

    • Description: The specified version of the request. The value range is 1-INT32_MAX.

    • Required: No

    • Default value: All versions are read by default

Development in wizard mode

Currently, development in wizard mode is not supported.

Development in script mode

Multi-version mode

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "otsreader-internalreader",
  7. "parameter": {
  8. "mode": "multiVersion",
  9. "endpoint": "",
  10. "accessId": "",
  11. "accessKey": "",
  12. "instanceName": "",
  13. "table": "",
  14. "range": {
  15. "begin": [
  16. {
  17. "type": "string",
  18. "value": "a"
  19. },
  20. {
  21. "type": "INF_MIN"
  22. }
  23. ],
  24. "end": [
  25. {
  26. "type": "string",
  27. "value": "g"
  28. },
  29. {
  30. "type": "INF_MAX"
  31. }
  32. ],
  33. "split": [
  34. {
  35. "type": "string",
  36. "value": "b"
  37. },
  38. {
  39. "type": "string",
  40. "value": "c"
  41. }
  42. ]
  43. },
  44. "column": [
  45. {
  46. "name": "attr1"
  47. }
  48. ],
  49. "timeRange": {
  50. "begin": 1400000000,
  51. "end": 1600000000
  52. },
  53. "maxVersion": 10
  54. }
  55. }
  56. },
  57. "writer": {}
  58. }

Normal mode

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "otsreader-internalreader",
  7. "parameter": {
  8. "mode": "normal",
  9. "endpoint": "",
  10. "accessId": "",
  11. "accessKey": "",
  12. "instanceName": "",
  13. "table": "",
  14. "range": {
  15. "begin": [
  16. {
  17. "type": "string",
  18. "value": "a"
  19. },
  20. {
  21. "type": "INF_MIN"
  22. }
  23. ],
  24. "end": [
  25. {
  26. "type": "string",
  27. "value": "g"
  28. },
  29. {
  30. "type": "INF_MAX"
  31. }
  32. ],
  33. "split": [
  34. {
  35. "type": "string",
  36. "value": "b"
  37. },
  38. {
  39. "type": "string",
  40. "value": "c"
  41. }
  42. ]
  43. },
  44. "column": [
  45. {
  46. "name": "pk1"
  47. },
  48. {
  49. "name": "pk2"
  50. },
  51. {
  52. "name": "attr1"
  53. },
  54. {
  55. "type": "string",
  56. "value": ""
  57. },
  58. {
  59. "type": "int",
  60. "value": ""
  61. },
  62. {
  63. "type": "double",
  64. "value": ""
  65. },
  66. {
  67. "type": "binary",
  68. "value": "aGVsbG8="
  69. }
  70. ]
  71. }
  72. }
  73. },
  74. "writer": {}
  75. }
Thank you! We've received your feedback.