edit-icon download-icon

Configure MaxCompute reader

Last Updated: Apr 17, 2018

The MaxCompute Reader plug‑in provides the ability to read data from MaxCompute. For more information about MaxCompute, see MaxCompute Overview.

At the underlying implementation level, MaxCompute Reader plug‑in reads data from the MaxCompute system by using Tunnel based on the source project/table/partition/table fields and other information you configured. For common Tunnel commands, see Tunnel Command Operations.

MaxCompute Reader can read both partition and non-partition tables, but cannot read virtual views. To read a partition table, you must specify the partition configuration. For example, to read table t0 with a partition configuration of “pt=1, ds=hangzhou”, you must set the value in the configuration. For a non-partition table, the partition configuration is left empty. For table fields, you can specify all or some of the columns sequentially, change the order in which columns are arranged, and specify constant fields and partition columns. (A partition column is not a table field).

MaxCompute Reader supports the following data types in MaxCompute.

Type MaxCompute data type
Integer bigint
Floating point double, decimal
String string
Date datetime
Boolean Boolean

Parameter description

Parameter Description Required Default value
datasource Data source name. It must be identical to the data source name added. Adding data source is supported in script mode. Yes None
table The name of the data table to be read (case-insensitive). Yes None
partition The information of the partition from which you read data. Linux shell wildcard is allowed (““ represents 0 or multiple characters, and “?” represents any character.)
For example, a partition table named “test” has four partitions: pt=1/ds=hangzhou, pt=1/ds=shanghai, pt=2/ds=hangzhou, and pt=2/ds=beijing.
-To read data from partition pt=1/ds=shanghai, configure it to "partition":"pt=1/ds=shanghai".
- To read data from all the partitions under pt=1, configure it to `”partition”:”pt=1/ds=
.<br> - To read data from all the partitions of the "test" table, configure it to:“partition”:”pt=/ds=“`
For a partition tables, it is required. For non-partition tables, it is left empty. None
column The column information of the MaxCompute source table. For example, the fields of a table named “test” are id, name, and age. To read the fields in turn, configure it to: "column":["id","name","age"] or "column":["*"]. We don’t recommend that you configure the extracted field to “*”, because it indicates every field in the table is read in turn. If you change the order or types of the table fields, or add or delete some table fields, it is likely that the source table columns cannot be aligned with the target table columns, causing incorrect results or even failure. To read name and id in sequence, configure it to: "coulumn":["name","id"]. To add a constant field in the fields to be extracted from the source table (to match the field order of the target table): For example, if the data values you want to extract are values of age, name, constant date “1988-08-08 08:08:08”, and id columns, configure it to: "column":["age","name","'1988-08-08 08:08:08'","id"], with the constant value enclosed by '. In internal implementation, any field enclosed by ' is considered as a constant field, and its value is the content in the '.
Notes:
- MaxCompute Reader does not use Select SQL statement of MaxCompute for extracting data from a table. Therefore, you cannot specify functions in fields.
- Column must contain the specified column set to be synchronized and it cannot be blank.
Yes None

Development in wizard mode

wizardmodeMaxCompute

Note:

  • Data Sources: datasource in the preceding parameter description. Select odps.
  • Table: table in the preceding parameter description. Select the table to be synchronized.
  • Zoning Information: partition in the preceding parameter description. Configure the information of the partition to be read.
  • Field Mapping: column in the preceding parameter description.

Development in script mode

The following is a script configuration sample. For more information about parameters, see Parameter description.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "odps",
  7. "parameter": {
  8. "datasource": "datasourceName",
  9. "table": "table",
  10. "column": [
  11. "id",
  12. "name"
  13. ],
  14. "partition": "pt=20140501/ds=20140502"
  15. }
  16. },
  17. "writer": {
  18. }
  19. }
  20. }

Note:

If all the columns can be configured with “ * ”, for example, "column": ["*"], then multiple partitions and multiple wildcards are allowed.

  • "partition": "pt=20140501/ds=*" indicates all the partitions in ds.
  • In "partition":"pt=top?", “?” indicate whether the preceding characters exist, namely, pt=top and pt=to.
Thank you! We've received your feedback.