The MaxCompute Reader plug‑in provides the ability to read data from MaxCompute. For more information about MaxCompute, see MaxCompute Overview.
At the underlying implementation level, MaxCompute Reader plug‑in reads data from the MaxCompute system by using Tunnel based on the source project/table/partition/table fields and other information you configured. For common Tunnel commands, see Tunnel Command Operations.
MaxCompute Reader can read both partition and non-partition tables, but cannot read virtual views. To read a partition table, you must specify the partition configuration. For example, to read table t0 with a partition configuration of “pt=1, ds=hangzhou”, you must set the value in the configuration. For a non-partition table, the partition configuration is left empty. For table fields, you can specify all or some of the columns sequentially, change the order in which columns are arranged, and specify constant fields and partition columns. (A partition column is not a table field).
MaxCompute Reader supports the following data types in MaxCompute.
|Type||MaxCompute data type|
|Floating point||double, decimal|
|datasource||Data source name. It must be identical to the data source name added. Adding data source is supported in script mode.||Yes||None|
|table||The name of the data table to be read (case-insensitive).||Yes||None|
|partition||The information of the partition from which you read data. Linux shell wildcard is allowed (““ represents 0 or multiple characters, and “?” represents any character.)
For example, a partition table named “test” has four partitions: pt=1/ds=hangzhou, pt=1/ds=shanghai, pt=2/ds=hangzhou, and pt=2/ds=beijing.
-To read data from partition pt=1/ds=shanghai, configure it to
- To read data from all the partitions under pt=1, configure it to `”partition”:”pt=1/ds= “
|For a partition tables, it is required. For non-partition tables, it is left empty.||None|
|column||The column information of the MaxCompute source table. For example, the fields of a table named “test” are id, name, and age. To read the fields in turn, configure it to:
- MaxCompute Reader does not use Select SQL statement of MaxCompute for extracting data from a table. Therefore, you cannot specify functions in fields.
- Column must contain the specified column set to be synchronized and it cannot be blank.
- Data Sources: datasource in the preceding parameter description. Select odps.
- Table: table in the preceding parameter description. Select the table to be synchronized.
- Zoning Information: partition in the preceding parameter description. Configure the information of the partition to be read.
- Field Mapping: column in the preceding parameter description.
The following is a script configuration sample. For more information about parameters, see Parameter description.
If all the columns can be configured with “
*”, for example,
"column": ["*"], then multiple partitions and multiple wildcards are allowed.
"partition": "pt=20140501/ds=*"indicates all the partitions in ds.
"partition":"pt=top?", “?” indicate whether the preceding characters exist, namely, pt=top and pt=to.