You can use the TableWriter API to read or write MaxCompute tables.
You can use tensorflow.python_io.TableWriter to read and write MaxCompute tables. This allows you to perform operations on MaxCompute tables without the graph execution logic of PAI-TensorFlow.
Note If you write data to a MaxCompute table as a PAI-TensorFlow task is running, you can access the data by using TableWriter only after the task is completed. If the task is still running or abnormally exits, you cannot access the data.
Create a writer and open a table
If you initialize TableWriter, a MaxCompute table is opened and a writer object is returned. Definition:
writer = tf.python_io.TableWriter(table, slice_id=0)
- table: the name of the MaxCompute table that you want to access. This parameter is of the
STRING type. This parameter must be consistent with the name of the output table specified
by -Doutput in the PAI command. Otherwise,
table xxx not predefinedis returned.
- slice_id: the partition to which you want to write data. This parameter is used to avoid write conflicts. This parameter is of the INT type. In standalone mode, use the default value 0. In distributed mode, if multiple workers, including parameter server (PS) nodes, write data to the same slice_id, the write operation fails.
Notice Each time you open a table, an empty table is opened because the data in this table is cleared.
Write data to the required columns of an opened table. The data can be read only after the table is closed. Definition:
- values: one or more rows of data that you want to write.
- If you write a single row of data, set this parameter to a tuple, list, or one-dimensional array that consists of scalars. A list or one-dimensional array indicates that the data types of the columns to which you write data are the same.
- If you write N rows of data, set this parameter to a list or one-dimensional array. Each element in the parameter corresponds to a single row of data. This row of data is an N-dimensional array that contains structures, a tuple, or a list. N is greater than or equal to 1.
- indices: the column to which you want to write data. The value can be a tuple, list, or one-dimensional array that consists of indexes of the INTEGER type. Each number in indices corresponds to a column in the table. Columns start from column 0.
Close a table
WITHstatement block, you do not need to explicitly call the
close()method to close a table.
Notice If you call the
open()method to open a table again after the table is closed, the original data in the table is cleared.
Use TableWriter in the WITH statement
You can use the
WITHstatement to manage the context of TableWriter. Sample code:
with tf.python_io.TableWriter(table) as writer: # Prepare values for writing. writer.write(values, incides) # Table would be closed automatically outside this section,
- Create a MaxCompute table named test_write that has four columns. The following code
defines the column names and data types.
Column name Column type uid bigint name string price double virtual bool
- Run Python commands to set -Doutputs to odps://project/tables/test_write, and write the data described in the following table to the test_write table.
uid name price virtual 25 "Apple" 5.0 False 38 "Pear" 4.5 False 17 "Watermelon" 2.2 False
# table_writer_test.py file. import tensorflow as tf # Prepare data. values = [(25, "Apple", 5.0, False), (38, "Pear", 4.5, False), (17, "Watermelon", 2.2, False)] # Open a table and return the writer object. writer = tf.python_io.TableWriter("odps://project/tables/test") # Write data to the columns 0, 1, 2, and 3 of the table. records = writer.write(values, indices=[0, 1, 2, 3]) # Close the table and writer. writer.close()
- Submit the task to PAI-TensorFlow and run the task.
$ odpscmd -e "pai -name tensorflow140 -Dscript=<absolute_path_of_script>/table_writer_test.py -Doutputs=odps://project/tables/test_write ;"