All Products
Search
Document Center

Platform For AI:TableWriter API

Last Updated:Mar 04, 2024

You can use the TableWriter API to read data from and write data to MaxCompute tables.

You can use tensorflow.python_io.TableWriter to read and write MaxCompute tables. This allows you to perform operations on MaxCompute tables without the graph execution logic of Platform for AI (PAI)-TensorFlow.

Note

If you write data to a MaxCompute table when a PAI-TensorFlow task is running, you can access the data by using TableWriter only after the task is completed. If the task is still running or abnormally exits, you cannot access the data.

Create a writer and open a table

If you initialize TableWriter, a MaxCompute table is opened and a writer object is returned. Definition:

writer = tf.python_io.TableWriter(table, slice_id=0)
  • table: the name of the MaxCompute table that you want to access. This parameter is of the STRING type. This parameter must be consistent with the name of the output table specified by -Doutput in the PAI command. Otherwise, the error table xxx not predefined is returned.

  • slice_id: the partition to which you want to write data. This parameter is used to avoid write conflicts. This parameter is of the INT type. In standalone mode, use the default value 0. In distributed mode, if multiple workers, including parameter server (PS) nodes, write data to the same slice_id, the write operation fails.

Important

Each time you open a table, an empty table is opened because the data in this table is cleared.

Write data

Write data to the required columns of an open table. You can read the data only after the table is closed. Definition:

writer.write(values, indices)
  • values: one or more rows of data that you want to write.

    • If you write a single row of data, set this parameter to a tuple, list, or one-dimensional array that consists of scalars. A list or one-dimensional array indicates that the data types of the columns to which you write data are the same.

    • If you write N rows of data, set this parameter to a list or one-dimensional array. Each element in the parameter corresponds to a single row of data. This row of data is an N-dimensional array that contains structures, a tuple, or a list. N is greater than or equal to 1.

  • indices: the column to which you want to write data. The value can be a tuple, list, or one-dimensional array that consists of indexes of the INTEGER type. Each number in indices corresponds to a column in the table. Columns start from column 0.

Close a table

Definition:

writer.close()

In the WITH statement block, you do not need to explicitly call the close() method to close a table.

Important

If you call the open() method to open a table again after the table is closed, the original data in the table is cleared.

Use TableWriter in the WITH statement

You can use the WITH statement to manage the context of TableWriter. Sample code:

with tf.python_io.TableWriter(table) as writer:
    # Prepare values for writing.
    writer.write(values, incides)
    # Table would be closed automatically outside this section,

Examples

  1. Create a MaxCompute table named test_write that has four columns. The following code defines the column names and data types.

    For more information, see Create and manage MaxCompute tables.

    ColumnName

    ColumnType

    uid

    bigint

    name

    string

    price

    double

    virtual

    bool

  2. Run Python commands to set -Doutputs to odps://project/tables/test_write, and write the data described in the following table to the test_write table.

    uid

    name

    price

    virtual

    25

    "Apple"

    5.0

    False

    38

    "Pear"

    4.5

    False

    17

    "Watermelon"

    2.2

    False

    # table_writer_test.py file. 
    
    import tensorflow as tf
    
    # Prepare data. 
    values = [(25, "Apple", 5.0, False),
              (38, "Pear", 4.5, False),
              (17, "Watermelon", 2.2, False)]
    
    # Open a table and return the writer object. 
    writer = tf.python_io.TableWriter("odps://project/tables/test")
    
    # Write data to the columns 0, 1, 2, and 3 of the table. 
    records = writer.write(values, indices=[0, 1, 2, 3])
    
    # Close the table and writer. 
    writer.close()
  3. Submit the task to PAI-TensorFlow and run the task.

    $ odpscmd -e "pai -name tensorflow140 -Dscript=<absolute_path_of_script>/table_writer_test.py -Doutputs=odps://project/tables/test_write ;"