PyODPS supports MaxCompute SQL queries and provides methods to obtain execution results.

Execute SQL statements

You can call the execute_sql() and run_sql() methods of a MaxCompute entry object to execute SQL statements for creating task instances. For more information, see Task instances.

Parameters
  • statement: the SQL statement to execute.
    Note Not all SQL statements can be executed by using methods of the MaxCompute entry object. Some SQL statements that are executable on the MaxCompute client may fail to be executed by using the execute_sql() and run_sql() methods. You must use other methods to execute non-DDL or non-DML statements. For example, you must use the run_security_query method to execute GRANT or REVOKE statements, and use the run_xflow or execute_xflow method to call API operations.
  • hints: the runtime parameters. The hints parameter is of the DICT type.
Examples
  • Execute SQL statements.
    o.execute_sql('select * from table_name')  # Execute the statement in synchronous mode. Other instances are blocked until the execution of the SQL statement is complete. 
    instance = o.run_sql('select * from table_name')  # Execute the statement in asynchronous mode. 
    print(instance.get_logview_address())  # Obtain the Logview URL of an instance. 
    instance.wait_for_success()  # Other instances are blocked until the execution of the SQL statement is complete. 
  • Set runtime parameters for SQL statements.
    o.execute_sql('select * from pyodps_iris', hints={'odps.sql.mapper.split.size': 16})
    If you set the sql.settings parameter globally, the runtime parameters are automatically added each time you execute the statement.
    from odps import options
    options.sql.settings = {'odps.sql.mapper.split.size': 16}
    o.execute_sql('select * from pyodps_iris')  # The hints parameter is automatically set based on global settings. 

Obtain the execution results of SQL statements

You can call the open_reader method to obtain the execution results of SQL statements. When the query results are being read, the following situations may occur:
  • The SQL statements return structured data.
    with o.execute_sql('select * from table_name').open_reader() as reader:
        for record in reader:
        # Process each record. 
  • If the DESC command is executed, you can use reader.raw to obtain the raw SQL query results.
    with o.execute_sql('desc table_name').open_reader() as reader:
        print(reader.raw)
If you specify options.tunnel.use_instance_tunnel == True when you call the open_reader method, PyODPS automatically calls InstanceTunnel. However, if you use an earlier version of MaxCompute or an error occurs when PyODPS calls InstanceTunnel, PyODPS generates an alert and automatically downgrades the call object to the old Result interface. You can identify the cause of the downgrade based on the alert information. If the result of InstanceTunnel does not meet your expectation, set the options.tunnel.use_instance_tunnel parameter to False. When you call the open_reader method, you can specify whether to call InstanceTunnel or the Result interface by setting the tunnel parameter.
# Call InstanceTunnel. 
with o.execute_sql('select * from table_name').open_reader(tunnel=True) as reader:
    for record in reader:
    # Process each record. 
# Call the Result interface. 
with o.execute_sql('select * from table_name').open_reader(tunnel=False) as reader:
    for record in reader:
    # Process each record. 

By default, PyODPS does not limit the amount of data that can be read from an instance. However, for a protected project, the amount of data that can be downloaded by using Tunnel is limited. If you do not specify options.tunnel.limit_instance_tunnel, the limit is automatically enabled, and the number of data records that can be downloaded is limited based on the project configurations. In most cases, a maximum of 10,000 data records can be downloaded. If you want to manually limit the number of data records that can be downloaded, you can add a limit option to the open_reader method. For example, you can specify open_reader(tunnel = True, limit=1000) to limit the number of data records that can be downloaded to 1,000 rows. You can also set options.tunnel.limit_instance_tunnel to True in the Python script.

If your MaxCompute version supports only the old Result interface and you want to read all the execution results of SQL statements, you can write the execution results to another table and then use open_reader to read data from the table. This operation is subject to the security configurations of your MaxCompute project.

In PyODPS V0.7.7.1 or later, you can use the open_reader method to call InstanceTunnel to obtain all data. For more information, see InstanceTunnel.
instance = o.execute_sql('select * from movielens_ratings limit 20000')
with instance.open_reader(tunnel=True) as reader:
    print(reader.count)
    # for record in reader: Traverse the 20,000 data records. In this example, only 10 data records are obtained based on data slicing. 
    for record in reader[:10]:  
        print(record)

Configure resource aliases

If the resources referenced by a user-defined function (UDF) dynamically change, you can configure an alias for the old resource and use the alias as the name of the new resource. This way, you do not need to delete the UDF or create another UDF.

from odps.models import Schema
myfunc = '''\
from odps.udf import annotate
from odps.distcache import get_cache_file

@annotate('bigint->bigint')
class Example(object):
    def __init__(self):
        self.n = int(get_cache_file('test_alias_res1').read())

    def evaluate(self, arg):
        return arg + self.n
'''
res1 = o.create_resource('test_alias_res1', 'file', file_obj='1')
o.create_resource('test_alias.py', 'py', file_obj=myfunc)
o.create_function('test_alias_func',
                  class_type='test_alias.Example',
                  resources=['test_alias.py', 'test_alias_res1'])

table = o.create_table(
    'test_table',
    schema=Schema.from_lists(['size'], ['bigint']),
    if_not_exists=True
)

data = [[1, ], ]
# Write a row of data that contains only one value: 1. 
o.write_table(table, 0, [table.new_record(it) for it in data])

with o.execute_sql(
    'select test_alias_func(size) from test_table').open_reader() as reader:
    print(reader[0][0])
res2 = o.create_resource('test_alias_res2', 'file', file_obj='2')
# Set the alias of resource res1 as the name of resource res2. Without the need to modify the UDF or resource. 
with o.execute_sql(
    'select test_alias_func(size) from test_table',
    aliases={'test_alias_res1': 'test_alias_res2'}).open_reader() as reader:
    print(reader[0][0])

Execute SQL statements in an interactive environment

You can use SQL plug-ins to execute SQL statements or run parameterized queries in IPython and Jupyter. For more information, see IPython and Jupyter Notebook.

Set biz_id

In some situations, you must specify biz_id for the SQL statement to execute. Otherwise, an error occurs. In this case, you can set biz_id in the global options to fix the error.
from odps import options
options.biz_id = 'my_biz_id'
o.execute_sql('select * from pyodps_iris')