PyODPS provides an IPython plug-in to simplify the operations on MaxCompute.
IPython enhancement
Some commands are provided for command line enhancement.
- Run the following commands to load the plug-in:
The following result is returned:%load_ext odps %enter
<odps.inter.Room at 0x11341df10>
- In this case, the
o
andodps
global variables can be retrieved. You can run theo.get_table
orodps.get_table
command to call a table.
The following result is returned:o.get_table('table_name') odps.get_table('table_name')
odps.Table name: odps_test_sqltask_finance.`table_name` schema: c_int_a : bigint c_int_b : bigint c_double_a : double c_double_b : double c_string_a : string c_string_b : string c_bool_a : boolean c_bool_b : boolean c_datetime_a : datetime c_datetime_b : datetime
- Run the following command to display the stored objects as a table:
%stores
default name desc iris Iris dataset
Object name completion
PyODPS enhances the code completion feature that is provided by IPython. When you
write a statement such as
o.get_xxx
, the object name is automatically completed. In the following examples, <tab> is
used to denote pressing the Tab key. When you enter the statement and encounter <tab>,
press the Tab key.
- You can use the Tab key to complete the object name.
o.get_table(<tab>
- You can enter the first few characters of an object name and press the Tab key to
complete it.
IPython auto-completes the table name that starts witho.get_table('tabl<tab>
tabl
. - This feature also completes the names of objects in different projects. Syntax:
o.get_table(project='project_name', name='tabl<tab> o.get_table('tabl<tab>', project='project_name')
- If multiple matching objects exist, IPython provides a list.
options.completion_size
specifies the maximum number of objects in the list. The default value is 10.
SQL statements
PyODPS provides an SQL plug-in to execute MaxCompute SQL statements.
- You can use %sql to execute a single-line SQL statement.
In [37]: %sql select * from pyodps_iris limit 5 |==========================================| 1 / 1 (100.00%) 3s Out[37]: sepallength sepalwidth petallength petalwidth name 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa
- You can use
%%sql
to execute a multiple-line SQL statement.In [38]: %%sql ....: select * from pyodps_iris ....: where sepallength < 5 ....: limit 5 ....: |==========================================| 1 / 1 (100.00%) 15s Out[38]: sepallength sepalwidth petallength petalwidth name 0 4.9 3.0 1.4 0.2 Iris-setosa 1 4.7 3.2 1.3 0.2 Iris-setosa 2 4.6 3.1 1.5 0.2 Iris-setosa 3 4.6 3.4 1.4 0.3 Iris-setosa 4 4.4 2.9 1.4 0.2 Iris-setosa
- To execute parameterized SQL statements, you can use
:parameter
to specify the parameter.In [1]: %load_ext odps In [2]: mytable = 'table_name' In [3]: %sql select * from :mytable |==========================================| 1 / 1 (100.00%) 2s Out[3]: c_int_a c_int_b c_double_a c_double_b c_string_a c_string_b c_bool_a \ 0 0 0 -1203 0 0 -1203 True c_bool_b c_datetime_a c_datetime_b 0 False 2012-03-30 23:59:58 2012-03-30 23:59:59
- For SQL runtime parameters, you can use
%set
to set a global parameter or use SET within an SQLCell to set a local parameter. The following example sets a local parameter, which does not affect the global settings.
The following example sets a global parameter. Global settings are applied to all subsequent SQL statements.In [17]: %%sql set odps.sql.mapper.split.size = 16; select * from pyodps_iris;
In [18]: %set odps.sql.mapper.split.size = 16
Upload pandas DataFrame objects to MaxCompute tables
PyODPS provides the commands to upload pandas DataFrame objects to MaxCompute tables.
You only need to run the
%persist
command. The first parameter df
is the variable name. The second parameter pyodps_pandas_df
is the MaxCompute table name. import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3, 3), columns=list('abc'))
%persist df pyodps_pandas_df