PyODPSノードを使用してパーティションテーブルからデータを読み取る - MaxCompute

このトピックでは、PyODPSノードを使用してパーティションテーブルからデータを読み取る方法について説明します。

前提条件

次の操作が完了しました。

MaxComputeが有効化されています。詳しくは、「MaxCompute と DataWorks の有効化」をご参照ください。
DataWorksが有効化されています。詳細については、「DataWorksの有効化」をご参照ください。
ワークフローはDataWorksコンソールで作成されます。この例では、基本モードでDataWorksワークスペースのワークフローが作成されます。詳細については、「ワークフローの作成」をご参照ください。

手順

テストデータを準備します。
1. パーティションテーブルとソーステーブルを作成し、ソーステーブルにデータをインポートします。詳細については、「テーブルの作成とデータのアップロード」をご参照ください。
  この例では、次のテーブル作成文とソースデータを使用します。
  - 次のステートメントを実行して、user_detailという名前のパーティションテーブルを作成します。
```
create table if not exists user_detail
(
userid    BIGINT comment 'user ID',
job       STRING comment 'job type',
education STRING comment 'education level'
) comment 'user information table'
partitioned by (dt STRING comment 'date',region STRING comment 'region');
```
  - 次のステートメントを実行して、user_detail_odsという名前のソーステーブルを作成します。
```
create table if not exists user_detail_ods
(
  userid    BIGINT comment 'user ID',
  job       STRING comment 'job type',
  education STRING comment 'education level',
  dt STRING comment 'date',
  region STRING comment 'region'
);
```
  - user_detail.txtという名前のソースデータファイルを作成し、次のデータをファイルに保存します。データをuser_detail_odsテーブルにインポートします。
```
0001,Internet,bachelor,20190715,beijing
0002,education,junior college,20190716,beijing
0003,finance,master,20190715,shandong
0004,Internet,master,20190715,beijing
```
2. ワークフローを右クリックし、ノードの作成 > MaxCompute > ODPS SQLを選択します。
3. [ノードの作成] ダイアログボックスで、名前を指定し、確認をクリックします。
4. ODPS SQLノードの [設定] タブで、コードエディターに次のコードを入力します。
```
insert overwrite table user_detail partition (dt,region)
select userid,job,education,dt,region from user_detail_ods;
```
5. ツールバーの [実行] アイコンをクリックして、user_detail_odsテーブルのデータをuser_detailパーティションテーブルに挿入します。
PyODPSノードを使用して、パーティションテーブルからデータを読み取ります。
1. DataWorksコンソールにログインします。
2. 左側のナビゲーションウィンドウで、ワークスペース をクリックします。
3. ワークスペースを見つけて、アクション列にクイックエントリー > データ開発を選択します。
4. DataStudioページで、作成したワークフローを右クリックし、ノードの作成 > MaxCompute > PyODPS 2. を選択します。
5. [ノードの作成] ダイアログボックスで、名前を指定し、確認. をクリックします。
6. PyODPS 2ノードの構成タブで、コードエディターに次のコードを入力します。
```
import sys
from odps import ODPS
reload(sys)
print('dt=' + args['dt'])
# Set UTF-8 as the default encoding format.
sys.setdefaultencoding('utf8')
# Obtain the partitioned table.
t = o.get_table('user_detail')
# Check whether the specified partition exists.
print t.exist_partition('dt=20190715,region=beijing')
# View all partitions in the partitioned table.
for partition in t.partitions:
    print partition.name
# You can use one of the following methods to query data in the partitioned table:
# Method 1
with t.open_reader(partition='dt=20190715,region=beijing') as reader1:
    count = reader1.count
print("Query data in the partitioned table by using Method 1:")
for record in reader1:
    print record[0],record[1],record[2]
# Method 2
print("Query data in the partitioned table by using Method 2:")
reader2 = t.open_reader(partition='dt=20190715,region=beijing')
for record in reader2:
    print record["userid"],record["job"],record["education"]
# Method 3
print("Query data in the partitioned table by using Method 3:")
for record in o.read_table('user_detail', partition='dt=20190715,region=beijing'):
    print record["userid"],record["job"],record["education"]
```
7. ツールバーの [パラメーターで実行] アイコンをクリックします。
8. では、パラメータダイアログボックスでパラメーターを設定し、実行.
  パラメーターの説明：
  - リソースグループ名: [共通スケジューラリソースグループ] を選択します。
  - dt: このパラメーターをdt=20190715に設定します。
9. PyODPS 2ノードの実行結果をログの実行タブをクリックします。