EMR Workflow lets you configure data sources so that workflow nodes can query Hive, Impala, or Presto databases. This topic describes how to create, modify, and delete a data source.
Prerequisites
Before you begin, make sure that:
-
The cluster hosting the data source and the cluster running the workflow are deployed in the same virtual private cloud (VPC).
Create a data source
-
Log on to the EMR console.
-
In the left-side navigation pane, choose EMR Studio > Workflow.
-
Click the Datasource tab, then click Create DataSource.
-
In the CreateDataSource dialog box, configure the parameters described in the following table.
Parameter Required Description DataSource Yes The type of the data source. Valid values: HIVE/IMPALA and PRESTO. Datasource Name Yes A name for the data source. Description No A description of the data source. IP Yes The IP address of the data source. Port Yes The port of the data source. Default value: 10000. User Name Yes The username used to connect to the data source. Password No The password used to connect to the data source. Catalog Name Conditional The catalog name used to connect to the data source. Required only when DataSource is set to PRESTO. Database Name Yes The name of the database to connect to. jdbc connect parameters No Additional JDBC connection parameters in JSON format: {"key1":"value1","key2":"value2"...}. -
Click Confirm.
Modify or delete a data source
After creating a data source, you can modify or delete it from the Datasource tab.
-
Modify: Click the edit icon in the Operation column of the data source.
-
Delete: Click the delete icon in the Operation column of the data source.