All Products
Search
Document Center

Dataphin:Query and download data

Last Updated:Mar 05, 2025

Through ad hoc queries, you can customize, execute, and download data based on current business needs. For instance, after developing a compute task, ad hoc queries can help verify if it meets expectations. This topic explains how to create an ad hoc query and download the results.

Background information

Ad hoc query tasks utilize the compute source specified in your project. When using a Hadoop compute source, you can enable Impala tasks. Once enabled, you can create both HIVE_SQL and IMPALA_SQL ad hoc query tasks. Impala, being memory-based, offers faster query responses than HIVE_SQL. Dataphin allows easy switching between IMPALA_SQL and HIVE_SQL task types without rewriting query code for historical tasks. For details, see Appendix: Switch task types.

Prerequisites

To create an IMPALA_SQL ad hoc query, you must first enable Impala Tasks in the Hadoop compute source. For details, see Create a Hadoop compute source.

To download ad hoc query results, ensure you have the necessary download permissions for the project. For details, see Manage project permissions and compute sources.

Create an ad hoc query

  1. On the Dataphin home page, select Development > Data Development from the top menu bar.

  2. In the top menu bar, select Project and, if in Dev-Prod mode, also select Environment.

  3. In the left-side navigation pane, select Ad Hoc Query, and click the image icon to create a new query. You can choose between Compute Engine Ad Hoc Query and Database SQL Ad Hoc Query.

  4. In the Create Ad Hoc Query dialog box, configure the necessary parameters.

    Note

    If your compute engine is Hadoop and you have enabled Impala tasks, you can create both HIVE_SQL Ad Hoc Queries and IMPALA_SQL ad hoc queries.

    Parameter

    Description

    Name

    Enter the name of the ad hoc query.

    The length must not exceed 256 characters and cannot contain vertical lines (|), forward slashes (/), backslashes (\), colons (:), question marks (?), angle brackets (<>), asterisks (*), or quotation marks (").

    Description

    Enter a brief description of the ad hoc query.

    Select Directory

    Select the directory where the task is stored. The default is Temporary Code.

    If no directory is created, you can Create a Folder as follows:

    1. Click the image icon above the compute task list on the left side of the page to open the Create Folder dialog box.

    2. In the Create Folder dialog box, enter the folder Name and select the Directory location as needed.

    3. Click Confirm.

    Data Source Type

    Supports MySQL, Oracle, AnalyticDB for PostgreSQL, AnalyticDB for MySQL 2.0, AnalyticDB for MySQL 3.0, PostgreSQL, ClickHouse, Presto, StarRocks, Doris, SelectDB, and GaussDB (DWS) data source types.

    Note

    This parameter is only configurable when creating Database SQL Ad Hoc Queries.

    Datasource

    Select the data source for the ad hoc query. If there is no available data source, you can click +create Data Source to create one.

    Note

    This parameter is only configurable when creating Database SQL Ad Hoc Queries.

  5. Click Confirm to finalize the creation of the ad hoc query file.

  6. Write your query code according to the project's compute source or database SQL type. Once complete, click Run at the top of the editor.

  7. View the successful query results in the Console after execution.

Download result data

Once the query is executed, click the image icon in the console's upper right corner to download the results.

You can download either the complete dataset or a sample.

  • Complete data download: The complete dataset is downloaded in CSV format and may take some time. You can track the progress through the Message Center or the running record.

    Important

    Database SQL ad hoc queries only support downloading sample data, not the full dataset.

  • Sample Data Download: The format and running record for sample data downloads are set according to the data download approval settings. For more information, see Data Download Configuration.

Note
  • If download approval is required in the specification settings, you must submit a Data Download Request before you can proceed with the download. Once approved, you can check the approval task, view the current running result, or retrieve data from the running record. For more information, see Data Download Approval Configuration.

  • When the data download approval watermark feature is enabled, a watermark is automatically applied to the downloaded Excel file. The data download file remains available for 30 days. For instance, if the data query date is 2023-04-12 and the request date is 2023-04-13, the file will expire on 2023-05-13. For more information, see Data Download Approval Configuration.

Appendix: switch task types

With a Hadoop compute source and enabled Impala tasks, you can switch between IMPALA_SQL and HIVE_SQL task types.

  1. In the ad hoc query list, select the desired task, click the image icon next to the task name, and choose Modify Type.

  2. In the Modify Type dialog box, select the new type. The following image illustrates switching from an IMPALA_SQL to a HIVE_SQL task type. image.png

  3. Click Confirm to complete the switch.