All Products
Search
Document Center

Dataphin:Create a general project

Last Updated:Jan 21, 2025

A project is the basic organizational unit within Dataphin, serving as the primary boundary for multi-user isolation and access control. Once the Dataphin service is activated, you must use Dataphin through a project. This topic outlines the steps to create a project.

Prerequisites

Ensure the following conditions are met before beginning the operation:

  • To develop stream-batch integrated tasks, select and create a compute source that supports stream-batch integration based on your business requirements. For detailed instructions, refer to:

  • If Hadoop is your chosen compute engine for Dataphin and you need to use features like standard modeling, ad hoc queries, and Hive SQL compute tasks during data development, create a Hadoop compute source before establishing a project space. For detailed instructions, see Create a Hadoop compute source.

  • If TDH is your chosen compute engine for Dataphin and you need to use features like standard modeling, ad hoc queries, and Inceptor SQL compute tasks during data development, create a TDH Inceptor compute source before establishing a project space. For detailed instructions, see Create a TDH Inceptor compute source.

  • If MaxCompute is your chosen compute engine for Dataphin and you need to use features like standard modeling, ad hoc queries, and MaxCompute SQL compute tasks during data development, create a MaxCompute compute source before establishing a project space. For detailed instructions, see Create a MaxCompute compute source.

  • If Hologres is your chosen compute engine for Dataphin and you need to use features like ad hoc queries and Hologres SQL compute tasks during data development, create a Hologres compute source before establishing a project space. For detailed instructions, see Create a Hologres compute source.

  • If ArgoDB is your chosen compute engine for Dataphin and you need to use features like ad hoc queries and ArgoDB SQL compute tasks during data development, create an ArgoDB compute source before establishing a project space. For detailed instructions, see Create an ArgoDB compute source.

  • If StarRocks is your chosen compute engine for Dataphin and you need to use features like ad hoc queries and StarRocks SQL compute tasks during data development, create a StarRocks compute source before establishing a project space. For detailed instructions, see Create a StarRocks compute source.

Background information

Dataphin supports two development modes for projects:

  • Dev-Prod mode: This mode automatically generates a development environment (Dev project) and a production environment (Prod project) after project creation. The Prod project ensures data security in the production environment. Recommended for organizations with strong management requirements, clear division of labor among many data developers, and a higher budget for data computing and storage.

  • Basic mode: This mode automatically generates a Basic project that integrates development and production after project creation. The data production process is stable and convenient. Recommended for organizations prioritizing data development efficiency, with blurred functional boundaries among developers, and limited budgets for computing and storage.

    Important

    The Basic mode cannot be upgraded to the Dev-Prod mode, and there is a risk of directly changing production in the Basic mode. Please choose carefully.

    If you opt for the Basic mode, manage project members carefully to maintain data production stability.

Permission description

  • Super administrators, system administrators, and domain architects are authorized to create projects.

  • Super administrators, system administrators, and domain architects can manage permissions for data table read and write operations.

Procedure

  1. Navigate to the Dataphin home page, click Planning > Project in the top menu bar.

  2. On the Project Management page, click Create A General Project to open the Create Project dialog box.

  3. In the Create Project dialog box, select the desired project mode and click Next.

  4. In the Create Project dialog box, configure the necessary parameters. Both Dev-prod Mode and Basic Mode require the same parameters. The following example uses Dev-prod Mode.

    Parameter

    Description

    Belonging domain

    Data domain

    Select the data domain to which the project belongs.

    Basic information

    Common English name

    Enter the common English name of the project, adhering to the following naming conventions:

    • Include letters, numbers, and underscores (_).

    • Do not start with LD_.

    • Limit to a maximum of 64 characters.

    The English name for the development environment project will default to having a _dev suffix.

    When using MaxCompute as the compute engine, it is recommended that the common English name of the project align with the corresponding MaxCompute project name.

    Common name

    Enter the common name of the project, following these naming conventions:

    • Include Chinese characters, numbers, letters, underscores (_), or hyphens (-).

    • Do not start with LD_.

    • Limitto a maximum of 64 characters.

    Compute source type

    Select the compute source type and then choose the corresponding compute source.

    Important
    • A compute source associated with a project cannot be reassigned to another project.

    • Compute sources for Dev and Prod projects must be identical.

    Depending on the compute engines initialized by Dataphin, the available compute engine types for selection will vary. The details are as follows:

    • When Dataphin's compute engine is initialized to MaxCompute, select the compute engine type. Supported offline engines include MaxCompute and Hologres, while supported real-time engines include Alibaba Blink and Ververica Flink. When the offline compute engine is MaxCompute, Hologres can be used to configure the Platform for AI.

      Dataphin integrates with the Platform for AI, offering algorithm scheduling capabilities based on the Platform for AI. When creating a workspace for visual modeling on the Platform for AI, resource selection is based on the MaxCompute resource group. For more information, see Platform for AI compute resource group overview. If you enable the Platform for AI, configure the following parameters:

      • PAI region: Select the region where the Dataphin instance is located.

      • Access method: Choose the access method for the Platform for AI. Options include VPC and Public Network access methods.

      • AccessKey ID, AccessKey Secret: Provide the AccessKey ID and AccessKey Secret required for accessing the PAI account.

      • PAI project name: Choose the PAI project.

        It is recommended that the MaxCompute project associated with the current Dataphin project be consistent with the MaxCompute project linked to PAI.

    • When Dataphin's compute engine is initialized to ArgoDB or StarRocks, it supports real-time engines such as Ververica Flink, Alibaba Blink, and Flink.

    • When Dataphin's compute engine is initialized to Hadoop (non-FusionInsight), it supports the real-time engine Flink.

    • When Dataphin's compute engine is initialized to FusionInsight, it supports the real-time engine FusionInsight Flink.

    Project default resource group

    Tasks created within this project will default to using the configured project default resource group for scheduling. This is only configurable when the project's offline compute engine is enabled. You can also customize and modify the resource group for individual tasks during task configuration.

    • Only supports selecting resource groups with a normal status, usage scenarios including task daily scheduling, and resource groups that have an association relationship with the current project.

    • Modifying the default resource group here will automatically update the resource group for tasks configured with the project default resource group. If automatic updates are not desired, specify a separate custom resource group for the task. For more information, see Compute task resource configuration.

    Note
    • Configuration is only supported if the tenant has enabled the custom resource group feature. For more information, see Resource group overview.

    • Use the shared resource groups for scheduling of the current tenant, which is the tenant default resource group. Resource contention may occur during peak scheduling periods.

    Description information

    Provide a brief description of the project, not exceeding 128 characters.

    Business information

    Space type

    Identify the nature of project development tasks and output data. The default is the application layer. Space types include:

    • Intermediate layer: Typically used to store and process data to provide consistent, accurate, and clean data.

    • Source layer: Typically used to store raw data from business systems, serving as a foundation for subsequent processing and development.

    • Application layer: Focused on business needs, it defines personalized and diverse data metrics applicable to various scenarios.

    • General layer: Commonly used to store general aggregated data, such as aggregated data of a certain dimension within a specific subject area.

    Security settings

    Global security settings

    Security settings enable fine-grained control over data security and access, including setting the switch and authentication mode for Spark tasks to safeguard data. For more information, see Security settings.

    Data result download (download approval)

    Dataphin supports the download of business data. Configure whether project-level data can be downloaded. Once data is downloaded, it falls outside the system's control. To emphasize data security and discourage casual sharing, consider setting a watermark. For more information, see Data download configuration.

    Important

    Only non-guest role users are permitted to download data results to their local devices.

    Data permission approval

    The data permission approval policy allows for the specification of different approval rules based on data sensitivity levels, enabling approvers to concentrate on highly sensitive data. For public data, approval can be waived, reducing the administrative load. For more information, see Data permission configuration.

    Asset security policy

    Post-installation, use the data security policy to protect sensitive data. Modifications can be made within the Administration > Data Security > Project Security Policy section. For more information, see Project Security Policy.

    Release settings

    Release approval

    Enable this setting to require release approval for the object release process under this project.

    • Approver: Specify the approvers for object releases under this project. You can select up to 10 custom approvers, excluding project administrators.

    • Approval policy: Dataphin currently supports only Parallel Approval, which is the default selection and cannot be altered. Approval is granted if any approver consents, and the process is terminated if any approver declines.

    Task parameter configuration

    Flink task default parameter configuration

    • Upon enabling the Real-time Engine, enter the default Flink-related parameter configurations in the text box. These parameters will be pre-configured for future Flink tasks created under this project.

    • Parameters must be formatted as key-value pairs: key:value. For example: taskmanager.numberOfTaskSlots:1.

    More settings

    Default feature menu

    After selecting the data domain for the project, the system will automatically choose the appropriate feature menu based on the selected space type. You can adjust the selection according to your business needs.

    Note

    The default feature menu is not available when selecting the Hologres compute engine.

    Production environment periodic scheduling

    Note

    Basic projects are Periodically Scheduled.

    Enable, task automatic scheduling: Activating this setting will ensure that the status of new instances of periodic tasks aligns with the task status (historical instances are unaffected).

    Disable, task pause scheduling: Deactivating this setting will set new instances of periodic tasks to a paused status (historical instances are unaffected). Disabling periodic scheduling may have serious repercussions; proceed with caution.

    Note

    In the Dev environment, the default instance status will shift from not running to pause.

  5. Click Confirm to finalize the creation of the project.

What to do next

Once the project is created, you can proceed to the data development module to begin data development activities. For more information, see Data development overview.