A project is the basic organizational unit in Dataphin and serves as the primary boundary for multi-user isolation and access control. After you activate the Dataphin service, you must create a project to use the platform. This topic describes how to create a project.
Prerequisites
Before you start, make sure that the following requirements are met:
To develop tasks that integrate stream and batch processing, you must create a compute source that supports stream-batch integration based on your business needs. For more information, see:
If you select MaxCompute as the compute engine for Dataphin and need to use features such as standardized modeling, ad hoc queries, or MaxCompute SQL compute tasks, you must create a MaxCompute compute source before you create a project. For more information, see Create a MaxCompute compute source.
If you select MaxCompute as the compute engine, you can also create a Hologres compute source. After you attach the Hologres compute source to a project, you can use features such as ad hoc queries and HOLOGRES_SQL compute tasks. For more information, see Create a Hologres compute source.
If you select Hadoop as the compute engine for Dataphin and need to use features such as standardized modeling, ad hoc queries, or Hive SQL compute tasks, you must create a Hadoop compute source before you create a project. For more information, see Create a Hadoop compute source.
If you select Transwarp TDH as the compute engine for Dataphin and need to use features such as standardized modeling, ad hoc queries, or INCEPTOR_SQL compute tasks, you must create a TDH Inceptor compute source before you create a project. For more information, see Create a TDH Inceptor compute source.
If you select ArgoDB as the compute engine for Dataphin and need to use features such as ad hoc queries or ARGODB_SQL compute tasks, you must create an ArgoDB compute source before you create a project. For more information, see Create an ArgoDB compute source.
If you select StarRocks as the compute engine for Dataphin and need to use features such as ad hoc queries or STARROCKS_SQL compute tasks, you must create a StarRocks compute source before you create a project. For more information, see Create a StarRocks compute source.
If you select Amazon EMR as the compute engine for Dataphin and need to use features such as ad hoc queries or compute tasks, you must create an Amazon EMR compute source before you create a project. For more information, see Create an Amazon EMR compute source.
If you select SelectDB or Doris as the compute engine for Dataphin and need to use features such as ad hoc queries or SELECTDB_SQL or DORIS_SQL compute tasks, you must create a SelectDB or Doris compute source before you create a project. For more information, see Create a SelectDB or Doris compute source.
Background information
Dataphin supports projects in the following two development modes:
Dev-Prod mode: When you create a project in this mode, the system automatically generates a development environment (Dev project) and a production environment (Prod project). This separation ensures data security in the production environment. This mode is recommended if you have complex management needs, a large team of data developers with clear roles, and a sufficient budget for computing and storage.
Basic mode: When you create a project in this mode, the system automatically generates a Basic project, which integrates the development and production environments. This mode simplifies the data production process and is recommended if you prioritize development efficiency, have developers with flexible roles, and have a limited budget for computing and storage.
Permission description
Super administrators, system administrators, and organizational unit architects can create projects.
Super administrators, system administrators, and organizational unit architects can enable or disable permission requests for reading from and writing to data tables.
Procedure
On the Dataphin home page, choose Planning > Projects from the top menu bar.
On the Project Management page, click Create General Project to open the Create Project dialog box.
In the Create Project dialog box, select Dev-Prod Mode or Basic Mode, and then click Next.
ImportantYou cannot upgrade a project from Basic mode to Dev-Prod mode. Basic mode also presents the risk of direct changes to the production environment. Therefore, choose the mode carefully.
If you choose Basic mode, you must carefully manage project members to ensure the stability of data production.
In the Create Project dialog box, configure the parameters.
The parameters for Dev-Prod Mode and Basic Mode are the same. The following example uses Dev-Prod Mode.
Parameter
Description
Business Unit
Data Section
Select the business unit to which the project belongs.
Basic Information
Common English Name
Enter the common English name for the project. The naming convention is as follows:
Can contain letters, digits, and underscores (_).
Cannot start with LD_.
Cannot exceed 64 characters in length.
The English name of a development environment project has the `_dev` suffix by default.
NoteIf the compute engine is MaxCompute, we recommend that you set the Common English Name of the project to be the same as the corresponding MaxCompute project name.
Common Name
Enter the common name for the project. The naming convention is as follows:
Can contain Chinese characters, digits, letters, underscores (_), and hyphens (-).
Cannot start with LD_.
Cannot exceed 64 characters in length.
Compute Source Type
Select a compute source type, and then select the corresponding compute source.
ImportantA compute source that is attached to a project cannot be attached to another project.
The compute sources for the Dev project and the Prod project must be the same.
If the Dataphin compute engine is initialized as MaxCompute, you can select MaxCompute or Hologres as the offline engine. If you select MaxCompute, you can also enable Machine Learning PAI.
Dataphin integrates with Platform for AI (PAI) to provide basic algorithm scheduling. In PAI, when you create a workspace for visual modeling, select a MaxCompute-based computing resource group. For more information, see the AI computing resource group overview in Platform for AI. If you enable PAI, configure the following parameters.
PAI Region: Select the same region as the Dataphin instance.
Access Method: Select the access method for PAI. The VPC and Internet access methods are supported.
AccessKey ID, AccessKey Secret: Configure the AccessKey ID and AccessKey secret of the account that needs to access PAI.
PAI Project Name: Select a PAI project.
We recommend that the MaxCompute project attached to the current Dataphin project is the same as the MaxCompute project attached to PAI.
If you select StarRocks as the offline engine and the engine is from a database in an External catalog, standardized modeling, writing data using Data Integration, and downloading the complete results of ad hoc queries are not supported.
Project Default Resource Group
Tasks created in this project use the default resource group configured here for scheduling. This parameter is available only if the offline compute engine is enabled for the project. You can also customize the resource group for a specific task in the task configuration.
You can select only resource groups that are in the Normal status, are used for Daily Task Scheduling, and are associated with the current project.
After you change the default resource group, tasks whose schedule resource is set to Project Default Resource Group automatically use the new resource group. If you do not want the resource group to be automatically updated, specify a separate custom resource group for the task. For more information, see Configure computing resources for a task.
NoteThis parameter is available only if the custom resource group feature is enabled for the tenant. For more information, see Overview of resource groups.
Uses the shared resource groups for scheduling of the current tenant, which is the default resource group for the tenant. Resource contention may occur during peak scheduling hours.
Description
Enter a brief description of the project. The description cannot exceed 128 characters in length.
Business Information
Workspace Type
Distinguishes the development tasks and output data features of the project. The default value is Application Layer. The following workspace types are available:
Intermediate layer: Stores and processes data to provide consistent, accurate, and clean data.
Source layer: Stores raw data integrated from business systems to provide data sources for subsequent processing and development.
Application layer: Defines personalized and diverse data metrics for different business scenarios.
Common layer: Stores common summary data, such as summary data for a dimension in a subject area.
Security Settings
Global Security Settings
Security settings let you apply fine-grained control over data security and access. You can also configure settings and authentication modes for Spark tasks to ensure data security. For more information, see Security settings.
Data Result Download (Download Approval)
Dataphin supports business data download. You can configure whether project-level data can be downloaded. After data is downloaded, it is no longer under system control. You can add watermarks to promote data security and prevent unauthorized sharing. For more information, see Configure data download.
ImportantOnly users who are not assigned the Visitor role can download data results to a local device.
Database Permission Approval
The database permission approval policy lets you specify different approval rules for different data sensitivity levels. This allows approvers to focus on highly sensitive data and exempts public data from approval, which reduces the workload of permission approval. For more information, see Configure database permissions.
Asset Security Policy
After installation, you can use data security policies to protect sensitive data. You can modify the policies in the Administration > Data Security > Project Security Policy module. For more information, see Project security policy.
Submit Settings
Code Review
This feature is disabled by default. If you enable it, you must also configure a Code Reviewer. After code review is enabled, the code of compute tasks in this project must be reviewed before submission.
By default, Administrator is selected as the Code Reviewer. You can also select Custom to select multiple members for approval.
Publish Settings
Publish Approval
If you enable this feature, you must configure the Approval Settings. The publishing process for objects in this project must then go through release approval.
Specify Approvers: The process is approved if any approver approves it and is stopped if any approver rejects it. You can select Administrator or Custom. If you select Custom, you must select one to ten Approvers.
Specify Approval Template: Approval is based on the selected approval template. If no suitable template is available, click +Add Template to go to the Approval Workflow Template page and create a template. For more information, see Create and manage approval templates.
Task Parameter Configuration
Flink Task Default Parameter Configuration
After you enable the Real-time Engine, enter Flink parameter configurations in the text box. When you create Flink tasks in this project, these parameters are used by default.
The parameters must be in the key-value format:
key:value. Example: `taskmanager.numberOfTaskSlots:1`.
More Settings
Default Feature Menu
After you select the business unit for the project, the system selects the corresponding feature menus based on the workspace type that you selected. You can change the selection as needed.
NoteDefault feature menus are not supported if you select Hologres as the compute engine.
Production Environment Periodic Scheduling
NoteFor Basic projects, this parameter is named Periodic Scheduling.
Enable, Tasks Are Automatically Scheduled: If enabled, new instances of periodic tasks in this project have the same status as the task. Historical instances are not affected.
Disable, Tasks Skip Execution: If disabled, new instances of periodic tasks in this project are paused. Historical instances are not affected. Disabling periodic scheduling may have serious consequences. Proceed with caution.
NoteIn the Dev environment, the instance status changes from Not Run to Paused by default.
Click OK to create the project.
What to do next
After you create a project, you can navigate to the Data Development module to develop data. For more information, see Overview of Data Development.