This topic describes how to activate Dataphin.
Purchase description
You cannot purchase Dataphin instances with finance cloud accounts.
Dataphin supports auto-renewal. To enable this feature, select Auto-renewal in the software billing configuration section. Auto-renewal applies only to the Dataphin software. You must renew the underlying resources separately. The auto-renewal term is monthly and can be canceled at any time. For more information, see Renewal (Semi-managed).
After you purchase a Dataphin instance, service activation typically takes 2 to 3 hours. If the activation fails, contact the Dataphin operations and maintenance (O&M) and deployment team.
You can purchase and add value-added modules. For more information, see Billing description.
Notes
Before you activate Dataphin, take note of the following:
Before you make a purchase, contact Alibaba Cloud pre-sales consulting and provide your enterprise data construction requirements and background information. The pre-sales consulting team will confirm whether the current Dataphin version and its features meet your needs. After confirmation, they will grant you purchase permission.
Dataphin does not support unconditional refunds. Before you make a purchase, carefully confirm the Dataphin version.
To request a refund in special circumstances after the purchase, submit a ticket and contact your account manager. If your request meets the refund conditions, the system deducts fees based on your usage and processes the refund. Refunds are not provided for issues that are unrelated to the product.
Dataphin supports the subscription billing method.
If a Dataphin instance expires and is suspended, you can renew the instance only within the 15-day retention period.
Purchase a Dataphin instance
Log on to the Alibaba Cloud official website using your Alibaba Cloud account.
On the Alibaba Cloud website, hover over Products, then over Big Data Computing in the navigation pane, and click Intelligent Data Construction And Governance Dataphin under the Data Development And Service column.
On the Dataphin product page, click Management Console or Activate Now (Semi-managed) to open the purchase page.
On the purchase page, configure parameters such as the instance name, region, billing method, and subscription duration. You can also select value-added feature packages as needed.
Parameter
Description
Service Instance Name
Enter a name for the Dataphin instance. The name is displayed in the Dataphin console. The name cannot be changed after the instance is created. Enter the name with caution.
The name can contain digits, letters, hyphens (-), and underscores (_). The name can be up to 64 characters long.
The name must follow the naming convention for resources in the service.
Region
Select the Region where you want to purchase the Dataphin instance. The supported Regions are China (Shanghai), China (Shenzhen), China (Beijing), China (Hangzhou), Singapore, and Germany (Frankfurt).
Software Billing Configuration
Billing Method
Only the Subscription billing method is supported.
Subscription Duration
The available subscription durations are 1 Month, 1 Year, 2 Years, and 3 Years. The default is 1 Month.
Auto-renewal
If you enable this feature, only the Dataphin software is automatically renewed. You must renew the underlying resources separately.
Dataphin Feature Selection
Data Processing Unit
The default DPU is 500. Available options are 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, and 10000.
Real-time Development (Optional)
The Standard Edition specification is available. If you do not need this feature, select Do Not Select.
Artificial Intelligence For IT Operations (Optional)
This feature includes baseline monitoring and throttling configuration to ensure the timeliness of business data and system stability. This feature also reduces manual operations and maintenance (O&M) costs and improves O&M efficiency.
The default offering is 3 Baselines + 1 Throttling Rule (Free). The Standard Edition is available for higher specifications.
Data Standard (Optional)
The available specification is Standard Edition. If you do not need this feature, select Do Not Select.
Asset Quality (Optional)
The available editions are Domain Edition, Global Edition, and Domain Edition + Global Edition. If you do not need this feature, select Do Not Select.
Asset Security (Optional)
The available specification is Standard Edition. If you do not need this feature, you can select Do Not Select.
Resource Administration (Optional)
Standard Edition is the available specification. If you do not need this feature, you can select Do Not Select.
OpenAPI
This feature is available in the Standard Edition. If you do not need this feature, select Do Not Select.
DataService Studio (Optional)
The available specification is Api.base (maximum 500 QPS/50 Concurrent). If you do not need this feature, select Do Not Select.
NoteQPS stands for queries per second. This is the average number of API requests that can be processed per second.
Concurrency is the number of API requests that can be processed at the same time.
QPS, concurrency, and response time (RT) are closely related. If the concurrency is fixed, the QPS decreases as the API RT increases. The following formula expresses this relationship:
QPS = Concurrency / RT (in seconds).Real-time Integration (Optional)
Standard Edition is the available specification. If you do not need this feature, select Do Not Select.
Tag Factory (Optional)
The available specifications are Offline Edition, Real-time Edition, Offline Edition + Audience Group Selection, Offline Edition + Real-time Edition, or Offline Edition + Audience Group Selection + Audience Group Permissions. If you do not need this feature, select Do Not Select.
Row-level Permissions (Optional)
The specification for this feature is Standard Edition. If you do not need this feature, select Do Not Select.
Number Of Tenants
You can create 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 tenants in a single Dataphin instance, and each tenant can use a different compute engine.
Register Scheduling Cluster (Optional)
The available specification is Standard Edition. If you do not need this feature, select Do Not Select.
Metadata Acquisition (Optional)
Supported specifications include Default Version and Big Data Engine.
Metadata Management (Optional)
This feature is available in the Standard Edition. If you do not need this feature, select Do Not Select.
Asset Operation (Optional)
The Standard Edition specification is available. If you do not need this feature, select Do Not Select.
Dataphin Domain Name Settings
Product Endpoint
The endpoint used to access the Dataphin instance. The endpoint cannot be the same as the OpenAPI endpoint or the DataService Studio endpoint. For example, dataphin.yourcompany.com.
The endpoint can contain lowercase letters, digits, hyphens (-), and periods (.). A hyphen cannot be used alone or consecutively. A hyphen cannot be placed at the beginning or end of the endpoint. The endpoint can be up to 63 characters long.
OpenAPI Endpoint
This configuration item is available only when the OpenAPI feature is enabled.
This is the endpoint used to access the Dataphin instance through OpenAPI. The endpoint cannot be the same as the product endpoint or the DataService Studio endpoint. For example, dataphin-openapi.yourcompany.com.
The naming convention is the same as that for the product endpoint.
DataService Studio Endpoint
This configuration item is available only when the DataService Studio feature is enabled.
The endpoint that points to the DataService Studio application. The endpoint cannot be the same as the product endpoint or the OpenAPI endpoint. For example, dataphin-dataservice.yourcompany.com.
The naming convention is the same as that for the product endpoint.
Enable Public Network Access
If you enable this feature, the system automatically creates a pay-by-data-transfer Elastic IP address (EIP) for the Ingress load balancer (LB) instance. After you bind the EIP to the domain name in your local hosts file, you can access the Dataphin instance from outside your office network.
If you want to allow access to the Dataphin instance only from your office network, disable this feature.
Key Configuration
Key Pair Name
The key pair used to log on to the Elastic Compute Service (ECS) machine.
Network Configuration
Zone 1, Zone 2
All underlying resources of Dataphin are deployed across multiple zones to provide a disaster recovery solution.
VPC ID
Select a Virtual Private Cloud (VPC) in which you want to deploy the Dataphin instance. Select the VPC with caution. You cannot change the VPC after the instance is created.
VSwitch ID 1, VSwitch ID 2
Select two vSwitches that belong to the selected VPC.
Pod CIDR Block, Service CIDR Block
Container Service for Kubernetes (ACK) is one of the underlying resources on which Dataphin depends. When you deploy an ACK cluster, the Flannel network plug-in is used, and you must specify a CIDR block. For more information, see Comparison between Terway and Flannel.
The Pod CIDR block and the Service CIDR block are virtual network segments. These CIDR blocks cannot overlap with the vSwitch CIDR block of the VPC or with each other. For example, if the VPC uses the 172.16.0.0/12 CIDR block, you cannot use CIDR blocks such as 172.16.0.0/16 or 172.17.0.0/16 for Kubernetes pods because these CIDR blocks are included in 172.16.0.0/12.
The number of reserved IP addresses in the Pod and Service CIDR blocks affects the concurrency of Dataphin nodes. We recommend that you reserve more than 2,048 IP addresses, which means the subnet mask length should not be greater than 21. Select the CIDR blocks with caution because you cannot change them after the instance is created. For more information, see Flannel network mode.
Auto-configure NAT Gateway
Allows cluster nodes and applications to access the public network.
If you enable this feature and a NAT Gateway already exists in the selected VPC, ACK uses that gateway by default and automatically configures a source network address translation (SNAT) rule. If no NAT Gateway exists in the selected VPC, ACK automatically creates a NAT Gateway and configures an SNAT rule.
If you disable this feature, you must ensure that the ACK cluster can access the public network. Dataphin deployment requires public network access to pull images. Otherwise, the deployment may fail.
Advanced configuration
Application Node Pool Instance Type
Select the appropriate node specifications and quantity based on the application deployment mode. Avoid mixing instance types of different sizes. The supported specifications are 16 VCPU, 128 GiB (ecs.r9i.4xlarge), 16 VCPU, 128 GiB (ecs.r8i.4xlarge), 16 VCPU, 128 GiB (ecs.u2i-c1m8.4xlarge), 16 VCPU, 128 GiB (ecs.r7.4xlarge), 16 VCPU, 128 GiB (ecs.u1-c1m8.4xlarge), and 16 VCPU, 128 GiB (ecs.hfr7.4xlarge).
NoteThe maximum resource requirement for High-availability Mode is 40 vCPUs and 320 GiB of memory. The maximum resource requirement for non-high-availability mode is 20 vCPUs and 160 GiB of memory.
Initial Number Of Nodes In The Application Node Pool
Configure the initial number of nodes in the node pool based on the application deployment mode. The recommended number is 3, and the minimum is 2. If the number of nodes is 2, high availability is not guaranteed.
Scheduling Node Pool Instance Type
Select an instance type for the node pool that meets your scheduling task requirements. The supported instance types are 24 VCPU, 96 GiB (ecs.g9i.6xlarge), 24 VCPU, 96 GiB (ecs.g8i.6xlarge), 24 VCPU, 96 GiB (ecs.g7.6xlarge), 24 VCPU, 96 GiB (ecs.hfg7.6xlarge), and 24 VCPU, 96 GiB (ecs.hfg6.6xlarge).
Initial Number Of Nodes In The Scheduling Node Pool
Configure the initial number of nodes in the node pool based on the number of scheduling tasks required. The recommended number is 2, and the minimum is 1. If the number of nodes is 1, high availability is not guaranteed.
Number Of Replicas For Dataphin Application Deployment
Before increasing the number of replicas, ensure that the application node pool has sufficient resources. The default value is 2.
PostgreSQL Specifications
Select a specification to set the maximum number of connections for the PostgreSQL database. The supported specifications are 4 VCPU, 16 GiB (1,600 Maximum Connections), 8 VCPU, 16 GiB (1,600 Maximum Connections), and 16 VCPU, 32 GiB (3,200 Maximum Connections).
Initial Disk Size Of The PostgreSQL Database (GB)
This is mainly used to store scheduling instances and business metadata. The default size is 400 GB. You can configure a storage space between 200 GB and 400 GB, with an increment of 5 GB.
After carefully reviewing the purchase information, click Next: Confirm Order.
On the Confirm Order page, confirm the instance type for the Dataphin instance. Then, click Intelligent Data Construction and Governance Service Agreement next to Terms of Service and read the service agreement. After you agree to the terms, select I have read and agree to the Intelligent Data Construction and Governance Service Agreement and click Pay.
What to do next
After you activate Dataphin, you must obtain the initial Alibaba Cloud account and IP address and bind the host for subsequent data development. For more information, see Perform a cold start after deployment.