This topic describes how to activate the Dataphin service so you can use its features.
Purchase description
Dataphin instances cannot be purchased with finance cloud accounts.
Dataphin supports auto-renewal. To enable this feature, select Auto-renewal in the software billing configuration section. Auto-renewal applies only to the Dataphin software. You must renew the underlying resources separately. The auto-renewal cycle is monthly and can be canceled at any time. For more information, see Renewal (Semi-managed).
After you purchase a Dataphin instance, service activation typically takes 2 to 3 hours. If the activation fails, contact the Dataphin operations and maintenance (O&M) and deployment team.
You can purchase and add various value-added modules. For more information, see Billing description.
Notes
Before you activate Dataphin, note the following items:
Before you make a purchase, contact Alibaba Cloud pre-sales consulting and provide your enterprise's data infrastructure requirements and background information. The pre-sales consulting team will confirm whether the current Dataphin version and its features meet your needs and then grant you purchase permission.
Dataphin does not support unconditional refunds. Before you make a purchase, carefully confirm the Dataphin version.
To request a refund in special circumstances after the purchase, submit a ticket and contact your account manager. If your request meets the refund conditions, the system deducts fees based on your usage and processes the refund. Refunds are not provided for issues unrelated to the product.
Dataphin supports the subscription billing method.
If a Dataphin instance expires and is suspended, you can renew the instance only within the 15-day retention period.
Purchase a Dataphin instance
Log on to the Alibaba Cloud official website with your Alibaba Cloud account.
On the Alibaba Cloud website, hover over Products, then over Big Data Computing in the navigation pane on the left, and click Data Development And Service under Intelligent Data Construction And Governance Dataphin.
On the Dataphin product page, click Management Console or Activate Now (Semi-managed) to go to the Dataphin purchase page.
On the purchase page, configure parameters such as the instance name, region, billing method, and subscription duration. You can also select value-added feature packages as needed.
Parameter
Description
Service Instance Name
Enter a name for the Dataphin instance. The name is displayed in the Dataphin console. The name cannot be changed after the instance is created. We recommend that you choose the name with caution.
The name can contain digits, letters, hyphens (-), and underscores (_). The name can be up to 64 characters long.
The name must follow the naming convention for resources in the service.
Region
Select the Region where you want to purchase the Dataphin instance. The supported Regions are China (Hangzhou), China (Shanghai), China (Shenzhen), China (Guangzhou), China (Beijing), China (Hohhot), China (Ulanqab), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Indonesia (Jakarta), and US (Virginia).
Software Billing Configuration
Billing Method
The only supported billing method is Subscription.
Subscription Duration
The available subscription durations are 1 Month, 1 Year, 2 Years, and 3 Years. The default duration is 1 Month.
Auto-renewal
If you enable this feature, only the Dataphin software is automatically renewed. You must renew the underlying resources separately.
Dataphin Feature Selection
Data Processing Unit
The default data processing unit (DPU) is 500. The available specifications are 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, and 10000.
Real-time Development (Optional)
The Standard Edition feature is available. If you do not need this feature, select Do Not Select.
Artificial Intelligence For IT Operations (Optional)
This feature includes baseline monitoring and throttling configuration to ensure the timeliness of business data and system stability. This feature also reduces manual O&M costs and improves O&M efficiency.
The default offering is 3 Baselines + 1 Throttling Rule (Free). The Standard Edition offers higher specifications.
Data Standard (Optional)
Standard Edition is the only available specification. If you do not need this feature, select Do Not Select.
Asset Quality (Optional)
Available editions include Domain Edition, Global Edition, and Domain Edition + Global Edition. Select Do Not Select if you do not need this feature.
Asset Security (Optional)
This feature is available in the Standard Edition. If you do not need this feature, you can select Do Not Select.
Resource Management (Optional)
The Standard Edition specification is available. If you do not need this feature, you can select Do Not Select.
OpenAPI
This feature is available in the Standard Edition. If you do not need it, select Do Not Select.
DataService Studio (Optional)
The available specification is api.base (maximum 500 QPS and 50 concurrent requests). If you do not need this feature, select Do Not Select.
NoteQPS stands for queries per second. It represents the average number of API requests that can be processed per second.
Concurrency is the number of API requests that can be processed at the same time.
QPS, concurrency, and response time (RT) are closely related. If the concurrency is fixed, the QPS decreases as the API RT increases. The following formula shows this relationship:
QPS = Concurrency / RT (in seconds).Real-time Integration (Optional)
This feature is available in the Standard Edition. If you do not need this feature, select Do Not Select.
Tag Factory (Optional)
The following specifications are available: Offline Edition, Real-time Edition, Offline Edition + Audience Group Selection, Offline Edition + Real-time Edition, or Offline Edition + Audience Group Selection + Audience Group Permissions. If you do not need this feature, select Do Not Select.
Row-level Permissions (Optional)
This feature is available in the Standard Edition. If you do not want this feature, select Do Not Select.
Number Of Tenants
You can create 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 tenants in a single Dataphin instance, and each tenant can use a different compute engine.
Register Scheduling Cluster (Optional)
This feature is available in the Standard Edition. If you do not need this feature, you can select Do Not Select.
Metadata Acquisition (Optional)
The supported specifications are Default Version and Big Data Engine.
Metadata Management (Optional)
The Standard Edition is the available specification. If you do not need this feature, select Do Not Select.
Asset Operation (Optional)
The only available specification is Standard Edition. To opt out of this feature, select Do Not Select.
Dataphin Domain Name Settings
Product Endpoint
The endpoint used to access the Dataphin product instance. The endpoint cannot be the same as the OpenAPI endpoint or the DataService Studio endpoint. For example, dataphin.yourcompany.com.
The endpoint can contain lowercase letters, digits, hyphens (-), and periods (.). A hyphen cannot be used alone or consecutively. A hyphen cannot be placed at the beginning or end of the endpoint. The endpoint can be up to 63 characters long.
OpenAPI Endpoint
This configuration item is available only if the OpenAPI feature is enabled.
The endpoint used to access the Dataphin product instance through OpenAPI. The endpoint cannot be the same as the product endpoint or the DataService Studio endpoint. For example, dataphin-openapi.yourcompany.com.
The naming convention is the same as that for the product endpoint.
Data Service Endpoint
You can use this configuration item only when the DataService Studio feature is enabled.
The endpoint that points to the DataService Studio application. The endpoint cannot be the same as the product endpoint or the OpenAPI endpoint. For example, dataphin-dataservice.yourcompany.com.
The naming convention is the same as that for the product endpoint.
Enable Public Network Access
If you enable this feature, the system automatically creates a pay-by-data-transfer elastic IP address (EIP) for the Ingress load balancer (LB) instance. After you bind the EIP to the domain name in your local hosts file, you can access the Dataphin instance from outside your office network.
If you want to allow access to the Dataphin instance only from your office network, disable this feature.
Key Configuration
Key Pair Name
The key pair used to log on to the ECS instance.
Network Configuration
Zone 1, Zone 2
All underlying resources of Dataphin are deployed across multiple zones to provide a disaster recovery solution.
VPC ID
Select a VPC in which to deploy the Dataphin instance. Choose the VPC with caution. You cannot change the VPC after the instance is created.
VSwitch ID 1, VSwitch ID 2
Select two vSwitches that belong to the selected VPC.
Pod CIDR Block, Service CIDR
Container Service for Kubernetes (ACK) is one of the underlying resources on which Dataphin depends. When you deploy an ACK cluster, the Flannel network plug-in is used, and you must specify a CIDR block. For more information, see Comparison between Terway and Flannel.
The Pod CIDR block and Service CIDR block are virtual CIDR blocks. These CIDR blocks cannot overlap with the vSwitch CIDR block of the VPC or with each other. For example, if the VPC uses the 172.16.0.0/12 CIDR block, you cannot use CIDR blocks such as 172.16.0.0/16 or 172.17.0.0/16 for Kubernetes pods because these CIDR blocks are subnets of 172.16.0.0/12.
The number of reserved IP addresses in the Pod CIDR and Service CIDR blocks affects the concurrency of Dataphin tasks. We recommend that you reserve more than 2,048 IP addresses, which means the subnet mask length must not be greater than 21. Select the CIDR blocks with caution because they cannot be changed after the instance is created. For more information, see Flannel network mode.
Auto-configure NAT Gateway
Allows cluster nodes and applications to access the public network.
If you enable this feature and a NAT Gateway already exists in the selected VPC, ACK uses that gateway by default and automatically configures an SNAT rule. If no NAT Gateway exists in the selected VPC, ACK automatically creates a NAT Gateway and configures an SNAT rule.
If you disable this feature, you must ensure that the ACK cluster can access the public network. Dataphin deployment requires public network access to pull images. Otherwise, the deployment may fail.
Advanced configuration
Application Node Pool Instance Type
Select the appropriate node specifications and number of nodes based on the application deployment mode. Avoid mixing instance types of different sizes. The supported specifications are 16 vCPUs, 128 GiB (ecs.r9i.4xlarge), 16 vCPUs, 128 GiB (ecs.r8i.4xlarge), 16 vCPUs, 128 GiB (ecs.u2i-c1m8.4xlarge), 16 vCPUs, 128 GiB (ecs.r7.4xlarge), 16 vCPUs, 128 GiB (ecs.u1-c1m8.4xlarge), and 16 vCPUs, 128 GiB (ecs.hfr7.4xlarge).
NoteThe maximum resource requirement for High-availability Mode is 40 vCPUs and 320 GiB of memory. The maximum resource requirement for non-high-availability mode is 20 vCPUs and 160 GiB of memory.
Initial Number Of Nodes In The Application Node Pool
Configure the initial number of nodes in the node pool based on the application deployment mode. The recommended number is 3, and the minimum is 2. If the number of nodes is 2, high availability is not guaranteed.
Scheduling Node Pool Instance Type
Select an instance type for the node pool that meets your scheduling task requirements. The supported instance types are 24 vCPUs, 96 GiB (ecs.g9i.6xlarge), 24 vCPUs, 96 GiB (ecs.g8i.6xlarge), 24 vCPUs, 96 GiB (ecs.g7.6xlarge), 24 vCPUs, 96 GiB (ecs.hfg7.6xlarge), and 24 vCPUs, 96 GiB (ecs.hfg6.6xlarge).
Initial Number Of Nodes In The Scheduling Node Pool
Configure the initial number of nodes in the node pool based on the number of scheduling tasks required. The recommended number is 2, and the minimum is 1. If the number of nodes is 1, high availability is not guaranteed.
Number Of Replicas For Dataphin Application Deployment
Before increasing the number of replicas, ensure that the application node pool has sufficient resources. The default value is 2.
PostgreSQL Specifications
Select one of the following specifications to set the maximum connections for the PostgreSQL database: 4 vCPUs, 16 GiB (1,600 maximum connections), 8 vCPUs, 16 GiB (1,600 maximum connections), and 16 vCPUs, 32 GiB (3,200 maximum connections).
Initial Disk Size Of The PostgreSQL Database (GB)
This is mainly used to store scheduling instances and business metadata. The default size is 400 GB. You can configure a storage space between 200 GB and 400 GB in 5 GB increments.
Carefully review the purchase information. If the information is correct, click Next: Confirm Order.
On the Confirm Order page, confirm the specifications of the Dataphin instance. Then, click Terms of Service and then Intelligent Data Construction and Governance Service Agreement. Carefully read the service agreement. If you agree to the terms, select I have read and agree to the Intelligent Data Construction and Governance Service Agreement and click Pay.
What to do next
After you activate Dataphin, you must obtain the initial Alibaba Cloud account and IP address and bind the host for subsequent data development. For more information, see Perform a cold start after deployment.