Session clusters are suitable for development and test environments in non-production environments. You can deploy or debug jobs in a session cluster to improve the resource utilization of a JobManager and accelerate the job startup.

Background information

Fully managed Flink supports per-job clusters and session clusters. The two types of clusters have the following differences:
  • Per-Job Clusters: By default, fully managed Flink deploys or debugs jobs in a per-job cluster. Each job requires a separate JobManager to achieve resource isolation between jobs. Therefore, the resource utilization of JobManagers for jobs that process a small amount of data is low. This type of cluster is suitable for jobs that consume a large number of resources or jobs that run in a continuous and stable manner.
  • Session Clusters: This type of cluster allows multiple jobs to use the same JobManager, which increases the resource utilization of the JobManager. If multiple jobs run on the same JobManager, the stability of jobs is affected. Session clusters do not support the monitoring and alerting feature for a single job. Therefore, session clusters are suitable only when you test jobs.
    Note
    • You can configure multiple session clusters for each project. However, you can enable Use for SQL Editor Previews for only one session cluster. For more information about this configuration item, see the description of Use for SQL Editor Previews in this topic.
    • When you create a session cluster, the cluster resources are consumed regardless of whether you use the session cluster. The resource consumption is based on the configurations that you select when you create the cluster.
    • An addition of 0.5 compute units (CUs) are consumed after a session cluster that uses Ververica Runtime (VVR) 3.0.4 or later is run.

Limits

Session clusters have the following limits:
  • Metrics of session clusters cannot be displayed.
  • Session clusters do not support the monitoring and alerting feature.
  • Session clusters do not support the Autopilot feature.

Precautions

  • Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in production environments. If you use session clusters in production environments, the following stability issues may occur:
    • If a JobManager is faulty, all jobs of a cluster that runs on the JobManager are affected.
    • If a TaskManager is faulty, the jobs that have tasks running on the TaskManager are affected.
    • If processes are not isolated for tasks that run on the same TaskManager, the tasks may be affected by each other.
  • We recommend that you do not run jobs in a session cluster for which Use for SQL Editor Previews is enabled. If you run jobs in a session cluster for which Use for SQL Editor Previews is enabled, you must stop the session cluster and manually configure the Flink engine when a new version is released. Otherwise, an error is returned during local debugging.
  • If the session cluster uses the default configurations, take note of the following points:
    • For a single small job, we recommend that the total number of such jobs in a cluster be no more than 100.
    • For complex jobs, we recommend that the number of parallel jobs be no more than 512, and the number of clusters in which 64 medium-sized jobs run in parallel be no more than 32. Otherwise, issues such as heartbeat timeout may occur and the stability of the cluster may be affected. In this case, you must increase the heartbeat interval and heartbeat timeout period.
    • If you want to run more tasks at the same time, you must increase the resource configuration of the session cluster.

Create a session cluster

  1. Log on to the Realtime Compute for Apache Flink console.
  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
  3. In the left-side navigation pane, choose Administration > Session Clusters.
  4. In the upper-right corner of the page, click Create Session Cluster.
  5. Configure the parameters.
    Create Session ClusterThe following table describes the parameters.
    Section Parameter Description
    Standard Name The name of the cluster.
    State The desired state of the cluster. Valid values:
    • STOPPED: The cluster is stopped after it is configured, and the jobs in the cluster are also stopped.
    • RUNNING: The cluster keeps running after it is configured.
    Use for SQL Editor Previews Specifies whether to use this session cluster for SQL previews. For more information, see Debug a job.
    Note You can enable Use for SQL Editor Previews for only one session cluster in a project. If you turn on this switch for the current cluster, the setting of another cluster for which this feature is enabled becomes invalid.
    Label key You can configure labels for jobs in the Labels section. This allows you to find a job on the Overview page in an efficient manner.
    Label value N/A.
    Configuration Engine Version The version of the Flink engine that is used by the current job.
    Note For Python API jobs, you must select vvr-2.1.4-flink 1.11 or later.
    Flink Restart Policy Valid values:
    • No Restarts: No jobs are restarted.
    • Fixed Delay: Jobs are restarted with a delay. The delay period is fixed. If you select this option, you must also configure Number of Restart Attempts and Delay between Restart Attempts.
    • Failure Rate: the failure rate. If you select this option, you must also configure Failure Rate Interval, Max Failures per Interval, and Delay between Restart Attempts.
    Note If you leave this parameter empty, the default Apache Flink restart policy is used. In this case, if a task fails and checkpointing is disabled, the JobManager is not restarted. If you enable checkpointing, the JobManager is restarted.
    Additional Configuration Configure other Flink settings, such as taskmanager.numberOfTaskSlots: 1.
    Resources Number of TaskManagers By default, the value is the same as the parallelism.
    Job Manager CPUs Default value: 1.
    JobManager Memory The minimum value is 1 GiB. We recommend that you set this parameter to 2 GiB. We recommend that you use GiB or MiB as the unit. For example, you can set this parameter to 1024 MiB or 1.5 GiB.
    We recommend that you configure JobManager resources and heartbeat-related parameters for the JobManager. When you configure the JobManager, take note of the following points:
    • The JobManager provides features, such as TaskManager heartbeat, task serialization, and resource scheduling. Therefore, we recommend that the resource configuration for the JobManager be no less than the default configuration. Handle this issue based on the workload on your cluster.
    • To ensure cluster stability, you must prevent heartbeat timeout that is caused by the busy main thread of the JobManager. Therefore, we recommend that you set the heartbeat interval to at least 10 seconds and the heartbeat timeout period to at least 50 seconds. The heartbeat interval is specified by the heartbeat.interval parameter and the heartbeat timeout period is specified by the heartbeat.timeout parameter. You can increase the values of these parameters based on the number of TaskManagers and the increase in the number of jobs.
    Task Manager CPUs Default value: 2.
    TaskManager Memory The minimum value is 1 GiB. We recommend that you set this parameter to 8 GiB. We recommend that you use GiB or MiB as the unit. For example, you can set this parameter to 1024 MiB or 1.5 GiB.
    We recommend that you specify the number of slots for each TaskManager and the amount of resources that are available for TaskManagers. The number of slots is specified by the taskmanager.numberOfTaskSlots parameter. When you configure this parameter, take note of the following points:
    • For a single small job, we recommend that you set the CPU-to-memory ratio of a single slot to 1:4 and configure at least 1 CPU core and 2 GiB of memory for each slot.
    • For a complex job, we recommend that you configure at least 1 CPU core and 4 GiB of memory for each slot. If you use the default resource configuration, you can configure two slots for each TaskManager.
    • We recommend that you use the default resource configuration for each TaskManager and set the number of slots to 2.
      Note
      • If the resources configured for a TaskManager are insufficient, the stability of the jobs that run on the TaskManager is affected. In addition, the slots of the TaskManager cannot bear the overhead of the TaskManager because of insufficient slots. As a result, the resource utilization is reduced.
      • If you configure a large number of resources for a TaskManager, a large number of jobs run on the TaskManager. If the TaskManager is faulty, all the jobs are affected.
    Logging Root Log Level Valid values: TRACE, DEBUG, INFO, WARN, and ERROR.
    Log levels The name and level of the log.
    Logging Profile The log template. You can use the system template or configure a custom template.
    Note For more information about the options related to the integration between Flink and resource orchestration frameworks such as Kubernetes and Yarn, see Resource Orchestration Frameworks.
  6. Click Create Session Cluster.
    After a session cluster is created, click Start in the Actions column on the Session Clusters page. After the session cluster enters the Running state, you can select the session cluster from the Deployment Target drop-down list in the New Draft dialog box when you create a job. Deployment Target