This topic describes how to run a Flink DataStream job to read data from DataHub.
Prerequisites
- Java Development Kit (JDK) 8 is installed on your on-premises machine.
- Maven 3.x is installed on your on-premises machine.
- An integrated development environment (IDE) for Java or Scala is installed on your on-premises machine. We recommend that you use IntelliJ IDEA. The JDK and Maven are configured.
- A topic is created in DataHub, and test data exists in the topic.
Note The test data must contain four fields, whose data types are STRING, STRING, DOUBLE, and BIGINT in sequence.
- datahub-demo-master is downloaded.
Background information
Notice Only Blink 3.X supports this demo.
Develop a job
Publish a job
For more information about how to publish a job, see Publish a job.
The following example shows the job content:
Notice Before you publish a job, set the Parallelism parameter for the source table on the
Configurations tab of the Development page. The parallelism setting of the source table cannot be greater than the number
of shards in the source table. Otherwise, a JobManager error occurs after the job
starts.
-- The complete main class name, such as com.alibaba.realtimecompute.DatastreamExample. This field is required.
blink.main.class=com.alibaba.blink.datastreaming.DatahubDemo
-- The name of the job.
blink.job.name=datahub_demo
-- The resource name of the JAR package that contains the complete main class name, such as blink_datastream.jar.
blink.main.jar=${Resource name of the JAR package that contains the complete main class name}
-- The default state backend configuration. This field takes effect when the job code is not explicitly configured.
state.backend.type=niagara
state.backend.niagara.ttl.ms=129600000
-- The default checkpoint configuration. The configuration takes effect when the job code is not explicitly configured.
blink.checkpoint.interval.ms=180000
Note
- Modify blink.main.class and blink.job.name as required.
- You can configure custom parameters. For more information, see Set custom parameters.
Verify the test results
On the Container Log tab of the Job Administration page, view information in the taskmanager.out file of the sink node. In this example, the type of the sink node is print.
If the information shown in the following figure appears, Realtime Compute for Apache
Flink has read data from DataHub. 

FAQ
If an error similar to the following error appears when a job is running, a JAR package conflict occurs. What do I do?java.lang.AbstractMethodError: com.alibaba.fastjson.support.jaxrs.FastJsonAutoDiscoverable.configure(Lcom/alibaba/blink/shaded/datahub/javax/ws/rs/core/FeatureContext;)

We recommend that you use the relocation feature of maven-shade-plugin to resolve the JAR package conflict.
<relocations combine.self="override">
<relocation>
<pattern>org.glassfish.jersey</pattern>
<shadedPattern>com.alibaba.blink.shaded.datahub.org.glassfish.jersey</shadedPattern>
</relocation>
</relocations>