After you create an E-MapReduce (EMR) cluster, you can create a project in Data Platform. Data Platform is a workflow platform where you can develop, schedule, and monitor jobs and workflows. You can define a set of jobs that have dependencies by using a directed acyclic graph (DAG) and run the jobs in sequence based on the dependencies. You can manage jobs, schedule tasks, and monitor the status of jobs in the EMR console to manage and maintain workflows.

Notice If your high-security EMR cluster is connected to an external MIT key distribution center (KDC), you cannot use the features of Data Platform.
Data Platform provides the following features:
  • Project management: You can associate cluster resources with projects and add project members. For more information, see Manage projects.
  • Development and editing of big data jobs: You can develop various types of jobs, such as Hive, Hive SQL, MapReduce, Spark, and Shell. For more information, see Edit jobs.
  • Workflow development and scheduling: You can perform drag-and-drop operations to build a workflow. You can also configure time-based scheduling policies and dependencies among workflows. For more information, see Edit a workflow.
  • Ad hoc query: Four types of ad hoc query jobs are supported: Hive SQL, Spark SQL, Spark, and Shell. For more information, see Implement ad hoc queries.
  • Information viewing: You can view the running records and logs of tasks and workflows, and run failed jobs and workflows again. You can also view the operation history of project members in a project. For more information, see Scheduling center.