This topic provides the best practices for using Requirements Management in the data team of an enterprise.
- Jack (username: Jack_PD): the data product manager who plans and implements data products. Jack has a deep understanding of the business logic of the enterprise.
- Alice and Rose (usernames: Alice_DEV and Rose_DEV): the data developers who jointly design models, develop code, and test code.
- Mike (username: Mike_DEV_TL): the data development director who is responsible for the stability and reliability of the production environment.
This enterprise has improved efficiency by using DataWorks. With rapid business growth and various changes in business demands, the data team needs an effective tool to plan daily work.
- As a sales person, Sales_01 does not need to be added to any DataWorks workspaces. Sales_01 can directly go to the Requirements Management page and create requirements.
- Jack is the data product manager who supervises the entire process from requirement creation to implementation. Jack does not design or implement specific code. Therefore, Jack can be added to a DataWorks workspace or not.
- Mike is assigned the workspace administrator role to configure resources, review code, and deploy code in a DataWorks workspace. Mike is also responsible for the O&M, deployment, and security management of the workspace.
- Alice and Rose are assigned the developer role. They take responsibility only for developing nodes and creating deploy tasks.
- Improved project efficiency and quality
The responsibility division of staff and the input and output of each phase are clarified to guarantee the integrity of key information and reduce invalid or repeated communication. This ensures steady project progress and improves overall efficiency.
- Optimized daily work plan
Each business requirement is strictly reviewed and scheduled, and development tasks are properly assigned with clear deadlines.
- Well-established internal communication system
An internal communication system is established within the enterprise based on the standard data development specifications. This avoids internal conflicts caused by poor communication.
Requirement development process
- Review: Evaluate the feasibility of a requirement and the technology and data required to implement the requirement.
- Design: Design data models, code, and dependencies based on the data form, including the data quality and distribution.
- Develop: Develop code in a standard and efficient way based on the output of the Design phase.
- Test: Accurately locate bugs and risks in the code to improve the quality of output data.
- Publish and acceptance check: Deploy the code that meets the publishing conditions online and perform an acceptance check on the code to guarantee stable running.
- Submit a requirement.
Sales_01 from the sales department has a data requirement. Sales_01 logs on to DataWorks with a Resource Access Management (RAM) user account and goes to the Requirements Management page to create a requirement.
- Sales_01 logs on to the DataWorks console, finds the target workspace, and then clicks Data Analytics.
- On the DataStudio page that appears, Sales_01 clicks the icon in the upper-left corner and chooses .
- Sales_01 clicks Create Request.
- Sales_01 enters the requirement name and content. In the Basic Information section on the right, Sales_01 sets Assign To to Jack_PD to assign the requirement to Jack.
- Sales_01 clicks Save.
- Review the requirement.
- Jack organizes relevant persons to evaluate the necessity, feasibility, risks, and implementation details of the requirement based on the development specifications. Jack sets the status of the requirement to Reviewing.
- If the requirement passes the review, Jack goes to the Requirements Management page and sets the status of the requirement to To Be Designed.
If the requirement fails to pass the review, Jack goes to the Requirements Management page and sets the status of the requirement to Rejected.
- Jack sets the owner in each phase based on the responsibilities of the staff.
- If the team has sufficient staff, we recommend that you do not assign a tester to serve as a developer or designer at the same time.
- The code must be reviewed in the Publish phase to make sure that the code is stable. Therefore, an experienced person, except the developer and designer, must be specified as the owner of the Publish phase. Sufficient smoke tests must be performed before the code is published.
- The person who submits the requirement must be specified as the owner of the Acceptance Check phase.
- If the expected publishing date specified by Sales_01 is unrealistic, Jack changes the publishing date to that agreed by both parties.
- Jack uploads the reviewed requirement document as a reference for owners of the subsequent phases.
- According to the requirement document, Alice explores, analyzes, and designs data required to implement the requirement based on the development specifications for the Design phase. Meanwhile, Alice changes the status of the requirement to Designing to advance the progress.
- After completing the design, Alice uploads the data exploration report, extract-transform-load (ETL) document, and scheduling design document, and changes the status of the requirement to To Be Developed.
- Develop the code.
- According to the documents generated in the Design phase, Rose develops nodes in DataWorks based on the code development specifications. Meanwhile, Rose changes the status of the requirement to Developing to advance the progress.
- Rose clicks Associated Nodes.
- In the Select Associated Nodes dialog box, Rose selects required nodes from DataStudio or experiments from Machine
Learning Platform for AI (PAI), and then clicks OK.
On the Requirements Management page, Rose verifies that all the required nodes are associated with the requirement. Requirements Management automatically calculates the overall deployment progress based on the status of the associated nodes and displays the progress as a percentage. The percentage reaches 100% when all nodes are deployed.
- Test the code.
Rose tests the nodes, and then prepares and uploads the unit test report, publish operation document, and code review report. Meanwhile, Rose changes the status of the requirement to To Be Tested to advance the progress.
- Alice uses test cases to perform delivery and data tests on the nodes generated in
the Develop phase based on the test specifications. Meanwhile, Alice changes the status
of the requirement to Testing to advance the progress.
DataWorks workspaces in standard mode isolate the development environment from the production environment. You can develop code and perform smoke testing in the development environment before publishing the code to the production environment.
- On the DataStudio page, Alice clicks the icon in the upper-left corner and chooses .
- Alice double-clicks the target workflow. On the editing page of the workflow, Alice clicks the Submit icon to submit the developed nodes to Operation Center of the development environment.
- Alice clicks Run Smoke Test in Development Environment for each node to simulate code running in the production environment. Alice can also click View Log to check whether the start time and result of each node are as expected.
- After the nodes are tested, Alice prepares and uploads the delivery test report, quality assessment report, and acceptance check report. Meanwhile, Alice changes the status of the requirement to To Be Published to advance the progress.
- Based on the development specifications for the Publish phase, Alice submits a code
publishing application with the documents generated in the Test phase. Mike verifies
that the code is standard-compliant and appropriate, and publishes the code.
The procedure for publishing code in DataWorks is as follows:
- Alice submits a publishing application.
Alice clicks Deploy in the upper-right corner to create a deploy task for the nodes that have been run successfully in the Test phase. After the deploy task is created, it needs to be reviewed by the workspace administrator, that is, Mike.
- Mike reviews and deploys the nodes.
Mike goes to the View Deploy Tasks page and reviews the code to be published. If the code is correct, Mike clicks Deploy to deploy the nodes to the production environment for scheduling.
Mike can click the icon in the upper-left corner of the DataStudio page and choose. In Operation Center, Mike can choose to view all the nodes that have been deployed to the production environment for scheduling.
Mike can also click Cycle Instance to view the instances generated by scheduled nodes every day and the operational logs of each instance.
- Mike configures rules to monitor the data quality.
After the nodes are deployed, Mike can configure rules to monitor the data quality for the deployed nodes. This guarantees the reliability of output data.
- Alice submits a publishing application.
- After the code is published, Alice changes the status of the requirement to To Be Checked.
Alice can click Upload to upload any description document generated in the Publish phase for archiving.
Sales_01 and Jack check whether the data tables or APIs that are developed meet the expectations based on the initial business requirement. If they meet the expectations, Sales_01 changes the status of the requirement to Acceptance Checked.
Sales_01 can click Upload to upload any description document generated in the Acceptance Check phase for archiving.
At this point, a data requirement is implemented based on a standardized process. As the requirement manager of the data team, Jack can use the advanced search and view features of Requirements Management to supervise the work of each member in the team. Jack can also set the priorities of requirements to manage the work of team members.