njagwani
Intern
Intern
  • UID9577
  • Fans6
  • Follows1
  • Posts5
Reads:3605Replies:3

[Share]How to pass ACA Big Data Exam, Checkout Study Guide!

Created#
More Posted time:Aug 4, 2020 21:02 PM
Hi,

Below is the ACA Big Data Study Guide!

If you are new to ACA Big Data then guide should help you pass the exam. But first,  I highly recommend you to buy the course material
for ACA Big Data exam available at a discounted price $0.99 > https://edu.alibabacloud.com/certification/clouder_acpbigdata?spm=a3c0i.11600350.7786225450.6.1cdc272cf3betF.

Make sure you go through all the videos before appearing for the test. Specially the ones that give a practical demo (Hands-on experience to gain a better understanding)

The points highlighted in red below are wrong answers so pay close attention to the wrong answers. It is very important for you
to know everything on this study guide in order for you to pass the exam.
 
This exam has a very low passing rate, however this guide should help you pass the exam. Few sample questions have been
added to this  guide. All the best in your preparation!

Summary:
MaxCompute:
use case for MaxCompute


Date Warehouse, Social networking analysis, User profile, transaction processing, Fast realtime response, ad-hoc queries by
end user, high concurrent user requests, Order management


MaxCompute Command & SQL:
show tables -> if you want to view all tables in a project, you can execute;
desc table_a -> show table schema and the size of space taken by the table;
datediff -> calculate the difference between 2-time stamps


SQL is based on CBO/HBO (Cost/History Based Optimizer), not R(rule)BO


MaxCompute Pricing (Pay-as-you-go):
MaxCompute: Data download / Computing / Storage. (no data upload)


MaxCompute Client:

odpscmd:
Tunnel: administration / MaxCompute provides two data import and export
methods: using Tunnel Operation on the console
directly or using TUNNEL written with java x (command)


MaxCompute cannot provide a SQL query function for the external interface


IntelliJ Idea:
Development. (lower thresholds)
DataWorks: configure workflow and scheduling (Recommanded)
Purge: Clears the table directory. By default, use this command to clear information of the last three days.


MaxCompute Security steps: (set ProjectProtection=true;)


AccessKey pair: Access Key ID / Access Key Secret


ACL:
1. use prj1;
2. add user aliyun$alice@aliyun.com;
3. grant List, CreateTable, CreateInstance on project prj1
to user
aliyun$alice@aliyun;
4. flush privileges


ACL Objects:
Project, Resource, Procedure


MaxCompute Resource: Files, Tables, Jar package, Archive


DataWorks:
- The data development mode in DataWorks has been upgraded to the
three-level structure comprising of ___project__,__solution___, __business flow__.


- DataWorks can be used to create all types of tasks and configure scheduling cycles as needed. The supported
granularity levels of scheduling cycles include days, weeks, months, hours, 5 minutes


- In DataWorks workflow, Inner nodes of a flow task can NOT be depended on by other flow or node tasks.


- All Task edits can be performed in the Development Environment, and the Production Environment Code can also be directly modified.


- Phase of the scheduling process: Not running, Running, Run successfully


- Work node Type: Data
Synchronization, SHELL, MaxCompute ODPS SQL,

MaxCompute MR, no Scala
- The SHELL node supports standard SHELL syntax and the interactive syntax. The SHELL task can run on the default resource group.
- Connect with PAI: after PAI experiments are created on PAI


- How DataWorks uses Python: use PyODPS type Node;


- DataService: To meet the personalized query requirements of advanced users, DataService Studio provides the custom Python script
mode to allow you compile
the API query by yourself. It also supports multi-table association, complex query conditions
, and aggregate functions.


- FunctionStudio: languages Support: Python, Java, Real-time computing, no Scala


-O&M:
Alert policies: Email, Text message, webhook (DingTalk), not telephone and other IM


As an administrator of a project in MaxCompute. The project involves a large volume of sensitive data such as user IDs
and shopping records, to be specific, project users can only access the data within the project, all data flows only
within the project. What operation should you do?
Enable the data protection mechanism in the project (set ProjectProtection=true)


DataV:
- Data Portal: When a DataV screen is ready, it can embed works to the existing portal of the enterprise through URL after the release
- DataV data source: Alibaba Cloud' s AnalyticDB, ApsaraDB Static data in CSV and JSON formats Oracle Database / DataV
can NOT make full use of MaxCompute for data process
- DataV Visual screens types: Presentation type, Analysis type, Monitoring type


QuickBI: BI tools, need MaxCompute: Lightning
Data Source: MaxCompute / Local Excel files / MySQL RDS
Data Storage: Data Will be stored in Exploration space built in Quick BI
Security:
different users can view different data in a same report in Quick BI by
1. Set a row-level permission
2. Only Quick BI Pro provides the row-level permission function


E-MapReduce:
Log: Flume
Structure data: Sqoop
DTS: migrate their data with virtually no downtime
Streaming: Flume + Kafka + Spark Streaming (storm, f(B)link) + HDFS(Redis, HBase…)
It supports the Pay-As-You-Go payment method, which means that the cost of each task is
measured according to ECS. (not data input size)


Pay attention the traps (wrong information):
- DataWorks provides a convenient way to analyze and process big data for the user.
The user is able to analyze big data without concerning details of distributed computing.
- Deployment personnel or Operation & Management (O&M) personnel can generate
release packages based on the latest development results
- MaxCompute SQL is 100% equivalent to Hive SQL(It is 99% equivalent and not 100%)
- MaxCompute SQL can complete the query in minutes even seconds, and it can be
able to return the result in millisecond
- Tunnel command Parameters Purge:
Clears the table directory. By default, use this command to clear information of the last three days. (Clear the session directory not
table.
https://www.alibabacloud.com/help/doc-detail/27833.htm)

- MaxCompute can identify the RAM account system, it can also identify the RAM permission system(x).
- MaxCompute partition only supports string type and the conversion of any other types is not allowed(x) (MaxCompute 2.0 extends the support for partitioning types, currently, MaxCompute supports tinyint, smallint,
Int, bigint, varchar, and string partition types.)
- The table name and column name are both case sensitivex
- MaxCompute partition only supports string type and the conversion of any other types is not allowed

Single Selection:
1. Which of the following is not proper for granting the permission on a L4 MaxCompute
table to a user? (L4 is a level in MaxCompute Label-based security (LabelSecurity), it is a required MaxCompute Access Control (MAC)
policy at the project space level. It allows project administrators to control the user access to column-level sensitive data with
improved flexibility.)

A. If no permissions have been granted to the user and the user does not belong to the project, add the user to the project.
The user does not have any permissions before they are added to the project.
B. Grant a specific operation permission to the user.
C. If the user manages resources that have labels, such as datasheets and packages with datasheets, grant label permissions
to the user.
D. The user needs to create a project in simple mode

Correct Answer: D


2. DataWorks provides scheduling capabilities including
time-based or dependency-based
task trigger functions to perform tens of millions of
tasks accurately and timely each day,
based on DAG relationships. Which of the following
descriptions about scheduling and
dependency in DataWorks is INCORRECT?


A. Users can configure an upstream dependency for a task. In this way, even if the current task instance reaches the
scheduled time, the task only run after the instance upstream task is completed.
B. If no upstream tasks is configured then, by default the current task is triggered by the project. As a result, the
default upstream task of the current task is project_start in the scheduling system. By default, a project_start task is
created as a root task for each
project.
C. If the task is submitted after 23: 30, the scheduling system automatically cycle-generate instances from the second day
and run on time.
D. The system automatically generates an instance for the task at each time point according to the scheduling attribute
configuration and periodically runs the task from the second day only after a task is submitted to the scheduling system.

Correct Answer: A

3. A dataset includes the following items (time, region, sales amount). If you want to present the information above in a chart, ______ is
applicable.

A. Bubble Chart
B. Tree Chart
C. Pie Chart
D. Radar Chart

Correct Answer: A

True or False:
1. If a task node of DataWorks is deleted from the recycle bin, it can still be restored.
A. True
B. False
 
Correct Answer: B
 
2. In each release of E-MapReduce, the software and software version are flexible. You can select multiple software versions.
A. True
B. False
 
Correct Answer: B
 
3. Label Security is a workspace-level mandatory access control (MAC) policy that enables workspace administrators to control
user access to row-level sensitive data more flexibly.
A. True
B. False
 
Correct Answer: B
 
4. Different versions of Spark can run in MaxCompute at the same time.
A. True
B. False
 
Correct Answer: B
 
5. A start-up company wants to use Alibaba Cloud MaxCompute to provide product recommendation services for its users
. However, the company does not have much users at the initial stage, while the charge for MaxCompute is higher than
that of ApsaraDB RDS, so the company should be recommended to use MaxCompute service until the number of its users increases
to a certain size.
A. True
B. False
 
Correct Answer: False
 
Multiple Solution:
The cost of an EMapReduce product consists of the cost of _____, the cost of the _____ , and the cost of the _____.
(Number of correct answers: 3)
A. ECS
B. E-MapReduce
C. Download
D.
HDFS storage (local
Storage)


Correct Answer: A, B and C

Latest likes:

fauzannfauzan... DashDash testxtestx GAVASKARGAVASK...

GAVASKAR
Intern
Intern
  • UID9274
  • Fans0
  • Follows2
  • Posts12
1st Reply#
Posted time:Aug 5, 2020 13:57 PM
excellent guide to complete Bigdata certification

fauzann
Intern
Intern
  • UID11346
  • Fans0
  • Follows1
  • Posts1
2nd Reply#
Posted time:Jun 16, 2021 2:05 AM
Thanks, really help us all

alisun
Assistant Engineer
Assistant Engineer
  • UID11450
  • Fans0
  • Follows0
  • Posts54
3rd Reply#
Posted time:Jun 16, 2021 17:24 PM
thanks for sharing
Guest