Big Data Q&A - Friday Blog, Week 65 - Alibaba Cloud Community

By: Jeremy Pedersen

Sorry folks, have to keep it short this week!

Let's take a look at some of the questions I have been asked in recent Alibaba Cloud training sessions. That's right "Friday Q&A" is back! We'll focus on Big Data today.

Q: Can DataV display data pulled directly from MaxCompute?

Unfortunately no. DataV dashboards are designed to display data in real-time, while MaxCompute is designed for offline, batch processing. This means that MaxCompute queries take anywhere from tens of seconds to tens of minutes to run (depending on the size of the data processing job), while DataV expects data sources to respond in under a second.

However, you can connect MaxCompute to Hologres, which is a much faster OLAP database system designed to work with MaxCompute. DataV can read data directly from Hologres without any issues.

Q: Can QuickBI pull data directly from MaxCompute?

Yes. Unlike DataV, QuickBI is not designed for real-time data display: it's a traditional BI reporting tool. Further, it has a built-in cache which allows it to work well with batch processing tools like MaxCompute.

Q: Can MaxCompute run Spark jobs?

Yes. MaxCompute can directly run Spark code. See this documentation.

Q: Can MaxCompute run Hadoop Hive jobs?

MaxCompute's SQL dialect is mostly compatible with Hive, so sometimes, yes. If you're curious how different SQL dialects (Hive, MySQL, Oracle) map onto MaxCompute's SQL language, take a look here.

Q: We use Airflow for job scheduling. Can we migrate our Airflow jobs DataWorks?

Actually, yes! As it turns out, you can translate Airflow jobs into DataWorks workflows. See this post on the developer forums. Unfortunately the article is in Chinese, so you might have to use Google Translate!

Q: I'm happy using Airflow as a scheduler. Can I use it to schedule jobs on MaxCompute, like I do on my Hadoop cluster?

Yes. See this document.

Q: I have a "data lake" consisting of unstructured and semi-structured data in Alibaba Cloud OSS. Can I access it from MaxCompute directly?

Yes. MaxCompute allows you to treat files in OSS as an "external table". See here.

That's it for this week! See you next time.

Community

Big Data Q&A - Friday Blog, Week 65

Q: Can DataV display data pulled directly from MaxCompute?

Q: Can QuickBI pull data directly from MaxCompute?

Q: Can MaxCompute run Spark jobs?

Q: Can MaxCompute run Hadoop Hive jobs?

Q: We use Airflow for job scheduling. Can we migrate our Airflow jobs DataWorks?

Q: I'm happy using Airflow as a scheduler. Can I use it to schedule jobs on MaxCompute, like I do on my Hadoop cluster?

Q: I have a "data lake" consisting of unstructured and semi-structured data in Alibaba Cloud OSS. Can I access it from MaxCompute directly?

Read previous post:

Read next post:

JDP

You may also like

Comments

JDP

Related Products

DataWorks

MaxCompute