MaxCompute is a fast and fully managed data warehouse that can process terabytes, petabytes, or even exabytes of data. This topic describes the open source features of MaxCompute.

SDK

MaxCompute provides SDK for Java and SDK for Python to create, view, and delete MaxCompute tables. You can edit code in SDKs to manage MaxCompute.
  • SDK for Java

    For more information about how to use SDK for Java, see SDK for Java.

    Technical support: View the official documentation.

  • SDK for Python
    PyODPS is the SDK for Python of MaxCompute. PyODPS supports the DataFrame framework and basic operations on MaxCompute objects. You can use PyODPS to analyze data in MaxCompute. For more information, see aliyun-odps-python-sdk on GitHub and PyODPS documentation that describes all related interfaces and classes in detail.
    • You are welcome to make contributions to the development of the PyODPS ecosystem. Before you use PyODPS, you must install PyODPS. For more information, see Installation guide and limits.
    • For more information about how to use PyODPS in DataWorks, see Use PyODPS in DataWorks. PyODPS provides the DataFrame APIs. For more information, see Overview.
    • You are welcome to share your feedback and suggestions on aliyun-odps-python-sdk of GitHub to help accelerate the development of the PyODPS ecosystem.

    Technical support: View the official documentation.

MaxCompute RODPS

RODPS is a plug-in that MaxCompute provides for R. For more information, see ODPS Plugin for R on GitHub.

How to obtain service support: Leave a message or create an issue in ODPS Plugin for R on GitHub.

MaxCompute JDBC is an official JDBC driver provided by MaxCompute. It provides a set of interfaces to run SQL tasks for Java programs. The project is hosted in ODPS JDBC on GitHub.

How to obtain service support: Leave a message or create an issue in ODPS JDBC on GitHub.

Mars

Mars is a tensor-based unified distributed computing framework. Mars makes it possible to run large-scale scientific computing tasks by using only several lines of code, whereas MapReduce requires hundreds of lines of code. In addition, Mars improves computing performance.

The source code of Mars is now available on GitHub. You are welcome to contribute to Mars. You can visit Mars on GitHub to obtain its open source code.

For more information about Mars, see Mars Documentation.

How to obtain service support: Leave a message or create an issue in Mars on GitHub.

Data collector

MaxCompute provides a set of open source data collectors.

MaxCompute provides data collectors for the following services:
  • Flume
  • Oracle GoldenGate (OGG)
  • Sqoop
  • Kettle
  • Hive Data Transfer UDTF

    The Flume and OGG data collectors are implemented based on the DataHub SDK, whereas the data collectors for Sqoop, Kettle, and Hive Data Transfer UDTF are implemented based on the Tunnel SDK. DataHub is a real-time data transfer channel. Tunnel is a batch data transfer channel. The Flume and OGG data collectors are used to transfer data in real time. The data collectors for Sqoop, Kettle, and Hive Data Transfer UDTF are used to transfer data in batches in offline mode.

For more information about the source code, see Aliyun MaxCompute Data Collectors on GitHub. For more information about data collectors, see wiki.

How to obtain service support: Leave a message or create an issue in Aliyun MaxCompute Data Collectors on GitHub.