MaxCompute is a fast and fully managed data warehouse that can process terabytes, petabytes, or even exabytes of data. This topic describes the open source features of MaxCompute.
SDK
- SDK for Java
For more information about how to use SDK for Java, see SDK for Java.
Technical support: View the official documentation.
- SDK for Python
PyODPS is the SDK for Python of MaxCompute. PyODPS supports the DataFrame framework and basic operations on MaxCompute objects. You can use PyODPS to analyze data in MaxCompute. For more information, see aliyun-odps-python-sdk on GitHub and PyODPS documentation that describes all related interfaces and classes in detail.
- You are welcome to make contributions to the development of the PyODPS ecosystem. Before you use PyODPS, you must install PyODPS. For more information, see Installation guide and limits.
- For more information about how to use PyODPS in DataWorks, see Use PyODPS in DataWorks. PyODPS provides the DataFrame APIs. For more information, see Overview.
- You are welcome to share your feedback and suggestions on aliyun-odps-python-sdk of GitHub to help accelerate the development of the PyODPS ecosystem.
Technical support: View the official documentation.
MaxCompute RODPS
RODPS is a plug-in that MaxCompute provides for R. For more information, see ODPS Plugin for R on GitHub.
How to obtain service support: Leave a message or create an issue in ODPS Plugin for R on GitHub.
MaxCompute JDBC is an official JDBC driver provided by MaxCompute. It provides a set of interfaces to run SQL tasks for Java programs. The project is hosted in ODPS JDBC on GitHub.
How to obtain service support: Leave a message or create an issue in ODPS JDBC on GitHub.
Mars
Mars is a tensor-based unified distributed computing framework. Mars makes it possible to run large-scale scientific computing tasks by using only several lines of code, whereas MapReduce requires hundreds of lines of code. In addition, Mars improves computing performance.
The source code of Mars is now available on GitHub. You are welcome to contribute to Mars. You can visit Mars on GitHub to obtain its open source code.
For more information about Mars, see Mars Documentation.
How to obtain service support: Leave a message or create an issue in Mars on GitHub.
Data collector
MaxCompute provides a set of open source data collectors.
- Flume
- Oracle GoldenGate (OGG)
- Sqoop
- Kettle
- Hive Data Transfer UDTF
The Flume and OGG data collectors are implemented based on the DataHub SDK, whereas the data collectors for Sqoop, Kettle, and Hive Data Transfer UDTF are implemented based on the Tunnel SDK. DataHub is a real-time data transfer channel. Tunnel is a batch data transfer channel. The Flume and OGG data collectors are used to transfer data in real time. The data collectors for Sqoop, Kettle, and Hive Data Transfer UDTF are used to transfer data in batches in offline mode.
How to obtain service support: Leave a message or create an issue in Aliyun MaxCompute Data Collectors on GitHub.