MaxCompute is a fast and fully managed data warehouse that can process terabytes, petabytes, or even exabytes of data. This topic describes the open source features of MaxCompute.
SDK
- Java SDK
For more information about how to use SDK for Java, see SDK for Java.
How to obtain service support: See the official documentation or submit a ticket.
- Python SDK
PyODPS is the SDK for Python of MaxCompute. PyODPS provides the DataFrame framework and allows you to perform basic operations on MaxCompute objects. This helps you analyze data in MaxCompute. For more information, see aliyun-odps-python-sdk on GitHub and PyODPS documentation, which describes all related interfaces and classes in detail.
- You are welcome to help build the PyODPS ecosystem. Install PyODPS before you use it. For more information, see Installation guide and limits.
- For more information about how to use PyODPS in DataWorks, see Use PyODPS in DataWorks. PyODPS provides the DataFrame API. For more information, see PyODPS DataFrame overview.
- Feel free to share your feedback and suggestions to help PyODPS evolve. For more information, see aliyun-odps-python-sdk on GitHub.
How to obtain service support: See the official documentation or submit a ticket.
MaxCompute RODPS
RODPS is a plug-in that MaxCompute provides for R. For more information, see ODPS Plugin for R on GitHub.
How to obtain service support: Leave a message or create an issue in ODPS Plugin for R on GitHub.
MaxCompute JDBC is an official JDBC driver provided by MaxCompute. It provides a set of interfaces to run SQL tasks for Java programs. The project is hosted in ODPS JDBC on GitHub.
How to obtain service support: Leave a message or create an issue in ODPS JDBC on GitHub.
Mars
Mars is a tensor-based unified distributed computing framework. Mars makes it possible to run large-scale scientific computing tasks by using only several lines of code, whereas MapReduce requires hundreds of lines of code. In addition, Mars improves computing performance.
The source code of Mars is now available on GitHub. You are welcome to contribute to Mars. You can visit Mars on GitHub to obtain its open source code.
For more information about Mars, see Mars Documentation.
How to obtain service support: Leave a message or create an issue in Mars on GitHub.
Data collector
MaxCompute provides a set of open source data collectors.
- Flume
- Oracle GoldenGate (OGG)
- Sqoop
- Kettle
- Hive Data Transfer UDTF
The Flume and OGG data collectors are implemented based on the DataHub SDK, whereas the data collectors for Sqoop, Kettle, and Hive Data Transfer UDTF are implemented based on the Tunnel SDK. DataHub is a real-time data transfer channel. Tunnel is a batch data transfer channel. The Flume and OGG data collectors are used to transfer data in real time. The data collectors for Sqoop, Kettle, and Hive Data Transfer UDTF are used to transfer data in batches in offline mode.
How to obtain service support: Leave a message or create an issue in Aliyun MaxCompute Data Collectors on GitHub.