PyODPS provides a pandas-like API, PyODPS DataFrame, which can make full use of the computing power of MaxCompute. You can also change the data source from MaxCompute tables to pandas DataFrame, so that the same code can be executed on pandas.

  • Quick start: describes how to create and manage a DataFrame object and how to use DataFrame to process data.
  • Create a DataFrame object: describes how to create a DataFrame project to reference a data source.
  • Sequence: introduces sequence objects in DataFrame. SequenceExpr represents a column in a two-dimensional dataset. You are not allowed to manually create SequenceExpr objects. You can only retrieve one from a collection object.
  • Collection: introduces collection objects in DataFrame. CollectionExpr supports various operations on two-dimensional datasets, such as column operations, data filtering, and data transformation.
  • Execution: introduces the execution methods that you can call to perform operations in DataFrame.
  • MapReduce API: describes how to use the MapReduce API in DataFrame.
  • Column operations: describes the column operations supported by DataFrame.
  • Aggregation: describes the aggregation operations supported by DataFrame. It also describes how to implement group aggregation and write aggregate functions.
  • Sort, deduplicate, sample, and transform data: describes how to perform sorting, deduplication, sampling, and data transformation on DataFrame objects.
  • Data merging: describes the data merge operations supported by DataFrame. These operations include the JOIN and UNION operations.
  • Window functions: describes the window functions supported by DataFrame.
  • Plotting: describes the plotting methods provided by DataFrame.
  • Debugging: describes how to perform DataFrame debugging. DataFrame can optimize and display the entire execution. You can visualize the execution.