This topic provides examples on how to perform operations on MaxCompute resources in typical scenarios by using the SDK for Python.
Background information
In most cases, MaxCompute resources are used in user-defined functions (UDFs) and MapReduce. You can use the following methods to perform basic operations on the resources:
list_resources: queries all resources in a project.exist_resource: checks whether a resource exists.delete_resource: deletes a resource. You can also use the Resource object to call thedropmethod to delete a resource.create_resource: creates a resource.open_resource: reads a resource.
PyODPS supports two types of resources: file resources and table resources.
- File resourcesFile resources include files of the
FILE,PY,JAR, andARCHIVEtypes.For more information about how to perform operations on file resources by using the SDK for Python, see Manage file resources in this topic.Note When you upload a PY file in DataWorks, you must set the resource type to file. For more information, see Python UDF documentation. - Table resources: For more information about how to perform operations on table resources by using the SDK for Python, see Manage table resources in this topic.
Manage file resources
Common operations on file resources:
Create a file resource
You can specify a resource name, file type, and file-like object or string to call the
create_resource method to create a file resource. # Use a file-like object to create a file resource. Files, such as compressed packages, must be read in binary mode.
resource = o.create_resource('test_file_resource', 'file', file_obj=open('/to/path/file', 'rb'))
# Use a string to create a file resource.
resource = o.create_resource('test_py_resource', 'py', file_obj='import this')Read and modify a file resource
You can use one of the following methods to open a resource:
- Call the
openmethod for a file resource to open it. - Call the
open_resourcemethod at the MaxCompute entry point to open a file resource.
open method predefined in Python. The following example demonstrates the opening modes of file resources: with resource.open('r') as fp: # Open a resource in read mode.
content = fp.read() # Read all content.
fp.seek(0) # Return to the beginning of the resource.
lines = fp.readlines() # Read multiple lines.
fp.write('Hello World') # An error is returned. Data cannot be written in read mode.
with o.open_resource('test_file_resource', mode='r+') as fp: # Open the file in read/write mode.
fp.read()
fp.tell() # Locate the current position.
fp.seek(10)
fp.truncate() # Truncate the file to the specified length.
fp.writelines(['Hello\n', 'World\n']) # Write multiple lines into the file.
fp.write('Hello World')
fp.flush() # Manually call the method to submit the update to MaxCompute. PyODPS supports the following opening modes: r: read mode. The file can be opened, but data cannot be written to it.w: write mode. Data can be written to the file, but data in the file cannot be read. If a file is opened in write mode, the file content is cleared first.a: append mode. Data can be added to the end of the file.r+: read/write mode. You can read data from and write data to the file.w+: This mode is similar to ther+mode. The only difference is that the file content is cleared first.a+: This mode is similar to ther+mode. The only difference is that data can be written only to the end of the file.
rb: binary read mode.r+b: binary read/write mode.
Manage table resources
Create a table resource
o.create_resource('test_table_resource', 'table', table_name='my_table', partition='pt=test')Update a table resource
table_resource = o.get_resource('test_table_resource')
table_resource.update(partition='pt=test2', project_name='my_project2')Obtain information about a table and a partition
table_resource = o.get_resource('test_table_resource')
table = table_resource.table
print(table.name)
partition = table_resource.partition
print(partition.spec)Read and write data
table_resource = o.get_resource('test_table_resource')
with table_resource.open_writer() as writer:
writer.write([0, 'aaaa'])
writer.write([1, 'bbbbb'])
with table_resource.open_reader() as reader:
for rec in reader:
print(rec)