This topic describes how to perform operations on Mars clusters, read and write MaxCompute tables, and obtain the URLs of Mars UI, Logview, and Jupyter Notebook.
For more information about how to develop Mars jobs, see Mars.
Mars cluster operations
- Create a Mars cluster
Run the following commands to create a Mars cluster. This process takes a while to complete.
from odps import options options.verbose = True # If the preceding commands have been configured on the DataWorks PyODPS 3 node, you do not need to run the two commands. client = o.create_mars_cluster(5, 4, 16, min_worker_num=3)
where:- 5: the number of worker nodes in the cluster. In this example, the cluster consists of five worker nodes.
- 4: the number of CPU cores for each worker node. In this example, each worker node has four CPU cores.
- 16: the memory size of each worker node. In this example, each worker node has 16
GB of memory.
Note
- The memory size that you request for each worker node must be greater than 1 GB. The optimal ratio of CPU cores to the memory size is 1:4. For example, configure a worker node with 4 CPU cores and 16 GB of memory.
- You can create a maximum of 30 worker nodes. If the number of worker nodes exceeds the upper limit, the image server may be overloaded. If you want to create more than 30 worker nodes, submit a ticket.
- min_worker_num: the minimum number of worker nodes that must be started for the system to return a client object. If this parameter is set to 3, the system returns a client object after the three worker nodes are started.
If you set
options.verbose
to True when you create a Mars cluster, the URLs of Logview, Mars UI, and Jupyter Notebook of the MaxCompute instance are displayed in the command output. You can use the Mars UI to connect to Mars clusters and query the status of clusters and jobs. - Submit a job
When you create a Mars cluster, the cluster creates a default session that connects to the cluster. You can call the
.execute()
method to submit a job to the cluster and run the job in the default session.import mars.dataframe as md import mars.tensor as mt md.DataFrame(mt.random.rand(10, 3)).execute() # Call the .execute() method to submit the job to the created cluster.
- Stop and release a cluster
A Mars cluster is automatically released three days after it is created. If you no longer require a Mars cluster, you can call the
client.stop_server()
method to release the cluster.client.stop_server()
Read and write operations on MaxCompute tables
Mars can directly read and write MaxCompute tables.
- Read MaxCompute tables
Mars calls the
o.to_mars_dataframe
method to read a MaxCompute table and returns a Mars DataFrame.In [1]: df = o.to_mars_dataframe('test_mars') In [2]: df.head(6).execute() Out[2]: col1 col2 0 0 0 1 0 1 2 0 2 3 1 0 4 1 1 5 1 2
- Write MaxCompute tables
Mars calls the
o.persist_mars_dataframe(df, 'table_name')
method to save a Mars DataFrame as a MaxCompute table.In [3]: df = o.to_mars_dataframe('test_mars') In [4]: df2 = df + 1 In [5]: o.persist_mars_dataframe(df2, 'test_mars_persist') # Save the Mars DataFrame as a MaxCompute table. In [6]: o.get_table('test_mars_persist').to_df().head(6) # Call the PyODPS DataFrame API operation to query data. col1 col2 0 1 1 1 1 2 2 1 3 3 2 1 4 2 2 5 2 3
- Use the Jupyter Notebook of a Mars cluster
Note The Jupyter Notebook can be used only if
with_notebook=True
is specified increate_mars_cluster
.When you create a Jupyter Notebook document, a session is automatically created to submit jobs to the Mars cluster. Therefore, session creation does not need to be shown in the Jupyter Notebook document.import mars.dataframe as md md.DataFrame(mt.random.rand(10, 3)).sum().execute() # Call the .execute() method in the Jupyter Notebook to submit the job to the current cluster. Therefore, session creation does not need to be shown in the Jupyter Notebook document.
Note- The Jupyter Notebook document is not automatically saved. We recommend that you manually save the Jupyter Notebook document as required.
- You can connect your Jupyter Notebook to an existing Mars cluster. For more information, see Use an existing Mars cluster.
Other operations
- Use an existing Mars cluster
- Recreate an existing Mars cluster based on the instance ID.
client = o.create_mars_cluster(instance_id=**instance-id**)
- To use an existing Mars cluster, create a Mars session to visit the URL of the Mars
UI.
from mars.session import new_session new_session('**URL of the Mars UI**').as_default() # Set the created session as the default session.
- Recreate an existing Mars cluster based on the instance ID.
- Obtain the URL of the Mars UI
If you set
options.verbose
to True when you create a Mars cluster, the URL of the Mars UI is automatically displayed in the command output. You can useclient.endpoint
to obtain the URL of the Mars UI.print(client.endpoint)
- Obtain the Logview URL of an instance
If you set
options.verbose
to True when you create a Mars cluster, the Logview URL is automatically displayed in the command output. You can also useclient.get_logview_address()
to obtain the Logview URL.print(client.get_logview_address())
- Obtain the Jupyter Notebook URL
If you set
options.verbose
to True when you create a Mars cluster, the Jupyter Notebook URL is automatically displayed in the command output. You can also useclient.get_notebook_endpoint()
to obtain the Jupyter Notebook URL.print(client.get_notebook_endpoint())