edit-icon download-icon

OSS usage instructions

Last Updated: Apr 19, 2017

OSS URI

When using E-MapReduce, you can use two types of OSS URIs:

  • native URI: oss://[accessKeyId:accessKeySecret@]bucket[.endpoint]/object/path

    This URI is used for specifying input/output data sources in the job, and is similar to hdfs://. In OSS data operations, you can configure accessKeyId, accessKeySecret and endpoint in Configuration, or you can specify accessKeyId, accessKeySecret and endpoint in URI.

  • ref URI: ossref://bucket/object/path

    It is only valid in E-MapReduce job configuration and is used to specify the resources needed for running the job. For example:

  1. --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 1G --num-executors 2 ossref://my-bucket/spark-examples-0.1-SNAPSHOT.jar 1000

We call “oss” and “ossref” prefixes as “scheme”. During usage, you need to pay special attention to the difference of scheme in URI.

Clear files in OSS Fragment Management

When supporting data writing to OSS, E-MapReduce adopts multipart uploading method of OSS. It is worth reminding that when the job suffers abnormal interruptions, some generated data may be left in OSS and you need to delete the data manually. This action is consistent with job output to HDFS. When the job suffers abnormal interruptions, some data may be left in HDFS and you also need to delete the data manually. But there is a difference: OSS puts the uploaded files through multipart in Fragment Management. So you not only need to delete the files left in the output directory in OSS File Management, but also need to clear the files in OSS Fragment Management. Otherwise charges can be incurred for the data storage.

Thank you! We've received your feedback.