This topic describes the FAQ about TensorFlow.
- How do I enable machine learning?
- How do I reference multiple Python files?
- How do I upload data to OSS?
- How do I read data from OSS?
- How do I write data to OSS?
- Why does an OOM error occur?
- What use cases of TensorFlow are available?
- What is the role of model_average_iter_interval when two GPUs are configured?
How do I enable machine learning?
Machine Learning Platform for AI (PAI) provides TensorFlow, Caffe, and MXNet components for machine learning. To enable machine learning, you must be granted permissions for graphics processing unit (GPU) resources and Object Storage Service (OSS) access. For more information, see Authorization.
How do I reference multiple Python files?

- Python Code Files: the .tar.gz package.
- Primary Python File: the program entry file.
How do I upload data to OSS?
Before you upload data to OSS, you must create an OSS bucket, where the data of machine learning algorithms is stored. We recommend that you create the OSS bucket in the same region as the GPU cluster that you use for machine learning. This way, you can transmit data in a classic network of Alibaba Cloud, where traffic generated by algorithms is free. After you create an OSS bucket, you can create folders, organize data directories, and upload data in the OSS console.
How do I read data from OSS?
Python code cannot be run to read data from OSS. All code that calls functions such
as Python Open()
and os.path.exist()
to perform operations on files and folders cannot be executed. For example, code
that invokes scipy.misc.imread()
and numpy.load()
cannot be executed.
- Use functions of the tf.gfile module to read images or text. The following sample functions are supported:
tf.gfile.Copy(oldpath, newpath, overwrite=False) # Copies a file. tf.gfile.DeleteRecursively(dirname) # Recursively deletes all files in a directory. tf.gfile.Exists(filename) # Checks whether a file exists. tf.gfile.FastGFile(name, mode='r') # Reads a file in non-blocking mode. tf.gfile.GFile(name, mode='r') # Reads a file. tf.gfile.Glob(filename) # Queries all files in a directory. You can filter these files by pattern. tf.gfile.IsDirectory(dirname) # Checks whether an item is a directory. tf.gfile.ListDirectory(dirname) # Queries all files in a directory. tf.gfile.MakeDirs(dirname) # Creates a folder in a directory. If no parent directories exist, a parent directory is automatically created. If the folder you want to create already exists and is writable, a success response is returned. tf.gfile.MkDir(dirname) # Creates a folder in a directory. tf.gfile.Remove(filename) # Deletes a file. tf.gfile.Rename(oldname, newname, overwrite=False) # Renames a file. tf.gfile.Stat(dirname) # Queries statistical data about a directory. tf.gfile.Walk(top, inOrder=True) # Queries the file tree of a directory.
- Use
tf.gfile.Glob
,tf.gfile.FastGFile
,tf.WhoFileReader()
, andtf.train.shuffer_batch()
functions to batch read files. Before you batch read files, you must obtain the list of the files and create a batch.
import tensorflow as tf
FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_string('buckets', 'oss://{OSS Bucket}/', 'Folder of the training image file')
tf.flags.DEFINE_string('batch_size', '15', 'Size of the batch')
files = tf.gfile.Glob(os.path.join(FLAGS.buckets,'*.jpg')) # Queries the paths of all JPG files in buckets.
- Use the
tf.gfile.FastGfile()
function to batch read a small number of files.for path in files: file_content = tf.gfile.FastGFile(path, 'rb').read() # Remember to specify rb when you call this function. Otherwise, errors may occur. image = tf.image.decode_jpeg(file_content, channels=3) # In this example, JPG images are used.
- Use the
tf.WhoFileReader()
function to batch read a large number of files.
The following description explains the code:reader = tf.WholeFileReader() # Instantiates a reader. fileQueue = tf.train.string_input_producer(files) # Creates a queue for the reader to read. file_name, file_content = reader.read(fileQueue) # Uses the reader to read a file from the queue. image_content = tf.image.decode_jpeg(file_content, channels=3) # Decodes the file content into images. label = XXX # In this example, label processing operations are omitted. batch = tf.train.shuffle_batch([label, image_content], batch_size=FLAGS.batch_size, num_threads=4, capacity=1000 + 3 * FLAGS.batch_size, min_after_dequeue=1000) sess = tf.Session() # Creates a session. tf.train.start_queue_runners(sess=sess) # Starts the queue. If this command is not executed, the thread keeps blocked. labels, images = sess.run(batch) # Obtains the results.
tf.train.string_input_producer
: converts files into a queue. You must usetf.train.start_queue_runners
to start the queue.tf.train.shuffle_batch
includes the following parameters:- batch_size: the amount of data to return each time after you run a batch task.
- num_threads: the number of threads to run. The value is usually set to 4.
- capacity: the number of pieces of data from which you want to randomly extract data. For example, assume that a dataset has 10,000 pieces of data. If you want to randomly extract data from 5,000 pieces of data to train, set capacity to 5000.
- min_after_dequeue: the minimum length of the queue to maintain. The value must be less than or equal to the value of capacity.
How do I write data to OSS?
- Use the
tf.gfile.FastGFile()
function to write a file. The following code shows the sample function.tf.gfile.FastGFile(FLAGS.checkpointDir + 'example.txt', 'wb').write('hello world')
- Use the
tf.gfile.Copy()
function to copy a file. The following code shows the sample function.tf.gfile.Copy('./example.txt', FLAGS.checkpointDir + 'example.txt')
Why does an OOM error occur?
The out-of-memory (OOM) error occurs because your memory usage reaches the maximum of 30 GB. We recommend that you use gfile functions to read data from OSS. For more information, see How do I read data from OSS?.
What use cases of TensorFlow are available?
- Use TensorFlow to classify images. For more information, see Use PAI-TensorFlow to build an image classification model, and Code for use cases of TensorFlow.
- Use TensorFlow to write songs. For more information, see Lyric writing.
What is the role of model_average_iter_interval when two GPUs are configured?
If model_average_iter_interval is not set, the parallel Stochastic Gradient Descent (SGD) algorithm is used in the GPUs, and the gradient is updated in each iteration. If model_average_iter_interval is greater than 1, the model averaging method is used to calculate two average model parameters after the data is trained multiple times at the specified iteration interval. The model_average_iter_interval parameter specifies the number of training times.