This topic describes how to evaluate the performance of a model, such as the accuracy and recall rate of the model. This topic also describes how to view and compare model evaluation results.
Prerequisites
A model is created and associated with a training job. For more information, see Manage models.
A persistent volume claim (PVC) is created. For more information, see Mount a statically provisioned NAS volume in the console or Use the console to mount a statically provisioned OSS volume.
Create an evaluation job
- Log on to AI Developer Console
- In the left-side navigation pane of AI Developer Console, click Model Manage.
In the Model Manage List section, click New Model Evaluate in the Operation column of the model that you want to evaluate.
In the EvaluateJob Message and EvaluateJob Config sections, configure the parameters.
Section
Parameter
Description
EvaluateJob Message
EvaluateJob Name
The name must be 1 to 256 characters in length, and can contain digits, letters, and hyphens (-).
EvaluateJob Image
The base image used by the evaluation job.
Namespace
The namespace to which the evaluation job belongs.
Image Pull Secrets
The credentials for pulling private images. This parameter is optional.
Data Configuration
Specify a data source. This parameter is optional.
To use a PVC, you need to specify a data source.
Model Path
The path of the model that you want to evaluate.
Dataset Path
The path of the datasets that are used by the evaluation job.
Metrics Path
The path of the evaluation results generated by the evaluation job.
Execution Command
The command that you want the pods of the evaluation job to run.
Code Configuration
Specify a Git repository.
To pull code from Git, you need to specify a Git repository.
EvaluateJob Config
CPU (Cores)
The amount of resources requested by the evaluation job.
Memory (GB)
GPU (Card Numbers)
The preceding parameters correspond to the parameters for submitting evaluation jobs in Arena. You can use the default evaluation job code or customize the code. For more information about how to write evaluation job code, see Write evaluation job code.
Click Submit Evaluation Job.
You can view information about the evaluation job on the Evaluate Jobs page.
Write evaluation job code
Procedure
Perform the following steps to write custom evaluation job code. For more information about the sample code, see Sample evaluation job code.
Run the following command to download the latest KubeAI package:
pip install kubeai
Run the following command to import the ABC and KubeAI packages:
from kubeai.evaluate.evaluator import Evaluator from abc import ABC from kubeai.api import KubeAI
Define an Evaluator class to inherit the Evaluator abstract class and rewrite the
preprocess_dataset
,load_model
,evaluate_model
, andreport_metrics
methods. These methods are used to preprocess the dataset, load the model, evaluate the model, and export the evaluation report.class CustomerEvaluatorDemo(Evaluator, ABC): def preprocess_dataset(self): # Preprocess the dataset. def load_model(self): # Load the model. def evaluate_model(self, dataset): # Evaluate the model. def report_metrics(self, metrics): # Export the evaluation report.
Create an API client and import the created Evaluator object. Then, call the Evaluate method to run the evaluation job.
customer_evaluator = CustomerEvaluatorDemo() KubeAI.evaluate(customer_evaluator)
If you want to perform the test on your on-premises machine, you can reference
dataset_dir
,model_dir
, andreport_dir
to call the Test method to perform an on-premises test.customer_evaluator = CustomerEvaluatorDemo() KubeAI.test(customer_evaluator, model_dir, dataset_dir, report_dir)
Sample evaluation job code
The following sample code is used to evaluate the model trained based on the MNIST dataset of TensorFlow 1.15.
from kubeai.evaluate.evaluator import Evaluator
from abc import ABC
from kubeai.api import KubeAI
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers, models
class CNN(object):
def __init__(self):
model = models.Sequential()
model.add(layers.Conv2D(
32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()
self.model = model
class TensorflowEvaluatorDemo(Evaluator, ABC):
def preprocess_dataset(self): #Preprocess the dataset.
with np.load(self.dataset_dir) as f:
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
train_images = x_train.reshape((60000, 28, 28, 1))
test_images = x_test.reshape((10000, 28, 28, 1))
train_images, test_images = train_images / 255.0, test_images / 255.0
train_images, train_labels = train_images, y_train
test_images, test_labels = test_images, y_test
test_loader = {
"test_images" : test_images,
"test_labels" : test_labels
}
return test_loader
def load_model(self): #Load the model.
latest = tf.train.latest_checkpoint(self.model_dir)
self.cnn = CNN()
self.model = self.cnn.model
self.model.load_weights(latest)
def evaluate_model(self, dataset): #Evaluate the model.
metrics = Utils.evaluate_function_classification_tensorflow1(model=self.model, evaluate_x=dataset["test_images"], evaluate_y=dataset["test_labels"])
predictions = self.model.predict(dataset["test_images"])
pred = []
for arr in predictions:
pred.append(np.argmax(arr))
pred = np.array(pred)
confusion_matrix = mt.confusion_matrix(dataset["test_labels"], pred)
metrics["Confusion_matrix"] = str(confusion_matrix)
return metrics
def report_metrics(self, metrics): #Export the evaluation report.
print(metrics)
Utils.ROC_plot(fpr=metrics["ROC"]["fpr"], tpr=metrics["ROC"]["tpr"], report_dir=self.report_dir)
print("Here is the customer-defined report method")
if __name__ == '__main__':
tensorflow_evaluator = TensorflowEvaluatorDemo() #Create an API client and import the created Evaluator object. Then, call the Evaluate method to run the evaluation job.
KubeAI.evaluate(tensorflow_evaluator)
# KubeAI.test(tensorflow_evaluator, model_dir, dataset_dir, report_dir) #Perform the test on your on-premises machine.
View evaluation metrics
In the left-side navigation pane of the AI development console, click Evaluate Jobs.
In the Job List section, you can click the name of an evaluation job to view the metrics.
The preceding figure shows the following metrics: Accuracy, Precision, Recall, F1_score, ROC, and AOC. F1_score indicates the average of Precision and Recall. The Receiver Operating Characteristic (ROC) curve shows the performance of the model. The Area Under ROC Curve (AOC) is the basic evaluation metric.
Compare evaluation metrics
- In the left-side navigation pane of AI Developer Console, click Evaluate Jobs.
In the Job List section, select two or more evaluation jobs and click Metrics Compare in the upper-right corner of the section.
On the page that appears, the metrics of the selected evaluation jobs are displayed in column charts. This provides a visualized comparison to help you better select from multiple models.