Community Blog Basic Machine Learning: How to Recognize a Cat

Basic Machine Learning: How to Recognize a Cat

This article explains the main steps for training a machine and obtaining a model, provides some simple practices, and shares a basic principle of machine learning.

By Xixia

1) What Is Machine Learning and Why Do We Need It?

What is machine learning? First, let's look at two examples.

How Do We Learn to Recognize an Animal as a "Cat"?

Imagine people who have never seen a cat, such as little babies. They do not even have the word "cat" in their vocabulary.

One day they see a furry animal like this:


They do not know what it is, and you tell them that it is a "cat". At this time, they may remember that it is a cat.

After some time, they see another animal like this:


You tell them that it is also a cat. They remember that it is also a cat.

Later on, they see another animal:


At this time, they tell you directly that they see a "cat".

The preceding method is the basic method that we use to understand the world, which is pattern recognition: We concluded that it is a cat based on extensive experiences.

In this process, we learned about the characteristics of cats by contacting samples, which are various cats. We learn through reading and observe how they mew, and what they look like with two ears, four legs, a tail, and whiskers, to draw the conclusion. Then, we know what a cat is.

How Do We Identify the npm Package as a Test npm Package?

The following is a piece of code written by one of my colleagues:

  pt = TO_CHAR(DATEADD(GETDATE(), - 1, 'dd'), 'yyyymmdd')
  AND name NOT LIKE '%test%'
  AND name NOT LIKE '%demo%'
  AND name NOT LIKE '%测试%'
  AND keywords NOT LIKE '%test%'
  AND keywords NOT LIKE '%测试%'
  AND keywords NOT LIKE '%demo%'

Obviously, our criterion is whether the module's name and keywords contain characters: test or demo. If it is true, we consider it as a test module. We tell the database our rules, and the database helps us filter out non-test modules.

The identification of a cat is essentially the same as the identification of a test module. Both imply looking for characteristics:

  • Characteristics of a cat: mewing, two ears, four legs, a tail, and whiskers
  • Characteristics of a test module: test or demo

The characteristics can be further programmed as follows:

  • Characteristics of a cat: mewing: true, ears:2, legs:4, tail:1, and whiskers:10
  • Characteristics of a test module: test:count>0 or demo: count>0

With these characteristics, both people and machines can recognize a cat or a test module.

To put it simply, machine learning uses characteristics and their weights to implement data classification. This simplified statement is for your ease of understanding. For more information, see AiLearning/1. Basic Machine Learning.md at master apachecn/AiLearning at GitHub.

Why Do We Need Machine Learning?

The reason is that when a classification task involves large amounts of characteristics, it is difficult to use "if-else" to perform simple classification. Take our common product recommendation algorithm as an example. To identify whether a certain product should be recommended to someone, the algorithm may involve hundreds of characteristics.

2) How Can We Train a Machine and Obtain a Model?

Prepare Data

Data preparation may account for more than 75% of the time consumed by the entire machine learning task. Therefore, it is the most important and most difficult part. The main steps are as follows:

1) Collect basic data.
2) Remove outliers.
3) Select possible characteristics: characteristic engineering.
4) Tag the data.

Prepare an Algorithm

Fit your data with a function: y = f(x)

For example, use a linear function: y = ax + b

Evaluate the Algorithm

Use an evaluation function to find out whether you have found the proper 'a' and 'b' values.

An evaluation function describes the difference between the parameters that are obtained through training and the actual values. The difference is also called loss value. The following figure shows an example:


The blue line on the right is closer to the actual data points.

The most common loss evaluation function is the mean squared error. This function measures the average squared difference between the estimated values and the actual value to judge the quality of estimated values.

As shown in the preceding figure, the coordinates of the small yellow circles in the sample are as follows:

[x1, y1],
[x2, y2],
[x3, y3],
[x4, y4],
[x5, y5],
[x6, y6]

The estimated coordinates on the blue line are as follows:

[x1, y'1],
[x2, y'2],
[x3, y'3],
[x4, y'4],
[x5, y'5],
[x6, y'6]

Therefore, the loss value is:

const cost = ((y'1-y1)^2 + (y'2-y2)^2 + (y'3-y3)^2 + (y'4-y4)^2 + (y'5-y5)^2 + (y'6-y6)^2 )/6

Train the Algorithm

Find the Proper Values of a and b Based on the Lowest Point of a Parabola

Taking the preceding linear function as an example. Training an algorithm is actually looking for the proper values of 'a' and 'b'. If we perform a random search in the vast ocean of numbers, we will never find the proper values of 'a' and 'b'. In this case, we need to use a gradient descent algorithm to find the proper values of 'a' and 'b'.

To clarify the goal, replace the preceding formula for calculating the loss value with y = ax + b

// Function 2
const cost = (((a*x1+b)-y1)^2 + ((a*x2+b)-y2)^2 + ((a*x3+b)-y3)^2 + ((a*x4+b)-y4)^2 + ((a*x5+b)-y5)^2 + ((a*x6+b)-y6)^2 )/ 6

Our goal is to find the 'a' and 'b' values that minimize the cost. With this goal, we may go straight to find a solution.

Do you still remember the quadratic functions you learned in middle school, which are quadratic equations: y = ax^2 + bx + c?

Although the preceding cost function seems long, it is also a quadratic function. Its graph is probably as follows:


As long as we can find the 'a' and 'b' values for the lowest point, we can achieve our goal.

Identify the Lowest Point of a Parabola by the Slope, Which Is Zero at the Lowest Point

Let's assume we randomly initialize the value of 'a' as 1, then the point is on the upper-left part of the parabola, and it is still far away from the lowest point with the minimum cost.

As shown in the preceding figure, we only need to increase the value of 'a' to approach the lowest point. However, machines are unable to understand the graph. In this case, we look to the most complicated mathematical knowledge in this article: derivative. At this point, the slope of the tangent line is the derivative of this parabola, such as the lowest point (the slope is 0) in the preceding graph.

We can calculate the slope of the tangent line (the oblique red line) here through this derivative. If the slope of this oblique line is negative, it means that the value of a is too small and needs to be increased to get closer to the bottom. On the contrary, if the slope is positive, it means that the value of 'a' has passed the lowest point and needs to be reduced to get closer to the bottom.

How Can We Find the Derivative of a Cost Function?

Let's look at the following code. To understand the code, first review the mathematical knowledge: partial derivative and how to find the derivative of composite functions.

// Function 3
// Partial derivative of a
const costDaoA = (((a*x1+b)-y1)*2*x1 + ((a*x2+b)-y2)*2*x1 + ((a*x3+b)-y3)*2*x1 + ((a*x4+b)-y4)*2*x1 + ((a*x5+b)-y5)*2*x1 + ((a*x6+b)-y6)*2*x1 )/ 6

// Partial derivative of b
const costDaoB = (((a*x1+b)-y1)*2 + ((a*x2+b)-y2)*2 + ((a*x3+b)-y3)*2 + ((a*x4+b)-y4)*2 + ((a*x5+b)-y5)*2 + ((a*x6+b)-y6)*2 )/ 6

If we bring the 'a' and 'b' values into the costDaoA function, we get a slope, which determines how to adjust the parameter 'a' to get closer to the bottom.

Similarly, costDaoB determines how to adjust the parameter 'b' to get closer to the bottom.

Run for 500 Cycles

If you run for 500 cycles in this way, you can get very close to the bottom and obtain the proper 'a' and 'b' values.

Obtain a Model

Obtain a model like y = ax + b, which we can use to make estimations.

3) Start with a Simple Practice: Linear Regression

What Is Linear Regression?

It is known to all that crickets chirp more frequently in hot weather than in cool weather. We have recorded the temperatures and the cricket chirps per minute in a table and graphed the table in Excel as follows (The case is from the official tutorial of Google TF):


It is clear that these red dots are almost on a straight line:


Therefore, we regard the data distribution as linear, and the process of drawing this straight line is called linear regression. With this straight line, we accurately estimate the cricket chirps per minute under any circumstances.

Use a Browser to Demonstrate Linear Regression

Address: test gradient descent



We use highcharts for data visualization and directly use the default data points of highcharts to save 75% of the time consumed.

When the training is completed, a blue line is overlaid on the graph, and the loss rate curve of 'a' and 'b' values for each training cycle are added.

Code Description


* Cost function and calculation of the mean squared error


function cost(a, b) {

    let sum = data.reduce((pre, current) = >{

        return pre + ((a + current[0] * b) - current[1]) * ((a + current[0] * b) - current[1]);


    return sum / 2 / data.length;



* Calculate the gradient

* @param a

* @param b


function gradientA(a, b) {

    let sum = data.reduce((pre, current) = >{

        return pre + ((a + current[0] * b) - current[1]) * (a + current[0] * b);


    return sum / data.length;


function gradientB(a, b) {

    let sum = data.reduce((pre, current) = >{

        return pre + ((a + current[0] * b) - current[1]);


    return sum / data.length;


// Number of training cycles
let batch = 200;

// This is the speed at which the result value gets closer to the bottom in each cycle. It is also the learning speed. If it is too high, the result value will bounce around the lowest point. If the speed is too low, the learning efficiency will become lower.
let alpha = 0.001;

let args = [0, 0]; // Initialized a and b values
function step() {

    let costNumber = (cost(args[0], args[1]));

    console.log('cost', costNumber);

    chartLoss.series[0].addPoint(costNumber, true, false, false);

    args[0] -= alpha * gradientA(args[0], args[1]);

    args[1] -= alpha * gradientB(args[0], args[1]);

    if ((—batch > 0)) {

        window.requestAnimationFrame(() = >{

    } else {

        drawLine(args[0], args[1]);




4) What's Next

When there are more characteristics, we need to perform more calculations and invest more time in training to obtain a training model.

After you read the preceding simple descriptions, machine learning is now definitely no longer mysterious to you. Refer to the introductory articles that are more professional.

If you're interested to start your AI/ML journey on Alibaba Cloud, please visit the Machine Learning Platform for AI (PAI) page to learn more.


1) GitHub - apachecn/AiLearning: AiLearning: Machine Learning - MachineLearning - ML, Deep Learning - DeepLearning - DL, Natural Language Processing - NLP
2) https://developers.google.com/machine-learning/crash-course/descending-into-ml/video-lecture?hl=zh-cn
3) Learn Machine Learning from Zero: A Step-by-step Guide to the Implementation of Gradient Descent with Python

The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

0 1 1
Share on

Alibaba Clouder

2,600 posts | 754 followers

You may also like


Alibaba Clouder

2,600 posts | 754 followers

Related Products