Zerobased entry machine learning
One: What is machine learning and why do we need machine learning
what is machine learning
Let's look at two examples:
How did we learn about the animal "cat"?
Imagine a person who has never seen a cat (like a small baby) and doesn't even have the word cat in his vocabulary. One day he sees a furry animal:
At this point he doesn't know what it is, you tell him it's a "cat". At this time, the baby may remember that this is a cat.
After a while he saw this animal again:
You tell him it's also a cat. He remembered that it was also a cat.
Then some time later, he saw another animal:
At this point he directly tells you that he saw a "cat".
The above is the basic method for us to understand the world, pattern recognition: people draw conclusions through a lot of experience, and thus judge that it is a cat.
In this process, we learn the characteristics of cats by contacting samples (various cats) (people learn by reading, observe that it can bark, have two ears, four legs, a tail, and have whiskers, and draw conclusions), so as to know what It's a cat.
How do we know that the npm package judges that an npm package is a test npm package?
I posted a piece of code from my friend:
SELECT * FROM
tianma.module_xx
WHERE
pt = TO_CHAR(DATEADD(GETDATE(), 1, 'dd'), 'yyyymmdd')
AND name NOT LIKE '%test%'
AND name NOT LIKE '%demo%'
AND name NOT LIKE '%test%'
AND keywords NOT LIKE '%test%'
AND keywords NOT LIKE '%test%'
AND keywords NOT LIKE '%demo%'
Obviously, the way we judge is whether the name and keywords of this module contain the three characters: test, demo, and test. If so, we consider it a test module. We told the rules to the database, and then the database helped us filter out nontest modules.
Identifying whether it is a cat or identifying whether a module is a test module is essentially the same, both are looking for features:
Features of cats: barking, two ears, four legs, one tail, whiskers
Features of the test module: test, demo, test
Further programmatically express the features:
Features of cats Call: true, Ears: 2, Legs: 4, Tail: 1, Whiskers: 10
Features of the test module: test: count>0, demo: count >0, test: count > 0
With these features, both humans and machines can correctly identify cats or test modules.
A simple understanding of machine learning is to classify data through features and feature weights. (For easier understanding, please refer to: AiLearning/1. Machine Learning Basics.md at master apachecn/AiLearning GitHub)
Why use machine recognition?
The reason is that when the number of features of a certain classification task is huge, it is difficult for us to use the if else method to do simple classification. For example, in our common product recommendation algorithm, there are hundreds or thousands of possible features to determine whether a product is suitable for recommending to someone.
Two: How to train the machine and obtain the model?
prepare data
Data preparation may account for more than 75% of the time in the entire machine learning task, which is the most important part and the most difficult part. mainly:
collect basic data
Clean outliers
Picking Possible Features: Feature Engineering
Data marking
prepare algorithm
A function to fit your data: y=f(x)
For example, a linear function is a onedimensional function: y=ax+b
evaluation algorithm
How to determine whether the found values of a and b are appropriate requires an evaluation function.
The evaluation function describes the gap between the trained parameters and the actual value (loss value). For example, the following picture:
The blue line on the right is closer to the actual data points.
The most common loss evaluation function is the mean square error function. The quality of the predicted value is judged by calculating the sum of squares of the difference between the predicted value and the actual value.
As shown above: the coordinates of the small yellow circle of the sample are:
[
[x1, y1],
[x2, y2],
[x3, y3],
[x4, y4],
[x5, y5],
[x6, y6]
],
The coordinates predicted by the blue line are:
[
[x1, y'1],
[x2, y'2],
[x3, y'3],
[x4, y'4],
[x5, y'5],
[x6, y'6]
],
Then the loss value is:
const cost = ((y'1y1)^2 + (y'2y2)^2 + (y'3y3)^2 + (y'4y4)^2 + (y'5y5 )^2 + (y'6y6)^2 ) / 6
training algorithm
How to Find Appropriate Values of a,b: The Lowest End of a Parabola
Taking the above linear function as an example, the training algorithm is actually looking for suitable a, b values. If we randomly search for the values of a and b in the vast ocean of numbers, we should never find them. At this time, we need to use the gradient descent algorithm to find the values of a and b.
To clarify the goal again, replace the above loss value calculation formula with: y=ax+b
// function 2
const cost = (((a*x1+b)y1)^2 + ((a*x2+b)y2)^2 + ((a*x3+b)y3)^2 + ((a* x4+b)y4)^2 + ((a*x5+b)y5)^2 + ((a*x6+b)y6)^2 )/ 6
The goal is to find a set of values of a and b that minimizes the cost. With this goal, it is much easier to handle.
I don’t know if you still remember the parabolic function in junior high school, that is, the quadratic equation in one variable: y = ax^2+bx+c
Although our above cost function looks very long, it happens to be a quadratic function. Its diagram is roughly like this:
As long as we find the a and b values of the lowest point, our goal is completed.
How do you know you have reached the lower end of the parabola: the lower slope of the parabola is 0
Suppose we randomly initialize a value of a to 1, then our point is at the upper left of the parabola, and it is still far away from the lowest point (minimum cost).
Looking at the picture, we can see that we can get closer to the lowest point as long as we increase the value of a. The picturereading machine can’t. At this time, we are going to sacrifice the most complicated mathematical knowledge in this article: derivatives. The slope value of the tangent line at this point is the derivative of the parabola, as shown in the lowest point in the figure above (where the slope is 0).
From this derivative, the slope of the tangent line (red slash) at this position can be calculated. If the slope of this slope is negative it means that a is too small and needs to be increased to get closer to the bottom. On the contrary, if the slope is positive, it means that the lowest point has been passed, and it needs to be reduced to get closer to the bottom.
How to find the derivative of the cost function?
Don't expand, just look at the code. Keywords: partial derivative, composite derivative
// function 3
// Partial derivative of a parameter
const costDaoA = (((a*x1+b)y1)*2*x1 + ((a*x2+b)y2)*2*x1 + ((a*x3+b)y3)*2* x1 + ((a*x4+b)y4)*2*x1 + ((a*x5+b)y5)*2*x1 + ((a*x6+b)y6)*2*x1 ) / 6
// Partial derivative of the b parameter
const costDaoB = (((a*x1+b)y1)*2 + ((a*x2+b)y2)*2 + ((a*x3+b)y3)*2 + ((a* x4+b)y4)*2 + ((a*x5+b)y5)*2 + ((a*x6+b)y6)*2 )/ 6
That is, as long as the values of a and b are brought into the costDaoA function, a slope can be obtained, which guides how to adjust the parameter a so that it is closer to the bottom.
In the same way, costDaoB guides how the parameter b is changed to be close to the bottom.
Let's loop 500 times
Just cycle 500 times in this way, basically it will be very close to the bottom, so as to obtain the appropriate a, b value.
get model
When you get the a and b values, then we get a model like y=ax+b, which can help us make predictions.
Three: Practice it and start with the simple one: linear regression
What is Linear Regression
It has long been known that crickets chirp more frequently in hotter weather than in cooler weather. We recorded a table of temperature and calls per minute, and drew the following figure in Excel (the case comes from the official tutorial of google tf):
Is it very clear, these little red dots are almost in a straight line:
Then we think that the distribution of these data is linear, and the process of drawing this straight line is linear regression. With this curve we can accurately predict the number of tweets for any question.
Do a linear regression demo in the browser
Address: Test Gradient Descent
https://jshare.com.cn/feeqi/CtGy0a/share?spm=ata.13261165.0.0.6d8c3ebfIOhvAq
For visualization, highcharts is used for data visualization, and the default data points of highcharts are directly used in order to save 75% of the time:
https://www.highcharts.com.cn/demo/highcharts/scatter
When the training is completed, a blue line is drawn and superimposed on the graph, and at the same time, the loss rate curve of the a and b values of each training, commodity and other fieldrelated algorithm work are added. We look forward to students with backgrounds in machine learning/natural language processing/image processing/data mining to join us. Interested students can send their resumes to my email dehong.gdh@alibabainc.com.
what is machine learning
Let's look at two examples:
How did we learn about the animal "cat"?
Imagine a person who has never seen a cat (like a small baby) and doesn't even have the word cat in his vocabulary. One day he sees a furry animal:
At this point he doesn't know what it is, you tell him it's a "cat". At this time, the baby may remember that this is a cat.
After a while he saw this animal again:
You tell him it's also a cat. He remembered that it was also a cat.
Then some time later, he saw another animal:
At this point he directly tells you that he saw a "cat".
The above is the basic method for us to understand the world, pattern recognition: people draw conclusions through a lot of experience, and thus judge that it is a cat.
In this process, we learn the characteristics of cats by contacting samples (various cats) (people learn by reading, observe that it can bark, have two ears, four legs, a tail, and have whiskers, and draw conclusions), so as to know what It's a cat.
How do we know that the npm package judges that an npm package is a test npm package?
I posted a piece of code from my friend:
SELECT * FROM
tianma.module_xx
WHERE
pt = TO_CHAR(DATEADD(GETDATE(), 1, 'dd'), 'yyyymmdd')
AND name NOT LIKE '%test%'
AND name NOT LIKE '%demo%'
AND name NOT LIKE '%test%'
AND keywords NOT LIKE '%test%'
AND keywords NOT LIKE '%test%'
AND keywords NOT LIKE '%demo%'
Obviously, the way we judge is whether the name and keywords of this module contain the three characters: test, demo, and test. If so, we consider it a test module. We told the rules to the database, and then the database helped us filter out nontest modules.
Identifying whether it is a cat or identifying whether a module is a test module is essentially the same, both are looking for features:
Features of cats: barking, two ears, four legs, one tail, whiskers
Features of the test module: test, demo, test
Further programmatically express the features:
Features of cats Call: true, Ears: 2, Legs: 4, Tail: 1, Whiskers: 10
Features of the test module: test: count>0, demo: count >0, test: count > 0
With these features, both humans and machines can correctly identify cats or test modules.
A simple understanding of machine learning is to classify data through features and feature weights. (For easier understanding, please refer to: AiLearning/1. Machine Learning Basics.md at master apachecn/AiLearning GitHub)
Why use machine recognition?
The reason is that when the number of features of a certain classification task is huge, it is difficult for us to use the if else method to do simple classification. For example, in our common product recommendation algorithm, there are hundreds or thousands of possible features to determine whether a product is suitable for recommending to someone.
Two: How to train the machine and obtain the model?
prepare data
Data preparation may account for more than 75% of the time in the entire machine learning task, which is the most important part and the most difficult part. mainly:
collect basic data
Clean outliers
Picking Possible Features: Feature Engineering
Data marking
prepare algorithm
A function to fit your data: y=f(x)
For example, a linear function is a onedimensional function: y=ax+b
evaluation algorithm
How to determine whether the found values of a and b are appropriate requires an evaluation function.
The evaluation function describes the gap between the trained parameters and the actual value (loss value). For example, the following picture:
The blue line on the right is closer to the actual data points.
The most common loss evaluation function is the mean square error function. The quality of the predicted value is judged by calculating the sum of squares of the difference between the predicted value and the actual value.
As shown above: the coordinates of the small yellow circle of the sample are:
[
[x1, y1],
[x2, y2],
[x3, y3],
[x4, y4],
[x5, y5],
[x6, y6]
],
The coordinates predicted by the blue line are:
[
[x1, y'1],
[x2, y'2],
[x3, y'3],
[x4, y'4],
[x5, y'5],
[x6, y'6]
],
Then the loss value is:
const cost = ((y'1y1)^2 + (y'2y2)^2 + (y'3y3)^2 + (y'4y4)^2 + (y'5y5 )^2 + (y'6y6)^2 ) / 6
training algorithm
How to Find Appropriate Values of a,b: The Lowest End of a Parabola
Taking the above linear function as an example, the training algorithm is actually looking for suitable a, b values. If we randomly search for the values of a and b in the vast ocean of numbers, we should never find them. At this time, we need to use the gradient descent algorithm to find the values of a and b.
To clarify the goal again, replace the above loss value calculation formula with: y=ax+b
// function 2
const cost = (((a*x1+b)y1)^2 + ((a*x2+b)y2)^2 + ((a*x3+b)y3)^2 + ((a* x4+b)y4)^2 + ((a*x5+b)y5)^2 + ((a*x6+b)y6)^2 )/ 6
The goal is to find a set of values of a and b that minimizes the cost. With this goal, it is much easier to handle.
I don’t know if you still remember the parabolic function in junior high school, that is, the quadratic equation in one variable: y = ax^2+bx+c
Although our above cost function looks very long, it happens to be a quadratic function. Its diagram is roughly like this:
As long as we find the a and b values of the lowest point, our goal is completed.
How do you know you have reached the lower end of the parabola: the lower slope of the parabola is 0
Suppose we randomly initialize a value of a to 1, then our point is at the upper left of the parabola, and it is still far away from the lowest point (minimum cost).
Looking at the picture, we can see that we can get closer to the lowest point as long as we increase the value of a. The picturereading machine can’t. At this time, we are going to sacrifice the most complicated mathematical knowledge in this article: derivatives. The slope value of the tangent line at this point is the derivative of the parabola, as shown in the lowest point in the figure above (where the slope is 0).
From this derivative, the slope of the tangent line (red slash) at this position can be calculated. If the slope of this slope is negative it means that a is too small and needs to be increased to get closer to the bottom. On the contrary, if the slope is positive, it means that the lowest point has been passed, and it needs to be reduced to get closer to the bottom.
How to find the derivative of the cost function?
Don't expand, just look at the code. Keywords: partial derivative, composite derivative
// function 3
// Partial derivative of a parameter
const costDaoA = (((a*x1+b)y1)*2*x1 + ((a*x2+b)y2)*2*x1 + ((a*x3+b)y3)*2* x1 + ((a*x4+b)y4)*2*x1 + ((a*x5+b)y5)*2*x1 + ((a*x6+b)y6)*2*x1 ) / 6
// Partial derivative of the b parameter
const costDaoB = (((a*x1+b)y1)*2 + ((a*x2+b)y2)*2 + ((a*x3+b)y3)*2 + ((a* x4+b)y4)*2 + ((a*x5+b)y5)*2 + ((a*x6+b)y6)*2 )/ 6
That is, as long as the values of a and b are brought into the costDaoA function, a slope can be obtained, which guides how to adjust the parameter a so that it is closer to the bottom.
In the same way, costDaoB guides how the parameter b is changed to be close to the bottom.
Let's loop 500 times
Just cycle 500 times in this way, basically it will be very close to the bottom, so as to obtain the appropriate a, b value.
get model
When you get the a and b values, then we get a model like y=ax+b, which can help us make predictions.
Three: Practice it and start with the simple one: linear regression
What is Linear Regression
It has long been known that crickets chirp more frequently in hotter weather than in cooler weather. We recorded a table of temperature and calls per minute, and drew the following figure in Excel (the case comes from the official tutorial of google tf):
Is it very clear, these little red dots are almost in a straight line:
Then we think that the distribution of these data is linear, and the process of drawing this straight line is linear regression. With this curve we can accurately predict the number of tweets for any question.
Do a linear regression demo in the browser
Address: Test Gradient Descent
https://jshare.com.cn/feeqi/CtGy0a/share?spm=ata.13261165.0.0.6d8c3ebfIOhvAq
For visualization, highcharts is used for data visualization, and the default data points of highcharts are directly used in order to save 75% of the time:
https://www.highcharts.com.cn/demo/highcharts/scatter
When the training is completed, a blue line is drawn and superimposed on the graph, and at the same time, the loss rate curve of the a and b values of each training, commodity and other fieldrelated algorithm work are added. We look forward to students with backgrounds in machine learning/natural language processing/image processing/data mining to join us. Interested students can send their resumes to my email dehong.gdh@alibabainc.com.
Related Articles

A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team

What Does IOT Mean
Knowledge Base Team

6 Optional Technologies for Data Storage
Knowledge Base Team

What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers

Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00