generally refers to the concept of learning without a teacher. In the context of ML (Machine Learning
), unsupervised learning techniques are those which can detect patterns in data without any information about what groupings are “right” or “wrong”. Unsupervised learning is typically used to determine common features shared by items in a dataset, to allow them to be grouped into categories or “clusters”. This type of clustering can be useful to identify common patterns or features in an input data set, even if very little is known about the data.
Let’s try an example. Imagine that you run a factory which makes computer chips, and you have very wisely set up sensors throughout your factory which track humidity, temperature, and vibration, because you know these factors have some impact on how many of your chips fail.
On some days, your factory produces more faulty chips than other days, and you want to see if there is some key combination of factors that causes this. Simple data analysis has not turned up any single key factor that causes your chips to fail: it could be multiple factors acting together.
Using an unsupervised clustering algorithm such as K-means clustering, you are able to group datapoints together based on how similar they are. Here, “similar” means the data points are “near” each other when plotted on a graph. After examining the output, you see there are 3 distinct “clusters” of datapoints:
- “High humidity + high temperature + low vibration”
- “Low humidity + high temperature + high vibration”
- “Low humidity + low temperature + low vibration”
- “High humidity + high temperature + high vibration”
Comparing these conditions with the data for the number of failed chips produced each day, you quickly realize that days with both “high temperature” and “high vibration” (groups 2 and 4) have the most chip failures. After installing a new air conditioning system and making changes to your equipment to reduce vibration, your chip yield increases significantly, increasing your profits.
This is an example of the type of task unsupervised learning systems are ideal for. You can contrast this with supervised learning
(see the article “What is Supervised Learning
?”), in which you also need to know what the expected outcome is (the “right answers”), given a set of input data. For instance, if we were using supervised learning we might not be able to easily determine which factors cause chips to fail, but we *could* use sensor data to predict *how many* chips we would expect to fail. This could be useful risk management data.