In the world of Big Data, there is Artificial Intelligence (AI) and Machine Learning (ML). Both are intrinsically linked and many believe that they will be the heroines of the next century.
Artificial Intelligence was born in the 1950s and refers to software or hardware that man has equipped with a seemingly intelligent behaviour. This invention was long restrained because programming algorithms by hand, was very quickly exhausting. That’s why we created Machine Learning (ML).
Machine Learning, almost always present in an AI, has the capacity to develop algorithms and to learn. It is a real revolution because it works much faster than the human brain and has become a real windfall in fields as different as scientific research or online commerce.
What are the best types of ML algorithms? Supervised or unsupervised? Deep Learning, what for? How to use them? It is to all these questions that we answer in the following of this article, whose will is to explain, to all those who start in Machine Learning.
Different types of ML algorithms.
To date, there are three types of ML algorithms.
The first is defined by an apprenticeship that is described as supervised learning. In this type of learning the ML will define a three-step algorithm. These three stages are representation, evaluation and optimization. We will try to generalize associations of known criteria to cases that we have not yet recognized. The values that will be observed at that time are called supervised learning (SA) feedback.
There are two types of problems in supervised learning. Some fall under the classification, if your output variable is in the form of categories, the others of the regression, if your output variable is in the form of real value. Algorithms such as Linear Regression, Logistic Regression, Native Bayes, KNN or CART are part of this supervised learning. We can add Random Forest or XGBoost.
The second type of machine learning algorithms is defined by an apprenticeship that is described as unsupervised learning. In this type of learning we seek, above all, to recover hidden structures in the transmitted data. One avoids presuppositions other than the one that says that if one observes similar things in different datasets, these observations can have similar meanings.
Three families of techniques exist: the search for associations, the clustering and the factorial analysis when one manages to reduce the number of variables within a sample while preserving the information the more important. In these families, one finds for example algorithms like Apriori algorithm, K-means or ACP.
Machine Learning’s last type of learning is the so-called reinforcement learning. In this last type, the algorithm makes it possible to decide the best course of action according to its current state by following behaviours that will maximize its success. We mix trial and error. These algorithms are very present in the robotics industry or in the creation of video games.
The 10 ML algorithms to master for beginners
We can group the algorithms in the following way, each group having a particular function that may interest you depending on your problem. At first, let’s look at those whose learning is supervised:
- Decision trees: this is the perfect algorithm for doing Data Mining.
- Bayesian networks: thanks to them it is now possible to calculate the probabilities of illness starting from the symptoms.
- The least squares technique: very useful for forecasting sales and carrying out seasonal analyzes.
- Logistic regression
- SVM (Support Vector Machines): versatile classification technique on reasonably sized samples
- Set method : thanks to them we combine predictive models and historical analyzes to be more accurate
In a second step, let’s look at those whose learning is not supervised:
- The grouping of algorithms: this is where we will find the Deep Learning, on which we will make a point a little further in the article.
- PCA (Principal Component Analysis): it allows sorting in variables.
- The SVD (Singular Value Decomposition): it is used in facial recognition for example.
- ICA (Independent Component Analysis): it is the algorithm useful in image processing.
Let’s go further: operation and examples of supervised ML algorithms
- CART (Classification And Regression Trees): it is an algorithm that works on the model of decision trees. In this type of algorithm, the different possibilities are represented in the form of a tree. Each part of the tree represents an input variable (x) or a point of division of this variable. The sheet represents the output variable (y). We go through the model following its divisions to arrive at a prediction.
- Naïve Bayes: This type of algorithm is a Bayesian network based on Bayes’ theorem. This theorem is used to calculate the probability of a result from a variable. We therefore compute (h), the hypothesis, from (d), an earlier datum by following the following theorem: P(h/d) = (P(d/h)) * P(h) / P(d). P in this theorem is the probability that we want to define.
- Linear regression: This is a least squares method. It is based on the relationship between the input variables and the output variable. We will quantify this relationship using the equation y = a + bx. The goal is to find the values a and b. Then we will draw a figure.
- Logistic regression: if in the linear regression the predicted values are continuous, this is not the case in the logistic regression. For example, to determine whether a person will be sick or not, we will establish the following equation, where 1 denotes the sick instances: h (x) = 1 / (1 + e ^ x) which will define an S-shaped curve A threshold will then be decided to integrate this probability into a binary classification.
- The SVM: it is the algorithm of the nearest neighbours. Its purpose is to classify and distribute data. He uses a vector to achieve its ends and will calculate the similarity between the different instances using measure as Euclidean distance or Hamming distance.
Let’s go further: operation and examples of unsupervised ML algorithms
- PCA: Principal Component Analysis is intended to facilitate the exploration of data by reducing variables. It captures the maximum variance in the data and places it in a new coordinate system with axes called main components that are linear variables of the previous data. By combining them, we obtain all the necessary variables.
- Set-theoretic approaches: three techniques for grouping algorithms exist. The first is the Bagging found in Random Forest. It combines several decision tree methods. We will be able to mix several learners. We are on a random model. With AdaBoost, which works with the second technique, that of Boosting, we will be more on a corrective model? Each error will be corrected as and when.
This is where Deep Learning will have its place. This is a very important element for the developer, because the Deep Learning will allow him to specify functions in its program that is to say to define clearly what are the characters or results, from the data, which will be useful or not to the analysis. Deep Learning is the closest thing to how our brain works. Many Artificial Intelligences shape a network that learns and erases what resembles our neural network.
The last technique is Stacking. It’s about superimposing different elements.
Discipline in its own right and essential if you want to evolve on a Big Data platform, Machine Learning and the algorithms used to develop it are diverse, practical and intuitive. If they may seem obscure at first, a diligent practice, you will certainly succeed in associating them to obtain the best results in the creation of your ML.