Machine learning is an analytical way of solving problems through identification, classification or prediction. Algorithms learn from entered data and then use this knowledge to draw conclusions from new data. Machine learning is a branch within artificial intelligence that uses mathematical algorithms that allow machines to learn.
Today there are many companies that want to take advantage of the opportunities offered by machine learning and, in particular, prediction. For this purpose, they carry out predictive projects that allow them to have a solid base on which to make better business decisions.
The approach required for a predictive machine learning project consists of three main phases.
When we set out to start a predictive project we must ask ourselves a few questions:
In order for it to be so, there must be a very large amount of data that is relevant. A machine learning project should not be based on data that does not provide information or that is not of quality, because it will be a waste of time.
We must keep in mind that machine learning algorithms abstract patterns from data, but they don't reason. Therefore, they must be used as a solid basis on which to make decisions.
Although machine learning are algorithms that can learn by themselves, there must always be a human guide. The machine reads graphs, numbers, etc., but it always requires a human to give value and logic to the results it offers from a business point of view.
We have to be careful with the expectations deposited in machine learning. Most of the time the results offered by the algorithms are a basis for later decisions or actions, they are not automatically translated into benefits. There are some exceptions, such as Netflix: its results are displayed directly on the platform in the form of a recommendation, there is no employee making decisions based on the results of the algorithm. In such cases it is important to assess the impact of an error on the results. It is not the same to fail in the recommendation of a television series than in the possibilities of suffering an accident on the road.
Machine learning is not good for finding coincidences, as it is always based on discovering patterns. So, in front of a chance, the algorithm will not know what to do, because it does not have a reference to which to relate it. So, if the problem we want to solve has many coincidences, perhaps we should consider another way to solve it.
Even if the solution is good and the algorithm works perfectly, the results cannot always be interpreted. With some algorithms, especially decision trees, it is a little easier to see which variables have more weight, but others simply offer a result that cannot be interpreted, even if it is correct and valid for the objective. This is due to the very high complexity of the algorithm's reasoning. In some cases, it would take years for a human to understand why the algorithm has reached that conclusion.
Machine learning is not the best option for our problem if we do not have enough data or it is not labeled. We need tagged data so that the algorithm has references on which to learn and to be able to find patterns and, later, offer predictions.
Keep in mind that a machine learning project can be slow, so it may not be the right technology if we need to put it into production quickly. In addition, such a project requires high error tolerance. Keep in mind that the machine can be wrong, but the goal is always to reduce the margin of error to the maximum.
Text by: Maria Gorini with the collaboration of Raquel GarcĂa | Data Scientist