How To Approach A Machine Learning Project

Machine learning projects are a source of opportunities for businesses, but it is vital that they are well defined in order to achieve success.

Machine learning is an analytical way of solving problems through identification, classification or prediction. Algorithms learn from entered data and then use this knowledge to draw conclusions from new data. Machine learning is a branch within artificial intelligence that uses mathematical algorithms that allow machines to learn.

Today there are many companies that want to take advantage of the opportunities offered by machine learning and, in particular, prediction. For this purpose, they carry out predictive projects that allow them to have a solid base on which to make better business decisions.

The approach required for a predictive machine learning project consists of three main phases.

Data collection. The data can be collected from any site: web, database, voice elements that are turned into text...
Interpretation thanks to algorithms. There is a human part that is dedicated to the collection of data, that analyzes it and makes it go through the algorithm, which extracts information from that data.
Decision making. The algorithm offers a result that will be used as a basis for making business decisions according to the company's criteria.

When we set out to start a predictive project we must ask ourselves a few questions:

Is it a problem that requires artificial intelligence?

In order for it to be so, there must be a very large amount of data that is relevant. A machine learning project should not be based on data that does not provide information or that is not of quality, because it will be a waste of time.

We must keep in mind that machine learning algorithms abstract patterns from data, but they don't reason. Therefore, they must be used as a solid basis on which to make decisions.

Although machine learning are algorithms that can learn by themselves, there must always be a human guide. The machine reads graphs, numbers, etc., but it always requires a human to give value and logic to the results it offers from a business point of view.

When should machine learning be used to solve the problem?

When logical software is difficult to write. It is not feasible for a person to write lines and lines of code with each of the possibilities given the very high complexity of the process and number of cases. This also makes the process expensive. If the software is very complex and must be written manually, a lot of economic, temporal and human resources are wasted.
When there are large amounts of data.
As long as it complies with the machine learning structure. A machine learning problem has a series of very clear guidelines. This means that the problem must have an objective variable, which can be, for example, to classify clients or to find out the number of claims. It can be identified by responding to "what do I want to predict?" Once achieved, it is necessary to know if the data we have are sufficient and adequate to predict what we want.

When can we not solve a problem with machine learning?

We have to be careful with the expectations deposited in machine learning. Most of the time the results offered by the algorithms are a basis for later decisions or actions, they are not automatically translated into benefits. There are some exceptions, such as Netflix: its results are displayed directly on the platform in the form of a recommendation, there is no employee making decisions based on the results of the algorithm. In such cases it is important to assess the impact of an error on the results. It is not the same to fail in the recommendation of a television series than in the possibilities of suffering an accident on the road.

Machine learning is not good for finding coincidences, as it is always based on discovering patterns. So, in front of a chance, the algorithm will not know what to do, because it does not have a reference to which to relate it. So, if the problem we want to solve has many coincidences, perhaps we should consider another way to solve it.

Even if the solution is good and the algorithm works perfectly, the results cannot always be interpreted. With some algorithms, especially decision trees, it is a little easier to see which variables have more weight, but others simply offer a result that cannot be interpreted, even if it is correct and valid for the objective. This is due to the very high complexity of the algorithm's reasoning. In some cases, it would take years for a human to understand why the algorithm has reached that conclusion.

Machine learning is not the best option for our problem if we do not have enough data or it is not labeled. We need tagged data so that the algorithm has references on which to learn and to be able to find patterns and, later, offer predictions.

Keep in mind that a machine learning project can be slow, so it may not be the right technology if we need to put it into production quickly. In addition, such a project requires high error tolerance. Keep in mind that the machine can be wrong, but the goal is always to reduce the margin of error to the maximum.

When is machine learning the best solution?

When scalability is difficult. Machine learning works very well when it has to work with millions and millions of data, which is not the case with many other technologies.
When we need personalized results. With machine learning we work with our own data, so the results obtained will specifically serve us to make decisions in our business.

Machine learning has received a lot of attention in recent years and will probably continue to do so in the future. Algorithms offer us many opportunities to make business decisions, but it is important to define our machine learning projects properly in order to lead them to success.

Text by: Maria Gorini with the collaboration of Raquel García | Data Scientist

Posted by Maria Gorini