While ETL is still a primary process in business, in recent years a new trend has emerged: ELT. In this article we go through the differences between the two and try to decipher which is the best option according to your data assets and what you want to do with them.
In this blog we have previously talked about ETL processes, what they do and what are their benefits. Extract, Transform and Load is a very common data integration and transformation process in the business world for, among other things, it supports data security, data quality and traceability.
Recently, the process is being approached from a new perspective that alters the usual pipeline sequence, exchanging the order of transform and load. Thus, in an ELT process, data is first extracted, then loaded into a single data warehouse —this is where the relevance of data lakes comes in— and, once integrated into the data lake, data transformations are performed on demand (filtering, aggregations, normalizations, combinations, data cleansing and validation, etc.).
Is ELT the Future of ETL?
ELT should not be understood as a replacement. It has not arrived to completely replace ETL —a process that continues to be fundamental and imperative in the business world—. Both processes are complementary and will be more or less beneficial than the other depending on the type and the amount of data to be extracted, transformed and loaded and its use. In fact, experts are already speaking of the ETLT process (extract, transform, load and transform) as a combination of both.
Undoubtedly, one of the main contributors of ELT's growth is the gradual expansion of cloud data warehouses, which, among other things, do not require the purchase of hardware or the installation of software, allow dynamic scalability of the number of nodes, separate storage from computation, support parallel queries, offer unlimited data storage and facilitate data retrieval. All this makes cloud warehouses the ideal choice for transforming data without deteriorating query performance.
ETL vs ELT: Differences
At first glance the difference between ELT and ETL is the order in which the process operations take place, but in practice, in most cases, data extraction, transformation and loading are carried out in parallel to speed up the process. A priori, the difference between ELT and ETL is the order in which the process operations occur, but in practice, in most cases, data extraction, transformation and loading are carried out in parallel to speed up the process. The main distinction, therefore, lies in the location in which the data transformations are performed.
In ELT data is extracted from the data source and fed directly into the back-end data warehouse in a staging area where the transformation is then started. The transformation process works with raw data and finally copies the already processed data into a separate area. The raw data is then loaded into the back-end data warehouse, where data security must be ensured.
For ELT to work efficiently having a data warehouse with the processing and computational capabilities to perform transformations is a must, which in practice implies the use of technologies such as Azure Data Lake, Azure Databricks or Azure Synapse Analytics. In addition, a scalable environment is required so that companies can acquire more space as they need it. In this sense, Azure is the optimal choice, as tools such as Azure Data Factory are pay-per-use.
All this sharpens one of the great strengths of ELT and the reason why many organizations are betting on this new approach: greater speed. In ETL, the speed of data insertion or update is not high and data transformations are mainly done through manual or batch processes. In ELT, on the other hand, the speed of data ingestion is higher since the pipeline is not burdened with heavy operations and bypasses an essential step in ETL: performing data copies.
Advantages of ELT
Apart from making the process lighter and faster, ELT performs all processing in the native language of the back-end data warehouse, rather than in ETL's native language. Also, by choosing ELT, many of the requirements of ETL tools disappear, which means cost savings.
On the other hand, loading data before transforming it (ELT) provides flexibility to data analysts who do not have to decide what they want to do with the data before loading and can instead perform the transformations at the time they are required. In practice, this means that analysts can load the data without having to determine its structure beforehand. In contrast, ETL is a more rigid process in which transformations often involve the use of temporary storage tables and retrieving the original data is more complicated, which could be detrimental to business intelligence if varying the operations performed on the data, the focus of the analysis or the use of the data is needed.
What Is the Best Option?
The performance of the process will depend on the characteristics of each organization and its data assets.
If your company already owns an ETL system that works optimally and smoothly, there is no need to replace it. If, on the other hand, the company is going to work with new data warehouses, starting to play with ELT tools and cloud data warehouses could be interesting.
ELT's strategy is not associated with a specific tool, but rather with an architecture. Classic data integration tools such as SSIS can be integrated into the ELT cycle. This eases migration between the two. Exploring the parallel use of both tools (ETLT) can also add value. In this case, an ELT tool can extract data from various source systems and store it in a data lake —composed of Data Factory, Databricks and SQL—. The ETL process, in turn, can extract data from the data lake, transform it and load it into a data warehouse for reporting.
It is apparent that ELT solves some of ETL's problems and that this new perspective is here to stay. However, to say that ELT is the substitute for ETL would be a bold statement, since both options have their pros and cons and their performance will depend on the attributes and particularities of each organization and its data assets. Only the future can tell what will be the next step for ETL, ELT and ETLT.