Exploramos los distintos tipos y ejemplos de data warehouse que existen y hablamos de su relación con OLAP y OLTP.
A data warehouse is the centrepiece of any data integration process and a key component in the process of extracting, transforming and loading (ETL) data from online transaction processing (OLTP) systems and other operational sources. In this article we explore the differences between ETL data warehousing, OLAP and OLTP.
Data Warehouse, ETL, OLAP and OLTP are terms related to the processing and storage of data in the field of database management and business intelligence.
The ETL process involves extracting data from various sources, transforming it to cleanse, structure and enrich the data, and loading it into a centralised data warehouse. OLTP, on the other hand, focuses on real-time transaction management and is used to manage day-to-day business operations. Meanwhile, OLAP focuses on the analysis and exploration of large data sets stored in a data warehouse.
These three areas are interconnected and work together to facilitate the management and analysis of data in an enterprise environment.
What is a data warehouse?
As the name suggests, broadly speaking, a data warehouse is simply a data store that differs from other databases in its capabilities when working with large amounts of data and in terms of integration with BI and analytics tools.
In terms of Online Transaction Processing (OLTP), a data warehouse generally refers to a secondary or auxiliary data storage that is used to extract, transform and load (ETL) data from OLTP systems and other operational sources. In this context, the Data Warehouse acts as a separate database specifically designed to support business analysis and reporting.
Data Warehousing: ETL, OLAP and OLTP
What is ETL?
The acronym ETL —Extract, Transform, Load— is used to refer to a fundamental process in data management and analysis in the field of data integration.
The ETL process is used to move data from various sources, transform it into the appropriate format and then load it into a centralised database or data warehouse where it can be used for analysis, reporting and informed decision making. Each stage of the process serves a specific function:
-
Extract: At this stage, data is obtained from external sources, which can be databases, flat files, web APIs, CRM systems, spreadsheets, among others. Extraction may involve the selection of specific data to be processed.
-
Transform: Once data has been extracted, it is often necessary to perform certain transformations to clean, structure or enrich the data. This may involve removing duplicate values, replacing missing values, converting data formats, aggregating and calculating new metrics, among other actions.
-
Load: In this phase, the transformed data is introduced into a centralised data warehouse, which can be a relational database, a data warehouse, a big data platform, etc. The goal is to make the data ready for analysis and reporting.
The ETL process is crucial to ensuring data quality and integrity, as it allows data to be properly prepared prior to analysis. It also facilitates the consolidation of data from multiple sources into a centralised location, which improves efficiency in business decision making and reporting.
In recent years, with the evolution of technology and the emergence of more advanced tools, the term ETL has evolved into "ELT" ("Extract, Load, Transform"), where data transformation is performed after loading into the data warehouse. This is particularly relevant in the context of modern data warehouses and big data solutions.
What is OLTP?
OLTP (Online Transaction Processing) is an approach to database management that focuses on the management and processing of real-time transactions. Unlike OLAP (Online Analytical Processing), which focuses on the analysis and exploration of large data sets, OLTP is used to manage and control day-to-day business operations involving individual transactions.
OLTP main features:
-
Real-time transactions: OLTP is designed to handle individual transactions in real time, which means that data is updated and processed immediately as transactions occur.
-
Frequent queries and updates: OLTP systems are optimised to perform read and write operations with high frequency and efficiency. They are designed to allow multiple users to simultaneously access and make changes to data concurrently.
-
Database structure: OLTP databases generally have a highly normalised design to minimise redundancy and ensure data integrity. This helps to maintain consistency of information in environments where there are many transactions affecting the same data.
-
Size of the database: Although OLTP databases can contain large amounts of data, their size is generally smaller compared to Data Warehouses used in OLAP.
-
Focus on availability and integrity: OLTP systems place significant emphasis on ensuring the availability and integrity of data at all times. This involves maintaining high system availability and ensuring that transactions are error-free and maintain data consistency.
Examples of OLTP applications include online database management systems for financial transactions, online reservation and sales systems, inventory control systems, patient registration systems in hospitals and any other application requiring immediate and efficient processing of individual transactions.
What is OLAP?
OLAP (Online Analytical Processing) is a technology and approach to advanced data analysis that allows users to explore and derive meaningful information from large data sets from multiple perspectives. Unlike OLTP (Online Transaction Processing), which focuses on real-time transaction management, OLAP focuses on the analysis and discovery of patterns, trends and relationships in data stored in a data warehouse or other multidimensional databases.
OLAP main features:
-
Multidimensional data: OLAP organises data into multidimensional structures known as 'OLAP cubes'. These cubes contain data in different dimensions, such as time, product, location and category, allowing users to analyse data from different perspectives.
-
Complex analytical operations: OLAP systems support complex analytical operations such as drill-down/drill-up, filtering, rotation, segmentation and roll-up. These operations allow users to explore data at different levels of detail and gain deeper insights.
-
Data aggregation: OLAP allows summarising and aggregating data to obtain a more general and comprehensive view of the data. This is useful for aggregate analysis such as totals, averages, peaks and troughs.
-
Fast response: OLAP systems are optimised to provide fast responses to complex queries. They use efficient processing and pre-calculation techniques to speed up data analysis and retrieval.
-
Decision support: OLAP is used to analyse large volumes of business data and provide valuable information for informed decision-making. Users can gain a deeper understanding of trends and patterns, enabling them to make better strategic and tactical decisions.
OLAP applications are wide-ranging and can include business intelligence tools, dashboards, analytical reports and decision support systems. These tools and systems allow analysts and executives to interact with data and explore it in a flexible way to obtain relevant and valuable information for the organisation
What is the difference between OLAP and OLTP?
OLAP and OLTP are two different approaches to database management. While OLTP focuses on real-time transaction processing, OLAP focuses on the analysis and exploration of large data sets to obtain meaningful information.
OLTP is used to manage day-to-day business operations involving individual transactions, such as reservation systems and online sales. OLAP, on the other hand, is used to analyse large volumes of data from multiple perspectives, allowing users to obtain valuable information for business decision making.
Data warehousing, ETL, OLAP and OLTP: Where do they meet?
Data warehousing, OLAP (online analytical processing) and OLTP (online transactional processing) are concepts related to the management and processing of data within an organisation's information systems. Although they serve different purposes, they are interconnected in several ways.
As we have already seen, a data warehouse is a centralised database used to store historical and consolidated data from various operational sources, including OLTP systems. On the other hand, OLTP (Online Transaction Processing) systems are databases that are used to manage and process real-time transactions, such as sales, orders, reservations and other daily business operations. These systems are responsible for performing and maintaining routine business operations, ensuring that data is up-to-date and available in real time.
The relationship between data warehouse and OLTP is that the data used for OLAP analysis comes from OLTP systems and other operational sources. Data is periodically extracted from OLTP systems and transformed in the data warehouse to make it available for historical and multidimensional analysis in OLAP. The data warehouse acts as a consolidated and optimised repository for analysis, allowing OLTP systems to focus on performance and day-to-day transaction management without the additional burden of complex analysis.
OLAP is the technology and approach used to analyse data stored in the Data Warehouse and other multidimensional databases. It uses specialised data structures, such as OLAP cubes, to allow users to perform multidimensional queries and explore large amounts of data from different perspectives.
On the other hand, ETL processes are used to extract data from OLTP systems, transform it into a suitable format, and load it into the data warehouse. This transformation involves aggregating, cleaning, and structuring the data for analytical purposes.
In short, a data warehouse acts as the centralised repository for historical and consolidated data, and is the primary source for analytical operations performed through OLAP. OLTP systems, on the other hand, are responsible for managing and processing the day-to-day operations of the business and providing up-to-date data that will eventually be used in the data warehouse and, subsequently, in OLAP analysis. Together, these three concepts enable effective data management and analysis in a business environment.
Thus, the relationship between a data warehouse, OLAP, and OLTP can be understood as a pipeline: OLTP systems generate transactional data, ETL processes transform and load this data into the data warehouse, and OLAP systems leverage the data warehouse to provide powerful analytical capabilities to users for decision-making and business insights.