The concept Data as a Product (DaaP) is one of the fundamentals for the construction of a data mesh, but what does data as a product actually mean?
The concept "Data as a Product (DaaP)" has become more popular as data-driven companies are committed to building a flexible data architecture in their organisation. However, there is still a lot of confusion about what it is and what it means to consider data as a product. The term "Data as a Product (DaaP)" is sometimes mistakenly confused with the term "data product".
The term "Data as a Product (DaaP)" has gained relevance in relation to the concept "data mesh", as one of the principles of a data mesh is to consider data as a product. In other words, treating corporate data as a product is one of the cornerstones for building an enterprise data mesh.
Although it is not a recent concept and data professionals have been treating datasets as products since the first data warehouses, the relevance of the "Data as a Product (DaaP)" approach has been accentuated by the rise of flexible data architectures such as data meshes.
According to IDC, by 2026, only 10% of the data produced annually will be completely new. The remaining 90% will be the result of reusing data already generated. This scenario only increases the importance of starting to treat data as a product, rather than as a tool for building data products.
Since the popularisation of the term DaaP, invented by Zhamak Dehghani, many do not fully understand the difference between "data as a product" (DaaP) and "data products".
Below, we try to clarify what data as a product is, what the DaaP perspective actually implies, and how it differs from data products.
What is Data as a Product (DaaP)?
Basically, Data as a Product (DaaP) is a perspective that consists of understanding and treating data as a product. This approach considers data as elements that can be reused and exploited to provide information at times required by business processes or to provide meaningful data when seeking to analyse specific aspects related to business activity or to make strategic decisions based on them.
It involves ensuring that the data assets held meet a number of fundamental characteristics, such as being easily discoverable, secure, addressable, understandable and reliable. To achieve this, the role of the Chief Data Officer is seen as key.
The concept of 'Data as a Product (DaaP)' originally appeared in the article 'Data Mesh: Delivering Data-Driven Value at Scale' by Zhamak Dehghani. In the article, the author explains why it is necessary to consider data as products: "Data teams must apply product thinking [...] to the data sets they provide; considering their data assets as their products and the rest of the organisation's data scientists, machine learning and data engineers as their customers".
In short, the notion of "data as a product" comes from applying a product development mindset around datasets, ensuring that they possess several fundamental qualities.
What is the difference between Data as a Product (DaaP) and a data product?
The term "Data as a Product (DaaP)" has sometimes been mistranslated or misquoted as "data product", leading to confusion between the two terms which, however, do not mean the same.
If we go back to the first recognised definition of "data product", stated by DJ Patil in the book "Data Jujitsu: The Art of Turning Data into Product" (2012); a data product is "a product that facilitates the achievement of an objective through the use of data".
Therefore, it is any product that relies on data to achieve a goal. In this sense, any online newspaper could be considered a data product if the news items presented on the homepage are dynamically selected based on browsing history data.
In 2018, Simon O'Regan offered an article under the title "Designing Data Products" in which he gave concrete examples of data products, categorising them by type: raw data, derived data, algorithms, decision support and automated decisions.
In short, a "data product" is a generic concept that encompasses any data-driven product. In contrast, "Data as a Product (DaaP)" is a mindset of treating data as a product.
Specific examples of data products
Some examples of data products are listed below to clarify the difference between "data product" and "Data as a Product (DaaP)":
- A data warehouse is a data product that is itself a mixture of raw data, derived data and also a decision support system.
- A business dashboard that visually represents the company's performance indicators and KPIs is a data product of the decision support system type and the interface to access it is a visualisation.
- A list of nearby recommended restaurants specifically created for a particular user is a data product of the automated decision support system type.
- An autonomous car can also be considered a data product. Because the car drives automatically thanks to data, it is an automated decision-making data product.
The key features of Data as a Product (DaaP)
How does the idea of "data as a product" materialise? A data as a product encompasses the code, its own data together with the associated metadata, and the infrastructure required for its execution.
Data authorities outline a number of characteristics that data and its management must meet in order to be considered "Data as a Product (DaaP)".
Data as a product (DaaP) must be:
1) Discoverable
To ensure that data as a product is easily found, it is essential to have a search engine that allows users to register datasets and request access to them when needed.
The first phase of this capability could involve simply having a list of datasets on the company's internal network, and building and enhancing it incrementally from there.
2) Addressable
Having addressable and easily findable datasets dramatically improves the productivity of the teams working with them. Analysts and data scientists gain the ability to be independent in finding and using the data they need to do their work. Also, data engineers' workflows are less interrupted by queries from third parties who want to know where they can access data linked to a specific topic.
3) Self-describing and interoperable
In a world where companies are accumulating more and more data, it is essential that datasets include metadata that provides clarity and follows uniform naming guidelines (which, in turn, promotes the interoperability of datasets).
In order for consumers of datasets to be able to find and use them appropriately for the purpose for which they were created, it is essential that datasets include descriptions with, at a minimum, the following parameters:
- Data location
- Data provenance and data mapping
- Sample data
- Execution time and freshness
- Input preconditions
- Example notebook or SQL queries using the data set
At Bismart we have a solution that self-documents Power BI datasets and enriches them with functional and business descriptions. Power BI Data Catalog encourages the proper use of data and empowers business users, regardless of their technical skills, to generate their own reports without technical assistance.
Discover Power BI Data Catalog!
4) Trustworthy and secure
At this point in time, regular and automated data quality assurance is essential to meet the expectation of reliability that data as a product should offer. In this sense, dataset managers must respond accordingly to the results obtained from these assessments.
Data quality assessments should be carried out at both the data input and data consumption stages. In addition, it is desirable to provide context about data quality to those who consume the data.
Bismart also has a solution designed to support the quality of an organisation's data. The tool evaluates, validates, documents and performs profiling on the data, ensuring an optimal level of quality.
Discover Data Quality Framework!
Finally, datasets that have been registered and whose quality has already been assessed should not be automatically accessible to everyone if data security is to be guaranteed. Instead, it is recommended that users request access individually for each dataset and that access is granted or denied by those responsible for each dataset.
Conclusion
The concept of "Data as a Product (DaaP)" is fundamental to building an enterprise data mesh. Treating data as a product implies understanding its value as reusable and exploitable elements to provide information and make strategic decisions.
Unlike data products, data as a product focuses on ensuring fundamental characteristics such as discoverability, security, addressability, understandability and reliability.