In a word — yes. But before delving further into this, let’s first address a few questions.
What is Data Virtualization?
Data virtualization is a set of processes that enable the business to access, use and trust its siloed datasets more fully. It logically brings together datasets in different formats from disparate data warehouses, data marts and data lakes and makes them readily available to the organization without copying the data to a new physical layer.
Data Virtualization does not typically pose a threat to existing data platforms; it neither seeks to eliminate existing infrastructure resources, nor to impose a data model. Rather, existing resources continue to operate as before while maintaining their own specific data formats and schemas. Data Virtualization merely taps their data and makes it more easily accessed and consumed.
What are the benefits of Data Virtualization?
Data virtualization enables data consumers to access a wider array of datasets in a consistent and predictable manner. Data consumers are freed from spending time searching, navigating, learning and using the disparate systems that hold their data. Data consumers do not need to spend time to understand how data is defined and handled in the source system, or where it is physically stored.
Similarly, data consumers can collaborate on the integrated data and jointly construct the organization’s single source of truth. Users can understand how data is defined and derived, and gain more confidence in the data.
Data virtualization accelerates time-to-value on business data. With data virtualization, users can manipulate data artifacts on their own without relying on technical staff to design and implement the necessary data pipelines. This enables the business to respond faster to new data needs.
Data virtualization enables data stewards to increase control over access to and use of sensitive data. It offers data owners the ability to ensure that compliance and enforcement rules defined in the source systems are adhered to in the output data.
Once data is virtualized and formally published in a single location, businesses experience fewer data-related errors, since the virtualized data can be more consistently studied and understood. Data linegate enables users to understand how virtual data is derived. Similarly, since data isn’t copied, there is less risk of having users access out-of-date data.
What are some of the capabilities of Data Virtualization?
Typically, a data virtualization solution includes the following capabilities:
- Data connectors: Connect the data virtualization solution to various types of source systems such as relational sources, Hadoop sources, marketing automation systems, web services, CRM, and more.
- Data catalog: Search, browse, tag, categorize, comment, profile and share data elements.
- Data drill-down: Examine data elements, identify specific entities, connect data across sources, build and display ERDs and data models.
- Data transformation: Model the data; design virtual data columns and tables; join, transform and reshape the data.
- Data quality: Provide the necessary services to ensure that data consumers access the most accurate and comprehensive data possible.
- Data security: Provide the necessary services to authenticate and authorize users, and protect data in compliance with existing security benchmarks of source systems.
- Data governance: Provide the necessary services to offer lineage, track usage, log activities, offer traceability and control over data
- Metadata management: Control all relevant metadata including data columns, tables and views.
- Query optimization: Build smart queries based on data needs and system performance to pull data from source systems as fast as possible.
- Caching: Locally persist some data to accelerate query performance further.
- Scaling: Spread queries over nodes and clusters as needed to accelerate performance while minimizing costs.
How does Lore IO use Data Virtualization?
Lore IO relies on some aspects of data virtualization mentioned above to enable its proprietary Collaborative Transformations. While Lore IO caches some data for performance gains, most transformed data isn’t persisted. Typically, virtual columns, tables, views and data transformations are declared with metadata definitions. Lore IO converts the definitions to actual data pipelines automatically as needed.
To populate the virtual data layer, Lore IO customers do not need to worry about how to join the disparate datasets. SQL analysts explore the catalog and identify the datasets they wish to join using pre-built functions. Once these are declared, Lore IO automatically determines the best way to join the data. Advanced users can define their own custom expressions to join the data.
Lore IO's Data Virtualization features empower collaboration within the context of managing and transforming data in several ways:
- Single-source of truth, but many schemas: Individual contributors can experiment on new versions of the schema without disrupting the Production version. Likewise, multiple versions of a schema can be customized and maintained for various consumers.
- Collaboratively define transformations: Virtualization of the schema allows a degree of flexibility that is otherwise not possible. A virtual schema may be constructed using partially defined transformations. In this way, individual contributors can complete the parts that they understand, or are responsible for, while delegating other parts to their teammates. This enables distribution of effort as well as communication of intent.
- Quickly iterate on transformation rules: Requirements for data projects as well as understanding of the underlying data can rapidly evolve within an organization. A virtualization framework allows transformation rules to be made quickly without having to overcome data-gravity since only metadata is manipulated. Moreover, these changes are self-documented (schema-lineage) and tracked (change-log).
In summary, Lore IO does rely on data virtualization, but as part of the bigger construct of Collaborative Data Transformations that do much more than merely virtualize the data.