Knowledge Center

What is Data Standardization?

Data Standardization is a data processing workflow that converts the structure of disparate datasets into a Common Data Format. As part of the Data Preparation field, Data Standardization deals with the transformation of datasets after the data is pulled from source systems and before it's loaded into target systems. Because of that, Data Standardization can also be thought of as the transformation rules engine in Data Exchange operations.

Data Standardization enables the data consumer to analyze and use data in a consistent manner. Typically, when data is created and stored in the source system, it's structured in a particular way that is often unknown to the data consumer. Moreover, datasets that might be semantically related may be stored and represented differently, thereby making it difficult for a data consumer to aggregate or compare the datasets.

Data Standardization Use Cases

There are two main use case categories in Data Standardization: Source-to-Target Mapping, and Complex Reconciliation. We typically divide the former into two sub-categories thereby arriving at three use cases:

  • Simple mapping from external sources: This use case handles on-boarding data from systems that are external to the organization, and mapping its keys and values to an output schema.
  • Simple mapping from internal sources: This use case involves handling internal datasets that are based on inconsistent definitions and transforming them into a single trustworthy data set for the entire organization.
  • Complex reconciliation: This use case involves the creation of complex calculated metrics that provide their own semantics based on defined business logic.

Data Standardization Examples

Below are listed a few Data Exchange scenarios that require Data Standardization:

  • Consumer Package Goods brands sell their products through a retail channel. In support of that, brand exchange product sales and inventory data with the retailers. These exchanges involve standardization of inconsistencies in data formats, schemas, and values.
  • Travel and hospitality aggregators receive property descriptions and availability data from their airline, car rental, and hotel chain partners. Each data provider may have its own data schema and structure that must be standardized before it can be used.
  • Holding companies with independent subsidiaries, franchisees, business units, global offices, and external partners receive inconsistent financial data that again must be standardized before it's used.use Lore IO to map and standardize subsidiary sales and finance information to the corporate’s model.