Technical Challenges

Technical Challenges

1. Strategy

Usually, the data being integrated resides entirely in a Data Warehouse and is integrated as part of the ETL (Extract, Transform and Load) process. In the case of syndicated data however, the volume can be so great as to make this infeasible. In this case, data is integrated (or ‘harmonized’) in real time, that is, at the time an end user makes a query. Ironbridge’s Net-Bench product has this capability. This, then is the first question to answer: is the data entirely resident, or a mixture with the external data remaining at their sources.

2. Data Acquisition

Acquiring data from its original source, the company’s transaction systems or external services requires cooperation from the data provider whether internal or external. Large scale ERP systems are notoriously difficult to extract data from, they are complex in design, require many tables and abstract the company’s data into surrogate keys that are meaningless for data recognition. This step requires specialized skills in the ERP technologies and even more specialized if external data is being brought in house. (Ironbridge has a long history of successful data extraction projects from both internal and external data sources).

3. Timing

The different data sources become available at different times of the day, week or month. This is especially true when data is from different divisions of the company. It is important to not make data available from one source before all integrated sources are ready.

4. Harmonizing

This is the most significant challenge of data integration. Harmonizing is the process of matching data items from each source. This breaks down into four separate tasks for the four common dimensions of the data: geographies, products, time and measures:


The geography dimension includes the company’s regions, districts, sales territories all the way down to individual customers. The “Global Location Number” is an international standard for all physical locations. If this is known for the company’s customer ship-to locations, this is a very useful concept for standardizing geographies across systems. When integrating different internal systems that have no geographical overlap, the only challenge is to apply consistent coding to the levels above customer. Zip code or Postal Code is a good starting point.

When different systems contain the same customer but with different coding schemes, there is no choice but to make a harmonizing relationship table. With luck, the customer name and address is spelled consistently between systems. If not, there must be manual maintenance.

The process of integrating syndicated or any external data depends entirely on the levels available in the external data. For a DSD business that has syndicated store level data available, matching the retail chain and store number is straightforward. If the syndicated data is at trading area or retail market area or metro-market, then we must know the stores that make up the area. This is available from the syndicated supplier. We then aggregate our data from customer up to these levels.

The same concept applies to a Distribution Center based business.


Harmonizing time is not always straightforward. Perhaps our company closes its week at Saturday midnight but IRI (for example) closes at Sunday midnight. Individual time-stamped transactions can always be aggregated to any time period.


Products refers to the company’s manufactured items and in the case of external data, competitors’ items. If internal systems use different SKU or Item ID schemes, then the most effective harmonizing method is to use product attributes and descriptions to match items. As for customers, there will be a need for manual matching to achieve 100% coverage.

External (syndicated) data has only the UPC/NDC/EAN for matching since this is what the retailer’s point-of-sale scanners provide. If our product system does not contain an equivalent UPC for retail items (it may be a case or pallet identifier), then we harmonize using product attributes. External data, of course, includes competitor information. When harmonizing using attributes, we must decide whether to use the syndicated provider’s coding system or our own. (Since competitors may have products outside our range, the syndicated system is usually preferred).


Generally, companies ‘equivalize’ product volumes when they are sold in different packages so that volume measures are available in ‘equivalent pounds/kilos/gallons/liters’ in addition to the unit quantity sales. This is the only reliable way to compare internal and external sales volumes. ‘Unit volume’ will be unreliable when our company counts a shipped case of 24 items a ‘unit’ whereas at the point of sale, each item counts as one. ‘Sales dollars’ from internal data are wholesale, whereas point of sale ‘sales dollars’ are retail.

Learn more about what Ironbridge Software can do for your business.