data integration patterns

Top Five Data Integration Patterns

  • By Yokesh Shankar
  • 04-04-2024
  • Big Data

Data is a valuable business asset, but it is often difficult to access, compile and interpret.

When moved between systems, that move does not always occur in a standard format. The integration means that the data does not depend on any specific model and can be easily opened and managed.

To achieve higher utilization and even speed, developers can use patterns to standardize the integration process.

These patterns, like mountain trails, are discovered and established based on use. They always present different levels of perfection but can be optimized or adopted, taking into account the business needs that require solutions. We can conceive of the software

development services use case as an execution of the pattern, that is, a use of the generic process of moving and managing data.
There are five data integration patterns based on business use cases and cloud integration patterns.

Data Integration Pattern 1: Migration

Migration is the act of transferring data from one system to another. A migration involves a source system where the data resides before execution, the criteria that determine the scope of the data to be migrated, the transformation the data set will go through, and the target system where it is migrated, as well as the ability to save the results so that the state reached with respect to the desired state can be known.

Why does migration add value?

Migration is essential for all data systems. Creating and maintaining data involves a large investment of time, and migration is key to ensuring the independence of data from the tools used for its creation, visualization and management. Without it, all the accumulated data would be lost every time the tool was changed, which would harm productivity in a digitalized world.

When is migration useful?

Migration occurs when you transfer data from one system to a different system or to a new or different instance of the original system, when you create a new system that extends the current infrastructure, when you back up a data set, when you add nodes to database clusters, when you replace database hardware, and when you consolidate systems, among other processes.

Data Integration Pattern 2: Streaming

Streaming can also be called "one-way synchronization from one system to many" and refers to the transfer of data from a single source system to multiple destination systems continuously and in real time (or near real time).

Where there is a need to keep data up-to-date across multiple systems over time, a transmission pattern, two-way synchronization, or correlation will be required. The transmission pattern, as in migration, only transfers data in one direction, from the source to the destination. The difference lies in its transactional nature.

Under this approach, message processor logic is not executed for all elements within the scope; what is executed is more of a logic only for the recently modified elements. The stream can be understood as a sliding window that only captures those elements with field values ​​that have changed since the last run of the stream.

Another important difference is how the pattern implementation is designed. Migration can be tuned to accommodate large volumes of data, process numerous records in parallel, and effectively isolate problematic items in the event of failures. Transmission patterns are optimized to process records quickly and demonstrate high reliability to avoid loss of essential data during transit.

Why does transmission add value?

The streaming pattern is especially valuable when system B needs real-time access to information that originates or resides on system A. For example, you may want to create a real-time reporting dashboard—a destination for different reporting applications. transmission that receives immediate updates on what is happening in different systems.

You may want to immediately start serving orders received through CRM systems, e-commerce or internal tools where the order processing system is centralized, regardless of the channel from which the order comes. This may involve sending a notification about the temperature of a steam turbine to the monitoring system every 100 ms. Or transmitting data to a GP's patient management system when one of their regular patients has to be admitted to the emergency room. The cases in which it is necessary to transmit data from a source system to a destination system are countless.

When is streaming useful?

The "need" for a transmission pattern can be easily identified based on the following criteria:

  • Should system B know immediately if an event occurs? - Yeah
  • Should data automatically flow from A to B without human intervention? - Yeah
  • Should system A know what happens to the object in system B? - No

The first question helps to decide whether to use the migration or streaming pattern based on whether the data should be real-time or not. Any case lasting less than approximately one hour will require a transmission pattern. However, there are always exceptions based on data volumes.

The second question typically rules out on-demand applications, and streaming patterns will typically be initiated by a push notification or scheduled job, so there will be no human intervention.

The last question asks if it is necessary to unify the two sets of data so that they are synchronized on both systems—a process known as two-way synchronization. Different needs require different integration patterns , but the streaming pattern offers much greater flexibility in connecting applications, so it is recommended that you use two streaming applications instead of one two-way synchronization application.

Integration pattern 3: Two-way synchronization

A two-way synchronization data integration pattern involves combining two sets of data in two different systems so that they act as one, respecting their existence as separate entities. This need for integration arises from having different tools or systems perform different functions on the same set of data.

For example, you may have one system to receive and manage orders and another for customer service. You may also feel that both systems are best-of-breed and important to use, rather than one package that does both and has a shared database. Using two-way synchronization to share the data set will allow you to use both systems while maintaining a consistent, real-time view of the data in them.

Why does two-way synchronization add value?

Two-way synchronization can be both a catalyst and a salvation, depending on the circumstances that justify its need. If there are two or more independent or isolated representations of the same reality, bidirectional synchronization can be used to optimize those processes.

On the other hand, two-way synchronization allows you to go from a set of products that work well together (although perhaps not the best in their specific functions) to a set of solutions that can be selected separately and integrated with each other using an integration platform. enterprise

When is two-way sync useful?

The need for an integration application through bidirectional synchronization is synonymous with wanting representations of objects of reality that are complete and coherent. For example, if a single view of the customer is desired, it can be achieved by manually granting access to all people for all systems that have a representation of the notion of a customer. However, a more efficient solution would be to communicate which fields should be visible for that client object and on which systems, as well as determine which systems own it.
Most enterprise systems have a way to extend objects that allows you to modify the data structure of client objects to include such fields. Thus, it is possible to create integration applications, whether they are point-to-point (using a common integration platform), if it is a simple solution, or a more advanced system such as the routing model based on a publish/subscribe or queue. if several systems come into play.

For example, a seller needs to know the status of a delivery but does not need to know which warehouse it is in. Likewise, the person responsible for dispatching a delivery needs to know the customer's name but not how much he paid for it. Two-way sync gives both people a real-time view of the same customer, tailored to their needs.

Data Integration Pattern 4: Correlation

The correlated data integration pattern is a design that identifies the intersection of two data sets and performs a two-way synchronization only if the item is found in both systems naturally. Similar to how bidirectional patterning synchronizes the union of the relevant data set, correlation synchronizes the intersection.

In the case of the correlation pattern, those elements that reside in both systems may have been manually created in each of them, as if two sales representatives had entered the same contact in both CRM systems. Or they may have been included as part of a different integration. With the correlation pattern, it doesn't matter where the objects come from, as it synchronizes them independently as long as they are in both systems.

Why does correlation add value?

The correlation data integration pattern is useful when you have two groups or systems that want to share data only if they both have a record that represents the same item/person in reality. Let's take the example of a hospital group that has two hospitals in the same city. You may want to have the two hospitals share data so that if a patient attends either hospital, an up-to-date history of the patient's treatment at both hospitals can be accessed.

To achieve such an integration, two transmission pattern integrations can be created: one from hospital A to hospital B and another from B to A. This ensures data synchronization. However, two integration applications will now need to be managed.

To eliminate the need to manage two different applications, you can simply use the two-way sync pattern between Hospital A and Hospital B. However, to increase efficiency, it would be advisable not to pull records from Hospital B for clients that have no relationship with hospital A and only do so with the relevant records in real time and as they are generated. The correlation pattern is useful because it only synchronizes the relevant objects bidirectionally, rather than transferring entire data sets in both directions.

When is correlation useful?

The correlation data integration pattern is most useful when having additional data is more of a cost than a benefit, as it allows unnecessary data to be discarded. Another example would be a university that is part of a larger university system and seeks to generate reports on its students.

The university does not need those students who have never studied there to appear in the reports. But it must include the subjects that these students have taken in other universities of the same university system. In this case, the correlation pattern would save a lot of work, either in integration or in reporting, because it allows you to synchronize only the information of students who have attended both universities.

Data Integration Pattern 5: Migration

Aggregation is the act of collecting or receiving data from multiple systems and inserting it into a single system. For example, a customer's data integration may reside across three systems, and a data analyst may want to generate a report with data from all of them. It would be possible to create a daily migration from each of those systems to a data repository and then query that database. However, another database would need to be monitored and synchronized.

Additionally, the repository would need to be constantly updated as changes occur in the other three systems. Another disadvantage is that the data would be one day old, so to get real-time reporting, the analyst would have to initiate migrations manually or wait another day. One option would be to configure three streaming applications so that the reporting database is always up-to-date and reflects the most recent changes occurring on each system. However, this database should continue to be maintained, which only stores replicated data so that it can be consulted regularly. Additionally, a series of API calls would need to be wasted to ensure that the database is never more than a certain number of minutes off from reality.

This is where the aggregation pattern comes into play. When creating custom software development services or using one of our templates, you will notice that you can query many systems on demand, combine the data set, and apply it to whatever purpose you see fit.

For example, you can create an integration application that queries multiple systems, combines the data, and then generates a report. In this way, you avoid having an additional database, and you can obtain the report in.csv format or another format of your choice. It is a report that can be stored in the location where reports are saved directly.

Why does aggregation provide value?

The aggregation pattern is valuable because it allows data to be extracted and processed from multiple systems into a unified application. In this way, the data is updated at the right time, is not replicated, and can be processed or combined to generate the desired data set.

When is aggregation useful?

The aggregation pattern is valuable if you are creating orchestration APIs to "modernize" legacy systems, especially an API that receives data from multiple systems and processes it into a single response. Another use case is creating reports or dashboards to pull data from multiple systems and create an experience from this data.

Recent blog

Get Listed