X
SnapLogic May 2020 Release is here: Dramatically improve productivity with the latest Iris AI innovations!
Learn More

Detecting Schema Change

Even though the SnapLogic Intelligent Integration platform can handle schema change effortlessly with schema-on-read functionality, the impact of schema changes can be big and needs to be carefully evaluated. For example, if you have a database or a data warehouse table that is based on an object or a source database, and if the source object or database schema changes, the schema change (wherever appropriate) should be replicated with your target data warehouse automatically and in a timely manner.

In this example, the goal was to create a solution that would capture the schema of a third-party controlled database on a daily basis then alert the team if the schema had changed. Using the SnapLogic Intelligent Integration Platform, you can solve this problem with two pipelines and a scheduled task. One thing to note, a vast majority of the Snaps used are core Snaps and the solution can be extended to other endpoints where schema or metadata can be read.

For the example discussed in this post, we will use SQL Server. As shown in the short pipeline below, a second output can be exposed for the SQL Server Select Snap that will pull the metadata of the target table.

The first pipeline writes SQL Server metadata as JSON file
Figure 1: The first pipeline writes SQL Server metadata as JSON file.

To expose that metadata output of the SQL Server Select snap, open the snap properties, click the ‘Views’ tab, then click the ‘+’ sign to the right of ‘Output’. Your result should like similar to the screenshot below. (Please note that I’ve also renamed the output views to “rows” and “metadata” for clarity).

Figure 2: Exposing the metadata view on SQL Server Select snap
Figure 2: Exposing the metadata view on SQL Server Select snap.

In the File Writer Snap, we make use of SnapLogic’s expression language functionality to concatenate a timestamp with a static piece of the filename. For this example, the expression used is ‘Date.now().toLocaleDateTimeString().substring(0,19) + “_original_schema.json”’ which results in a dynamic filename like “2020-03-26T16:03:00_original_schema.json”. The next two screenshots show the File Writer and an example of the output preview from the expression.

Figure 3: File Writer expression to generate timestamped filenames
Figure 3: File Writer expression to generate timestamped filenames.
Figure 4: Example result of filename expression
Figure 4: Example result of filename expression.

With the original metadata saved, it is time to take a look at the pipeline that will compare and notify us if it detects a change. TheDiff Snap plays the key role here in letting us know which columns fall into four buckets, each with a separate output view: deleted, inserted, modified, and unmodified. In the downstream of the Diff Snap we prep the data primarily with Mapper Snaps to build a structured email.

Figure 5: Second pipeline that identifies deleted, inserted and modified columns.
Figure 5: Second pipeline that identifies deleted, inserted and modified columns.

If you take a close look at the Diff Snap, you can see it has two input views – one for the original data to compare, which in our case is the file, and a second input for the new data, which can also be thought of as the current metadata. In our first pipeline, we stored the metadata as raw schema, which looks like the preview data shown in the screenshot below. You could compare all this data via the Diff Snap after cleaning it up, though for this example we will only be focusing on the “columns.”

Figure 6: Preview of metadata from SQL Server Select snap
Figure 6: Preview of metadata from SQL Server Select snap.

Downstream to both the File Reader and SQL Server Select Snap we use JSON Splitter and Mapper Snaps to break out the columns into their own array and remove keys that are not needed for comparison. Immediately before the Diff Snap the JSON data for both new and original inputs will look something like this:

Figure 7: Cleaned up columns array as seen just before reaching the Diff snap
Figure 7: Cleaned up columns array as seen just before reaching the Diff snap.

One other thing to point out is that the original path upstream of the Diff snap also has a Copy Snap which sends the cleaned up original data to a Join Snap on the other side of the Diff snap. We do that because it makes the data available for ternary expressions like

‘$[‘Column Size’] == $Original_COLUMN_SIZE ? “Unchanged (” + $[‘Column Size’] + “)” : “Changed (now ” + $[‘Column Size’] + ” was ” + $Original_COLUMN_SIZE + “)”’

in the “Map Modified” Mapper, which allows us to list the changed aspect of the metadata in the final email.

For the test table used in the demo, we deleted two columns named “isActive” and “read,” inserted one column named “newField,” and modified one column named “email” by changing the field size limit from 8,000 to 7,000. You can see an example of the email message below and how it is organized by status.

Figure 8: Example of the email received.
Figure 8: Example of the email received.

Try out the ‘Detecting Schema Change’ pipelines for yourself by downloading them from our community post. If you don’t have access to the SnapLogic platform, start your free trial today!

Roger Sramkoski

Roger Sramkoski is a Solutions Engineer at SnapLogic.


Subscribe to Blog Updates

Quickly connect apps, data, and devices

Start Free Trial
Contact Us Free Trial