Detecting Schema Change
Even though the SnapLogic Intelligent Integration platform can handle schema change effortlessly with schema-on-read functionality, the impact of schema changes can be big and needs to be carefully evaluated. For example, if you have a database or a data warehouse table that is based on an object or a source database, and if the source object or database schema changes, the schema change (wherever appropriate) should be replicated with your target data warehouse automatically and in a timely manner.
In this example, the goal was to create a solution that would capture the schema of a third-party controlled database on a daily basis then alert the team if the schema had changed. Using the SnapLogic Intelligent Integration Platform, you can solve this problem with two pipelines and a scheduled task. One thing to note, a vast majority of the Snaps used are core Snaps and the solution can be extended to other endpoints where schema or metadata can be read.
For the example discussed in this post, we will use SQL Server. As shown in the short pipeline below, a second output can be exposed for the SQL Server Select Snap that will pull the metadata of the target table.
To expose that metadata output of the SQL Server Select snap, open the snap properties, click the ‘Views’ tab, then click the ‘+’ sign to the right of ‘Output’. Your result should like similar to the screenshot below. (Please note that I’ve also renamed the output views to “rows” and “metadata” for clarity).
In the File Writer Snap, we make use of SnapLogic’s expression language functionality to concatenate a timestamp with a static piece of the filename. For this example, the expression used is ‘Date.now().toLocaleDateTimeString().substring(0,19) + “_original_schema.json”’ which results in a dynamic filename like “2020-03-26T16:03:00_original_schema.json”. The next two screenshots show the File Writer and an example of the output preview from the expression.
With the original metadata saved, it is time to take a look at the pipeline that will compare and notify us if it detects a change. TheDiff Snap plays the key role here in letting us know which columns fall into four buckets, each with a separate output view: deleted, inserted, modified, and unmodified. In the downstream of the Diff Snap we prep the data primarily with Mapper Snaps to build a structured email.
If you take a close look at the Diff Snap, you can see it has two input views – one for the original data to compare, which in our case is the file, and a second input for the new data, which can also be thought of as the current metadata. In our first pipeline, we stored the metadata as raw schema, which looks like the preview data shown in the screenshot below. You could compare all this data via the Diff Snap after cleaning it up, though for this example we will only be focusing on the “columns.”
Downstream to both the File Reader and SQL Server Select Snap we use JSON Splitter and Mapper Snaps to break out the columns into their own array and remove keys that are not needed for comparison. Immediately before the Diff Snap the JSON data for both new and original inputs will look something like this:
One other thing to point out is that the original path upstream of the Diff snap also has a Copy Snap which sends the cleaned up original data to a Join Snap on the other side of the Diff snap. We do that because it makes the data available for ternary expressions like
‘$[‘Column Size’] == $Original_COLUMN_SIZE ? “Unchanged (” + $[‘Column Size’] + “)” : “Changed (now ” + $[‘Column Size’] + ” was ” + $Original_COLUMN_SIZE + “)”’
in the “Map Modified” Mapper, which allows us to list the changed aspect of the metadata in the final email.
For the test table used in the demo, we deleted two columns named “isActive” and “read,” inserted one column named “newField,” and modified one column named “email” by changing the field size limit from 8,000 to 7,000. You can see an example of the email message below and how it is organized by status.