Learn how you can see and store your organization’s metadata in an S3 environment.
Learn more about the Data Catalog in the blog post, “SnapLogic November 2018 Release: Revolutionize your business with intelligent integration.”
Hi! In this video, I will walk you through the SnapLogic Data Catalog-as-a-Service. The Data Catalog, at a high level, provides a set of relational tables that contain information about how the data can be converted from a non-relationship data format to a relational data format. Additionally, the metadata describes the data and captured data related details, such as attribute names, relationships, column data types and more.
The SnapLogic Data Catalog adds a new asset type called table, which holds data location and schema information for files that reside in an external file system, such as S3. To use the Data Catalog, you will use two Snaps called the Catalog Insert and the Catalog Query Snap to interact with the metadata. These Snaps are part of the Core Snap Pack.
The Catalog Insert Snap allows you to insert metadata information into the Data Catalog so that you have all the metadata related information available for your data. The Catalog Query Snap is used to query metadata information from the Data Catalog, meaning that you can use this Snap to retrieve metadata from tables that already exist in the Data Catalog.
Now, let me show you how to use these two Snaps. First, I’ll show a pipeline that focuses on the Catalog Insert Snap and the second pipeline will highlight the Catalog Query Snap.
In this pipeline, the Adjacent Generator Snap is used to generate data about a person shown on this table.
Then by using the Parquet Writer Snap, we will write the data and its metadata as a Parquet file into an S3 bucket. Then we will use the Catalog Insert Snap to create a catalog table by specifying a location on the SnapLogic file system.
The Catalog Insert Snap can also keep the data organized for easy search by assigning partition keys.
Once the catalog table is successfully created, we can switch over to the SnapLogic Manager to review the available schema.
Now that we are on the SnapLogic Manager, we can review and search the schema based on specific fields.
Now let’s look at the pipeline with the Catalog Query Snap. As part of this Snap, you can specify the catalog table name and the path to retrieve the metadata stored in this table.
Thank you for watching this video. For more information please visit docs-snaplogic.atlassian.net