Heavy SNOW: Is Snowflake Right for Your Environment?
The hottest data warehouse technology to arrive in a long while is Snowflake.
With the company’s IPO projected to be one of the largest software IPOs on record, if not the largest, Snowflake and its achievements are garnering significant attention. Deservedly so. Over the last five years, the company has injected new life into the otherwise staid data warehouse and analytics technology industry—an area that has not seen game-changing innovation since Bill Inmon first introduced the data warehouse concept over 40 years ago.
If the company’s pre-IPO revenue (according to its S1 filing, $408M for the trailing four quarters) and growth projections (133% for the first half of 2020) hold true, Snowflake will achieve $1B in annual revenue within two years. To put this in context, none of the still-thriving Hadoop-based companies that launched in the Hadoop era (the last big buzz in the “analytics” space) over ten years ago have ever achieved one billion in revenue. Further, within a few years, the only independent database/data warehouse company likely to be bigger than Snowflake is Oracle.
If you haven’t adopted Snowflake, and you hear all the buzz about the company, you may be asking yourself if Snowflake’s technology is the right fit for your data environment.
As a data and application integration platform provider, SnapLogic offers pre-built connectors for Snowflake and for many of the popular databases. From our integration experience, here are a few macro factors to consider:
- Do you plan to implement a data warehouse solution on-premises or in the cloud? Snowflake is a cloud-only, software as a service solution. If you are seeking a cloud-based data warehouse solution, Snowflake is a potential fit.
- Do you have corporate policies against multi-tenant environments?
If not, and you want to be in the cloud, Snowflake is right up your alley. If however, you have corporate policies against multi-tenancy, Snowflake offers a single-tenant solution (Snowflake Virtual Private Snowflake (VPS)).
- What’s the nature, or the format, of the data you wish to warehouse and analyze?
Snowflake can ingest (load) and natively operate with semi-structured data such as JSON and XML, and query this data in a fully relational manner, just as easily as with structured data. This, along with the separation of computing and storage, which makes scaling easy, are their claims to fame. Note, semi-structured data size is limited to 16 MB. Parquet, Avro, or ORC data stores can also be loaded into Snowflake. The optimal maximum size for these types of files is ~1 GB (you’re encouraged to split larger files).
Unstructured data such as .pdf files, images, and audio cannot be loaded into Snowflake. You’ll require a separate storage platform for these types of files. However, if these types of files are converted to or are represented as binary files or character strings (e.g., VARCHAR) for analysis purposes, they can be loaded into Snowflake. Note, binary file sizes are limited to 8 MB, while character strings are limited to 16 MB, uncompressed for both.
- What is the write or transactional performance required for your specific use case?
Snowflake is not a transactional or OLTP database. While Snowflake may be able to handle a token amount of transactional capabilities, you’ll need to test Snowflake to determine the limit for your particular use case. For demanding write or transactional requirements, you may require a NoSQL database in front of Snowflake or a completely separate OLTP database alongside Snowflake.
- Do you require streaming data support for your data warehouse?
Snowflake supports streaming data, with a latency of about 1 minute. Test Snowflake to ensure your latency requirements can be met.
- Do you require machine learning and AI capabilities?
Snowflake does not natively offer a machine learning library. SnapLogic’s ML/AI Snap can provide this enhancement as a complement to Snowflake.
- Do your data teams prefer SQL, Python, or Java?
SQL is the native data access and query language for the Snowflake environment. Python, Java, and other scripting languages are supported via connectivity options.
If you’re new to the Snowflake concept and experience, these are the macro questions to consider and evaluate whether Snowflake, from an architecture perspective, is the right fit for your environment. From a data warehouse perspective, Snowflake excels with ease-of-use and is capable of performing extremely well, hence its popularity. For extremely large tables, or for tables that are not naturally sorted by a timestamp, you may need to resort to clustering keys to optimize performance.
For your data loading and application integration requirements, SnapLogic is a perfect complement to Snowflake. Try SnapLogic for free today.