In our previous blog post on how to build a data warehouse in 6-8 weeks, we showed you how to get lightning-fast results and effectively create a working data warehouse with Snowflake. Future state integrations and governance needs are coming, though. This is why 2nd Watch highly recommends executing a data strategy and governance project in parallel with your Snowflake proof-of-concept. Knowing how to leverage Snowflake’s strengths to avoid common pitfalls will save you time, money, and re-work.
Consider one company that spent a year using the data discovery layer-only approach. With data sources all centralized in the data warehouse and all transformations occurring at run-time in the BI tool, the data team was able to deliver a full analytical platform to its users in less time than ever before. Users were happy, at first, until the logic became more mature and more complex and ultimately required more compute power (translating to higher cost) to keep the same performance expectations. For some, however, this might not be a problem but an expected outcome.
For this company, enabling analytics and reporting was the only need for the first year, but integration of data across applications was coming full steam ahead. The primary line of business applications needed to get near-real-time updates from the others. For example, marketing automation didn’t rely 100% on humans; it needed data to execute its rules, from creating ad campaigns to sending email blasts based on events occurring in other systems.
This one use case poked a big hole in the architecture – you can’t just have a data warehouse in your enterprise data platform. There’s more to it. Even if it’s years away, you need to effectively plan for it or you’ll end up in a similar, costly scenario. That starts with data strategy and governance.
ETL vs. ELT in Snowflake
Identify where your transformations occur and how they impact your downstream systems.
The new paradigm is that you no longer need ETL (Extract, Transform, Load) – you need ELT (Extract, Load, Transform). This is true, but sometimes misleading. Some will interpret ELT as no longer needing to build and manage the expensive pipelines and business logic that delay speed-to-insight, are costly to maintain, and require constant upkeep for changing business rules. In effect, it’s interpreted as removing the “T” and letting Snowflake solve for this. Unfortunately, someone has to write the code and business logic, and it’s best to not have your business users trying to do this when they’re better served working on your organization’s core goals.
In reality, you are not removing the “T” – you are moving it to a highly scalable and performant database after the data has been loaded. This is still going to require someone to understand how your customer data in Salesforce ties to a customer in Google Analytics that corresponds to a sale in your ERP. You still need someone who knows both the data structures and the business rules. Unfortunately, the “T” will always need a place to go – you just need to find the right place.
Ensure your business logic is defined only once in the entire flow. If you’ve written complex transformation code to define what “customer” means, when that business logic inevitably changes, you’ll be guaranteed that this definition of “customer” will flow the same way to your BI users as it does to your ERP and CRM. When data science and machine learning enter the mix, you’ll also avoid time spent in data prep and instead focus on delivering predictive insights.
You might be thinking that this all sounds even more similar to the data warehouse you’ve already built and are trying to replace. There’s some good news: Snowflake does make this easier, and ELT is still exactly the right approach.
Defining and Adjusting the Business Logic and Views
Snowflake enables an iterative process of data discovery, proof-of-concept, business value, and long-term implementation.
Perhaps you’ve defined a sales hierarchy and a salesperson compensation metric. The developer can take that logic, put it into SQL against the raw data, and refresh the dashboard, all while the business user is sitting next to them. Is the metric not quite what the user expected, or is the hierarchy missing something they hadn’t thought of in advance? Tweak the SQL in Snowflake and refresh. Iterate like this until the user is happy and signs off, excited to start using the new dashboard in their daily routine.
By confirming the business logic in the salesperson compensation example above, you’ve removed a major part of what made ETL so painful in the past: developing, waiting for a load to finish, and showing business users. That gap between load finishing and the next development cycle is a considerable amount of lost time and money. With this approach, however, you’ve confirmed the business logic is correct and you have the SQL already written in Snowflake’s data discovery views.
Developing your initial logic in views in Snowflake’s data discovery layer allows you to validate and “certify” it for implementation into the physical model. When you’ve completed the physical path, you can change the BI tool for each completed subject area to point to the physical layer instead of the data discovery layer.
If you have any questions about data strategy and governance, or if you want to learn more about how Snowflake can fit into your organization, contact us today.
This blog originally appeared as a section of our eBook, “Snowflake Deployment Best Practices: A CTO’s Guide to a Modern Data Platform.” Click here to download the full eBook.
What is Snowflake, How is it Different, and Where Does it Fit in Your Ecosystem?
How to Build a Data Warehouse in 6-8 Weeks
Methods to Implement a Snowflake Project