What Is the Difference Between Snowflake and Amazon Redshift?

The modern business world is data-centric. As more businesses turn to cloud computing, they must evaluate and choose the right data warehouse to support their digital modernization efforts and business outcomes. Data warehouses can increase the bottom line, improve analytics, enhance the customer experience, and optimize decision-making. 

A data warehouse is a large repository of data businesses utilize for deep analytical insights and business intelligence. This data is collected from multiple data sources. A high-performing data warehouse can collect data from different operational databases and apply a uniform format for better analysis and quicker insights.

Two of the most popular data warehouse solutions are Snowflake and Amazon Web Services (AWS) Redshift. Let’s look at how these two data warehouses stack up against one another. 

What is Snowflake?

Snowflake is a cloud-based data warehousing solution that uses third-party cloud-compute resources, such as Azure, Google Cloud Platform, or Amazon Web Services (AWS.) It is designed to provide users with a fully managed, cloud-native database solution that can scale up or down as needed for different workloads. Snowflake separates compute from storage: a non-traditional approach to data warehousing. With this method, data remains in a central repository while compute instances are managed, sized, and scaled independently. 

Snowflake is a good choice for companies that are conscious about their operational overhead and need to quickly deploy applications into production without worrying about managing hardware or software. It is also the ideal platform to use when query loads are lighter, and the workload requires frequent scaling. 

The benefits of Snowflake include:

  • Easy integration with most components of data ecosystems
  • Minimal operational overhead: companies are not responsible for installing, configuring, or managing the underlying warehouse platform
  • Simple setup and use
  • Abstracted configuration for storage and compute instances
  • Robust and intuitive SQL interface

What is Amazon Redshift?

Amazon Redshift is an enterprise data warehouse built on Amazon Web Services (AWS). It provides organizations with a scalable, secure, and cost-effective way to store and analyze large amounts of data in the cloud. Its cloud-based compute nodes enable businesses to perform large-scale data analysis and storage. 

Amazon Redshift is ideal for enterprises that require quick query outputs on large data sets. Additionally, Redshift has several options for efficiently managing its clusters using AWS CLI/Amazon Redshift Console, Amazon Redshift Query API, and AWS Software Development Kit. Redshift is a great solution for companies already using AWS services and running applications with a high query load. 

The benefits of Amazon Redshift include:

  • Seamless integration with the AWS ecosystem
  • Multiple data output formatting support
  • Easy console to extract analytics and run queries
  • Customizable data and security models

Comparing Data Warehouse Solutions

Snowflake and Amazon Redshift both offer impressive performance capabilities, like scalability across multiple servers and high availability with minimal downtime. There are some differences between the two that will determine which one is the best fit for your business.

Performance

Both data warehouse solutions harness massively parallel processing (MPP) and columnar storage, which enables advanced analytics and efficiency on massive jobs. Snowflake boasts a unique architecture that supports structured and semi-structured data. Storage, computing, and cloud services are abstracted to optimize independent performance. Redshift recently unveiled concurrency scaling coupled with machine learning to compete with Snowflake’s concurrency scaling. 

Maintenance

Snowflake is a pure SaaS platform that doesn’t require any maintenance. All software and hardware maintenance is handled by Snowflake. Amazon Redshift’s clusters require manual maintenance from the user.

Data and Security Customization

Snowflake supports fewer customization choices in data and security. Snowflake’s security utilizes always-on encryption enforcing strict security checks. Redshift supports data flexibility via partitioning and distribution. Additionally, Redshift allows you to tailor its end-to-end encryption and set up your own identity management system to manage user authentication and authorization.

Pricing

Both platforms offer on-demand pricing but are packaged differently. Snowflake doesn’t bundle usage and storage in its pricing structure and treats them as separate entities. Redshift bundles the two in its pricing. Snowflake tiers its pricing based on what features you need. Your company can select a tier that best fits your feature needs. Redshift rewards businesses with discounts when they commit to longer-term contracts. 

Which data warehouse is best for my business?

To determine the best fit for your business, ask yourself the following questions in these specific areas:

  • Do I want to bundle my features? Snowflake splits compute and storage, and its tiered pricing provides more flexibility to your business to purchase only the features you require. Redshift bundles compute and storage to unlock the immediate potential to scale for enterprise data warehouses. 
  • Do I want a customizable security model? Snowflake grants security and compliance options geared toward each tier, so your company’s level of protection is relevant to your data strategy. Redshift provides fully customizable encryption solutions, so you can build a highly tailored security model. 
  • Do I need JSON storage? Snowflake’s JSON storage support wins over Redshift’s support. With Snowflake, you can store and query JSON with native functions. With Redshift, JSON is split into strings, making it difficult to query and work with. 
  • Do I need more automation? Snowflake automates issues like data vacuuming and compression. Redshift requires hands-on maintenance for these sorts of tasks. 

Conclusion

A data warehouse is necessary to stay competitive in the modern business world. The two major data warehouse players – Snowflake and Amazon Redshift – are both best-in-class solutions. One product is not superior to the other, so choosing the right one for your business means identifying the one best for your data strategy.

2nd Watch is an AWS Certified Partner and an Elite Snowflake Consulting Partner. We can help you choose the right data warehouse solution for you and support your business regardless of which data warehouse your choose.

We have been recognized by AWS as a Premier Partner since 2012, as well as an audited and approved Managed Service Provider and Data and Analytics Competency partner for our outstanding customer experiences, depth and breadth of our products and services, and our ability to scale to meet customer demand. Our engineers and architects are 100% certified on AWS, holding more than 200 AWS certifications.

Our full team of certified SnowPros has proven expertise to help businesses implement modern data solutions using Snowflake. From creating a simple proof of concept to developing an enterprise data warehouse to customized Snowflake training programs, 2nd Watch will help you to utilize Snowflake’s powerful cloud-based data warehouse for all of your data needs.

Contact 2nd Watch today to help you choose the right data warehouse for your business!


A CTO’s Guide to a Modern Data Platform: Data Strategy and Governance

In our previous blog post on how to build a data warehouse in 6-8 weeks, we showed you how to get lightning-fast results and effectively create a working data warehouse with Snowflake. Future state integrations and governance needs are coming, though. This is why 2nd Watch highly recommends executing a data strategy and governance project in parallel with your Snowflake proof-of-concept. Knowing how to leverage Snowflake’s strengths to avoid common pitfalls will save you time, money, and re-work.

Consider one company that spent a year using the data discovery layer-only approach. With data sources all centralized in the data warehouse and all transformations occurring at run-time in the BI tool, the data team was able to deliver a full analytical platform to its users in less time than ever before. Users were happy, at first, until the logic became more mature and more complex and ultimately required more compute power (translating to higher cost) to keep the same performance expectations. For some, however, this might not be a problem but an expected outcome.

For this company, enabling analytics and reporting was the only need for the first year, but integration of data across applications was coming full steam ahead. The primary line of business applications needed to get near-real-time updates from the others. For example, marketing automation didn’t rely 100% on humans; it needed data to execute its rules, from creating ad campaigns to sending email blasts based on events occurring in other systems.

This one use case poked a big hole in the architecture – you can’t just have a data warehouse in your enterprise data platform. There’s more to it. Even if it’s years away, you need to effectively plan for it or you’ll end up in a similar, costly scenario. That starts with data strategy and governance.

ETL vs. ELT in Snowflake

Identify where your transformations occur and how they impact your downstream systems.

The new paradigm is that you no longer need ETL (Extract, Transform, Load) – you need ELT (Extract, Load, Transform). This is true, but sometimes misleading. Some will interpret ELT as no longer needing to build and manage the expensive pipelines and business logic that delay speed-to-insight, are costly to maintain, and require constant upkeep for changing business rules. In effect, it’s interpreted as removing the “T” and letting Snowflake solve for this. Unfortunately, someone has to write the code and business logic, and it’s best to not have your business users trying to do this when they’re better served working on your organization’s core goals.

In reality, you are not removing the “T” – you are moving it to a highly scalable and performant database after the data has been loaded. This is still going to require someone to understand how your customer data in Salesforce ties to a customer in Google Analytics that corresponds to a sale in your ERP. You still need someone who knows both the data structures and the business rules. Unfortunately, the “T” will always need a place to go – you just need to find the right place.

Ensure your business logic is defined only once in the entire flow. If you’ve written complex transformation code to define what “customer” means, when that business logic inevitably changes, you’ll be guaranteed that this definition of “customer” will flow the same way to your BI users as it does to your ERP and CRM. When data science and machine learning enter the mix, you’ll also avoid time spent in data prep and instead focus on delivering predictive insights.

You might be thinking that this all sounds even more similar to the data warehouse you’ve already built and are trying to replace. There’s some good news: Snowflake does make this easier, and ELT is still exactly the right approach.

Defining and Adjusting the Business Logic and Views

Snowflake enables an iterative process of data discovery, proof-of-concept, business value, and long-term implementation.

Perhaps you’ve defined a sales hierarchy and a salesperson compensation metric. The developer can take that logic, put it into SQL against the raw data, and refresh the dashboard, all while the business user is sitting next to them. Is the metric not quite what the user expected, or is the hierarchy missing something they hadn’t thought of in advance? Tweak the SQL in Snowflake and refresh. Iterate like this until the user is happy and signs off, excited to start using the new dashboard in their daily routine.

By confirming the business logic in the salesperson compensation example above, you’ve removed a major part of what made ETL so painful in the past: developing, waiting for a load to finish, and showing business users. That gap between load finishing and the next development cycle is a considerable amount of lost time and money. With this approach, however, you’ve confirmed the business logic is correct and you have the SQL already written in Snowflake’s data discovery views.

Developing your initial logic in views in Snowflake’s data discovery layer allows you to validate and “certify” it for implementation into the physical model. When you’ve completed the physical path, you can change the BI tool for each completed subject area to point to the physical layer instead of the data discovery layer.

If you have any questions about data strategy and governance, or if you want to learn more about how Snowflake can fit into your organization, contact us today.

This blog originally appeared as a section of our eBook, “Snowflake Deployment Best Practices: A CTO’s Guide to a Modern Data Platform.” Click here to download the full eBook.

Related Content:

What is Snowflake, How is it Different, and Where Does it Fit in Your Ecosystem?

How to Build a Data Warehouse in 6-8 Weeks

Methods to Implement a Snowflake Project


HVR, Riversand, ALTR, and Sisu | Snowflake-Adjacent Technologies You Need to Know About

Snowflake is a powerhouse in the data world, but it can become even more powerful when paired with other technologies that suit your organization’s needs. 2nd Watch is a tech-agnostic firm, meaning we only recommend tools we truly believe are the best for your business. Because Snowflake partners with such a wide range of technologies, we have lots of options to create a tech stack uniquely suited to you.

Among the many tools that work in conjunction with Snowflake, the following Snowflake-adjacent technologies enhance Snowflake in different ways and are all noteworthy in their own right.

HVR (a Fivetran company)

HVR, which became part of Fivetran as of late 2021, is a cloud data replication tool that natively supports Snowflake. It executes real-time data replication by reading directly from database transaction logs, which allows for high performance and low impact on database servers.

As one of the earliest technologies to join the Snowflake Partner Network, HVR understands how analytics efforts can be impacted by real-time data streaming to Snowflake. Using HVR to offer immediate access to your data in Snowflake’s Data Cloud, you can make timely decisions based on current, accurate data.

Riversand

Riversand is a master data management (MDM) tool that “goldenizes” your data. In other words, it creates one current, complete, and accurate record.

After connecting your organization with the right MDM tool for your situation, like Riversand, 2nd Watch would oversee the MDM implementation and provide guidance on next steps. This includes sharing insights on best-in-class data warehouses, like Snowflake’s Data Cloud. Together, Riversand and Snowflake can prepare you for downstream analytics and long-term data governance.

ALTR

ALTR is a cloud-native Data Security as a Service (DSaaS) platform that helps companies optimize data consumption governance. With ALTR, you’re able to track data consumption patterns and limit how much data can be consumed at various levels across your organization.

ALTR and Snowflake work together in two ways:

  1. ALTR seamlessly integrates with Snowflake so you can view your securely shared data consumption information.
  2. With data security at the SQL layer, ALTR improves upon Snowflake’s built-in security to stop and prevent threats within Snowflake.

Sisu

Sisu accelerates and expands your analytics capabilities. Using Sisu, business users are able to self-serve and find the answers to increasingly complex questions.

Sisu connects directly to Snowflake’s Data Cloud, so you always have access to your most up-to-date data for comprehensive, actionable analytics. Sisu and Snowflake partner to empower you to make quick, decisive judgments.

These four technologies are just a taste of the many tools that work with Snowflake to improve your organization’s data and analytics capabilities. To discuss your specific data and analytics needs – and to determine the tools Snowflake partners with that will help you meet those needs – contact 2nd Watch today.


A High-Level Overview of Looker: An Excerpt from Our BI Tool Comparison Guide

Looker is one of several leading business intelligence (BI tools) that can help your organization harness the power of your data and glean impactful insights that allow you to make the best decisions for your business.

Keep reading for a high-level overview of Looker’s key features, pros and cons of Looker versus competitors, and a list of tools and technologies that easily integrate with Looker to augment your reporting.

Overview of Looker

Looker is a powerful BI tool that can help a business develop insightful visualizations. Among other benefits, users can create interactive and dynamic dashboards, schedule and automate the distribution of reports, set custom parameters to receive alerts, and utilize embedded analytics.

Why Use Looker

If you’re looking for a single source of truth, customized visuals, collaborative dashboards, and top-of-the-line customer support, Looker might be the best BI platform for you. Being fully browser-based cuts down on confusion as your team gets going, and customized pricing means you get exactly what you need.

Pros of Looker

  • Looker offers performant and scalable analytics on a near-real-time basis.
  • Because you need to define logic before creating visuals, it enforces a single-source-of-truth semantic layer.
  • Looker is completely browser-based, eliminating the need for desktop software.
  • It facilitates dashboard collaboration, allowing parallel development and publishing with out-of-the-box git integration.

Cons of Looker

  • Looker can be more expensive than competitors like Microsoft Power BI; so while adding Looker to an existing BI ecosystem can be beneficial, you will need to take costs into consideration.
  • Compared to Tableau, visuals aren’t as elegant and the platform isn’t as intuitive.
  • Coding in LookML is unavoidable, which may present a roadblock for report developers who have minimal experience with SQL.

Select Complementary Tools and Technologies for Looker

  • Any SQL database
  • Amazon Redshift
  • AWS
  • Azure
  • Fivetran
  • Google Cloud
  • Snowflake

Was this high-level overview of Looker helpful? If you’d like to learn more about Looker reporting or discuss how other leading BI tools, like Tableau and Power BI, may best fit your organization, contact us to learn more.

The content of this blog is an excerpt of our Business Intelligence Tool Comparison Guide. Click here to download a copy of the guide.


How Snowflake Fits Into Your Existing Azure Environment

To get your source data ingested and loaded, or for a deep dive into how you can build a fully automated data integration process in Snowflake on Azure, schedule a Snowflake whiteboarding session with our team of data architects.

If you have a Microsoft ecosystem but have been wanting to take advantage of more tools on the market, Snowflake on Azure means additional opportunities for you to upgrade your analytics platform while not throwing away the investment in and keeping the uniformity of your current environment.

For those who are not familiar, Snowflake is a cloud-based, massively parallel processing (MPP), columnar storage database. It’s a newer option for data warehousing that is set up for more efficient querying of large data volumes. It also consumes structured and semi-structured data in a way that traditional relational databases are not designed to do as effectively – think “big data” without the “big overhead.”

With this release, Microsoft companies can evaluate using Snowflake to increase the performance and flexibility of their analytics environment by adding it on top of their existing data integration process. To determine where Snowflake might be a good fit, our 2nd Watch consultants took a dive into where Snowflake could sit in our current and prospective Microsoft clients’ ecosystems.

Where does Snowflake fit in your current Azure environment?

Snowflake is best used as the home of an analytical layer (or dimensional model, for the more technical) that enables reporting. Think of this as the substitute for products like SQL Server Analysis Services (SSAS).

While we still recommend that you maintain a data integration hub and related process for all of your consuming application needs (i.e., sending consolidated and cleansed data back to each source system to keep them in sync), Snowflake can sit right at the end of that process. Because it’s optimized for read activities, it complements the needs of business intelligence tools like Power BI, Looker, Tableau, etc., making it faster for business users to grab the information they need via those tools. Integration and ETL are possible with many tools and services, including Azure Data Factory.

Example architecture for adding Snowflake to the end of your data integration process 

When would you want to use Snowflake?

There are two primary ideas behind Snowflake’s competitive advantage when it comes to data warehousing platforms: its automatic optimization of query execution and the hands-off nature of its maintenance.

We recommend Snowflake for two main use cases:

  1. Developing a more efficient analytics platform
  2. Creating a platform for flexible data discovery

Our clients that fit within the above use cases usually had:

  • Multiple business users – The more people you have querying your database at one time, the more the database has to be configured to handle that load so as to not lock up other processes. Traditional databases can be scaled up to handle larger reads (think of a query that produces data), but this takes a decent amount of time and effort to achieve, and they are more often optimized for writes (think of loading data into a table). In Snowflake, a user can spin up resources for just the one query, then spin it back down right after. This allows for a more modular use of higher power resources.
  • Lots of data – If you’ve ever tried to perform a huge query on your current system, you likely noticed a slowdown from your usual processing. Traditional databases are not as optimized for read activities as columnar databases are. This makes options like Snowflake more attractive to those performing heavier analytical queries on a regular basis.

“Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform row-based systems. ” –  MIT Computer Science and Artificial Intelligence Laboratory

  • A mix of structured and semi-structured data – Though many traditional databases offer options for consuming semi-structured data (e.g., JSON, XML, etc.), they aren’t optimized to do so. If you have a mix of structured and semi-structured data, options like Snowflake might be more efficient for your process.

What’s the tradeoff?

Snowflake can be a great option for clients who need a true analytical platform where they can perform data discovery or need to read out a lot of data at once. That said, the following are situations where an alternative to Snowflake might make more sense:

  • You need a data hub to keep your various applications in sync (application integration and API endpoints).
  • You aren’t trying to perform complex aggregations or data discovery.
  • You’re invested in an existing solution that doesn’t have many performance or management overhead concerns.

So what’s the gist?

Snowflake makes it easy to kick off a data warehouse project with its ease of use and start-small, scale-big approach. Despite this innovative architecture, we still recommend applying the same data warehousing fundamentals that you would have in the past.

Yes, Snowflake will allow you to create views on top of messy raw source data, but you will ultimately lose the battle on performance as your data grows in complexity. We suggest approaching a new Snowflake deployment with a vision of your data platform as a whole – not just the analytics layer. Performing all your complex business logic in Snowflake is possible, but due to the columnar architecture, integration use cases are better served in complementary cloud database platforms, such as Azure SQL.

With a roadmap and strategy in place for both your integration and analytics needs, you’ll be better able to select the right mix of tools and technologies for your solution. Learn more about Snowflake deployment best practices in our eBook.


A CTO’s Guide to a Modern Data Platform: How to Build a Data Warehouse in 6-8 Weeks

It’s true that in a very short period of time, you can get an operational data warehouse loaded with all your source data. Wondering how to build a data warehouse that yields these results? Snowflake and its technology partner ecosystem, such as partners like Fivetran, allow you to take your databases and SaaS applications, replicate them to Snowflake, and see results in a significantly shorter timeframe than we’ve seen before. Write some SQL views in Snowflake against this data, point your favorite BI tool at it, and get lightning-fast results.

With the right plan in place, you can (aggressively) deliver that first business “wow” in six to eight weeks. 2nd Watch typically recommends at least two analytical virtual warehouses in enterprise deployments: one for data discovery and one for more structured and governed reporting.

The practice of having both a data discovery layer and a governed layer for your Snowflake deployment not only saves you time in the initial build, but it creates a continuous integration and deployment pattern. It finally makes a dent in the notion that a data warehouse cannot be “agile.”

Future State Snowflake Data Architecture

Future State Snowflake Data Architecture

With this approach, you not only achieve governance and speed-to-insight, but you’ve also cut down your Snowflake consumption costs. Running complex queries at run-time, every time, can get expensive. Snowflake’s caching can help here, but if you’re constantly running a complex join across over 20 tables, that physical layer might help you move from an XL-sized Snowflake warehouse to a L or M. In the long run, those cost savings will add up. When “best practice” or risk avoidance isn’t enough to justify this approach, it’s possible the dollar savings might speak for themselves.

The initial six- to eight-week path assumes a focus on the data discovery layer, as depicted below. Loading all your data into a data discovery layer should be the primary development activity in your Snowflake proof of concept (POC) or pilot. Here are some tips:

  • Find several source systems that have the data your stakeholders need.
  • Begin the process of rapidly loading into your Snowflake data discovery layer.
  • Write iterative SQL in Snowflake views to build your business logic.
  • Connect your BI tool to Snowflake and build a very simple dashboard.
  • Get feedback from business, with the new dashboard as a facilitation tool.
  • Repeat this process – more feedback is better.

You’ve not only increased the speed to market, but you’ve also enabled developers and business users to execute an analytics project in a completely new way.

Highlighted: Speed-to-insight approach, with transformations in Snowflake

This all comes with a caveat: Yes, you can write all your logic in Snowflake and do everything you need from an analytics standpoint, but it will limit you in the long run. Every data project has sacrifices to be made, whether it’s time, features, cost, or flexibility. You need to balance these with a long-term vision for how your data will be used across the organization. Snowflake will get you part of the way there, but a good plan and strategy will take you even further.

In the meantime, if you have any questions or want to learn more about how to build a data warehouse using Snowflake, contact us today.


A High-Level Overview of Snowflake

Using a modern data warehouse, like Snowflake, can give your organization improved access to your data and dramatically improved analytics. When paired with a BI tool, like Tableau, or a data science platform, like Dataiku, you can gain even faster access to impactful insights that help your organization fuel innovation and drive business decisions.

In this post, we’ll provide a high-level overview of Snowflake, including a description of the tool, why you should use it, pros and cons, and complementary tools and technologies.

Overview of Snowflake

Snowflake was built from the ground up for the cloud, initially starting on AWS and scaling to Azure and GCP. With no servers to manage and near-unlimited scale in compute, Snowflake separates compute from storage and charges based on the size and length of time that compute clusters (known as “virtual warehouses”) are running queries.

Value Prop:

  • Cross cloud lets organizations choose the cloud provider to use
  • Dynamic compute scaling saves on cost
  • Micro-partitioned storage with automatic maintenance

Scalability:

  • Rapid auto-scaling of compute nodes allows for increased cost savings and high concurrency on demand, and compute and storage are separated

Performance:

  • Built for MPP (massive parallel processing)
  • Optimized for read via a columnar backend
  • Dedicated compute means no concurrency issues

Features:

  • Ability to assign dedicated compute
  • High visibility into spend
  • Native support for JSON, XML, Avro, Parquet, and ORC semi-structured data formats
  • SnowSQL has slight syntax differences
  • Introduction of Snowpark for Snowflake native development

Security:

  • Full visibility into queries executed, by whom, and how long they ran
  • Precision point-in-time restore available via “time-travel” feature

Why Use Snowflake

Decoupled from cloud vendors, it allows a true multi-cloud experience. You can deploy on Azure, AWS, GCP, or any combination of those cloud services. With near-unlimited scale and minimal management, it offers a best-in-class data platform but with a pay-for-what-you-use consumption model.

Pros of Snowflake

  • Allows for a multi-cloud experience built on top of existing AWS, Azure, or GCP resources, depending on your preferred platform
  • Highly-performant queries utilizing uniquely provisioned pay-as-you-go compute and automatically derived partitioning
  • Easy implementation of security and role definitions for less frustrating user experience and easier delineation of cost while keeping data secure
  • Integrated ability to share data to partners or other consumers outside of an organization and supplement data with publicly available datasets within Snowflake

Cons of Snowflake

  • Ecosystem of tooling continues to grow as adoption expands, but some features are not readily available
  • Due to the paradigm shift in a cloud-born architecture, taking full advantage of Snowflake’s advanced features requires a good understanding of cloud data architecture

Select Complementary Tools and Technologies for Snowflake

  • Apache Kafka
  • AWS Lambda
  • Azure Data Factory
  • Dataiku
  • Power BI
  • Tableau

We hope you found this high-level overview of Snowflake helpful. If you’re interested in learning more about Snowflake or other modern data warehouse tools like Amazon Redshift, Azure Synapse, and Google BigQuery, contact us to learn more.

The content of this blog is an excerpt of our Modern Data Warehouse Comparison Guide. Click here to download a copy of that guide.


A CTO’s Guide to a Modern Data Platform: Methods to Implement a Snowflake Project

Now that you’re prepared to build your modern data platform with Snowflake, you need to determine the best path to get there. Here are two recommended methods to implement a Snowflake project that will get you to the same end state:

  1. End-to-End Path: Build your end-to-end solution, from raw layer to data hub to data warehouse.
  2. Analytics-First Path: Build your Snowflake data warehouse and analytics environment first, and work your way back in future phases.

This blog post originally appeared as a section of our eBook, “Snowflake Deployment Best Practices: A CTO’s Guide to a Modern Data Platform.” Click here to download the full eBook.

End-to-End Path

The end-to-end path will be a higher initial investment, but you will enable both integration and analytics by the time you are finished. When you know what you want and have a fairly stable business environment, this path will ensure you avoid accruing technical debt by continuously putting off work that needs to be done.

Identify a key subject area, such as financial reporting, and bring in only that data throughout the data flow. By the time you finish your first phase, you will have enabled financial analytics, integration of financial data into other applications, and a full end-to-end solution that can be expanded upon with more data in future iterative phases.

Analytics-First Path

The analytics-first path takes a different approach and is likely the most common one you will see in Snowflake POCs. This path is a lower initial cost from a services and work standpoint, and it can produce business value in a much shorter time frame. If you need to show results to get buy-in to a larger project, the analytics-first approach will allow you to load raw data from your source applications and leverage Snowflake views for analytics tools to consume.

The trade-off with this approach comes in the form of an incomplete solution. By identifying several key subject areas, such as both financial and sales reporting, you will be able to get buy-in from multiple stakeholders in the organization. The temptation will be high to continue to add more functionality and more data as business demand grows. Working your way back to create a full end-to-end solution, however, is the right next step. The implementation costs will be higher, but you will be able to create a better-governed data platform, reduce Snowflake costs, and allow for application integration by following the plan.

Both options will lead you to success. If you have buy-in from business leaders and feel comfortable with your vision, building the end-to-end path will allow you to create a strong foundation at the beginning. If you need to show fast results or need to pivot quickly, the analytics-first path will help you sell the value of the solution by showing stakeholders results in dashboards before making a bigger investment up front. Both require a plan to see the entire vision executed but take different routes to get to the end state.

Conclusion

Knowing where to go in your journey is worth the investment.

Snowflake implementation partners, such as 2nd Watch, know how to leverage Snowflake for its strengths and other systems to complement and augment. We know the ecosystems, such as Azure and AWS, and we know how to help you put a strategy in place so Snowflake doesn’t become an analytics silo. We will help you select the right path that creates success for both you and your organization.

If you’re in the process of implementing Snowflake, wondering how it might fit in your organization, or just dreading your next daily ETL load, we would be happy to schedule a free whiteboarding session to brainstorm some ideas together. Click here to get started.


Snowflake Summit 2022 Keynote Recap: Disrupting Application Development

Dating back to 2014, Snowflake disrupted analytics by introducing the Snowflake Elastic Data Warehouse, the first data warehouse built from the ground up for the cloud with patented architecture that would revolutionize the data platform landscape. Four years later, Snowflake continued to disrupt the data warehouse industry by introducing Snowflake Data Sharing, an innovation for data collaboration that would eliminate the barriers of traditional data sharing methods in favor of enabling enterprises to easily share live data in real time without moving the shared data. This year, in 2022, under the bright Sin City lights, Snowflake intends to disrupt application development by echoing a unified platform for developing data applications from coding to monetization.

Currently, data teams looking to add value, whether it be improving their team’s analytical efficiency or reducing costs to their enterprise’s processes, develop internal data products such as ML-powered product categorization models or C-suite dashboards in whichever flavor their team is savvy in. However, to produce an external data product that brings value to their enterprise, there is only one metric executives truly care about: revenue.

To bridge the value gap between internal and external data products comes the promise of the Snowflake Native Application Framework. This framework will now enable developers to build, distribute, and deploy applications natively in the Data Cloud landscape through Snowflake. Moreover, these applications can be monetized on the Snowflake Marketplace, where consumers can securely purchase, install, and run these applications natively in their Snowflake environments, with no data movement required. It’s important to note that Snowflake’s goal is not to compete with OLTP Oracle DB workloads, but rather to disrupt how cloud applications are built by seamlessly blending the transactional and analytical capabilities Snowflake has to offer.

To round out the Snowflake Native Application Framework, a series of product announcements were made at the Summit:

  • Unistore (Powered by Hybrid Tables): To bridge transactional and analytical data in a single platform, Snowflake developed a new workload called Unistore. At its core, the new workload enables customers to unify their datasets across multiple solutions and streamline application development by incorporating all the same simplicity, performance, security, and governance customers expect from the Snowflake Data Cloud platform. To power the core, Snowflake developed Hybrid Tables. This new table type supports fast single-row operations driven by a ground-breaking row-based storage engine that will allow transactional applications to be built entirely in Snowflake. Hybrid Tables will also support primary key enforcement to protect against duplicate record inserts.
  • Snowpark for Python: Snowpark is a development framework designed to bridge the skill sets of engineers, data scientists, and developers. Previously, Snowpark only supported Java and Scala, but Snowflake knew what the people wanted – the language of choice across data engineers/scientists and application developers, Python. Allowing Python workloads into Snowflake removes the burden of security, a challenge developers often face within their enterprises.
  • Snowflake Worksheets for Python: Though in private preview currently, Snowflake will support Python development natively within Snowsight Worksheets, to develop pipelines, ML models, and applications. This will allow streamlining development features like auto-complete and the ability to code custom Python logic in seconds.
  • Streamlit Acquisition: To democratize access to data, a vision both Snowflake and Streamlit share, Snowflake acquired Streamlit, an open-source Python project for building data-based applications. Streamlit helps fill the void of bringing a simplified app-building experience for data scientists who want to quickly translate an ML model into a visual application that anyone within their enterprise can access.
  • Large Memory Warehouses: Still in development (out for preview in AWS in EU-Ireland), Snowflake will soon allow consumers to access 5x and 6x larger warehouses. These larger warehouses will enable developers to execute memory-intensive operations such as training ML models on large datasets through open-source Python libraries that will be natively available through Anaconda integration.

On top of all those features released for application development, Snowflake also released key innovations to improve data accessibility, such as:

  • Snowpipe Streaming: To eliminate the boundaries between batch and streaming pipelines, Snowflake introduced Snowpipe Streaming. This new feature will simplify stitching together both real-time and batch data into one single system. Users can now ingest via a client API endpoint aggregated log data for IoT devices without adding event hubs and even ingest CDC streams at a lower latency.
  • External Apache Iceberg Tables: Developed by Netflix, Apache Iceberg tables are open-source tables that can support a variety of file formats (e.g., Parquet, ORC, Avro). Snowflake will now allow consumers to query Iceberg tables in place, without moving the table data or existing metadata. This translates to being able to access customer-supplied storage buckets with Iceberg tables without compromising on security and taking advantage of the consistent governance of the Snowflake platform.
  • External On-Prem Storage Tables: For many enterprises, moving data into the Data Cloud is not a reality due to a variety of reasons, including size, security concerns, cost, etc. To overcome this setback, Snowflake has released in private preview the ability to create External Stages and External Tables on storage systems such as Dell or Pure Storage that can expose a highly compliant S3 API. This will allow customers to access a variety of storage devices using Snowflake without worrying about concurrency issues or the effort of maintaining compute platforms.

Between the Native Application Framework and the new additions for data accessibility, Snowflake has taken a forward-thinking approach on how to effectively disrupt the application framework. Developers should be keen to take advantage of all the new features this year while understanding that some key features such as Unistore and Snowpipe Streaming will have bumps along the road as they are still under public/private preview.

Wondering is Snowflake is the right choice for your organization? Read our high-level overview of Snowflake here or contact us to discuss your organization’s data and analytics needs.


Modern Data Management: On-Premise Data Warehouse vs Modern Data Warehouse

Regardless of your industry or function, the ability to access, analyze, and make use of your data is essential. For many organizations, however, data is scattered throughout the organization in various applications (data silos), often in a format that’s unique to that system. The result is inconsistent access to data and unreliable insights. Some organizations may have a data management solution in place, such as a legacy or on-premise data warehouse, that is not able to keep up with the volume of data and processing speeds required for modern analytics tools or data science initiatives. For organizations striving to become data-driven, these limitations are a major roadblock.

The solution for many leading companies is a modern data warehouse

Over the course of several blogs, we tap into our extensive data warehouse experience across industry, function, and company sizes to guide you through this powerful data management solution.

Download Now: Modern Data Warehouse Comparison Guide [Amazon Redshift, Google BigQuery, Azure Synapse, and Snowflake]

In this series of blogs, we:

  1. Define the modern data warehouse.
  2. Outline the different types of modern data warehouses.
  3. Illustrate how the modern data warehouse fits in the big picture.
  4. Share options on how to get started.

A modern data warehouse, implemented correctly, will allow your organization to unlock data-driven benefits from improved operations through data insights to machine learning to optimize sales pipelines. It will not only improve the way you access your data but will be instrumental in fueling innovation and driving business decisions in all facets of your organization.

Part 1: What Is a Data Warehouse?

At its most basic level, a data warehouse stores data from various applications and combines it together for analytical insights. The integrated data is then evaluated for quality issues, cleansed, organized, and modeled to represent the way a business uses the information – not the source system definition. With each business subject area integrated into the system, this data can be used for upstream applications, reporting, advanced analytics, and most importantly, for providing the insights necessary to make better, faster decisions.

Mini Case Study:

A great example of this is JD Edwards data integration. 2nd Watch worked with a client that had multiple source systems, including several JDE instances (both Xe and 9.1), Salesforce, TM1, custom data flows, and a variety of flat files they wanted to visualize in a dashboard report. The challenge was the source system definitions from JDE, with table names like “F1111”, Julian-style dates, and complex column mapping; it was nearly impossible to create the desired reports and visualizations.

2nd Watch solved this by creating a custom data architecture to reorganize the transactional data; centralize it; and structure the data for high-performance reporting, visualizations, and advanced analytics.

Image 1: The image above illustrates a retailer with multiple locations each with a different point of sale system. When they try to run a report on the numbers of units sold by state directly from data housed in these systems, the result is inaccurate due to data formatting inconsistencies. While this is a very simple example, imagine this on an enterprise scale.

Image 2: The image above shows the same data being run through an ETL process into a data warehouse. The result is a clear and accurate chart with the business users’ needs.

Data warehouses then . . . and now

There was a time when a data warehouse architecture consisted of a few source systems, a bunch of ELT/ETL (extract, transform, load) processes, and several databases, all running on one or two machines in an organization’s own data center. Companies would spend years building out this architecture with custom data processes that were used to copy and transform data from one database to another.

Times have changed and traditional on-premise data warehousing has hit its limits for most organizations. Enterprises have built data warehouse solutions in an era where they had limited data sources, infrequent changes, fewer transactions, and low competition. Now, the same systems that have been the backbone of an organization’s analytical environment are being rendered obsolete and ineffective.

Today’s organizations have to analyze data from many data sources to remain competitive. In addition, they are also addressing an increased volume of data coming from those data sources. Beyond this, in today’s fast-changing landscape, access to near real-time or instantaneous insights from data is necessary. Simply put, the legacy warehouse was not designed for the volume, velocity, and variety of data and analytics demanded by modern organizations.

If you depend on your data to better serve your customers, streamline your operations, and lead (or disrupt) your industry, a modern data warehouse built on the cloud is a must-have for your organization. In our next blog, we’ll dive deeper into the modern data warehouse and explore some of the options for deployment.

Contact us to learn what a modern data warehouse would look like for your organization.

Read Part 2: Modern Data Management: Comparing Modern Data Warehouse Options