Large language models are making waves across all industries and Document AI is becoming common as organizations look to unlock even greater business potential. But where do you begin? (Hint: Strategy is essential, and 2nd Watch has been working through the implications of LLMs and Document AI for more than a year to help you navigate through the hype.)
Beyond the continued splash of LLM and Document AI discussions, this year’sSnowflakeSummit focused on a couple of practical but still substantial announcements: an embrace of open source (both in applications and in AI/LLM models) and – maybe most impactful in the long run – the native first-party Microsoft Azure integration and expanded partnership. I’ll start there and work backwards to fully set the stage before digging into what some of the transformative LLM and Document AI use cases actually are across industries and sharing which use cases are trending to have the greatest and most immediate impact according to participants in 2nd Watch’s LLM industry use case battle, which ran through Snowflake Summit.
Snowflake + Microsoft Azure: Simplifying Integration and Enabling Native Snowflake Apps
The Snowflake and Microsoft Azure integration and expanded partnership is a big deal. Snowflake and Azure have paved the path for their customers, freeing them up from making difficult integration decisions.
For 2nd Watch, as a leader working with both Microsoft and Snowflake since as early as 2015, seeing a roadmap that integrates Snowflake with Azure’s core data services immediately brought to mind a customer value prop that will drive real and immediate decisions throughout enterprises. With a stronger partnership, Azure customers will reap benefits from both a technology standpoint and an overall go-to-market effort between the two organizations, from data governance via Azure Purview to AI via Cognitive Services.
Running your workloads where you want, how you want, has always been a key vision of Snowflake’s long-term roadmap, especially since the introduction ofSnowpark. While the Microsoft announcement expanded on that roadmap, Snowflake continued to push even further with performance upgrades and new features for both Snowpark and Apache Iceberg (allowing for data to be stored as parquet files in your storage buckets). Customers will be able to build and run applications and AI models in containers, natively on Snowflake, whether that’s using Streamlit, built using Snowflake’s Native App Framework, or all the above. With all your data in a centralized place and Apache Iceberg allowing for portability, there’s a compelling reason to consider building and deploying more apps directly in Snowflake, thereby avoiding the need to sync data, buy middleware, or build custom integrations between apps.
Snowflake + NVIDIA: Embracing Open Source for AI and LLM Modeling
Another major theme throughout Summit was an embrace of openness and open source. One of the first major cornerstones of the event was the announcement of NVIDIA and Snowflake’s partnership, an integration that unlocks the ability for customers to leverage open-source models.
What does this mean for you? This integration opens up the ability to both run and train your own AI and LLM models directly where your data lives – ensuring both privacy and security as the data no longer needs to be pushed to an external, third-party API. From custom Document AI models to open-source, fine-tuned LLMs, the ability to take advantage of NVIDIA’s GPU cloud reduces the latency both in training/feedback loops and use in document and embedding-based retrieval (such as document question answering across vast amounts of data).
Document AI: Introducing Snowflake’s Native Features
The 2nd Watch team was excited to see how spot on our2023 data and AI predictions were, as we even went so far as to feature Document AI in our exhibit booth design and hosted an LLM industry use case battle during expo hours. Document AI will be key to transformative industry use cases in insurance, private equity, legal, manufacturing – you name it. From contract analysis and risk modeling to competitive intelligence and marketing personalization, Document AI can have far-reaching impacts; and Snowflake is primed to be a major player in the Document AI space.
Many organizations are just beginning to identify use cases for their AI and LLM workloads, but we’ve already spent the past year combining our existing offerings of Document AI with LLM capabilities. (This was the starting point of our previously mentioned industry use case battle, which we’ll discuss in more detail below.) With Snowflake’s announcement of native Document AI features, organizations now have the ability to tap into valuable unstructured data that’s been sitting across content management systems, largely unused, due to the incredibly costly and time-consuming efforts it takes to manually parse or extract data from documents – particularly when the formats or templates differ across documents.
Snowflake’s Document AI capabilities allow organizations to extract structured data from PDFs via natural language and, by combining what is likely a Vision transformer with an LLM, build automations to do this at scale. The data labeling process is by far the most crucial step in every AI workload. If your model doesn’t have enough high-quality examples, it will produce the same result in automated workloads. Third-party software products, such as SnorkelAI, allow for automated data labeling by using your existing data, but one of the key findings in nearly every AI-related research paper is the same: high-quality data is what matters, and the efforts you put in to building that source of truth will result in exponential benefits downstream via Document AI, LLMs, and other data-centric applications.
Leveraging Snowflake’s Data Cloud, the end-to-end process can be managed entirely within Snowflake, streamlining governance and privacy capabilities for mitigating the risk of both current and future regulations across the globe, particularly when it comes to assessing what’s in the training data you feed into your AI models.
Retrieval Augmented Generation: Exploring Those Transformative Industry Use Cases
It’s likely become clear how widely applicable Document AI and retrieval augmented generation are. (Retrieval augmented generation, or RAG: retrieving data from various sources, including image processors, auto-generated SQL, documents, etc., to augment your prompts.) But to show how great of an impact they can have on your organization’s ability to harness the full bulk and depth of your data, let’s talk through specific use cases across a selection of industries.
AI for Insurance
According to 2nd Watch’s LLM industry use case battle, contract analytics (particularly in reinsurance) reigned supreme as the most impactful use case. Unsurprisingly, policy and quote insights also stayed toward the top, followed by personalized carrier and product recommendations.
Insurance organizations can utilize both Document AI and LLMs to capture key details from different carriers and products, generating personalized insurance policies while understanding pricing trends. LLMs can also alert policy admins or automate administration tasks, such as renewals, changes, and cancellations. These alerts can allow for human-in-the-loop feedback and review, and feed into workflow and process improvement initiatives.
AI in Private Equity Firms
In the private equity sector, firms can leverage Document AI and question-answering features to securely analyze their financial and research documents. This “research analyst co-pilot” can answer queries across all documents and structured data in one place, enabling analysts to make informed decisions rapidly. Plus, private equity firms can use LLMs to analyze company reports, financial and operational data, and market trends for M&A due diligence and portfolio company benchmarking.
However, according to the opinions shared by Snowflake Summit attendees who stopped by our exhibit booth, benchmarking is the least interesting application of AI in private equity, with its ranking dropping throughout the event. Instead, Document AI question answering was the top-ranked use case, with AI-assisted opportunity and deal sourcing coming in second.
Legal Industry LLM Insights
Like both insurance and private equity, the legal industry can benefit from LLM document review and analysis; and this was the highest-ranked LLM use case within legal. Insights from complex legal documents, contracts, and court filings can be stored as embeddings in a vector database for retrieval and comparison, helping to speed up the review process and reduce the workload on legal professionals.
Case law research made a big comeback in our LLM battle, coming from sixth position to briefly rest in second and finally land in third place, behind talent acquisition and HR analytics. Of course, those LLM applications are not unique to law firms and legal departments, so it comes as no surprise that they rank highly.
Manufacturing AI Use Cases
Manufacturers proved to have widely ranging opinions on the most impactful LLM use cases, with rankings swinging wildly throughout Snowflake Summit. Predictive maintenance did hold on to the number one spot, as LLMs can analyze machine logs and maintenance records, identify similar past instances, and incorporate historical machine performance metrics to enable a predictive maintenance system.
Otherwise, use cases like brand perception insights, quality control checks, and advanced customer segmentation repeatedly swapped positions. Ultimately, competitive intelligence landed in a tie with supply chain optimization and demand forecasting. Gleaning insights from unstructured data within sources like news articles, social media, and company reports, and coupled with structured data like factual market statistics and company performance data, LLMs can produce well-rounded competitive intelligence outputs. It’s no wonder this use case tied with supply chain and demand forecasting – in which LLMs analyze supply chain data and imaging at ports and other supply chain hubs for potential risks, then combining that data with traditional time-series demand forecasting for optimization opportunities. Both use cases focus on how manufacturers can optimally position themselves for an advantage within the market.
Even More LLM Use Cases
Not to belabor the point, but Document AI and LLM have such broad applications across industries that we had to call out several more:
Regulatory and Risk Compliance: LLMs can help monitor and ensure compliance with financial regulations. These compliance checks can be stored as embeddings in a vector database for auditing and internal insights.
Copyright Violation Detection: LLMs can analyze media content for potential copyright violations, allowing for automated retrieval of similar instances or known copyrighted material and flagging.
Personalized Healthcare: LLMs can analyze patient symptoms and medical histories from unstructured data and EHRs, the latest medical research and findings, and patient health records, enabling more effective treatment plans.
Medical Imaging Analysis: Use LLMs to help interpret medical imaging, alongside diagnoses, treatment plans, and medical history, allowing for patient imaging to suggest potential diagnoses and drug therapies based on the latest research and historical data.
Automated Content Tagging: Multimodal models and LLMs can analyze media content across video, audio, and text to generate relevant tags and keywords for automated content classification, search, and discovery.
Brand Perception Insights: LLMs can analyze social media and online reviews to assess brand perception.
Customer Support Copilots: LLMs can function as chatbots and copilots for customer service representatives, enabling customers to ask questions, upload photos of products, and allow the CSR to quickly retrieve relevant information, such as product manuals, warranty information, or other internal knowledge base data that is typically retrieved manually. By storing past customer interactions in a vector database, the system can retrieve relevant solutions based on similarity and improve over time, making the CSR more effective and creating a better customer experience.
More broadly, LLMs can be utilized to analyze company reports, research documents, news articles, financial data, and market trends, storing these relationships natively in Snowflake, side-by-side with structured data warehouse data and unstructured documents, images, or audio.
Snowflake Summit 2023 ended with the same clear focus that I’ve always found most compelling within their platform – giving customers simplicity, flexibility, and choice for running their data-centric workloads. That’s now been expanded to Microsoft, to the open-source community, to unstructured data and documents, and to AI and LLMs. Across every single industry, there’s a practical workload that can be applied today to solve high-value, complex business problems.
I was struck by not only the major (and pleasantly unexpected) announcements and partnerships, but also the magnitude of the event itself. Some of the most innovative minds in the data ecosystem came together to engage in curiosity-driven conversation, sharing what they’re working on, what’s worked, and what hasn’t worked. And that last part – especially as we continue to push forward on the frontier of LLMs – is what made the week so compelling and memorable.
With 2nd Watch’s experience, research, and findings in these new workloads, combined with our history working with Snowflake, we look forward to having more discussions like those we held throughout Summit to help identify and solve long-standing business problems in new, innovative ways. If you’d like to talk through Document AI and LLM use cases specific to your organization, please get in touch.
The Snowflake Data Cloud’s utility expanded further with the introduction of its Snowpark API in June of 2021. Snowflake has staked its claim as a significant player in cloud data storage and accessibility, enabling workloads including data engineering, data science, data sharing, and everything in between.
Snowflake provides a unique single engine with instant elasticity that is interoperable across different clouds and regions so users can focus on getting value out of their data, rather than trying to manage it. In today’s data-driven world, businesses must be able to quickly analyze, process, and derive insights from large volumes of data. This is where Snowpark comes in.
Snowpark expands Snowflake’s functionality, enabling users to leverage the full power of programming languages and libraries within the Snowflake environment. The Snowpark API provides a new framework for developers to bring DataFrame-style programming to common programming languages like Python, Java, and Scala. By integrating Snowpark into Snowflake, users can perform advanced data transformations, build complex data pipelines, and execute machine learning algorithms seamlessly.
The interoperability empowers organizations to extract greater value from their data, accelerating their speed of innovation.
What is Snowpark?
Snowpark’s API enables data scientists, data engineers, and software developers to perform complex data processing tasks efficiently and seamlessly. It has eliminated the need for data transfer through its high-level programming interface that allows users to write and execute code in their preferred programming language, all within the Snowflake platform. Snowpark comprises a client-side library and a server-side sandbox that enables users to work with their preferred tools and languages while leveraging the benefits of Snowflake virtual warehouses.
When developing applications, users can leverage the capabilities of Snowpark’s DataFrame API to process and analyze complex data structures and support various data processing operations such as filtering, aggregations, and sorting. In addition, users can create User Defined Functions (UDFs) whose code is uploaded to an internal stage in the Snowpark library that, when called on, is executed on the server side.
This enables the creation of custom functions to process and transform data according to their specific needs, along with greater flexibility and customization in data processing and analysis. These DataFrames are executed lazily, meaning they only run when an action to retrieve, store, or view the data they represent is run. Users write code within the client-side API in Snowpark, which is executed in Snowflake, so no data leaves unless the app asks.
Moreover, users can build queries within the DataFrame API, providing an easy way to work with data within the Structured Query Language (SQL) framework while integrating common languages like Python, Java, and Scala. Those queries are then converted to SQL within Snowpark before they distribute computation through Snowflake’s Elastic Performance Engine which enables collaboration across multiple clouds and regions.
From its support of the DataFrame API, UDFs, and seamless integration with data in Snowflake, Snowpark is an ideal tool for data scientists, data engineers, and software developers who need to work with big data in a fast and efficient manner.
Snowpark for Python
With the growth in data science and machine learning (ML) in past years, Python is closing the gap on SQL as a popular choice for data processing. Both are powerful in their own right, but they’re most valuable when they’re able to work together. Knowing this, Snowflake built Snowpark for Python “to help modern analytics, data engineering, data developers, and data science teams generate insights without complex infrastructure management for separate languages” (Snowflake, 2022). Snowpark for Python enables users to build scalable data pipelines and machine-learning workflows while utilizing the performance, elasticity, and security benefits of Snowflake.
Furthermore, with Snowflake virtual warehouses optimized for Snowpark, machine learning training is now possible due to its ability to process larger data sets by providing resources such as CPU, memory, and temporary storage. This enables Snowpark functions, including the execution of SQL statements that require compute sources (e.g., retrieving rows from tables) and performing Data Manipulation Language (DML) operations such as updating rows in tables, loading data into tables, and unloading data from tables.
With the compute infrastructure to execute memory-intensive operations, data scientists and teams can further streamline ML pipelines at scale with the interoperability of Snowpark and Snowflake.
Snowpark and Apache Spark
If you’re familiar with the world of big data, you may know a thing or two about Apache Spark. In short, Spark is a distributed system used for big data processing and analysis.
While Apache Spark and Snowpark share similar utilities, there are some distinct differences and advantages to leveraging Snowpark over Apache Spark. Within Snowpark, users can manage all data within Snowflake as opposed to the need to transfer data to Spark. This not only streamlines workflows but also eliminates the potential adverse effects of sensitive data being taken out of the databases you’re working within and into a new ecosystem.
Additionally, the ability to remain in the Snowflake ecosystem simplifies processing by reducing the complexity of setup and management. While Spark requires significant hands-on time due to its more complicated setup, the ease of data transfer that is present between Snowflake and Snowpark requires no setup. You simply choose a warehouse and are ready to run commands within the database of your choosing.
Another major advantage Snowpark offers against its more complex counterpart is the simplified security measures. Leveraging the same security architecture that is in place within Snowflake eliminates the need to build out a specific complex security protocol like what is necessary within Spark.
The interoperability of Snowpark within the Snowflake ecosystem provides an assortment of advantages when compared with Apache Spark. Being a stand-alone processing engine, Spark comes with a significant amount of complexity from setup, ongoing management, transference of data, and creating specific security protocols. By choosing Snowpark, you opt out of the unnecessary complexity and into a streamlined functional process that can improve the efficiency and accuracy of any actions surrounding the big data you are handling – two things that are front of mind for any business in any industry whose decisions are derived from their ability to process and analyze complex data.
Why It Matters
Regardless of the industry, there is a growing need to process big data and understand how to leverage it for maximum value. When looking specifically at Snowpark’s API, leveraging a simplified programming interface with support for UDFs simplifies processing large data volumes in the users programming languages of choice. In uniting the simplified process with all the benefits of the Snowflake Data Cloud platform, there is a unique opportunity for businesses to take advantage of.
As a proud strategic Snowflake consulting partner, 2nd Watch recognizes the unique value that Snowflake provides. We have a team of certified SnowPros to help businesses implement and utilize their powerful cloud-based data warehouse and all the possibilities that their Snowpark API has to offer.
In a data-rich world, the ability to democratize data across your organization and make data-driven decisions can accelerate your continued growth. To learn more about implementing the power of Snowflake with the help of the 2nd Watch team, contact us and start extracting all the value your data has to offer.
The modern business world is data-centric. As more businesses turn to cloud computing, they must evaluate and choose the right data warehouse to support their digital modernization efforts and business outcomes. Data warehouses can increase the bottom line, improve analytics, enhance the customer experience, and optimize decision-making.
A data warehouse is a large repository of data businesses utilize for deep analytical insights and business intelligence. This data is collected from multiple data sources. A high-performing data warehouse can collect data from different operational databases and apply a uniform format for better analysis and quicker insights.
Two of the most popular data warehouse solutions are Snowflake and Amazon Web Services (AWS) Redshift. Let’s look at how these two data warehouses stack up against one another.
What is Snowflake?
Snowflake is a cloud-based data warehousing solution that uses third-party cloud-compute resources, such as Azure, Google Cloud Platform, or Amazon Web Services (AWS.) It is designed to provide users with a fully managed, cloud-native database solution that can scale up or down as needed for different workloads. Snowflake separates compute from storage: a non-traditional approach to data warehousing. With this method, data remains in a central repository while compute instances are managed, sized, and scaled independently.
Snowflake is a good choice for companies that are conscious about their operational overhead and need to quickly deploy applications into production without worrying about managing hardware or software. It is also the ideal platform to use when query loads are lighter, and the workload requires frequent scaling.
The benefits of Snowflake include:
Easy integration with most components of data ecosystems
Minimal operational overhead: companies are not responsible for installing, configuring, or managing the underlying warehouse platform
Simple setup and use
Abstracted configuration for storage and compute instances
Robust and intuitive SQL interface
What is Amazon Redshift?
Amazon Redshift is an enterprise data warehouse built on Amazon Web Services (AWS). It provides organizations with a scalable, secure, and cost-effective way to store and analyze large amounts of data in the cloud. Its cloud-based compute nodes enable businesses to perform large-scale data analysis and storage.
Amazon Redshift is ideal for enterprises that require quick query outputs on large data sets. Additionally, Redshift has several options for efficiently managing its clusters using AWS CLI/Amazon Redshift Console, Amazon Redshift Query API, and AWS Software Development Kit. Redshift is a great solution for companies already using AWS services and running applications with a high query load.
The benefits of Amazon Redshift include:
Seamless integration with the AWS ecosystem
Multiple data output formatting support
Easy console to extract analytics and run queries
Customizable data and security models
Comparing Data Warehouse Solutions
Snowflake and Amazon Redshift both offer impressive performance capabilities, like scalability across multiple servers and high availability with minimal downtime. There are some differences between the two that will determine which one is the best fit for your business.
Performance
Both data warehouse solutions harness massively parallel processing (MPP) and columnar storage, which enables advanced analytics and efficiency on massive jobs. Snowflake boasts a unique architecture that supports structured and semi-structured data. Storage, computing, and cloud services are abstracted to optimize independent performance. Redshift recently unveiled concurrency scaling coupled with machine learning to compete with Snowflake’s concurrency scaling.
Maintenance
Snowflake is a pure SaaS platform that doesn’t require any maintenance. All software and hardware maintenance is handled by Snowflake. Amazon Redshift’s clusters require manual maintenance from the user.
Data and Security Customization
Snowflake supports fewer customization choices in data and security. Snowflake’s security utilizes always-on encryption enforcing strict security checks. Redshift supports data flexibility via partitioning and distribution. Additionally, Redshift allows you to tailor its end-to-end encryption and set up your own identity management system to manage user authentication and authorization.
Pricing
Both platforms offer on-demand pricing but are packaged differently. Snowflake doesn’t bundle usage and storage in its pricing structure and treats them as separate entities. Redshift bundles the two in its pricing. Snowflake tiers its pricing based on what features you need. Your company can select a tier that best fits your feature needs. Redshift rewards businesses with discounts when they commit to longer-term contracts.
Which data warehouse is best for my business?
To determine the best fit for your business, ask yourself the following questions in these specific areas:
Do I want to bundle my features? Snowflake splits compute and storage, and its tiered pricing provides more flexibility to your business to purchase only the features you require. Redshift bundles compute and storage to unlock the immediate potential to scale for enterprise data warehouses.
Do I want a customizable security model? Snowflake grants security and compliance options geared toward each tier, so your company’s level of protection is relevant to your data strategy. Redshift provides fully customizable encryption solutions, so you can build a highly tailored security model.
Do I need JSON storage? Snowflake’s JSON storage support wins over Redshift’s support. With Snowflake, you can store and query JSON with native functions. With Redshift, JSON is split into strings, making it difficult to query and work with.
Do I need more automation? Snowflake automates issues like data vacuuming and compression. Redshift requires hands-on maintenance for these sorts of tasks.
Conclusion
A data warehouse is necessary to stay competitive in the modern business world. The two major data warehouse players – Snowflake and Amazon Redshift – are both best-in-class solutions. One product is not superior to the other, so choosing the right one for your business means identifying the one best for your data strategy.
2nd Watch is an AWS Certified Partner and an Elite Snowflake Consulting Partner. We can help you choose the right data warehouse solution for you and support your business regardless of which data warehouse your choose.
We have been recognized by AWS as a Premier Partner since 2012, as well as an audited and approved Managed Service Provider and Data and Analytics Competency partner for our outstanding customer experiences, depth and breadth of our products and services, and our ability to scale to meet customer demand. Our engineers and architects are 100% certified on AWS, holding more than 200 AWS certifications.
Our full team of certified SnowPros has proven expertise to help businesses implement modern data solutions using Snowflake. From creating a simple proof of concept to developing an enterprise data warehouse to customized Snowflake training programs, 2nd Watch will help you to utilize Snowflake’s powerful cloud-based data warehouse for all of your data needs.
In our previous blog post on how to build a data warehouse in 6-8 weeks, we showed you how to get lightning-fast results and effectively create a working data warehouse with Snowflake. Future state integrations and governance needs are coming, though. This is why 2nd Watch highly recommends executing a data strategy and governance project in parallel with your Snowflake proof-of-concept. Knowing how to leverage Snowflake’s strengths to avoid common pitfalls will save you time, money, and re-work.
Consider one company that spent a year using the data discovery layer-only approach. With data sources all centralized in the data warehouse and all transformations occurring at run-time in the BI tool, the data team was able to deliver a full analytical platform to its users in less time than ever before. Users were happy, at first, until the logic became more mature and more complex and ultimately required more compute power (translating to higher cost) to keep the same performance expectations. For some, however, this might not be a problem but an expected outcome.
For this company, enabling analytics and reporting was the only need for the first year, but integration of data across applications was coming full steam ahead. The primary line of business applications needed to get near-real-time updates from the others. For example, marketing automation didn’t rely 100% on humans; it needed data to execute its rules, from creating ad campaigns to sending email blasts based on events occurring in other systems.
This one use case poked a big hole in the architecture – you can’t just have a data warehouse in your enterprise data platform. There’s more to it. Even if it’s years away, you need to effectively plan for it or you’ll end up in a similar, costly scenario. That starts with data strategy and governance.
ETL vs. ELT in Snowflake
Identify where your transformations occur and how they impact your downstream systems.
The new paradigm is that you no longer need ETL (Extract, Transform, Load) – you need ELT (Extract, Load, Transform). This is true, but sometimes misleading. Some will interpret ELT as no longer needing to build and manage the expensive pipelines and business logic that delay speed-to-insight, are costly to maintain, and require constant upkeep for changing business rules. In effect, it’s interpreted as removing the “T” and letting Snowflake solve for this. Unfortunately, someone has to write the code and business logic, and it’s best to not have your business users trying to do this when they’re better served working on your organization’s core goals.
In reality, you are not removing the “T” – you are moving it to a highly scalable and performant database after the data has been loaded. This is still going to require someone to understand how your customer data in Salesforce ties to a customer in Google Analytics that corresponds to a sale in your ERP. You still need someone who knows both the data structures and the business rules. Unfortunately, the “T” will always need a place to go – you just need to find the right place.
Ensure your business logic is defined only once in the entire flow. If you’ve written complex transformation code to define what “customer” means, when that business logic inevitably changes, you’ll be guaranteed that this definition of “customer” will flow the same way to your BI users as it does to your ERP and CRM. When data science and machine learning enter the mix, you’ll also avoid time spent in data prep and instead focus on delivering predictive insights.
You might be thinking that this all sounds even more similar to the data warehouse you’ve already built and are trying to replace. There’s some good news: Snowflake does make this easier, and ELT is still exactly the right approach.
Defining and Adjusting the Business Logic and Views
Snowflake enables an iterative process of data discovery, proof-of-concept, business value, and long-term implementation.
Perhaps you’ve defined a sales hierarchy and a salesperson compensation metric. The developer can take that logic, put it into SQL against the raw data, and refresh the dashboard, all while the business user is sitting next to them. Is the metric not quite what the user expected, or is the hierarchy missing something they hadn’t thought of in advance? Tweak the SQL in Snowflake and refresh. Iterate like this until the user is happy and signs off, excited to start using the new dashboard in their daily routine.
By confirming the business logic in the salesperson compensation example above, you’ve removed a major part of what made ETL so painful in the past: developing, waiting for a load to finish, and showing business users. That gap between load finishing and the next development cycle is a considerable amount of lost time and money. With this approach, however, you’ve confirmed the business logic is correct and you have the SQL already written in Snowflake’s data discovery views.
Developing your initial logic in views in Snowflake’s data discovery layer allows you to validate and “certify” it for implementation into the physical model. When you’ve completed the physical path, you can change the BI tool for each completed subject area to point to the physical layer instead of the data discovery layer.
If you have any questions about data strategy and governance, or if you want to learn more about how Snowflake can fit into your organization, contact us today.
This blog originally appeared as a section of our eBook, “Snowflake Deployment Best Practices: A CTO’s Guide to a Modern Data Platform.” Click here to download the full eBook.
Snowflake is a powerhouse in the data world, but it can become even more powerful when paired with other technologies that suit your organization’s needs. 2nd Watch is a tech-agnostic firm, meaning we only recommend tools we truly believe are the best for your business. Because Snowflake partners with such a wide range of technologies, we have lots of options to create a tech stack uniquely suited to you.
Among the many tools that work in conjunction with Snowflake, the following Snowflake-adjacent technologies enhance Snowflake in different ways and are all noteworthy in their own right.
HVR (a Fivetran company)
HVR, which became part of Fivetran as of late 2021, is a cloud data replication tool that natively supports Snowflake. It executes real-time data replication by reading directly from database transaction logs, which allows for high performance and low impact on database servers.
As one of the earliest technologies to join the Snowflake Partner Network, HVR understands how analytics efforts can be impacted by real-time data streaming to Snowflake. Using HVR to offer immediate access to your data in Snowflake’s Data Cloud, you can make timely decisions based on current, accurate data.
Riversand
Riversand is a master data management (MDM) tool that “goldenizes” your data. In other words, it creates one current, complete, and accurate record.
After connecting your organization with the right MDM tool for your situation, like Riversand, 2nd Watch would oversee the MDM implementation and provide guidance on next steps. This includes sharing insights on best-in-class data warehouses, like Snowflake’s Data Cloud. Together, Riversand and Snowflake can prepare you for downstream analytics and long-term data governance.
ALTR
ALTR is a cloud-native Data Security as a Service (DSaaS) platform that helps companies optimize data consumption governance. With ALTR, you’re able to track data consumption patterns and limit how much data can be consumed at various levels across your organization.
ALTR and Snowflake work together in two ways:
ALTR seamlessly integrates with Snowflake so you can view your securely shared data consumption information.
Sisu accelerates and expands your analytics capabilities. Using Sisu, business users are able to self-serve and find the answers to increasingly complex questions.
Sisu connects directly to Snowflake’s Data Cloud, so you always have access to your most up-to-date data for comprehensive, actionable analytics. Sisu and Snowflake partner to empower you to make quick, decisive judgments.
These four technologies are just a taste of the many tools that work with Snowflake to improve your organization’s data and analytics capabilities. To discuss your specific data and analytics needs – and to determine the tools Snowflake partners with that will help you meet those needs – contact 2nd Watch today.
To get your source data ingested and loaded, or for a deep dive into how you can build a fully automated data integration process in Snowflake on Azure, schedule a Snowflake whiteboarding session with our team of data architects.
If you have a Microsoft ecosystem but have been wanting to take advantage of more tools on the market, Snowflake on Azure means additional opportunities for you to upgrade your analytics platform while not throwing away the investment in and keeping the uniformity of your current environment.
For those who are not familiar, Snowflake is a cloud-based, massively parallel processing (MPP), columnar storage database. It’s a newer option for data warehousing that is set up for more efficient querying of large data volumes. It also consumes structured and semi-structured data in a way that traditional relational databases are not designed to do as effectively – think “big data” without the “big overhead.”
With this release, Microsoft companies can evaluate using Snowflake to increase the performance and flexibility of their analytics environment by adding it on top of their existing data integration process. To determine where Snowflake might be a good fit, our 2nd Watch consultants took a dive into where Snowflake could sit in our current and prospective Microsoft clients’ ecosystems.
Where does Snowflake fit in your current Azure environment?
Snowflake is best used as the home of an analytical layer (or dimensional model, for the more technical) that enables reporting. Think of this as the substitute for products like SQL Server Analysis Services (SSAS).
While we still recommend that you maintain a data integration hub and related process for all of your consuming application needs (i.e., sending consolidated and cleansed data back to each source system to keep them in sync), Snowflake can sit right at the end of that process. Because it’s optimized for read activities, it complements the needs of business intelligence tools like Power BI, Looker, Tableau, etc., making it faster for business users to grab the information they need via those tools. Integration and ETL are possible with many tools and services, including Azure Data Factory.
Example architecture for adding Snowflake to the end of your data integration process
When would you want to use Snowflake?
There are two primary ideas behind Snowflake’s competitive advantage when it comes to data warehousing platforms: its automatic optimization of query execution and the hands-off nature of its maintenance.
We recommend Snowflake for two main use cases:
Developing a more efficient analytics platform
Creating a platform for flexible data discovery
Our clients that fit within the above use cases usually had:
Multiple business users – The more people you have querying your database at one time, the more the database has to be configured to handle that load so as to not lock up other processes. Traditional databases can be scaled up to handle larger reads (think of a query that produces data), but this takes a decent amount of time and effort to achieve, and they are more often optimized for writes (think of loading data into a table). In Snowflake, a user can spin up resources for just the one query, then spin it back down right after. This allows for a more modular use of higher power resources.
Lots of data – If you’ve ever tried to perform a huge query on your current system, you likely noticed a slowdown from your usual processing. Traditional databases are not as optimized for read activities as columnar databases are. This makes options like Snowflake more attractive to those performing heavier analytical queries on a regular basis.
“Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform row-based systems. ” – MIT Computer Science and Artificial Intelligence Laboratory
A mix of structured and semi-structured data – Though many traditional databases offer options for consuming semi-structured data (e.g., JSON, XML, etc.), they aren’t optimized to do so. If you have a mix of structured and semi-structured data, options like Snowflake might be more efficient for your process.
What’s the tradeoff?
Snowflake can be a great option for clients who need a true analytical platform where they can perform data discovery or need to read out a lot of data at once. That said, the following are situations where an alternative to Snowflake might make more sense:
You need a data hub to keep your various applications in sync (application integration and API endpoints).
You aren’t trying to perform complex aggregations or data discovery.
You’re invested in an existing solution that doesn’t have many performance or management overhead concerns.
So what’s the gist?
Snowflake makes it easy to kick off a data warehouse project with its ease of use and start-small, scale-big approach. Despite this innovative architecture, we still recommend applying the same data warehousing fundamentals that you would have in the past.
Yes, Snowflake will allow you to create views on top of messy raw source data, but you will ultimately lose the battle on performance as your data grows in complexity. We suggest approaching a new Snowflake deployment with a vision of your data platform as a whole – not just the analytics layer. Performing all your complex business logic in Snowflake is possible, but due to the columnar architecture, integration use cases are better served in complementary cloud database platforms, such as Azure SQL.
With a roadmap and strategy in place for both your integration and analytics needs, you’ll be better able to select the right mix of tools and technologies for your solution. Learn more about Snowflake deployment best practices in our eBook.