Large language models are making waves across all industries and Document AI is becoming common as organizations look to unlock even greater business potential. But where do you begin? (Hint: Strategy is essential, and 2nd Watch has been working through the implications of LLMs and Document AI for more than a year to help you navigate through the hype.)
Beyond the continued splash of LLM and Document AI discussions, this year’s Snowflake Summit focused on a couple of practical but still substantial announcements: an embrace of open source (both in applications and in AI/LLM models) and – maybe most impactful in the long run – the native first-party Microsoft Azure integration and expanded partnership. I’ll start there and work backwards to fully set the stage before digging into what some of the transformative LLM and Document AI use cases actually are across industries and sharing which use cases are trending to have the greatest and most immediate impact according to participants in 2nd Watch’s LLM industry use case battle, which ran through Snowflake Summit.
Snowflake + Microsoft Azure: Simplifying Integration and Enabling Native Snowflake Apps
The Snowflake and Microsoft Azure integration and expanded partnership is a big deal. Snowflake and Azure have paved the path for their customers, freeing them up from making difficult integration decisions.
For 2nd Watch, as a leader working with both Microsoft and Snowflake since as early as 2015, seeing a roadmap that integrates Snowflake with Azure’s core data services immediately brought to mind a customer value prop that will drive real and immediate decisions throughout enterprises. With a stronger partnership, Azure customers will reap benefits from both a technology standpoint and an overall go-to-market effort between the two organizations, from data governance via Azure Purview to AI via Cognitive Services.
Running your workloads where you want, how you want, has always been a key vision of Snowflake’s long-term roadmap, especially since the introduction of Snowpark. While the Microsoft announcement expanded on that roadmap, Snowflake continued to push even further with performance upgrades and new features for both Snowpark and Apache Iceberg (allowing for data to be stored as parquet files in your storage buckets). Customers will be able to build and run applications and AI models in containers, natively on Snowflake, whether that’s using Streamlit, built using Snowflake’s Native App Framework, or all the above. With all your data in a centralized place and Apache Iceberg allowing for portability, there’s a compelling reason to consider building and deploying more apps directly in Snowflake, thereby avoiding the need to sync data, buy middleware, or build custom integrations between apps.
Snowflake + NVIDIA: Embracing Open Source for AI and LLM Modeling
Another major theme throughout Summit was an embrace of openness and open source. One of the first major cornerstones of the event was the announcement of NVIDIA and Snowflake’s partnership, an integration that unlocks the ability for customers to leverage open-source models.
What does this mean for you? This integration opens up the ability to both run and train your own AI and LLM models directly where your data lives – ensuring both privacy and security as the data no longer needs to be pushed to an external, third-party API. From custom Document AI models to open-source, fine-tuned LLMs, the ability to take advantage of NVIDIA’s GPU cloud reduces the latency both in training/feedback loops and use in document and embedding-based retrieval (such as document question answering across vast amounts of data).
Document AI: Introducing Snowflake’s Native Features
The 2nd Watch team was excited to see how spot on our 2023 data and AI predictions were, as we even went so far as to feature Document AI in our exhibit booth design and hosted an LLM industry use case battle during expo hours. Document AI will be key to transformative industry use cases in insurance, private equity, legal, manufacturing – you name it. From contract analysis and risk modeling to competitive intelligence and marketing personalization, Document AI can have far-reaching impacts; and Snowflake is primed to be a major player in the Document AI space.
Many organizations are just beginning to identify use cases for their AI and LLM workloads, but we’ve already spent the past year combining our existing offerings of Document AI with LLM capabilities. (This was the starting point of our previously mentioned industry use case battle, which we’ll discuss in more detail below.) With Snowflake’s announcement of native Document AI features, organizations now have the ability to tap into valuable unstructured data that’s been sitting across content management systems, largely unused, due to the incredibly costly and time-consuming efforts it takes to manually parse or extract data from documents – particularly when the formats or templates differ across documents.
Snowflake’s Document AI capabilities allow organizations to extract structured data from PDFs via natural language and, by combining what is likely a Vision transformer with an LLM, build automations to do this at scale. The data labeling process is by far the most crucial step in every AI workload. If your model doesn’t have enough high-quality examples, it will produce the same result in automated workloads. Third-party software products, such as SnorkelAI, allow for automated data labeling by using your existing data, but one of the key findings in nearly every AI-related research paper is the same: high-quality data is what matters, and the efforts you put in to building that source of truth will result in exponential benefits downstream via Document AI, LLMs, and other data-centric applications.
Leveraging Snowflake’s Data Cloud, the end-to-end process can be managed entirely within Snowflake, streamlining governance and privacy capabilities for mitigating the risk of both current and future regulations across the globe, particularly when it comes to assessing what’s in the training data you feed into your AI models.
Retrieval Augmented Generation: Exploring Those Transformative Industry Use Cases
It’s likely become clear how widely applicable Document AI and retrieval augmented generation are. (Retrieval augmented generation, or RAG: retrieving data from various sources, including image processors, auto-generated SQL, documents, etc., to augment your prompts.) But to show how great of an impact they can have on your organization’s ability to harness the full bulk and depth of your data, let’s talk through specific use cases across a selection of industries.
AI for Insurance
According to 2nd Watch’s LLM industry use case battle, contract analytics (particularly in reinsurance) reigned supreme as the most impactful use case. Unsurprisingly, policy and quote insights also stayed toward the top, followed by personalized carrier and product recommendations.
Insurance organizations can utilize both Document AI and LLMs to capture key details from different carriers and products, generating personalized insurance policies while understanding pricing trends. LLMs can also alert policy admins or automate administration tasks, such as renewals, changes, and cancellations. These alerts can allow for human-in-the-loop feedback and review, and feed into workflow and process improvement initiatives.
AI in Private Equity Firms
In the private equity sector, firms can leverage Document AI and question-answering features to securely analyze their financial and research documents. This “research analyst co-pilot” can answer queries across all documents and structured data in one place, enabling analysts to make informed decisions rapidly. Plus, private equity firms can use LLMs to analyze company reports, financial and operational data, and market trends for M&A due diligence and portfolio company benchmarking.
However, according to the opinions shared by Snowflake Summit attendees who stopped by our exhibit booth, benchmarking is the least interesting application of AI in private equity, with its ranking dropping throughout the event. Instead, Document AI question answering was the top-ranked use case, with AI-assisted opportunity and deal sourcing coming in second.
Legal Industry LLM Insights
Like both insurance and private equity, the legal industry can benefit from LLM document review and analysis; and this was the highest-ranked LLM use case within legal. Insights from complex legal documents, contracts, and court filings can be stored as embeddings in a vector database for retrieval and comparison, helping to speed up the review process and reduce the workload on legal professionals.
Case law research made a big comeback in our LLM battle, coming from sixth position to briefly rest in second and finally land in third place, behind talent acquisition and HR analytics. Of course, those LLM applications are not unique to law firms and legal departments, so it comes as no surprise that they rank highly.
Manufacturing AI Use Cases
Manufacturers proved to have widely ranging opinions on the most impactful LLM use cases, with rankings swinging wildly throughout Snowflake Summit. Predictive maintenance did hold on to the number one spot, as LLMs can analyze machine logs and maintenance records, identify similar past instances, and incorporate historical machine performance metrics to enable a predictive maintenance system.
Otherwise, use cases like brand perception insights, quality control checks, and advanced customer segmentation repeatedly swapped positions. Ultimately, competitive intelligence landed in a tie with supply chain optimization and demand forecasting. Gleaning insights from unstructured data within sources like news articles, social media, and company reports, and coupled with structured data like factual market statistics and company performance data, LLMs can produce well-rounded competitive intelligence outputs. It’s no wonder this use case tied with supply chain and demand forecasting – in which LLMs analyze supply chain data and imaging at ports and other supply chain hubs for potential risks, then combining that data with traditional time-series demand forecasting for optimization opportunities. Both use cases focus on how manufacturers can optimally position themselves for an advantage within the market.
Even More LLM Use Cases
Not to belabor the point, but Document AI and LLM have such broad applications across industries that we had to call out several more:
- Regulatory and Risk Compliance: LLMs can help monitor and ensure compliance with financial regulations. These compliance checks can be stored as embeddings in a vector database for auditing and internal insights.
- Copyright Violation Detection: LLMs can analyze media content for potential copyright violations, allowing for automated retrieval of similar instances or known copyrighted material and flagging.
- Personalized Healthcare: LLMs can analyze patient symptoms and medical histories from unstructured data and EHRs, the latest medical research and findings, and patient health records, enabling more effective treatment plans.
- Medical Imaging Analysis: Use LLMs to help interpret medical imaging, alongside diagnoses, treatment plans, and medical history, allowing for patient imaging to suggest potential diagnoses and drug therapies based on the latest research and historical data.
- Automated Content Tagging: Multimodal models and LLMs can analyze media content across video, audio, and text to generate relevant tags and keywords for automated content classification, search, and discovery.
- Brand Perception Insights: LLMs can analyze social media and online reviews to assess brand perception.
- Customer Support Copilots: LLMs can function as chatbots and copilots for customer service representatives, enabling customers to ask questions, upload photos of products, and allow the CSR to quickly retrieve relevant information, such as product manuals, warranty information, or other internal knowledge base data that is typically retrieved manually. By storing past customer interactions in a vector database, the system can retrieve relevant solutions based on similarity and improve over time, making the CSR more effective and creating a better customer experience.
More broadly, LLMs can be utilized to analyze company reports, research documents, news articles, financial data, and market trends, storing these relationships natively in Snowflake, side-by-side with structured data warehouse data and unstructured documents, images, or audio.
Snowflake Summit 2023 ended with the same clear focus that I’ve always found most compelling within their platform – giving customers simplicity, flexibility, and choice for running their data-centric workloads. That’s now been expanded to Microsoft, to the open-source community, to unstructured data and documents, and to AI and LLMs. Across every single industry, there’s a practical workload that can be applied today to solve high-value, complex business problems.
I was struck by not only the major (and pleasantly unexpected) announcements and partnerships, but also the magnitude of the event itself. Some of the most innovative minds in the data ecosystem came together to engage in curiosity-driven conversation, sharing what they’re working on, what’s worked, and what hasn’t worked. And that last part – especially as we continue to push forward on the frontier of LLMs – is what made the week so compelling and memorable.
With 2nd Watch’s experience, research, and findings in these new workloads, combined with our history working with Snowflake, we look forward to having more discussions like those we held throughout Summit to help identify and solve long-standing business problems in new, innovative ways. If you’d like to talk through Document AI and LLM use cases specific to your organization, please get in touch.