Data & AI Predictions in 2023

As we reveal our data and AI predictions for 2023, join us at 2nd Watch to stay ahead of the curve and propel your business towards innovation and success. How do we know that artificial intelligence (AI) and large language models (LLMs) have reached a tipping point? It was the hot topic at most families’ dinner tables during the 2022 holiday break.

AI has become mainstream and accessible. Most notably, OpenAI’s ChatGPT took the internet by storm, so much so that even our parents (and grandparents!) are talking about it. Since AI is here to stay beyond the Christmas Eve dinner discussion, we put together a list of 2023 predictions we expect to see regarding AI and data.

#1. Proactively handling data privacy regulations will become a top priority.

Regulatory changes can have a significant impact on how organizations handle data privacy: businesses must adapt to new policies to ensure their data is secure. Modifications to regulatory policies require governance and compliance teams to understand data within their company and the ways in which it is being accessed. 

To stay ahead of regulatory changes, organizations will need to prioritize their data governance strategies. This will mitigate the risks surrounding data privacy and potential regulations. As a part of their data governance strategy, data privacy and compliance teams must increase their usage of privacy, security, and compliance analytics to proactively understand how data is being accessed within the company and how it’s being classified. 

#2. AI and LLMs will require organizations to consider their AI strategy.

The rise of AI and LLM technologies will require businesses to adopt a broad AI strategy. AI and LLMs will open opportunities in automation, efficiency, and knowledge distillation. But, as the saying goes, “With great power comes great responsibility.” 

There is disruption and risk that comes with implementing AI and LLMs, and organizations must respond with a people- and process-oriented AI strategy. As more AI tools and start-ups crop up, companies should consider how to thoughtfully approach the disruptions that will be felt in almost every industry. Rather than being reactive to new and foreign territory, businesses should aim to educate, create guidelines, and identify ways to leverage the technology. 

Moreover, without a well-thought-out AI roadmap, enterprises will find themselves technologically plateauing, teams unable to adapt to a new landscape, and lacking a return on investment: they won’t be able to scale or support the initiatives that they put in place. Poor road mapping will lead to siloed and fragmented projects that don’t contribute to a cohesive AI ecosystem.

#3. AI technologies, like Document AI (or information extraction), will be crucial to tap into unstructured data.

According to IDC, 80% of the world’s data will be unstructured by 2025, and 90% of this unstructured data is never analyzed. Integrating unstructured and structured data opens up new use cases for organizational insights and knowledge mining.

Massive amounts of unstructured data – such as Word and PDF documents – have historically been a largely untapped data source for data warehouses and downstream analytics. New deep learning technologies, like Document AI, have addressed this issue and are more widely accessible. Document AI can extract previously unused data from PDF and Word documents, ranging from insurance policies to legal contracts to clinical research to financial statements. Additionally, vision and audio AI unlocks real-time video transcription insights and search, image classification, and call center insights.

Organizations can unlock brand-new use cases by integrating with existing data warehouses. Finetuning these models on domain data enables general-purpose models across a wide variety of use cases. 

#4. “Data is the new oil.” Data will become the fuel for turning general-purpose AI models into domain-specific, task-specific engines for automation, information extraction, and information generation.

Snorkel AI coined the term “data-centric AI,” which is an accurate paradigm to describe our current AI lifecycle. The last time AI received this much hype, the focus was on building new models. Now, very few businesses need to develop novel models and algorithms. What will set their AI technologies apart is the data strategy.

Data-centric AI enables us to leverage existing models that have already been calibrated to an organization’s data. Applying an enterprise’s data to this new paradigm will accelerate a company’s time to market, especially those who have modernized their data and analytics platforms and data warehouses

#5. The popularity of data-driven apps will increase.

Snowflake recently acquired Streamlit, which makes application development more accessible to data engineers. Additionally, Snowflake introduced Unistore and hybrid tables (OLTP) to allow data science and app teams to work together and jointly off of a single source of truth in Snowflake, eliminating silos and data replication.

Snowflake’s big moves demonstrate that companies are looking to fill gaps that traditional business intelligence (BI) tools leave behind. With tools like Streamlit, teams can harness tools to automate data sharing and deployment, which is traditionally manual and Excel-driven. Most importantly, Streamlit can become the conduit that allows business users to work directly with the AI-native and data-driven applications across the enterprise.

#6. AI-native and cloud-native applications will win.

Customers will start expecting AI capabilities to be embedded into cloud-native applications. Harnessing domain-specific data, companies should prioritize building upon module data-driven application blocks with AI and machine learning. AI-native applications will win over AI-retrofitted applications. 

When applications are custom-built for AI, analytics, and data, they are more accessible to data and AI teams, enabling business users to interact with models and data warehouses in a new way. Teams can begin classifying and labeling data in a centralized, data-driven way, rather than manually and often-repeated in Excel, and can feed into a human-in-the-loop system for review and to improve the overall accuracy and quality of models. Traditional BI tools like dashboards, on the other hand, often limit business users to consume and view data in a “what happened?” manner, rather than in a more interactive, often more targeted manner.

#7. There will be technology disruption and market consolidation.

The AI race has begun. Microsoft’s strategic partnership with OpenAI and integration into “everything,” Google’s introduction of Bard and funding into foundational model startup Anthropic, AWS with their own native models and partnership with Stability AI, and new AI-related startups are just a few of the major signals that the market is changing. The emerging AI technologies are driving market consolidation: smaller companies are being acquired by incumbent companies to take advantage of the developing technologies. 

Mergers and acquisitions are key growth drivers, with larger enterprises leveraging their existing resources to acquire smaller, nimbler players to expand their reach in the market. This emphasizes the importance of data, AI, and application strategy. Organizations must stay agile and quickly consolidate data across new portfolios of companies. 

Conclusion

The AI ball is rolling. At this point, you’ve probably dabbled with AI or engaged in high-level conversations about its implications. The next step in the AI adoption process is to actually integrate AI into your work and understand the changes (and challenges) it will bring. We hope that our data and AI predictions for 2023 prime you for the ways it can have an impact on your processes and people.

Think you’re ready to get started? Find out with 2nd Watch’s data science readiness assessment.


Data Clean Rooms: Share Your Corporate Data Fearlessly

Data sharing has become more complex, both in its application and our relationship to it. There is a tension between the need for personalization and the need for privacy. Businesses must share data to be effective and ultimately provide tailored customer experiences. However, legislation and practices regarding data privacy have tightened, and data sharing is tougher and fraught with greater compliance constraints than ever before. The challenge for enterprises is reconciling the increased demand for data with increased data protection.

The modern world runs on data. Companies share data to facilitate their daily operations. Data distribution occurs between business departments and external third parties. Even something as innocuous as exchanging Microsoft Excel and Google Sheets spreadsheets is data sharing!

Data collaboration is entrenched in our business processes. Therefore, rather than avoiding it, we must find the tools and frameworks to support secure and privacy-compliant data sharing. So how do we govern the flow of sensitive information from our data platforms to other parties?

The answer: data clean rooms. Data clean rooms are the modern vehicle for various data sharing and data governance workflows. Across industries – including media and entertainment, advertising, insurance, private equity, and more – a data clean room can be the difference-maker in your data insights.

Ready to get started with a data clean room solution? Schedule time to talk with a 2nd Watch data expert.

What is a data clean room?

There is a classic thought experiment wherein two millionaires want to find out who is richer without actually sharing how much money they are individually worth. The data clean room solves this issue by allowing parties to ask approved questions, which require external data to answer, without actually sharing the sensitive information itself!

In other words, a data clean room is a framework that allows two parties to securely share and analyze data by granting both parties control over when, where, and how said data is used. The parties involved can pool together data in a secure environment that protects private details. With data clean rooms, brands can access crucial and much-needed information while maintaining compliance with data privacy policies.

Data clean rooms have been around for about five years with Google being the first company to launch a data clean room solution (Google Ads Data Hub) in 2017. The era of user privacy kicked off in 2018 when data protection and privacy became law, most notably with the General Data Protection Regulation (GDPR).

This was a huge shake-up for most brands. Businesses had to adapt their data collection and sharing models to operate within the scope of the new legislation and the walled gardens that became popular amongst all tech giants. With user privacy becoming a priority, data sharing has become stricter and more scrutinized, which makes marketing campaign measurements and optimizations in the customer journey more difficult than ever before.

Data clean rooms are crucial for brands navigating the era of consumer protection and privacy. Brands can still gain meaningful marketing insights and operate within data privacy laws in a data clean room.

Data clean rooms work because the parties involved have full control over their data. Each party agrees upon access, availability, and data usage, while a trusted data clean room offering oversees data governance. This yields the secure framework needed to ensure that one party cannot access the other’s data and upholds the foundational rule that individual, or user-level data cannot be shared between different parties without consent.

Personally, identifying information (PII) remains anonymized and is processed and stored in a way that is not exposed to any parties involved. Thus, data sharing within a data clean room complies with privacy policies, such as GDPR and California Consumer Privacy Act (CCPA).

How does a data clean room work?

Let’s take a deeper dive into the functionality of a data clean room. Four components are involved with a data clean room:

#1 – Data ingestion
Data is funneled into the data clean room. This can be first-party data (generated from websites, applications, CRMs, etc.) or second-party data from collaborating parties (such as ad networks, partners, publishers, etc.)

#2 – Connection and enrichment
The ingested data sets are matched at the user level. Tools like third-party data enrichment complement the data sets.

#3 – Analytics
The data is analyzed to determine if there are intersections/overlaps, measurement/attribution, and propensity scoring. Data will only be shared where the data points intersect between the two parties.

#4 – Application
Once the data has finished its data clean room journey, each party will have aggregated data outputs. It creates the necessary business insights to accomplish crucial tasks such as optimizing the customer experience, performing reach and frequency measurements, building effective cross-platform journeys, and conducting deep marketing campaign analyses.

What are the benefits of a data clean room?

Data clean rooms can benefit businesses in any industry, including media, retail, and advertising. In summary, data clean rooms are beneficial for the following reasons:

You can enrich your partner’s data set.
With data clean rooms, you can collaborate with your partners to produce and consume data regarding overlapping customers. You can pool common customer data with your partners, find the intersection between your business and your partners, and share the data upstream without sharing sensitive information with competitors. An example would be sharing demand and sales information with an advertising partner for better-targeted marketing campaigns.

You can create governance within your enterprise.
Data clean rooms provide the framework to achieve the elusive “single source of truth.” You can create a golden record encompassing all the data in every system of records within your organization. This includes sensitive PII such as social security numbers, passport numbers, financial account numbers, transactional data, etc.

You can remain policy compliant.
In a data clean room environment, you can monitor where the data lives, who has access to it, and how it is used with a data clean room. Think of it as an automated middleman that validates requests for data. This allows you to share data and remain compliant with all the important acronyms: GDPR, HIPPA, CCPA, FCRA, ECPA, etc.

But you have to do it right…

With every data security and analytics initiative, there is a set of risks if the implementation is not done correctly. A truly “clean” data clean room will allow you to unlock data for your users while remaining privacy compliant. You can maintain role-based access, tokenized columns, and row-level security – which typically lock down particular data objects – and share these sensitive data sets quickly and in a governed way. Data clean rooms satisfy the need for efficient access and the need for the data producer to limit the consumer to relevant information for their use case.

Of course, there are consequences if your data clean room is actually “dirty.” Your data must be federated, and you need clarity on how your data is stored. The consequences are messy if your room is dirty. You risk:

  • Loss of customer trust
  • Fines from government agencies
  • Inadvertently oversharing proprietary information
  • Locking out valuable data requests due to a lack of process

Despite the potential risks of utilizing a data clean room, it is the most promising solution to the challenges of data-sharing in a privacy-compliant way.

Conclusion

To get the most out of your data, your business needs to create secure processes to share data and decentralize your analytics. This means pooling together common data with your partners and distributing the work to create value for all parties involved.

However, you must govern your data. It is imperative to treat your data like an asset, especially in the era of user privacy and data protection. With data clean rooms, you can reconcile the need for data collaboration with the need for data ownership and privacy.

2nd Watch can be your data clean room guide, helping you to establish a data mesh that enables sharing and analyzing distributed pools of data, all while maintaining centralized governance. Schedule time to get started with a data clean room.

Fred Bliss – CTO Data Insights 2nd Watch