Data & AI Predictions in 2023

As we reveal our data and AI predictions for 2023, join us at 2nd Watch to stay ahead of the curve and propel your business towards innovation and success. How do we know that artificial intelligence (AI) and large language models (LLMs) have reached a tipping point? It was the hot topic at most families’ dinner tables during the 2022 holiday break.

AI has become mainstream and accessible. Most notably, OpenAI’s ChatGPT took the internet by storm, so much so that even our parents (and grandparents!) are talking about it. Since AI is here to stay beyond the Christmas Eve dinner discussion, we put together a list of 2023 predictions we expect to see regarding AI and data.

#1. Proactively handling data privacy regulations will become a top priority.

Regulatory changes can have a significant impact on how organizations handle data privacy: businesses must adapt to new policies to ensure their data is secure. Modifications to regulatory policies require governance and compliance teams to understand data within their company and the ways in which it is being accessed. 

To stay ahead of regulatory changes, organizations will need to prioritize their data governance strategies. This will mitigate the risks surrounding data privacy and potential regulations. As a part of their data governance strategy, data privacy and compliance teams must increase their usage of privacy, security, and compliance analytics to proactively understand how data is being accessed within the company and how it’s being classified. 

#2. AI and LLMs will require organizations to consider their AI strategy.

The rise of AI and LLM technologies will require businesses to adopt a broad AI strategy. AI and LLMs will open opportunities in automation, efficiency, and knowledge distillation. But, as the saying goes, “With great power comes great responsibility.” 

There is disruption and risk that comes with implementing AI and LLMs, and organizations must respond with a people- and process-oriented AI strategy. As more AI tools and start-ups crop up, companies should consider how to thoughtfully approach the disruptions that will be felt in almost every industry. Rather than being reactive to new and foreign territory, businesses should aim to educate, create guidelines, and identify ways to leverage the technology. 

Moreover, without a well-thought-out AI roadmap, enterprises will find themselves technologically plateauing, teams unable to adapt to a new landscape, and lacking a return on investment: they won’t be able to scale or support the initiatives that they put in place. Poor road mapping will lead to siloed and fragmented projects that don’t contribute to a cohesive AI ecosystem.

#3. AI technologies, like Document AI (or information extraction), will be crucial to tap into unstructured data.

According to IDC, 80% of the world’s data will be unstructured by 2025, and 90% of this unstructured data is never analyzed. Integrating unstructured and structured data opens up new use cases for organizational insights and knowledge mining.

Massive amounts of unstructured data – such as Word and PDF documents – have historically been a largely untapped data source for data warehouses and downstream analytics. New deep learning technologies, like Document AI, have addressed this issue and are more widely accessible. Document AI can extract previously unused data from PDF and Word documents, ranging from insurance policies to legal contracts to clinical research to financial statements. Additionally, vision and audio AI unlocks real-time video transcription insights and search, image classification, and call center insights.

Organizations can unlock brand-new use cases by integrating with existing data warehouses. Finetuning these models on domain data enables general-purpose models across a wide variety of use cases. 

#4. “Data is the new oil.” Data will become the fuel for turning general-purpose AI models into domain-specific, task-specific engines for automation, information extraction, and information generation.

Snorkel AI coined the term “data-centric AI,” which is an accurate paradigm to describe our current AI lifecycle. The last time AI received this much hype, the focus was on building new models. Now, very few businesses need to develop novel models and algorithms. What will set their AI technologies apart is the data strategy.

Data-centric AI enables us to leverage existing models that have already been calibrated to an organization’s data. Applying an enterprise’s data to this new paradigm will accelerate a company’s time to market, especially those who have modernized their data and analytics platforms and data warehouses

#5. The popularity of data-driven apps will increase.

Snowflake recently acquired Streamlit, which makes application development more accessible to data engineers. Additionally, Snowflake introduced Unistore and hybrid tables (OLTP) to allow data science and app teams to work together and jointly off of a single source of truth in Snowflake, eliminating silos and data replication.

Snowflake’s big moves demonstrate that companies are looking to fill gaps that traditional business intelligence (BI) tools leave behind. With tools like Streamlit, teams can harness tools to automate data sharing and deployment, which is traditionally manual and Excel-driven. Most importantly, Streamlit can become the conduit that allows business users to work directly with the AI-native and data-driven applications across the enterprise.

#6. AI-native and cloud-native applications will win.

Customers will start expecting AI capabilities to be embedded into cloud-native applications. Harnessing domain-specific data, companies should prioritize building upon module data-driven application blocks with AI and machine learning. AI-native applications will win over AI-retrofitted applications. 

When applications are custom-built for AI, analytics, and data, they are more accessible to data and AI teams, enabling business users to interact with models and data warehouses in a new way. Teams can begin classifying and labeling data in a centralized, data-driven way, rather than manually and often-repeated in Excel, and can feed into a human-in-the-loop system for review and to improve the overall accuracy and quality of models. Traditional BI tools like dashboards, on the other hand, often limit business users to consume and view data in a “what happened?” manner, rather than in a more interactive, often more targeted manner.

#7. There will be technology disruption and market consolidation.

The AI race has begun. Microsoft’s strategic partnership with OpenAI and integration into “everything,” Google’s introduction of Bard and funding into foundational model startup Anthropic, AWS with their own native models and partnership with Stability AI, and new AI-related startups are just a few of the major signals that the market is changing. The emerging AI technologies are driving market consolidation: smaller companies are being acquired by incumbent companies to take advantage of the developing technologies. 

Mergers and acquisitions are key growth drivers, with larger enterprises leveraging their existing resources to acquire smaller, nimbler players to expand their reach in the market. This emphasizes the importance of data, AI, and application strategy. Organizations must stay agile and quickly consolidate data across new portfolios of companies. 

Conclusion

The AI ball is rolling. At this point, you’ve probably dabbled with AI or engaged in high-level conversations about its implications. The next step in the AI adoption process is to actually integrate AI into your work and understand the changes (and challenges) it will bring. We hope that our data and AI predictions for 2023 prime you for the ways it can have an impact on your processes and people.

Think you’re ready to get started? Find out with 2nd Watch’s data science readiness assessment.


3 Data Priorities for Organic Value Creation

Organic value creation focuses on a few main areas, including improving current performance (both financial and operational) of your companies, establishing a pattern of consistent growth, strengthening your organizational leadership team, and building the potential for a brighter future through product and competitive positioning. All of these are supported by and/or partially based on the data foundation you create in your companies. At exit, your buyers want to see and feel confident that the created organic value is sustainable and will endure. Data and analytics are key to proving that. 

Companies that solely focus on competition will ultimately die. Those that focus on value creation will thrive. — Edward De Bono

To organically create and drive value, there are a few key data priorities you should consider:

  1. A starting point is data quality, which underpins all you will ever do and achieve with data in your organization. Achieving better-quality data is an unrelenting task, one that many organizations overlook.
  2. Data monetization is a second priority and is also not top-of-mind for many organizations. The adage that “data is the new oil” is at least partially true, and most companies have ways and means to leverage the data they already possess to monetize and grow revenue for improved financial returns.
  3. A third data priority is to focus on user adoption. Having ready data and elite-level analytical tools is not sufficient. You need to be sure the data and tools you have invested in are broadly used – and not just in the short term. You also need to continue to evolve and enhance both your data and your tools to grow that adoption for future success.

Data Quality

Data quality is a complicated topic worthy of a separate article. Let’s focus our data quality discussion on two things: trust and the process of data quality.

If you are organically growing your companies and increasing the use of and reliance upon your data, you better make sure you trust your data. The future of your analytics solutions and broad adoption across your operational management teams depend on your data being trustworthy. That trust means that the data is accurate, consistent across the organization, timely, and involved in a process to ensure the continuing trust in the data. There is also an assumption that your data aligns with external data sources. You can measure accuracy of your portfolio company’s data in many ways, but the single best measure is going to be how your operating executives answer the question, “How much do you trust your data?”

Data quality is never stagnant. There are always new data sources, changes in the data itself, outside influences on the data, etc. You cannot just clean the data once and expect it to stay clean. The best analogy is a stream that can get polluted from any source that feeds into the stream. To maintain high data quality over time, you need to build and incorporate processes and organizational structures that monitor, manage, and own the quality of your company’s data.

One “buzzwordy” term often applied to good data governance is data stewardship – the idea being that someone within your enterprise has the authority and responsibility to keep your data of the highest quality. There are efficient and effective ways to dramatically improve your company data and to keep it of the highest quality as you grow the organization. Simply put, do something about data quality, make sure that someone or some group is responsible for data quality, and find ways to measure your overall data quality over time.

A leading equipment distributor found new revenue sources and increased competitive edge by leveraging the cloud data warehouse that 2nd Watch built for their growing company to share data on parts availability in their industry. Using the centralized data, they can grow revenue, increase customer service levels, and have more industry leverage from data that they already owned. Read this private equity case study here.

Data Monetization

Organic value creation can also come from creating value out of the data your portfolio companies already own. Data monetization for you can mean such options as:

Enriching your internal data – Seek ways to make your data more valuable internally. This most often comes from cross-functional data creation (e.g., taking costing data and marrying it with sales/marketing data to infer lifetime customer value). The unique view that this enriched internal data offers will often lead to better internal decision-making and will drive more profitable analytics as you grow your analytics solutions library.

Finding private value buyers – Your data, cleansed and anonymized, is highly valuable. Your suppliers will pay for access to more data and information that helps them customize their offerings and prices to create value for customers. Your own customers would pay for enhanced information about your products and services if you can add value to them in the process. Within your industry, there are many ways to anonymize and sell the data that your portfolio companies create.

Finding public value buyers – Industry trade associations, consultancies, conference organizations, and the leading advisory firms are all eager to access unique insights and statistics they can use and sell to their own clients to generate competitive advantage.

Building a data factory mindset – Modern cloud data warehouse solutions make the technology to monetize your data quite easy. There are simple ways to make the data accessible and a marketplace for selling such data from each of the major cloud data warehouse vendors. The hardest part is not finding buyers or getting them the data; it is building an internal mindset that your internal data is a valuable asset that can be easily monetized. 

User Adoption

Our firm works with many private equity clients to design, build, and implement leading analytics solutions. A consistent learning across our project work is that user adoption is a critical success factor in our work.

Just because we have more accurate data, or more timely data, or more enriched data won’t necessarily increase the adoption of advanced analytical solutions in your portfolio companies. Not all of your operating executives are data driven nor are they all analytically driven. Just because they capably produce their monthly reporting package and get it to you on time does not mean they are acting on issues and opportunities that they should be able to discern from the data. Better training, organizational change techniques, internal data sharing, and many other ways can dramatically increase the speed and depth of the user adoption in your companies.

You know how to seek value when you invest. You know how to grow your companies post-close. Growing organically during your hold period will drive increased exit valuations and let you outperform your investment thesis. Focus on data quality and broad user adoption as two of your analytics priorities for strong organic value creation across your portfolio.

Contact us today to set up a complimentary private equity data whiteboarding session. Our analytics experts have a template for data monetization and data quality assessments that we can run through with you and your team.


A CTO’s Guide to a Modern Data Platform: Data Strategy and Governance

In our previous blog post on how to build a data warehouse in 6-8 weeks, we showed you how to get lightning-fast results and effectively create a working data warehouse with Snowflake. Future state integrations and governance needs are coming, though. This is why 2nd Watch highly recommends executing a data strategy and governance project in parallel with your Snowflake proof-of-concept. Knowing how to leverage Snowflake’s strengths to avoid common pitfalls will save you time, money, and re-work.

Consider one company that spent a year using the data discovery layer-only approach. With data sources all centralized in the data warehouse and all transformations occurring at run-time in the BI tool, the data team was able to deliver a full analytical platform to its users in less time than ever before. Users were happy, at first, until the logic became more mature and more complex and ultimately required more compute power (translating to higher cost) to keep the same performance expectations. For some, however, this might not be a problem but an expected outcome.

For this company, enabling analytics and reporting was the only need for the first year, but integration of data across applications was coming full steam ahead. The primary line of business applications needed to get near-real-time updates from the others. For example, marketing automation didn’t rely 100% on humans; it needed data to execute its rules, from creating ad campaigns to sending email blasts based on events occurring in other systems.

This one use case poked a big hole in the architecture – you can’t just have a data warehouse in your enterprise data platform. There’s more to it. Even if it’s years away, you need to effectively plan for it or you’ll end up in a similar, costly scenario. That starts with data strategy and governance.

ETL vs. ELT in Snowflake

Identify where your transformations occur and how they impact your downstream systems.

The new paradigm is that you no longer need ETL (Extract, Transform, Load) – you need ELT (Extract, Load, Transform). This is true, but sometimes misleading. Some will interpret ELT as no longer needing to build and manage the expensive pipelines and business logic that delay speed-to-insight, are costly to maintain, and require constant upkeep for changing business rules. In effect, it’s interpreted as removing the “T” and letting Snowflake solve for this. Unfortunately, someone has to write the code and business logic, and it’s best to not have your business users trying to do this when they’re better served working on your organization’s core goals.

In reality, you are not removing the “T” – you are moving it to a highly scalable and performant database after the data has been loaded. This is still going to require someone to understand how your customer data in Salesforce ties to a customer in Google Analytics that corresponds to a sale in your ERP. You still need someone who knows both the data structures and the business rules. Unfortunately, the “T” will always need a place to go – you just need to find the right place.

Ensure your business logic is defined only once in the entire flow. If you’ve written complex transformation code to define what “customer” means, when that business logic inevitably changes, you’ll be guaranteed that this definition of “customer” will flow the same way to your BI users as it does to your ERP and CRM. When data science and machine learning enter the mix, you’ll also avoid time spent in data prep and instead focus on delivering predictive insights.

You might be thinking that this all sounds even more similar to the data warehouse you’ve already built and are trying to replace. There’s some good news: Snowflake does make this easier, and ELT is still exactly the right approach.

Defining and Adjusting the Business Logic and Views

Snowflake enables an iterative process of data discovery, proof-of-concept, business value, and long-term implementation.

Perhaps you’ve defined a sales hierarchy and a salesperson compensation metric. The developer can take that logic, put it into SQL against the raw data, and refresh the dashboard, all while the business user is sitting next to them. Is the metric not quite what the user expected, or is the hierarchy missing something they hadn’t thought of in advance? Tweak the SQL in Snowflake and refresh. Iterate like this until the user is happy and signs off, excited to start using the new dashboard in their daily routine.

By confirming the business logic in the salesperson compensation example above, you’ve removed a major part of what made ETL so painful in the past: developing, waiting for a load to finish, and showing business users. That gap between load finishing and the next development cycle is a considerable amount of lost time and money. With this approach, however, you’ve confirmed the business logic is correct and you have the SQL already written in Snowflake’s data discovery views.

Developing your initial logic in views in Snowflake’s data discovery layer allows you to validate and “certify” it for implementation into the physical model. When you’ve completed the physical path, you can change the BI tool for each completed subject area to point to the physical layer instead of the data discovery layer.

If you have any questions about data strategy and governance, or if you want to learn more about how Snowflake can fit into your organization, contact us today.

This blog originally appeared as a section of our eBook, “Snowflake Deployment Best Practices: A CTO’s Guide to a Modern Data Platform.” Click here to download the full eBook.

Related Content:

What is Snowflake, How is it Different, and Where Does it Fit in Your Ecosystem?

How to Build a Data Warehouse in 6-8 Weeks

Methods to Implement a Snowflake Project


Get to Know ALTR: Optimizing Data Consumption Governance

ALTR is a cloud native DSaaS platform designed to optimize data consumption governance. In this age of ever-expanding data security challenges, which have only increased with the mass move to remote workforces, data-centric organizations need to easily but securely access data. Enter ALTR: a cloud-native platform delivering Data Security as a Service (DSaaS) and helping companies to optimize data consumption governance.

Not sure you need another tool in your toolkit? We’ll dive into ALTR’s benefits so you can see for yourself how this platform can help you get ahead of the next changes in data security, simplify processes and enterprise collaboration, and maximize your technology capabilities, all while staying in control of your budget.

How Does ALTR Work?

With ALTR, you’re able to track data consumption patterns and limit how much data can be consumed. Even better, it’s simple to implement, immediately adds value, and is easily scalable. You’ll be able to see data consumption patterns from day one and optimize your analytics while keeping your data secure.

ALTR delivers security across three key stages:

  • Observe – ALTR’s DSaaS platform offers critical visibility into your organization’s data consumption, including an audit record for each request for data. Observability is especially critical as you determine new levels of operational risk in today’s largely remote world.
  • Detect and Respond – You can use ALTR’s observability to understand typical data consumption for your organization and then determine areas of risk. With that baseline, you’re able to create highly specific data consumption policies. ALTR’s cloud-based policy engine then analyzes data requests to prevent security incidents in real time.
  • Protect – ALTR can tokenize data at its inception to secure data throughout its lifecycle. This ensures adherence to your governance policies. Plus, ALTR’s data consumption reporting can minimize existing compliance scope by assuring auditors that your policies are solid.

What Other Benefits Does ALTR Offer?

ALTR offers various integrations to enhance your data consumption governance:

  • Share data consumption records and security events with your favorite security information and event management (SIEM) software.
  • View securely shared data consumption information in Snowflake.
  • Analyze data consumption patterns in Domo.

ALTR delivers undeniable value through seamless integration with technologies like these, which you may already have in place; paired with the right consultant, the ROI is even more immediate. ALTR may be new to you, but an expert data analytics consulting firm like 2nd Watch is always investigating new technologies and can ease the implementation process. (And if you need more convincing, ALTR was selected as a finalist for Bank Director’s 2020 Best of FinXTech Awards.)

Dedicated consultants can more quickly integrate ALTR into your organization while your staff stays on top of daily operations. Consultants can then put the power in the hands of your business users to run their own reports, analyze data, and make data-driven decisions. Secure in the knowledge your data is protected, you can encourage innovation by granting more access to data when needed.

As a tech-agnostic company, 2nd Watch helps you find the right tools for your specific needs. Our consultants have a vast range of product expertise to make the most of the technology investments you’ve already made, to implement new solutions to improve your team’s function, and to ultimately help you compete with the companies of tomorrow. Reach out to us directly to find out if ALTR, or another DSaaS platform, could be right for your organization.


Data Clean Rooms: Share Your Corporate Data Fearlessly

Data sharing has become more complex, both in its application and our relationship to it. There is a tension between the need for personalization and the need for privacy. Businesses must share data to be effective and ultimately provide tailored customer experiences. However, legislation and practices regarding data privacy have tightened, and data sharing is tougher and fraught with greater compliance constraints than ever before. The challenge for enterprises is reconciling the increased demand for data with increased data protection.

The modern world runs on data. Companies share data to facilitate their daily operations. Data distribution occurs between business departments and external third parties. Even something as innocuous as exchanging Microsoft Excel and Google Sheets spreadsheets is data sharing!

Data collaboration is entrenched in our business processes. Therefore, rather than avoiding it, we must find the tools and frameworks to support secure and privacy-compliant data sharing. So how do we govern the flow of sensitive information from our data platforms to other parties?

The answer: data clean rooms. Data clean rooms are the modern vehicle for various data sharing and data governance workflows. Across industries – including media and entertainment, advertising, insurance, private equity, and more – a data clean room can be the difference-maker in your data insights.

Ready to get started with a data clean room solution? Schedule time to talk with a 2nd Watch data expert.

What is a data clean room?

There is a classic thought experiment wherein two millionaires want to find out who is richer without actually sharing how much money they are individually worth. The data clean room solves this issue by allowing parties to ask approved questions, which require external data to answer, without actually sharing the sensitive information itself!

In other words, a data clean room is a framework that allows two parties to securely share and analyze data by granting both parties control over when, where, and how said data is used. The parties involved can pool together data in a secure environment that protects private details. With data clean rooms, brands can access crucial and much-needed information while maintaining compliance with data privacy policies.

Data clean rooms have been around for about five years with Google being the first company to launch a data clean room solution (Google Ads Data Hub) in 2017. The era of user privacy kicked off in 2018 when data protection and privacy became law, most notably with the General Data Protection Regulation (GDPR).

This was a huge shake-up for most brands. Businesses had to adapt their data collection and sharing models to operate within the scope of the new legislation and the walled gardens that became popular amongst all tech giants. With user privacy becoming a priority, data sharing has become stricter and more scrutinized, which makes marketing campaign measurements and optimizations in the customer journey more difficult than ever before.

Data clean rooms are crucial for brands navigating the era of consumer protection and privacy. Brands can still gain meaningful marketing insights and operate within data privacy laws in a data clean room.

Data clean rooms work because the parties involved have full control over their data. Each party agrees upon access, availability, and data usage, while a trusted data clean room offering oversees data governance. This yields the secure framework needed to ensure that one party cannot access the other’s data and upholds the foundational rule that individual, or user-level data cannot be shared between different parties without consent.

Personally, identifying information (PII) remains anonymized and is processed and stored in a way that is not exposed to any parties involved. Thus, data sharing within a data clean room complies with privacy policies, such as GDPR and California Consumer Privacy Act (CCPA).

How does a data clean room work?

Let’s take a deeper dive into the functionality of a data clean room. Four components are involved with a data clean room:

#1 – Data ingestion
Data is funneled into the data clean room. This can be first-party data (generated from websites, applications, CRMs, etc.) or second-party data from collaborating parties (such as ad networks, partners, publishers, etc.)

#2 – Connection and enrichment
The ingested data sets are matched at the user level. Tools like third-party data enrichment complement the data sets.

#3 – Analytics
The data is analyzed to determine if there are intersections/overlaps, measurement/attribution, and propensity scoring. Data will only be shared where the data points intersect between the two parties.

#4 – Application
Once the data has finished its data clean room journey, each party will have aggregated data outputs. It creates the necessary business insights to accomplish crucial tasks such as optimizing the customer experience, performing reach and frequency measurements, building effective cross-platform journeys, and conducting deep marketing campaign analyses.

What are the benefits of a data clean room?

Data clean rooms can benefit businesses in any industry, including media, retail, and advertising. In summary, data clean rooms are beneficial for the following reasons:

You can enrich your partner’s data set.
With data clean rooms, you can collaborate with your partners to produce and consume data regarding overlapping customers. You can pool common customer data with your partners, find the intersection between your business and your partners, and share the data upstream without sharing sensitive information with competitors. An example would be sharing demand and sales information with an advertising partner for better-targeted marketing campaigns.

You can create governance within your enterprise.
Data clean rooms provide the framework to achieve the elusive “single source of truth.” You can create a golden record encompassing all the data in every system of records within your organization. This includes sensitive PII such as social security numbers, passport numbers, financial account numbers, transactional data, etc.

You can remain policy compliant.
In a data clean room environment, you can monitor where the data lives, who has access to it, and how it is used with a data clean room. Think of it as an automated middleman that validates requests for data. This allows you to share data and remain compliant with all the important acronyms: GDPR, HIPPA, CCPA, FCRA, ECPA, etc.

But you have to do it right…

With every data security and analytics initiative, there is a set of risks if the implementation is not done correctly. A truly “clean” data clean room will allow you to unlock data for your users while remaining privacy compliant. You can maintain role-based access, tokenized columns, and row-level security – which typically lock down particular data objects – and share these sensitive data sets quickly and in a governed way. Data clean rooms satisfy the need for efficient access and the need for the data producer to limit the consumer to relevant information for their use case.

Of course, there are consequences if your data clean room is actually “dirty.” Your data must be federated, and you need clarity on how your data is stored. The consequences are messy if your room is dirty. You risk:

  • Loss of customer trust
  • Fines from government agencies
  • Inadvertently oversharing proprietary information
  • Locking out valuable data requests due to a lack of process

Despite the potential risks of utilizing a data clean room, it is the most promising solution to the challenges of data-sharing in a privacy-compliant way.

Conclusion

To get the most out of your data, your business needs to create secure processes to share data and decentralize your analytics. This means pooling together common data with your partners and distributing the work to create value for all parties involved.

However, you must govern your data. It is imperative to treat your data like an asset, especially in the era of user privacy and data protection. With data clean rooms, you can reconcile the need for data collaboration with the need for data ownership and privacy.

2nd Watch can be your data clean room guide, helping you to establish a data mesh that enables sharing and analyzing distributed pools of data, all while maintaining centralized governance. Schedule time to get started with a data clean room.

Fred Bliss – CTO Data Insights 2nd Watch 


3 Ways Poor Data Governance Holds Your Healthcare Organization Back

The widespread adoption of the value-based care model is encouraging more healthcare organizations to revisit the management of their data. Increased emphasis on the quality of service, elevating care outcomes along the way, means that organizations depend more than ever on consistent, accessible, and high-quality data.

The problem is that the current state of data management is inconsistent and disorganized. Less than half of healthcare CIOs trust the current quality of their clinical, operational, and financial data. In turn, the low credibility of their data sources calls into question their reporting and analytics, which ripples outward, inhibiting the entirety of their decision-making. Clinical diagnoses, operational assessments, insurance policy designs, and patient/member satisfaction reports all suffer with poor data governance.

Fortunately, most healthcare organizations can take straightforward steps to improve their data governance – if they are aware of what’s hindering their reporting and analytics. With that goal in mind, here are some of the most common challenges and oversights for data governance and what your organization can do to overcome them.

Data Silos

Most healthcare organizations are now aware of the idea of data silos. As a whole, the industry has made commendable progress breaking down these barriers and unifying large swaths of raw data into centralized repositories. Yet the ongoing addition of new data sources can lead to the return of analytical blind spots if your organization doesn’t create permanent protocols to prevent them.

Consider this situation: Your billing department just implemented a live chat feature on your website or app, providing automated answers to a variety of patient or member questions. If there is not an established protocol automatically integrating data from these interactions into your unified view, then you’ll miss valuable pieces of each patient or member’s overall story. The lack of data might result in missed opportunities for outreach campaigns or even expanded services.

Adding any new technology (e.g., live chat, healthcare diagnostic devices, virtual assistants) creates a potential threat to the comprehensiveness of your insights. Yet by creating a data pipeline and a data-centric culture, you can prevent data siloing from reasserting itself. Remember that your data ecosystem is dynamic, and your data governance practices should be too.

Lack of Uniformity

None of the data within a healthcare organization exists in a vacuum. Even if the data within your EHR or medical practice management (MPM) software is held to the highest quality standards, a lack of consistency between these or other platforms can diminish the overall accuracy of analytics. Worst of all, this absence of standardization can impact your organization in a number of ways.

When most people think of inconsistencies, it probably relates to the accuracy of the data itself. There are the obviously harmful clinical inconsistencies (e.g., a pathology report indicates cancerous cells are acute while a clinical report labels them chronic) and less glaring but damaging organizational inconsistencies (e.g., two or more different contact numbers that hamper communication). In these examples and others, data inaccuracies muddy the waters and impair the credibility of your analytics. The other issue is more subtle, sneaking under the radar: mismatched vocabulary, terminology, or representations.

Here’s an example. Let’s say a healthcare provider is trying to analyze data from two different sources, their MPM and their EHR. Both deal with patient demographics, but might have different definitions of what constitutes their demographics. Their age brackets might vary (one might set a limit at ages 18 to 29 and another might draw the line at 18 to 35), which can prevent seamless integration for demographic analysis. Though less harmful, this lack of uniformity can curtail the ability of departments to have a common understanding and derive meaningful business intelligence from their shared data.

In all of the above instances, establishing a single source of truth with standardized information and terminology is essential if you’re going to extract accurate and meaningful insights during your analyses.

To combat these problems, your organization needs to decide upon a standardized representation of core data entities that create challenges upon analysis. Then, rather than cleansing the data in their respective source systems, you can use an ELT process to extract and load structured and unstructured data into a centralized repository. Once the data has been centralized, you can evaluate the data for inaccuracies, standardize the data by applying data governance rules against it, and finally normalize the data so your organization can analyze it with greater uniformity.

Poor Accessibility

Even when your data is high-quality and consistent, your organization might still fall short of data governance best practices. The reason why? The accessibility of your data might not thread the needle between HIPAA compliance and appropriate end-user authorization.

Some organizations, dedicated to protecting the protected health information (PHI) of their patients or members, clip their own wings when the time comes to analyze data. In an attempt to avoid expensive HIPAA violations, they restrict stakeholders, analysts, or other covered entities from accessing the data. Though it’s essential to remain HIPAA compliant, data analysis can be conducted in ways that safeguard PHI while also improving treatment quality or reducing the cost of care.

Your organization can de-identify records (removing names, geographic indicators, contact info, social security numbers, etc.) in a specific data warehouse. Presenting scrubbed files to authorized users can help them gain a wide range of insights that can transform care outcomes, reduce patient outmigration, reduce waste, and more.

Elevating Your Overall Data Governance

With all of these challenges in sight, it’s easy to get overwhelmed about the next steps. Though we’ve provided some actions your organization can utilize, it’s important to recognize effective data governance is as much a change in your mindset as it is a series of best practices. Here are some additional considerations to keep in mind as you work to improve your data governance:

You Need a Defined Data Governance Strategy.

An ad hoc approach to data governance will fail in the long run. There needs to be agreement among your data stakeholders about data availability, consistency, and quality. Often, it helps to start with a pilot project on a single line of business or department to ensure that all of the kinks of the transition are ironed out before your data governance strategy is taken enterprise wide.

Even then, compromise between standardization and distributed action is important so users within your organization are following the same best practices as they conduct dispersed analytics.

Your Culture Likely Needs to Change.

Eliminating data inconsistencies or adjusting inaccuracies are temporary fixes if only your executives are committed to making a change. Employees across your organization need to embrace the ideals of effective data governance if your organization is going to gain useful and accurate intelligence from your data.

Is your organization suffering from poor data governance? Find out the ways you can improve your data management by scheduling a whiteboard session with a member of the 2nd Watch team.

Jim Anfield – Principal, Healthcare Practice Leader 2nd Watch


Snowflake’s Role in Data Governance for Insurance: Data Masking and Object Tagging Features

Data governance is a broad-ranging discipline that affects everyone in an organization, whether directly or indirectly. It is most often employed to improve and consistently manage data through deduplication and standardization, among other activities, and can have a significant and sustained effect on reducing operational costs, increasing sales, or both.

Data governance can also be part of a more extensive master data management (MDM) program. The MDM program an organization chooses and how they implement it depends on the issues they face and both their short- and long-term visions.

For example, in the insurance industry, many companies sell various types of insurance policies renewing annually over a number of years, such as industrial property coverages and workers’ compensation casualty coverages. Two sets of underwriters will more than likely underwrite the business. Having two sets of underwriters using data systems specific to their lines of business is an advantage when meeting the coverage needs of their customers but often becomes a disadvantage when considering all of the data — but it doesn’t have to be.

The disadvantage arises when an agent or account executive needs to know the overall status of a client, including long-term profitability during all the years of coverage. This involves pulling data from policy systems, claims systems, and customer support systems. An analyst may be tasked with producing a client report for the agent or account executive to truly understand their client and make better decisions on both the client and company’s behalf. But the analyst may not know where the data is stored, who owns the data, or how to link clients across disparate systems.

Fifteen years ago, this task was very time-consuming and even five years ago was still quite cumbersome. Today, however, this issue can be mitigated with the correct data governance plan. We will go deeper into data governance and MDM in upcoming posts; but for this one, we want to show you how innovators like Snowflake are helping the cause.

What is data governance?

Data governance ensures that data is consistent, accurate, and reliable, which allows for informed and effective decision-making. This can be achieved by centralizing the data into one location from few or many siloed locations. Ensuring that data is accessible in one location enables data users to understand and analyze the data to make effective decisions. One way to accomplish this centralization of data is to implement the Snowflake Data Cloud.

Snowflake not only enables a company to store their data inexpensively and query the data for analytics, but it can foster data governance. Dynamic data masking and object tagging are two new features from Snowflake that can supplement a company’s data governance initiative.

What is dynamic data masking?

Dynamic data masking is a Snowflake security feature that selectively omits plain-text data in table or view columns based on predefined policies for masking. The purpose of data masking or hiding data in specific columns is to ensure that data is accessed on a need-to-know basis. This kind of data is most likely sensitive and doesn’t need to be accessed by every user.

When is dynamic data masking used?

Data masking is usually implemented to protect personally identifiable information (PII), such as a person’s social security number, phone number, home address, or date of birth. An insurance company would likely want to reduce risk by hiding data pertaining to sensitive information if they don’t believe access to the data is necessary for conducting analysis.

However, data masking can also be used for non-production environments where testing needs to be conducted on an application. The users testing the environment wouldn’t need to know specific data if their role is just to test the environment and application. Additionally, data masking may be used to adhere to compliance requirements like HIPAA.

What is object tagging?

Another resource for data governance within Snowflake is object tagging. Object tagging enables data stewards to track sensitive data for compliance and discovery, as well as grouping desired objects such as warehouses, databases, tables or views, and columns.

When a tag is created for a table, view, or column, data stewards can determine if the data should be fully masked, partially masked, or unmasked. When tags are associated with a warehouse, a user with the tag role can view the resource usage of the warehouse to determine what, when, and how this object is being utilized.

When is object tagging used?

There are several instances where object tagging can be useful; one use would be tagging “PII” to a column and adding extra text to describe the type of PII data located there. For example, a tag can be created for a warehouse dedicated to the sales department, enabling you to track usage and deduce why a specific warehouse is being used.

Where can data governance be applied?

Data governance applies to many industries that maintain a vast amount of data from their systems, including healthcare, supply chain and logistics, and insurance; and an effective data governance strategy may use data masking and object tagging in conjunction with each other.

As previously mentioned, one common use case for data masking is for insurance customers’ PII. Normally, analysts wouldn’t need to analyze the personal information of a customer to uncover useful information leading to key business decisions. Therefore, the administrator would be able to mask columns for the customer’s name, phone number, address, social security number, and account number without interfering with analysis.

Object tagging is also valuable within the insurance industry as there is such a vast amount of data collected and consumed. A strong percentage of that data is sensitive information. Because there is so much data and it can be difficult to track those individual pieces of information, Snowflake’s object tagging feature can help with identifying and tracking the usage of those sensitive values for the business user.

Using dynamic data masking and object tagging together, you will be able to gain insights into the locations of your sensitive data and the amount specific warehouses, tables, or columns are being used.

Think back to the situation we mentioned earlier where the property coverage sales department is on legacy system X. During that same time period, the workers’ compensation sales department is on another legacy system Y. How are you supposed to create a report to understand the profitability of these two departments?

One option is to use Snowflake to store all of the data from both legacy systems. Once the information is in the Snowflake environment, object tagging would allow you to tag the databases or tables that involve data about their respective departments. One tag can be specified for property coverage and another tag can be set for workers’ compensation data. When you’re tasked with creating a report of profitability involving these two departments, you can easily identify which information can be used. Because the tag was applied to the database, it will also be applied to all of the tables and their respective columns. You would be able to understand what columns are being used. After the data from both departments is accessible within Snowflake, data masking can then be used to ensure that the new data is only truly accessible to those who need it.

This was just a small introduction to data governance and the new features that Snowflake has available to enable this effort. Don’t forget that this data governance effort can be a part of a larger, more intricate MDM initiative. In other blog posts, we touch more on MDM and other data governance capabilities to maintain and standardize your data, helping you make the most accurate and beneficial business decisions. If you have any questions in the meantime, feel free to get in touch.


The Critical Role of Data Governance in the Insurance Industry

Insurers are privy to large amounts of data, including personally identifying information. Your business requires you to store information about your policyholders and your employees, putting lots of people at risk if your data isn’t well-secured.

However, data governance in insurance goes beyond insurance data security. An enterprise-wide data governance strategy ensures data is consistent, accurate, and reliable, allowing for informed and effective decision-making.

If you aren’t convinced that your insurance data standards need a second look, read on to learn about the impact data governance has on insurance, the challenges you may face, and how to develop and implement a data governance strategy for your organization.

Why Data Governance Is Critical in the Insurance Industry

As previously mentioned, insurance organizations handle a lot of data; and the amount of data you’re storing likely grows day by day. Data is often siloed as it comes in, making it difficult to use at an enterprise level. With growing regulatory compliance concerns – such as the impact of the EU’s General Data Protection Regulation (GDPR) in insurance and other regulations stateside – as well as customer demands and competitive pressure, data governance can’t be ignored.

Having quality, actionable data is a crucial competitive advantage in today’s insurance industry. If your company lacks a “single source of the truth” in your data, you’ll have trouble accurately defining key performance indicators, efficiently and confidently making business decisions, and using your data to increase profitability and lower your business risks.

Data Governance Challenges in Insurance

Data governance is critical in insurance, but it isn’t without its challenges. While these data governance challenges aren’t insurmountable, they’re important to keep in mind:

  • Many insurers lack the people, processes, and technology to properly manage their data in-house.
  • As the amount of data you collect grows and new technologies emerge, insurance data governance becomes increasingly complicated – but also increasingly critical.
  • New regulatory challenges require new data governance strategies or at least a fresh look at your existing plan. Data governance isn’t a “one-and-done” pursuit.
  • Insurance data governance efforts require cross-company collaboration. Data governance isn’t effective when data is siloed within your product lines or internal departments.
  • Proper data governance may require investments you didn’t budget for and red tape can be difficult to overcome, but embarking on a data governance project sooner rather than later will only benefit you.

How to Create and Implement a Data Governance Plan

Creating a data governance plan can be overwhelming, especially when you take regulatory and auditing concerns into account. Working with a company like 2nd Watch can take some of the pressure off as our expert team members have experience crafting and implementing data management strategies customized to our clients’ situations.

Regardless of if you work with a data consulting firm or go it on your own, the process should start with a review of the current state of data governance in your organization and a determination of your needs. 2nd Watch’s data consultants can help with a variety of data governance needs, including data governance strategy; master data management; data profiling, cleansing, and standardization; and data security.

The next step is to decide who will have ultimate responsibility for your data governance program. 2nd Watch can help you establish a data governance council and program, working with you to define roles and responsibilities and then create and document policies, processes, and standards.

Finally, through the use of technologies chosen for your particular situation, 2nd Watch can help automate your chosen processes to improve your data governance maturity level and facilitate the ongoing effectiveness of your data governance program.

If you’re interested in discussing how insurance data governance could benefit your organization, get in touch with an 2nd Watch data consultant for a no-cost, no-risk dialogue.