Data mesh was once a mystical concept, but now, thanks to modern technology, it’s a more viable and accessible data management approach for enterprises. The framework offers a decentralized, domain-driven data platform architecture that empowers organizations to leverage their data assets more efficiently and effectively.
In this article, we’ll dive deeper into data mesh by exploring how it works, understanding its use cases, and differentiating it from traditional data management approaches, such as data lakes.
What is Data Mesh?
Data mesh is an innovative data platform architecture that capitalizes on the abundance of data within the enterprise through a domain-oriented, self-serve design. It’s an emerging approach to data management. Traditionally, organizations have leveraged a centralized data architecture, like a data lake, but data mesh advocates for a decentralized approach where data is organized into domain-oriented data products managed by domain teams. This new model breaks down silos, empowering domain teams to take ownership of their data, collaborate efficiently, and ultimately drive innovation.
There are four core principles of data mesh architecture:
Domain Ownership: Domain teams own their data and enable business units to build their data products.
Self-Service Architecture: Data mesh provides tools and capabilities that empower teams to abstract complexity away from building data products.
Data Products: Data mesh facilitates interoperability, trust, and discovery of data products.
Federated Governance: Data mesh allows users to deploy policy at global and local levels for data products.
These principles make data mesh a very intriguing prospect for industries like financial services, retail, and legal. Organizations in these particular industries contend with huge data challenges, such as massive amounts of data, highly siloed data, and strict compliance requirements. Therefore, any company that faces these data challenges needs an approach that can create flexibility, coherence, and cohesiveness across its entire ecosystem.
The Benefits of Data Mesh
Data mesh supports a domain-specific distributed data architecture that leverages “data-as-a-product,” with each domain handling its own data pipelines. These domain-driven data and pipelines federate data ownership among data teams who are held accountable for providing their data as products while facilitating communication among data distributed across different locations.
Within this domain-driven process, the infrastructure provides necessary solutions for domains to effectively process data. Domains are tasked with managing, ingesting, cleaning, and aggregating data to generate assets that are to be leveraged by business intelligence applications. Each domain is responsible for owning its own ETL pipelines – which help move data from source to database – and, once completed, enable domain owners to leverage said data for analytics or operational needs of the enterprise.
The self-serve functionality of data mesh simplifies technical complexity while focusing more on individual use cases with the data they collect. Data mesh extracts data infrastructure capabilities into a central platform that handles data pipeline engines and other infrastructure. At the same time, domains remain responsible for leveraging those components to run custom ETL pipelines providing necessary support to efficiently serve data and autonomy to own every step of the process.
Additionally, a universal set of standards under each domain helps facilitate collaboration between domains when necessary. Data mesh standardizes formatting, governance, discoverability, and metadata fields, creating cross-domain collaboration. With this interoperability and standardization of communication, data mesh overcomes the ungovernability of data lakes and the bottlenecks that monolithic data warehouses can present.
Another benefit of data mesh architecture is that it allows end-users to easily access and query data without moving or transforming it beforehand. In doing so, as data teams take ownership of domain-specific data products, they are aligned with business needs. By treating data as a product, organizations can unleash its true value, driving innovation and agility across the enterprise.
Functions of Data Mesh
In a data mesh ecosystem, data products become the building blocks of data consumption. These tailored data solutions cater to the unique requirements of data consumers, allowing them to access domain-specific datasets seamlessly. With self-serve capabilities, data consumers can make data-driven decisions independently, freeing the IT team from repetitive tasks and fostering a culture of data-driven autonomy.
Compared to some benefits of a data mesh, modern data lake architecture falls short because it provides less control over increasing volumes of data and places a heavy load on the central platform as more data continues to come in, requiring different transformations for different use cases. Data mesh addresses the shortcomings of data lake architecture through greater autonomy and flexibility for data owners, which encourages greater experimentation and innovation while lessening the burden on data teams looking to field the needs of all data consumers through a single pipeline.
Organizations can create a more efficient and scalable data ecosystem with data mesh architecture. Its method of distributing data ownership and responsibilities to domain-oriented teams fosters data collaboration and empowers data consumers to access and utilize data directly for specific use cases. Adopting an event-driven approach makes real-time data collaboration possible across the enterprise, notifying relevant stakeholders as events occur. The event-driven nature supports seamless integration and synchronization of data between different domains.
DataOps plays a significant role within the data mesh environment, streamlining data pipelines, automating data processing, and ensuring smooth data flow from source to destination. By adopting the principles of this fusion between data engineering and DevOps practices, organizations can accelerate data delivery more effectively, minimize data errors, and optimize the overall data management process. Federated governance becomes a large factor as it unites data teams, business units, and IT departments to manage data assets collaboratively. This further ensures data quality, security, and compliance while empowering domain experts to take ownership of their data. Federated governance ultimately bridges data management and consumption, encouraging data collaboration across the enterprise.
The Difference Between Data Mesh and Data Lakes
The architecture and data management approach is the primary differentiator between data mesh and central data lakes. Data lakes are a centralized repository that stores raw and unprocessed data from various sources. Data mesh supports a domain-driven approach in which data is partitioned into domain-specific data products that are owned and managed by individual domain teams. Data mesh emphasizes decentralization, data observability, and federal governance, allowing greater flexibility, scalability, and collaboration in managing data throughout organizations.
Data Ownership: Unlike traditional data lake approaches that rely on centralized data storage, data mesh promotes distribution. Data mesh creates domain-specific data lakes where teams manage their data products independently. This distribution enhances data autonomy while reducing the risk of data bottlenecks and scalability challenges.
Data Observability: Data observability is an essential component of data mesh and provides visibility into the performance and behavior of data products. Data teams can monitor, troubleshoot, and optimize their data pipelines effectively. By ensuring transparency, data observability empowers data teams to deliver high-quality data products and enables continuous improvement.
Data mesh is an architecture for analytical data management that enables end users to easily access and query data where it lives without first transporting it to a data lake or data warehouse. Using data mesh, data consumers and scientists can revolutionize how data is consumed and empower data consumers with self-serve capabilities. With access to domain-specific data products, data scientists can extract insights from rich, decentralized data sources that enable innovation. Data analytics present in data mesh environments takes center stage in value creation. With domain-specific data readily available, organizations can perform detailed data analysis, identify growth opportunities, and optimize operational processes, maximizing the potential of data products and driving improved decision-making and innovation.
Mesh Without the Mess
Data is the pinnacle of innovation, and building a data mesh architecture could be crucial to leveling up your enterprise’s data strategy. At 2nd Watch, we can help you every step of the way: from assessing the ROI of implementing data mesh to planning and executing implementation. 2nd Watch’s data strategy services will help you drive data-driven insights for the long haul.
Schedule a whiteboard session with the 2nd Watch team, and we can help you weigh all options and make the most fitting decision for you, your business, and your data usage. Start defining your organization’s data strategy today!
In a data-driven world, implementing robust data solutions is essential for organizations to thrive and stay competitive. However, as data becomes increasingly valuable and interconnected, ensuring its security and protection is of the utmost importance. Data breaches and cyber threats can have far-reaching consequences, ranging from financial losses to irreparable damage to an organization’s reputation. Therefore, before embarking on any data solution implementation journey, it’s vital for organizations to ask themselves critical security questions that will lay the groundwork for a secure and trusted data environment.
In this blog post, we’ll explore five fundamental security questions that every organization should address prior to implementing data solutions. By proactively addressing these questions, organizations can fortify their data security measures, protect sensitive information, and establish a robust foundation for the successful implementation of data-driven initiatives.
1. What sensitive data do you possess, and why is it important?
Identify the sensitive data you possess and understand its significance to your organization and objectives. This may require classifying data into categories such as customer information, financial records, intellectual property, or other relevant subject areas. Sensitive data may also include protected health information (PHI), research and development data, or account holder data, depending on the nature of your organization’s operations.
The loss or exposure of such data can lead to severe financial losses, damage to research efforts, and potential legal disputes. By recognizing the importance of your organization’s sensitive data, you can prioritize its protection and allocate appropriate security measures.
2. Who should have access to data, and how will you control it?
Determine who should have access to your sensitive data and consider implementing role-based access control (RBAC) or column-level security so data access is granted based on personnel roles and responsibilities. By carefully managing data access, you can mitigate the risk of internal data breaches and prevent unauthorized exposure of sensitive information. With column-level security on Snowflake, Google BigQuery, or Amazon Redshift, dynamic data masking can be applied to protect sensitive data from unauthorized access as data is queried.
In addition, implementing the principle of least privilege assures that individuals are only granted the minimum level of access required to perform their specific job functions. By adhering to this principle, you further limit the potential damage caused by any compromised accounts or insider threats, as employees will only have access to the data necessary for their tasks, reducing the overall attack surface and enhancing data protection.
3. How will you encrypt data to ensure its confidentiality?
Encrypt your data to safeguard from unauthorized access and theft. Implementing encryption at rest ensures that data stored on servers or devices remains unreadable without the proper decryption keys. Likewise, encryption in transit secures data as it travels over networks, preventing interception by malicious actors. Proper key management and protection are essential to maintain the confidentiality of encrypted data.
Snowflake’s Data Cloud platform employs a comprehensive approach to encryption, ensuring that data remains encrypted throughout its entire lifecycle, from the moment it enters the system to the moment it leaves. Snowflake’s end-to-end encryption approach provides organizations with a high level of confidence in the confidentiality and security of their sensitive data every step of the way.
4. Where and how will you securely store the data?
Leading cloud service providers like Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP), all offer advanced security features like data encryption, multi-factor authentication, and robust access controls. These cloud providers employ industry-leading security practices, including compliance certifications, regular security audits, and continuous monitoring to safeguard data from various threats.
5. How will you establish security governance and ensure compliance?
Build a robust security governance framework that will support data security at your organization. Organization leaders and/or data governance boards should define roles and responsibilities, establish security policies, and work to foster a culture of security awareness and data literacy across the organization.
Regular security assessments and audits are essential to identify areas for improvement and address potential weaknesses. Data managers must also stay up to date with industry best practices, maintain comprehensive documentation, and ensure compliance with relevant data protection regulations to preserve a secure and resilient data environment. Furthermore, data retention policies, multi-factor authentication (MFA), and regularly tested incident response plans contribute to the organization’s data security resilience.
Data governance is not a one-time management decision, but rather an ongoing and evolving process that will support an organization’s long-term data strategy. As a result, it’s crucial for leaders to be on board with data initiatives to balance the overhead required for data governance with the size and scope of the organization.
By asking yourself these five crucial security questions, you can develop a comprehensive data security strategy that protects sensitive information and effectively mitigates potential risks. Prioritizing data security in the early stages of implementing data solutions will help you build a solid foundation for a safe and trusted data environment that you can build upon for more advanced data, analytics, and AI ventures.
ALTR is a cloud native DSaaS platform designed to optimize data consumption governance. In this age of ever-expanding data security challenges, which have only increased with the mass move to remote workforces, data-centric organizations need to easily but securely access data. Enter ALTR: a cloud-native platform delivering Data Security as a Service (DSaaS) and helping companies to optimize data consumption governance.
Not sure you need another tool in your toolkit? We’ll dive into ALTR’s benefits so you can see for yourself how this platform can help you get ahead of the next changes in data security, simplify processes and enterprise collaboration, and maximize your technology capabilities, all while staying in control of your budget.
How Does ALTR Work?
With ALTR, you’re able to track data consumption patterns and limit how much data can be consumed. Even better, it’s simple to implement, immediately adds value, and is easily scalable. You’ll be able to see data consumption patterns from day one and optimize your analytics while keeping your data secure.
ALTR delivers security across three key stages:
Observe – ALTR’s DSaaS platform offers critical visibility into your organization’s data consumption, including an audit record for each request for data. Observability is especially critical as you determine new levels of operational risk in today’s largely remote world.
Detect and Respond – You can use ALTR’s observability to understand typical data consumption for your organization and then determine areas of risk. With that baseline, you’re able to create highly specific data consumption policies. ALTR’s cloud-based policy engine then analyzes data requests to prevent security incidents in real time.
Protect – ALTR can tokenize data at its inception to secure data throughout its lifecycle. This ensures adherence to your governance policies. Plus, ALTR’s data consumption reporting can minimize existing compliance scope by assuring auditors that your policies are solid.
What Other Benefits Does ALTR Offer?
ALTR offers various integrations to enhance your data consumption governance:
Share data consumption records and security events with your favorite security information and event management (SIEM) software.
View securely shared data consumption information in Snowflake.
Analyze data consumption patterns in Domo.
ALTR delivers undeniable value through seamless integration with technologies like these, which you may already have in place; paired with the right consultant, the ROI is even more immediate. ALTR may be new to you, but an expert data analytics consulting firm like 2nd Watch is always investigating new technologies and can ease the implementation process. (And if you need more convincing, ALTR was selected as a finalist for Bank Director’s 2020 Best of FinXTech Awards.)
Dedicated consultants can more quickly integrate ALTR into your organization while your staff stays on top of daily operations. Consultants can then put the power in the hands of your business users to run their own reports, analyze data, and make data-driven decisions. Secure in the knowledge your data is protected, you can encourage innovation by granting more access to data when needed.
Here are some additional benefits of ALTR:
1. Data Security Measures: ALTR employs robust data security measures to ensure the protection of sensitive information. The platform utilizes industry-standard encryption techniques to safeguard data both in transit and at rest. Data access controls are implemented to ensure that only authorized personnel can view or modify data. ALTR also offers data masking capabilities to further anonymize sensitive information, limiting exposure to unauthorized individuals. These security measures work together to provide a comprehensive defense against data breaches and unauthorized access.
2. Scalability and Performance: ALTR is designed to handle large-scale data volumes and growing user demands with ease. The platform is built on a cloud-native architecture that leverages scalable infrastructure to accommodate increasing data consumption needs. It utilizes distributed computing and parallel processing techniques to optimize performance, enabling efficient data analysis and processing. ALTR’s scalability ensures that organizations can seamlessly scale their data consumption governance as their data ecosystem expands, without compromising performance or data security.
3. Integration Methods: ALTR offers a variety of integration methods to seamlessly connect with different data systems, APIs, and data pipelines. The platform provides robust APIs and SDKs that enable organizations to integrate ALTR functionalities directly into their existing data infrastructure. ALTR also supports integration with popular data platforms, such as data lakes, data warehouses, and cloud storage solutions. This allows organizations to leverage their existing data systems and workflows while incorporating ALTR’s data consumption governance capabilities into their processes.
4. User Interface and Ease of Use: ALTR features an intuitive and user-friendly interface that simplifies data consumption governance tasks. The platform provides a visually appealing and customizable dashboard that allows users to monitor data consumption patterns, set consumption limits, and configure policies. The user interface offers easy navigation, streamlined workflows, and comprehensive data visualizations to empower users to effectively manage and optimize their data analytics. ALTR’s focus on usability ensures that organizations can quickly adopt the platform and derive value from its features without extensive training or technical expertise.
5. Compliance and Regulations: ALTR supports various compliance frameworks and regulations, enabling organizations to maintain regulatory compliance while optimizing data consumption governance. The platform incorporates features that align with data protection regulations such as the General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and others. ALTR’s data consumption reporting provides comprehensive insights and auditing capabilities, facilitating compliance audits and assuring auditors of policy adherence. By leveraging ALTR, organizations can confidently navigate the complexities of data compliance and ensure their data governance practices align with industry regulations.
As a tech-agnostic company, 2nd Watch helps you find the right tools for your specific needs. Our consultants have a vast range of product expertise to make the most of the technology investments you’ve already made, to implement new solutions to improve your team’s function, and to ultimately help you compete with the companies of tomorrow. Reach out to us directly to find out if ALTR, or another DSaaS platform, could be right for your organization.
Data sharing has become more complex, both in its application and our relationship to it. There is a tension between the need for personalization and the need for privacy. Businesses must share data to be effective and ultimately provide tailored customer experiences. However, legislation and practices regarding data privacy have tightened, and data sharing is tougher and fraught with greater compliance constraints than ever before. The challenge for enterprises is reconciling the increased demand for data with increased data protection.
The modern world runs on data. Companies share data to facilitate their daily operations. Data distribution occurs between business departments and external third parties. Even something as innocuous as exchanging Microsoft Excel and Google Sheets spreadsheets is data sharing!
Data collaboration is entrenched in our business processes. Therefore, rather than avoiding it, we must find the tools and frameworks to support secure and privacy-compliant data sharing. So how do we govern the flow of sensitive information from our data platforms to other parties?
The answer: data clean rooms. Data clean rooms are the modern vehicle for various data sharing and data governance workflows. Across industries – including media and entertainment, advertising, insurance, private equity, and more – a data clean room can be the difference-maker in your data insights.
There is a classic thought experiment wherein two millionaires want to find out who is richer without actually sharing how much money they are individually worth. The data clean room solves this issue by allowing parties to ask approved questions, which require external data to answer, without actually sharing the sensitive information itself!
In other words, a data clean room is a framework that allows two parties to securely share and analyze data by granting both parties control over when, where, and how said data is used. The parties involved can pool together data in a secure environment that protects private details. With data clean rooms, brands can access crucial and much-needed information while maintaining compliance with data privacy policies.
Data clean rooms have been around for about five years with Google being the first company to launch a data clean room solution (Google Ads Data Hub) in 2017. The era of user privacy kicked off in 2018 when data protection and privacy became law, most notably with the General Data Protection Regulation (GDPR).
This was a huge shake-up for most brands. Businesses had to adapt their data collection and sharing models to operate within the scope of the new legislation and the walled gardens that became popular amongst all tech giants. With user privacy becoming a priority, data sharing has become stricter and more scrutinized, which makes marketing campaign measurements and optimizations in the customer journey more difficult than ever before.
Data clean rooms are crucial for brands navigating the era of consumer protection and privacy. Brands can still gain meaningful marketing insights and operate within data privacy laws in a data clean room.
Data clean rooms work because the parties involved have full control over their data. Each party agrees upon access, availability, and data usage, while a trusted data clean room offering oversees data governance. This yields the secure framework needed to ensure that one party cannot access the other’s data and upholds the foundational rule that individual, or user-level data cannot be shared between different parties without consent.
Personally, identifying information (PII) remains anonymized and is processed and stored in a way that is not exposed to any parties involved. Thus, data sharing within a data clean room complies with privacy policies, such as GDPR and California Consumer Privacy Act (CCPA).
How does a data clean room work?
Let’s take a deeper dive into the functionality of a data clean room. Four components are involved with a data clean room:
#1 – Data ingestion
Data is funneled into the data clean room. This can be first-party data (generated from websites, applications, CRMs, etc.) or second-party data from collaborating parties (such as ad networks, partners, publishers, etc.)
#2 – Connection and enrichment
The ingested data sets are matched at the user level. Tools like third-party data enrichment complement the data sets.
#3 – Analytics
The data is analyzed to determine if there are intersections/overlaps, measurement/attribution, and propensity scoring. Data will only be shared where the data points intersect between the two parties.
#4 – Application
Once the data has finished its data clean room journey, each party will have aggregated data outputs. It creates the necessary business insights to accomplish crucial tasks such as optimizing the customer experience, performing reach and frequency measurements, building effective cross-platform journeys, and conducting deep marketing campaign analyses.
What are the benefits of a data clean room?
Data clean rooms can benefit businesses in any industry, including media, retail, and advertising. In summary, data clean rooms are beneficial for the following reasons:
You can enrich your partner’s data set.
With data clean rooms, you can collaborate with your partners to produce and consume data regarding overlapping customers. You can pool common customer data with your partners, find the intersection between your business and your partners, and share the data upstream without sharing sensitive information with competitors. An example would be sharing demand and sales information with an advertising partner for better-targeted marketing campaigns.
You can create governance within your enterprise.
Data clean rooms provide the framework to achieve the elusive “single source of truth.” You can create a golden record encompassing all the data in every system of records within your organization. This includes sensitive PII such as social security numbers, passport numbers, financial account numbers, transactional data, etc.
You can remain policy compliant.
In a data clean room environment, you can monitor where the data lives, who has access to it, and how it is used with a data clean room. Think of it as an automated middleman that validates requests for data. This allows you to share data and remain compliant with all the important acronyms: GDPR, HIPPA, CCPA, FCRA, ECPA, etc.
But you have to do it right…
With every data security and analytics initiative, there is a set of risks if the implementation is not done correctly. A truly “clean” data clean room will allow you to unlock data for your users while remaining privacy compliant. You can maintain role-based access, tokenized columns, and row-level security – which typically lock down particular data objects – and share these sensitive data sets quickly and in a governed way. Data clean rooms satisfy the need for efficient access and the need for the data producer to limit the consumer to relevant information for their use case.
Of course, there are consequences if your data clean room is actually “dirty.” Your data must be federated, and you need clarity on how your data is stored. The consequences are messy if your room is dirty. You risk:
Loss of customer trust
Fines from government agencies
Inadvertently oversharing proprietary information
Locking out valuable data requests due to a lack of process
Despite the potential risks of utilizing a data clean room, it is the most promising solution to the challenges of data-sharing in a privacy-compliant way.
Conclusion
To get the most out of your data, your business needs to create secure processes to share data and decentralize your analytics. This means pooling together common data with your partners and distributing the work to create value for all parties involved.
However, you must govern your data. It is imperative to treat your data like an asset, especially in the era of user privacy and data protection. With data clean rooms, you can reconcile the need for data collaboration with the need for data ownership and privacy.
2nd Watch can be your data clean room guide, helping you to establish a data mesh that enables sharing and analyzing distributed pools of data, all while maintaining centralized governance. Schedule time to get started with a data clean room.
Data governance is a broad-ranging discipline that affects everyone in an organization, whether directly or indirectly. It is most often employed to improve and consistently manage data through deduplication and standardization, among other activities, and can have a significant and sustained effect on reducing operational costs, increasing sales, or both.
Data governance can also be part of a more extensive master data management (MDM) program. The MDM program an organization chooses and how they implement it depends on the issues they face and both their short- and long-term visions.
For example, in the insurance industry, many companies sell various types of insurance policies renewing annually over a number of years, such as industrial property coverages and workers’ compensation casualty coverages. Two sets of underwriters will more than likely underwrite the business. Having two sets of underwriters using data systems specific to their lines of business is an advantage when meeting the coverage needs of their customers but often becomes a disadvantage when considering all of the data — but it doesn’t have to be.
The disadvantage arises when an agent or account executive needs to know the overall status of a client, including long-term profitability during all the years of coverage. This involves pulling data from policy systems, claims systems, and customer support systems. An analyst may be tasked with producing a client report for the agent or account executive to truly understand their client and make better decisions on both the client and company’s behalf. But the analyst may not know where the data is stored, who owns the data, or how to link clients across disparate systems.
Fifteen years ago, this task was very time-consuming and even five years ago was still quite cumbersome. Today, however, this issue can be mitigated with the correct data governance plan. We will go deeper into data governance and MDM in upcoming posts; but for this one, we want to show you how innovators like Snowflake are helping the cause.
What is data governance?
Data governance ensures that data is consistent, accurate, and reliable, which allows for informed and effective decision-making. This can be achieved by centralizing the data into one location from few or many siloed locations. Ensuring that data is accessible in one location enables data users to understand and analyze the data to make effective decisions. One way to accomplish this centralization of data is to implement the Snowflake Data Cloud.
Snowflake not only enables a company to store their data inexpensively and query the data for analytics, but it can foster data governance. Dynamic data masking and object tagging are two new features from Snowflake that can supplement a company’s data governance initiative.
What is dynamic data masking?
Dynamic data masking is a Snowflake security feature that selectively omits plain-text data in table or view columns based on predefined policies for masking. The purpose of data masking or hiding data in specific columns is to ensure that data is accessed on a need-to-know basis. This kind of data is most likely sensitive and doesn’t need to be accessed by every user.
When is dynamic data masking used?
Data masking is usually implemented to protect personally identifiable information (PII), such as a person’s social security number, phone number, home address, or date of birth. An insurance company would likely want to reduce risk by hiding data pertaining to sensitive information if they don’t believe access to the data is necessary for conducting analysis.
However, data masking can also be used for non-production environments where testing needs to be conducted on an application. The users testing the environment wouldn’t need to know specific data if their role is just to test the environment and application. Additionally, data masking may be used to adhere to compliance requirements like HIPAA.
What is object tagging?
Another resource for data governance within Snowflake is object tagging. Object tagging enables data stewards to track sensitive data for compliance and discovery, as well as grouping desired objects such as warehouses, databases, tables or views, and columns.
When a tag is created for a table, view, or column, data stewards can determine if the data should be fully masked, partially masked, or unmasked. When tags are associated with a warehouse, a user with the tag role can view the resource usage of the warehouse to determine what, when, and how this object is being utilized.
When is object tagging used?
There are several instances where object tagging can be useful; one use would be tagging “PII” to a column and adding extra text to describe the type of PII data located there. For example, a tag can be created for a warehouse dedicated to the sales department, enabling you to track usage and deduce why a specific warehouse is being used.
Where can data governance be applied?
Data governance applies to many industries that maintain a vast amount of data from their systems, including healthcare, supply chain and logistics, and insurance; and an effective data governance strategy may use data masking and object tagging in conjunction with each other.
As previously mentioned, one common use case for data masking is for insurance customers’ PII. Normally, analysts wouldn’t need to analyze the personal information of a customer to uncover useful information leading to key business decisions. Therefore, the administrator would be able to mask columns for the customer’s name, phone number, address, social security number, and account number without interfering with analysis.
Object tagging is also valuable within the insurance industry as there is such a vast amount of data collected and consumed. A strong percentage of that data is sensitive information. Because there is so much data and it can be difficult to track those individual pieces of information, Snowflake’s object tagging feature can help with identifying and tracking the usage of those sensitive values for the business user.
Using dynamic data masking and object tagging together, you will be able to gain insights into the locations of your sensitive data and the amount specific warehouses, tables, or columns are being used.
Think back to the situation we mentioned earlier where the property coverage sales department is on legacy system X. During that same time period, the workers’ compensation sales department is on another legacy system Y. How are you supposed to create a report to understand the profitability of these two departments?
One option is to use Snowflake to store all of the data from both legacy systems. Once the information is in the Snowflake environment, object tagging would allow you to tag the databases or tables that involve data about their respective departments. One tag can be specified for property coverage and another tag can be set for workers’ compensation data. When you’re tasked with creating a report of profitability involving these two departments, you can easily identify which information can be used. Because the tag was applied to the database, it will also be applied to all of the tables and their respective columns. You would be able to understand what columns are being used. After the data from both departments is accessible within Snowflake, data masking can then be used to ensure that the new data is only truly accessible to those who need it.
This was just a small introduction to data governance and the new features that Snowflake has available to enable this effort. Don’t forget that this data governance effort can be a part of a larger, more intricate MDM initiative. In other blog posts, we touch more on MDM and other data governance capabilities to maintain and standardize your data, helping you make the most accurate and beneficial business decisions. If you have any questions in the meantime, feel free to get in touch.
It has been said that the “hero of a successful digital transformation is GRC.” The ISACA website states, “to successfully manage the risk in digital transformation you need a modern approach to governance, risk and regulatory compliance.” For GRC program development, it is important to understand the health information technology resources and tools available to enable long term success.
What is GRC and why it important?
According to the HIPAA Journal, the average cost of a healthcare data breach is now $9.42 million. In the first half of 2021, 351 significant data breaches were reported, affecting nearly 28 million individuals. The needs have never been more acute among healthcare providers, insurers, biotechnology and health research companies for effective information security and controls. Protecting sensitive data and establishing a firm security posture is essential. Improving health care and reducing cost relies on structured approaches and thoughtful implementation of available technologies to help govern data and mitigate risk across the enterprise.
Effective and efficient management of governance, risk, and compliance, or GRC, is fast becoming a business priority across industries. Leaders at hospitals and health systems of all sizes are looking for ways to build operating strategies that harmonize and enhance efforts for GRC. Essential to that mission are effective data governance, risk management, regulatory compliance, business continuity management, project governance, and security. But rather than stand-alone or siloed security or compliance efforts, a cohesive program coupled with GRC solutions allow for organizational leaders to address the multitude of challenges more effectively and efficiently.
What are the goals for I.T. GRC?
For GRC efforts, leaders are looking to:
Safeguard Protected Healthcare Data
Meet and Maintain Compliance to Evolving Regulatory Mandates and Standards
Identify, Mitigate and Prevent Risk
Reduce operational friction
Build in and utilize best practices
Managing governance, risk, and compliance in healthcare enterprises is a daunting task. GRC implementation for healthcare risk managers can be difficult, especially during this time of rapid digital and cloud transformation. But relying on internal legacy methods and tools leads to the same issues that have been seen on-premises, stifling innovation and improvement. As organizations adapt to cloud environments as a key element of digital transformation and integrated health care, leaders are realizing that now is the time to leverage the technology to implement GRC frameworks that accelerate their progress toward positive outcomes. What’s needed is expertise and a clear roadmap to success.
Cloud Automation of GRC
The road to success starts with a framework, aligned to business objectives, that provides cloud automation of Governance, Risk, and Compliance. Breaking this into three distinct phases, ideally this would involve:
Building a Solid Foundation – within the cloud environment, ensuring infrastructure and applications are secured before they are deployed.
Image/Operation System hardening automation pipelines.
Infrastructure Deployment Automation Pipelines including Policy as Code to meet governance requirements.
CI/CD Pipelines including Code Quality and Code Security.
Disaster Recovery as a Service (DRaaS) meeting the organization’s Business Continuity Planning requirements.
Configuration Management to allow automatic remediation of your applications and operating systems.
Cost Management strategies with showback and chargeback implementation.
Automatic deployment and enforcement of standard security tools including FIM, IDS/IPS, AV and Malware tooling.
IAM integration for authorization and authentication with platforms such as Active Directory, Okta, and PingFederate, allowing for more granular control over users and elevated privileges in the clouds.
Reference Architectures created for the majority of the organization’s needs that are pre-approved, security baked-in to be used in the infrastructure pipelines.
Self-service CMDB integration with tools such ServiceNow, remedy and Jira ServiceDesk allowing business units to provision their own infrastructure while providing the proper governance guardrails.
Resilient Architecture designs
Proper Configuration and Maintenance – Infrastructure misconfiguration is the leading cause of data breaches in the cloud, and a big reason misconfiguration happens is infrastructure configuration “drift,” or change that occurs in a cloud environment post-provisioning. Using automation to monitor and self-remediate the environment will ensure the cloud environment stays in the proper configuration eliminating the largest cause of incidents. Since workloads will live most of their life in this phase, it is important to ensure there isn’t any drift from the original secure deployment. An effective program will need:
Cloud Integrity Monitoring using cloud native tooling.
Log Management and Monitoring with centralized logging, critical in a well-designed environment.
Application Monitoring
Infrastructure Monitoring
Managed Services including patching to resolve issues.
SLAs to address incidents and quickly get them resolved.
Cost Management to ensure that budgets are met and there are no runaway costs.
Perimeter security utilizing cloud native and 3rd party security appliance and services.
Data Classification
Use of Industry Leading Tools – for risk assessment, reporting, verification and remediation. Thwart future problems and provide evidence to stakeholders that the cloud environment is rock solid. Tools and verification components would include:
Compliance reporting
Risk Registry integration into tools
Future attestations (BAAs)
Audit evidence generation
Where do you go from here?
Your organization needs to innovate faster and drive value with the confidence of remaining in compliance. You need to get to a proactive state instead of being reactive. Consider an assessment to help you evaluate your organization’s place in the cloud journey and how the disparate forms of data in the organization are collected, controlled, processed, stored, and protected.
Start with an assessment that includes:
Identification of security gaps
Identification of foundational gaps
Remediation plans
Managed service provider onboarding plan
A Phase Two (Foundational/Remediation) proposal and Statement of Work
About 2nd Watch
2nd Watch is a trusted and proven partner, providing deep skills and advisory to leading organizations for over a decade. We earned a client Net Promoter Score of 85, a good way of telling you that our customers nearly always recommend us to others. We can help your organization with cloud native solutions. We offer skills in the following areas:
Developing cloud first strategies
Migration of workloads to the cloud
Implementing automation for governance and security guardrails
Implementing compliance controls and processes
Pipelines for data, infrastructure and application deployment