When using a modern data warehouse, your organization is likely to see improved access to your data and more impactful analytics. One such data warehouse is Azure Synapse, a Microsoft service. When paired with a powerful BI tool, like Looker, or a data science platform, like Dataiku, your organization can more quickly gain access to impactful insights that will help you drive business decisions across the enterprise.
In this post, we’ll provide a high-level overview of Azure Synapse, including a description of the tool, why you should use it, pros and cons, and complementary tools and technologies.
Overview of Azure Synapse
Azure Synapse is Microsoft’s service that puts an umbrella around various existing and new offerings, including Azure DW, Azure Databricks, and on-demand SQL querying, but lacks the tight integration across these services. Similar to Redshift, Azure DW is charged by size of instance and time running, while other Synapse services offer more of a consumption-based model.
Once data is stored within Azure Data Lake, no need to stage data again within the warehouse
Easy to scale up or down on the fly with use of Azure
Increases in pricing tiers only increase concurrent queries by 4 at each level
Built for MPP (massive parallel processing)
Performance optimal for data volumes larger than 1TB
Not suitable for running high volumes of concurrent queries (four concurrent requests per service level)
Requires active performance tuning (indexes, etc.)
Native connection with Power BI
Can select either a serverless SQL pool or a dedicated SQL pool based on the needs of the organization
Supports the ability to run Spark on Databricks
Core product still relies on Azure DW, an older technology
Supports row-level and column-level security, multi-factor authentication, and Azure AD integration
Why Use Azure Synapse
The Microsoft SQL Server ecosystem is familiar, with tighter integrations into Azure’s data ecosystem, including Azure Databricks and the MPP version of SQL Server, Azure DW – just don’t expect a turnkey solution quite yet.
Pros of Azure Synapse
Can be easily provisioned with existing Azure subscription and provides pay-as-you-go pricing
Integration with Azure Active Directory and Azure Purview can provide an easy way to manage user roles and insights into data
Transferable knowledge from on-premise Microsoft SQL Server background
Cons of Azure Synapse
“Synapse” is largely a marketing umbrella of technologies, with Azure DW at its core, requiring management of disparate services
Difficulty managing high volumes of concurrent queries due to tuning and cost of higher service tiers
Requires complex database administration tasks, including performance tuning, which other cloud data solutions have made more turnkey
Serverless capabilities are limited to newer Azure services, and lacks the on-demand, frictionless sizing of compute within Azure DW
Select Complementary Tools and Technologies for Azure Synapse
Azure Analysis Services
Azure Data Factory
We hope you found this high-level overview of Azure Synapse helpful. If you’re interested in learning more about Azure Synapse or other modern data warehouse tools like Amazon Redshift, Google BigQuery, and Snowflake, contact us to learn more.
Master data management, commonly called MDM, is an increasingly hot topic. You may have heard the term thrown around and been wondering, “What is MDM?” and, “Does my business need it?” We’re sharing a crash course in MDM to cover those questions, and we may even answer some you haven’t thought of asking yet.
What is MDM?
MDM allows for successful downstream analytics as well as synchronization of data to systems across your business. The process involves three major steps:
Ingest all relevant data in a repository.
Use an MDM tool (such as Riversand or Semarchy) to “goldenize” the data. In other words, create one current, complete, and accurate record.
Send the goldenized data downstream for analytics and back to the original source.
Let’s say one part of your business stores customer data in Oracle and a different area has customer data in Salesforce. And maybe you’re acquiring a business that stores customer data in HubSpot in some instances. You want to be able to access all of this information, understand what you’re accessing, and then analyze the data to help you make better business decisions. This is when you would turn to MDM.
Do we need an MDM solution?
First, answer a few questions:
This chart is a bit of an oversimplification but serves as a starting point. If you want to better understand how MDM could impact your business and ensure your MDM solution is customized to your needs, you can work with a data and analytics consulting firm like 2nd Watch.
When should I get help implementing an MDM solution?
An MDM company will be skilled at creating the “golden record” of your data. (Reminder: This is one current, complete, and accurate record.) However, they typically lack the ability to provide guidance going forward.
A data and analytics consulting firm like 2nd Watch will take a broad view of the current state of your business and its individual entities, while also considering where you want your organization to go. We partner with leading MDM providers like Riversand to ensure you get the best possible MDM solution. We can then guide the MDM implementation and set you up for next steps like creating reports and developing a data governance strategy.
Having an advocate for your future state goals during an MDM implementation is particularly important because there are four types of MDM architecture styles: registry, consolidation, coexistence, and centralized. You don’t need to understand the nuances of these MDM styles; that’s our job. 2nd Watch’s MDM experience has helped us develop recommendations on which style works best for different needs, so we can quickly generate value but also follow data best practices.
Even more important is how an MDM solution evolves over time. As your business changes, your data will change with it. 2nd Watch can develop an MDM solution that keeps up with your needs, moving through the progressively complex styles of MDM architecture as your organization grows and expands.
Data governance is a broad-ranging discipline that affects everyone in an organization, whether directly or indirectly. It is most often employed to improve and consistently manage data through deduplication and standardization, among other activities, and can have a significant and sustained effect on reducing operational costs, increasing sales, or both.
Data governance can also be part of a more extensive master data management (MDM) program. The MDM program an organization chooses and how they implement it depends on the issues they face and both their short- and long-term visions.
For example, in the insurance industry, many companies sell various types of insurance policies renewing annually over a number of years, such as industrial property coverages and workers’ compensation casualty coverages. Two sets of underwriters will more than likely underwrite the business. Having two sets of underwriters using data systems specific to their lines of business is an advantage when meeting the coverage needs of their customers but often becomes a disadvantage when considering all of the data — but it doesn’t have to be.
The disadvantage arises when an agent or account executive needs to know the overall status of a client, including long-term profitability during all the years of coverage. This involves pulling data from policy systems, claims systems, and customer support systems. An analyst may be tasked with producing a client report for the agent or account executive to truly understand their client and make better decisions on both the client and company’s behalf. But the analyst may not know where the data is stored, who owns the data, or how to link clients across disparate systems.
Fifteen years ago, this task was very time-consuming and even five years ago was still quite cumbersome. Today, however, this issue can be mitigated with the correct data governance plan. We will go deeper into data governance and MDM in upcoming posts; but for this one, we want to show you how innovators like Snowflake are helping the cause.
What is data governance?
Data governance ensures that data is consistent, accurate, and reliable, which allows for informed and effective decision-making. This can be achieved by centralizing the data into one location from few or many siloed locations. Ensuring that data is accessible in one location enables data users to understand and analyze the data to make effective decisions. One way to accomplish this centralization of data is to implement the Snowflake Data Cloud.
Snowflake not only enables a company to store their data inexpensively and query the data for analytics, but it can foster data governance. Dynamic data masking and object tagging are two new features from Snowflake that can supplement a company’s data governance initiative.
What is dynamic data masking?
Dynamic data masking is a Snowflake security feature that selectively omits plain-text data in table or view columns based on predefined policies for masking. The purpose of data masking or hiding data in specific columns is to ensure that data is accessed on a need-to-know basis. This kind of data is most likely sensitive and doesn’t need to be accessed by every user.
When is dynamic data masking used?
Data masking is usually implemented to protect personally identifiable information (PII), such as a person’s social security number, phone number, home address, or date of birth. An insurance company would likely want to reduce risk by hiding data pertaining to sensitive information if they don’t believe access to the data is necessary for conducting analysis.
However, data masking can also be used for non-production environments where testing needs to be conducted on an application. The users testing the environment wouldn’t need to know specific data if their role is just to test the environment and application. Additionally, data masking may be used to adhere to compliance requirements like HIPAA.
What is object tagging?
Another resource for data governance within Snowflake is object tagging. Object tagging enables data stewards to track sensitive data for compliance and discovery, as well as grouping desired objects such as warehouses, databases, tables or views, and columns.
When a tag is created for a table, view, or column, data stewards can determine if the data should be fully masked, partially masked, or unmasked. When tags are associated with a warehouse, a user with the tag role can view the resource usage of the warehouse to determine what, when, and how this object is being utilized.
When is object tagging used?
There are several instances where object tagging can be useful; one use would be tagging “PII” to a column and adding extra text to describe the type of PII data located there. For example, a tag can be created for a warehouse dedicated to the sales department, enabling you to track usage and deduce why a specific warehouse is being used.
Where can data governance be applied?
Data governance applies to many industries that maintain a vast amount of data from their systems, including healthcare, supply chain and logistics, and insurance; and an effective data governance strategy may use data masking and object tagging in conjunction with each other.
As previously mentioned, one common use case for data masking is for insurance customers’ PII. Normally, analysts wouldn’t need to analyze the personal information of a customer to uncover useful information leading to key business decisions. Therefore, the administrator would be able to mask columns for the customer’s name, phone number, address, social security number, and account number without interfering with analysis.
Object tagging is also valuable within the insurance industry as there is such a vast amount of data collected and consumed. A strong percentage of that data is sensitive information. Because there is so much data and it can be difficult to track those individual pieces of information, Snowflake’s object tagging feature can help with identifying and tracking the usage of those sensitive values for the business user.
Using dynamic data masking and object tagging together, you will be able to gain insights into the locations of your sensitive data and the amount specific warehouses, tables, or columns are being used.
Think back to the situation we mentioned earlier where the property coverage sales department is on legacy system X. During that same time period, the workers’ compensation sales department is on another legacy system Y. How are you supposed to create a report to understand the profitability of these two departments?
One option is to use Snowflake to store all of the data from both legacy systems. Once the information is in the Snowflake environment, object tagging would allow you to tag the databases or tables that involve data about their respective departments. One tag can be specified for property coverage and another tag can be set for workers’ compensation data. When you’re tasked with creating a report of profitability involving these two departments, you can easily identify which information can be used. Because the tag was applied to the database, it will also be applied to all of the tables and their respective columns. You would be able to understand what columns are being used. After the data from both departments is accessible within Snowflake, data masking can then be used to ensure that the new data is only truly accessible to those who need it.
This was just a small introduction to data governance and the new features that Snowflake has available to enable this effort. Don’t forget that this data governance effort can be a part of a larger, more intricate MDM initiative. In other blog posts, we touch more on MDM and other data governance capabilities to maintain and standardize your data, helping you make the most accurate and beneficial business decisions. If you have any questions in the meantime, feel free to get in touch.