Manufacturing Analytics: The Power of Data in the Manufacturing Industry

The effects of the pandemic have hit the manufacturing industry in ways no one could have predicted. During the last 18 months, a new term has come up frequently in the news and in conversation: the supply chain crisis. Manufacturers have been disrupted in almost every facet of their business, and they have been put to the test as to whether they can weather these challenges or not. 

Manufacturing Analytics: The Power of Data in the Manufacturing Industry

 

Manufacturing businesses that began a digital transformation prior to the current global crisis have been more agile in handling the disruptions. That is because manufacturers using data analytics and cloud technology can be flexible in adopting the capabilities they need for important business goals, be able to identify inefficiencies more quickly and be equipped to adopt a hybrid workforce to make sure production doesn’t stall. 

The pandemic has identified and accelerated the need for manufacturers to digitize and harness the power of modern technology. Real-time data and analytics are fundamental to the manufacturing industry because they create the contextual awareness that is crucial for optimizing products and processes. This is especially important during the supply chain crisis, but this goes beyond the scope of the pandemic. Manufacturers will want to, despite the external circumstances, automate for quicker and smarter decisions in order to remain competitive and have a positive impact on the bottom line. 

In this article, we’ll identify the use cases and benefits of manufacturing analytics, which can be applied in any situation at any time. 

What is Manufacturing Analytics?

Manufacturing analytics is used to capture, process, and analyze machine, operational, and system data in order to manage and optimize production. It is used in critical functions – such as planning, quality, and maintenance – because it has the ability to predict future use, avoid failures, forecast maintenance requirements, and identify other areas for improvement. 

To improve efficiency and remain competitive in today’s market, manufacturing companies need to undergo a digital transformation to change the way their data is collected. Traditionally, manufacturers capture data in a fragmented manner: their staff manually check and record factors, fill forms, and note operation and maintenance histories for machines on the floor. These practices are susceptible to human error, and as a result, risk being highly inaccurate. Moreover, these manual processes are extremely time-consuming and open to biases. 

Manufacturing analytics solves these common issues. It collects data from connected devices, which reduces the need for manual data collection and, thereby, cuts down the labor associated with traditional documentation tasks. Additionally, its computational power removes the potential errors and biases that traditional methods are prone to. 

Because manufacturing equipment collects massive volumes of data via sensors and edge devices, the most efficient and effective way to process this data is to feed the data to a cloud-based manufacturing analytics platform. Without the power of cloud computing, manufacturers are generating huge amounts of data, but losing out on potential intelligence they have gathered. 

Cloud-based services provide a significant opportunity for manufacturers to maximize their data collection. The cloud provides manufacturers access to more affordable computational power and more advanced analytics. This enables manufacturing organizations to gather information from multiple sources, utilize machine learning models, and ultimately discover new methods to optimize their processes from beginning to end. 

Additionally, manufacturing analytics uses advanced models and algorithms to generate insights that are near-real-time and much more actionable. Manufacturing analytics powered by automated machine data collection unlocks powerful use cases for manufacturers that range from monitoring and diagnosis to predictive maintenance and process automation. 

Use Cases for Cloud-Based Manufacturing Analytics

The ultimate goal of cloud-based analytics is to transition from having descriptive to predictive practices. Rather than just simply collecting data, manufacturers want to be able to leverage their data in near-real-time to get ahead of issues with equipment and processes and to reduce costs. Below are some business use cases for automated manufacturing analytics and how they help enterprises achieve predictive power:

Demand Forecasting and Inventory Management

Manufacturers need to have complete control of their supply chain in order to better manage inventory. However, demand planning is complex. Manufacturing analytics makes this process simpler by providing near-real-time floor data to support supply chain control, which leads to improved purchase management, inventory control, and transportation. The data provides insight into the time and costs needed to build parts and run a given job, which gives manufacturers the power to more accurately estimate their needs for material to improve planning. 

Managing Supply Chains

For end-to-end visibility in the supply chain, data can be captured from materials in transit and sent straight from external vendor equipment to the manufacturing analytics platform. Manufacturers can then manage their supply chains from a central hub of data collection that organizes and distributes the data to all stakeholders. This enables manufacturing companies to direct and redirect resources to speed up or down. 

Price Optimization

In order to optimize pricing strategies and create accurate cost models, manufacturers need exact timelines and costs. Having an advanced manufacturing analytics platform can help manufacturers determine accurate cycle times to ensure prices are appropriately set. 

Product Development

To remain competitive, manufacturing organizations must invest in research and development (R&D) to build new product lines, improve existing models, and introduce new services. Manufacturing analytics makes it possible for this process to be simulated, rather than using traditional iterative modeling. This reduces R&D costs greatly because real-life conditions can be replicated virtually to predict performance. 

Robotization

Manufacturers are relying more on robotics. As these robots become more intelligent and independent, the data they collect while they execute their duties will increase. This valuable data can be used within a cloud-based manufacturing analytics platform to really control quality at the micro-level. 

Computer Vision Applications

Modern automated quality control harnesses advanced optical devices. These devices can collect information via temperature, optics, and other advanced vision applications (like thermal detection) to precisely control stops.

Fault Prediction and Preventative Maintenance

Using near-real-time data, manufacturers can predict the likelihood of a breakdown – and when it may happen – with confidence. This is much more effective than traditional preventive maintenance programs that are use-based or time-based. Manufacturing analytics’s accuracy to predict when and how a machine will break down allows technicians to perform optimal repairs that reduce overall downtime and increase productivity. 

Warranty Analysis

It’s important to analyze information from failed products to understand how products are withstanding the test of time. With manufacturing analytics, products can be improved or changed to reduce failure and therefore costs. Collecting warranty data can also shed light on the use (and misuse) of products, increase product safety, improve repair procedures, reduce repair times, and improve warranty service. 

Benefits of Manufacturing Analytics

In short, cloud-based manufacturing analytics provides awareness and learnings on a near-real-time basis. For manufacturers to be competitive, contextual awareness is crucial for optimizing product development, quality, and costs. Production equipment generates huge volumes of data, and manufacturing analytics allows manufacturers to leverage this data stream to improve productivity and profitability. Here are the tangible benefits and results of implementing manufacturing analytics:

Full Transparency and Understanding of the Supply Chain

In today’s environment, owning the supply chain has never been more critical. Data analytics can help mitigate the challenges that have cropped up with the current supply chain crisis. For manufacturing businesses, this means having the right number of resources. Data analytics allows manufacturers to remain as lean as possible, which is especially important in today’s global climate. Organizations need to use data analytics to ensure they have the right amount of material and optimize their supply chains during a time when resources are scarce and things are uncertain. 

Reduced Costs

Manufacturing analytics reveals insights that can be used to optimize processes, which leads to cost savings. Predictive maintenance programs decrease downtime and manage parts inventories more intelligently, limiting costs and increasing productivity. Robotics and machine learning reduce labor and the associated costs. 

Increased Revenue

Manufacturers must be dynamic in responding to demand fluctuations. Near-real-time manufacturing analytics allows companies to be responsive to ever-changing demands. At any given time, manufacturing companies have up-to-date insights into inventory, product, and supply chains, allowing them to adjust to demand accordingly in order to maintain delivery times. 

Improved Efficiency Across the Board

The amount of information that product equipment collects enables manufacturers to increase efficiency in a variety of ways. This includes reducing energy consumption, mitigating compliance errors, and controlling the supply chain. 

Greater Customer Satisfaction

At the end of the day, it is important to know what customers want. Data analytics is a crucial tool in collecting data from customer feedback, which can be applied to streamlining the process per the customer’s requirements. Manufacturers can analyze the data collected to determine how to personalize services for their consumers, thereby, increasing customer satisfaction. 

Conclusion

The effects of COVID-19 have shaken up the manufacturing industry. Because of the pandemic’s disruptions, manufacturers are realizing the importance of robust tools – like cloud computing and data analytics – to remain agile, lean, and flexible regardless of external challenges. The benefits that organizations can reap from these technologies go far beyond the horizon of the current supply chain crisis. Leading manufacturers are using data from systems across the organization to increase efficiency, drive innovation, and improve overall performance in any environment.

2nd Watch’s experience managing and optimizing data means we understand industry-specific data and systems. Our manufacturing data analytics solutions and consultants can assist you in building and implementing a strategy that will help your organization modernize, innovate, and outperform the competition. Learn more about our manufacturing solutions and how we can help you gain deep insight into your manufacturing data!

rss
Facebooktwitterlinkedinmail

How to Build a Data Warehouse for the Insurance Industry

Insurance is a data-heavy industry with a huge upside to leveraging business intelligence. Today, we will discuss the approach we use at 2nd Watch to build out a data warehouse for insurance clients.

How to Build a Data Warehouse for the Insurance Industry

Understand the Value Chain and Create a Design

At its most basic, the insurance industry can be described by its cash inflows and outflows (e.g., the business will collect premiums based on effective policies and payout claims resulting from accidents). From here, we can describe the measures that are relevant to these activities:

  • Policy Transactions: Quote, Written Premium, Fees, Commission
  • Billing Transactions: Invoice, Taxes
  • Claim Transactions: Payment, Reserve
  • Payment transactions: Received amount

From these four core facts, we can collaborate with subject matter experts to identify the primary “describers” of these measures. For example, a policy transaction will need to include information on the policyholder, coverage, covered items, dates, and connected parties. By working with the business users and analyzing the company’s front-end software like Guidewire or Dovetail, we can design a structure to optimize reporting performance and scalability.

Develop a Data Flow

Here is a quick overview:

  1. Isolate your source data in a “common landing area”: We have been working on an insurance client with 20+ data sources (many acquisitions). The first step of our process is to identify the source tables that we need to build out the warehouse and load the information in a staging database. (We create a schema per source and automate most of the development work.)
  2. Denormalize and combine data into a data hub: After staging the data in the CLA, our team creates “Get” Stored Procedures to combine the data into common tables. For example, at one client, we have 13 sources with policy information (policy number, holder, effective date, etc.) that we combined into a single [Business].[Policy] table in our database. We also created tables for tracking other dimensions and facts such as claims, billing, and payment.
  3. Create a star schema warehouse: Finally, the team loads the business layer into the data warehouse by assigning surrogate keys to the dimensions, creating references in the facts, and structuring the tables in a star schema. If designed correctly, any modern reporting tool, from Tableau to SSRS, will be able to connect to the data warehouse and generate high-performance reporting.

Produce Reports, Visualizations, and Analysis

By combining your sources into a centralized data warehouse for insurance, the business has created a single source of the truth. From here, users have a well of data to extract operational metrics, build predictive models, and generate executive dashboards. The potential for insurance analytics is endless: premium forecasting, geographic views, fraud detection, marketing, operational efficiency, call-center tracking, resource optimization, cost comparisons, profit maximization, and so much more!

rss
Facebooktwitterlinkedinmail

3 Reasons to Implement a Data Vault Model (and 2 Reasons Not to)

One of the first steps we take for any client seeking an enterprise data warehouse or an updated reporting solution is to determine the best possible way to organize their information by developing a data model. We often consider the benefits of two forms of data models when deciding which would be best for our clients. The first is a relational model; the second, which many clients are less familiar with, is a data vault model.

Both options feed information into the data warehouse used for reporting, but there are some key differences to consider. Making the right decision around how source data is modeled and organized can make or break the performance and quality of your organization’s ETL process and reporting.

Looking for the right path to data modernization? Get started with our 60-minute data architecture assessment and how it will get you there.

In this post, we’ll focus on the data vault model. We’ll review what it is and explore scenarios around when to use it and when not to use it.

Implement a Data Vault Model

What Is a Data Vault Model?

A data vault model, more commonly referred to as simply data vault, is a practice of organizing data that separates structural information, such as a table’s unique identifier or foreign key relationships, from its attributes. It was created to enable storage and auditing of historical information, allow for parallel loading, and allow organizations with many source systems to scale without needing to redesign the entire solution. It adds flexibility and scales easily, making it great for growing organizations that would normally encounter many redesigns of their data solution. To achieve these benefits, the model is comprised of three basic table types:

  1. Hub tables hold all unique business keys of a subject. For example, HUB_EMPLOYEE may use an employee number to identify a unique employee.
  2. Link tables track all relationships between hubs. For example, LINK_EMPLOYEE_STORE would track the relationship between an employee and the stores they work at.
  3. Satellite tables hold any attributes related to a link or hub and update them as they change. For example, SAT_EMPLOYEE may feature attributes such as the employee’s name, role, or hire date.

3 Reasons to Use Data Vault

Reason 1: You have multiple source systems and relationships that change frequently.
Data vault provides the most benefits when your data comes from many source systems or has constantly changing relationships. Data vault works well for systems with these characteristics because it makes adding attributes simple. If there is a change to only one source system, that change doesn’t have to show up for all source systems. Similarly, you can limit the number of places changes are made because attributes are stored separately from structural data in satellites. Additionally, it is easier to account for new and changing relationships by closing off one link and creating another. You don’t have to change the historical data to account for a new relationship or update an existing schema; you only need to account for the changes going forward.

Reason 2: You need to be able to easily track and audit your data.
Data vault inherently enables auditing, as load times and record sources are required for every row. It also tracks a history of all changes as satellites include the load time as part of the primary key. When an attribute is updated, a new record is created. All of this auditing enables you to easily provide auditability for both regulatory and data governance purposes. Because you store all of your history, you can access data from any point in time.

Reason 3: You need data from multiple systems to load quickly.
Data vault also enables quicker data loading because many of the tables can be loaded at the same time in parallel. The model decreases dependencies between tables during the load process and simplifies the ingestion process by leveraging inserts only, which load quicker than upserts or merges.

When Would I Not Use Data Vault?

Reason 1: You need to load data directly into your reporting tool.
First and foremost, a data vault model should never directly feed into your reporting tool. Due to the necessity of the three types of tables, it would require your reporting tool to marry together all related tables to report on one subject area. These joins would slow down report performance and introduce the opportunity for error because reporting tools are not meant to do that form of data manipulation. The data vault model would need to feed data into a dimensional model or have an added reporting layer to enhance report performance. If you plan to implement a data model that can be directly reported on, you should create a dimensional model.

Reason 2: You only have one source system and/or relatively static data.
Another situation where data vaults are not a great fit is when data is relatively static or comes from a single source system. In these cases, you won’t be able to glean many of the benefits of data vault, and a dimensional model may be more simplistic and require less data manipulation. Implementing a data vault would require an increased amount of business logic where it is not needed.

Furthermore, data vault requires a lot more storage. Splitting up a subject area into three different tables essentially increases the number of tables by at least a multiple of three – not to mention the inserts-only nature of those tables. For these reasons, a data vault model is not worth implementing if your data is straightforward and the benefits mentioned above can be easily attained through a more simple dimensional model.

While some factors to consider are outlined above, this decision is often nuanced and falls within the gray areas. The impacts outlined above show up in unexpected ways and become more impactful over time. 2nd Watch has helped many organizations determine if reconsidering their architecture and implementing a data vault model is the answer for their challenges.

If you’re interested in learning more about data modeling, data architecture, or data management in general, fill out our quick contact form to set up a free, no-obligation 60-minute data architecture assessment.

rss
Facebooktwitterlinkedinmail

What Is Looker and Why Might You Need It?

Maybe you’re venturing into data visualization for the first time, or maybe you’re interested in how a different tool could better serve your business. Either way, you’re likely wondering, “What is Looker?” and, “Could it be right for us?” In this blog post, we’ll go over the benefits of Looker, how it compares to Power BI and Tableau, when you may want to use Looker, and how to get started if you decide it’s the right tool for your organization.

What Is Looker and Why Might You Need It?

What is Looker?

Looker is a powerful business intelligence (BI) tool that can help a business develop insightful visualizations. It offers a user-friendly workflow, is completely browser-based (eliminating the need for desktop software), and facilitates dashboard collaboration. Among other benefits, users can create interactive and dynamic dashboards, schedule and automate the distribution of reports, set custom parameters to receive alerts, and utilize embedded analytics.

How is Looker different?

We can’t fully answer “What is Looker?” without seeing how it stacks up against competitors:

How is Looker different?

Does Looker fit into my analytics ecosystem?

When to Use Looker

If you’re looking for customized visuals, collaborative dashboards, and a single source of truth, plus top-of-the-line customer support, Looker might be the best BI platform for you. Being fully browser-based cuts down on potential confusion as your team gets up and running, and pricing customized to your company means you get exactly what you need to meet your company’s analytics goals.

When Not to Use Looker

If you’ve already bought into the Microsoft ecosystem, Power BI is your best bet. Introducing another tool will likely only create confusion and increase costs.

When someone says “Tableau,” the first thing that comes to mind is how impressive the visuals are. If you want the most elegant visuals and a platform that’s intuitive for analysts and business users alike, you may want to go with Tableau.

How do I get started using Looker?

You can get started using Looker in four basic steps:

1. Determine if your data is analytics-ready.

Conduct an audit of where your data is stored, what formats are used, etc. You may want to consider a data strategy project before moving forward with a BI platform implementation.

2. Understand your company’s BI needs and use cases.

Partner with key stakeholders across the business to learn how they currently use analytics and how they hope to use more advanced analytics in the future. What features do they or their staff need in a BI tool?

3. Review compliance and data governance concerns.

When in conversation with those key stakeholders, discuss their compliance and data governance concerns as well. Bring your technology leaders into the discussion to get their valuable perspectives. You should have an enterprise-wide stance on these topics that informs any additions to your tech stack.

4. Partner with a trusted resource to ensure a smooth implementation.

Our consultants’ hands-on experience with Looker can contribute to a faster, simpler transition. Plus, 2nd Watch can transfer the necessary knowledge to make sure your team is equipped to make the most of your new BI tool. We can even help with the three previous steps, guiding the process from start to finish.

If you still have questions about if Looker is worth considering for your organization, or if you’re ready to get started with Looker, contact us here.

Partner with a trusted resource to ensure a smooth implementation

rss
Facebooktwitterlinkedinmail

What is data build tool (dbt) and how is it different?

At 2nd Watch, we’re always keeping an eye on up-and-coming technologies. We investigate, test, and test some more to make sure we fully understand the benefits and potential drawbacks of any technology we may recommend to our clients. One unique tool we’ve recently spent quality time with is data build tool (dbt).

What is Data Build Tool (DBT)

What is dbt?

Before loading data into a centralized data warehouse, it must be cleaned up, made consistent, and combined as necessary. In other words, data must be transformed – the “T” in ETL (extract, transform, load) and ELT. This allows an organization to develop valuable, trustworthy insights through analytics and reporting.

Dbt enables data analysts and data engineers to automate the testing and deployment of the data transformation process. This is especially useful because many companies have increasingly complex business logic behind their reporting data. The dbt tool keeps a record of all changes made to the underlying logic and makes it easy to trace data and update or fix the pipeline through version control.

What is DBT?

Where does dbt fit in the market?

Dbt has few well-adopted direct competitors in the enterprise space, as no tool on the market offers quite the same functionality. Dbt does not extract or load data to/from a warehouse; it focuses only on transforming data after it has been ingested.

Some complementary tools are Great Expectations, Flyway, and Apache Airflow. Let’s take a closer look:

Apache Airflow

Airflow assists with ETL by creating automated processes, including pipelines and other operations commonly found in the orchestration workflow. It can integrate into a data warehouse, run commands, and operate off of a DAG similar to dbt’s; but it isn’t designed for full querying work. The dbt tool has a fleshed out front-end interface for query development and coding, whereas Airflow focuses more on the actual flow of data in its interface.

Flyway

Flyway is a version control system that tracks updates made to tables in a data warehouse. It doesn’t allow for editing, merely easing the migration process for teams with different sets of code. Flyway advances documentation in a separate environment, while dbt manages this via integrations with services like GitHub and DevOps.

Great Expectations

Great Expectations allows you to create comprehensive tests that run against your database, but it isn’t integrated with other ETL features. Unlike dbt, it doesn’t allow for any editing of the actual database.

What should you know about the dbt tool?

Dbt has a free open source version and a paid cloud version in which they manage all of the infrastructure in a SaaS offering. In 2020, they introduced the Integrated Developer Environment, which coincided with dbt pricing updates. Read more about the dbt cloud environment and dbt pricing here.

Dbt’s key functions include the following:

Testing

  • Dbt tests data quality, integration, and code performance. Quality is built into the tool, and the others can be coded and run in dbt (automatically in some cases).
  • Create test programs that check for missing/incomplete entries, unique constraints, and accepted values within specific columns.
  • Manually run scripts that will then run automated tests and deploy changes after passing said tests. Notifications can be programmed to be sent out if a certain test fails.

Deployment

  • Dbt has a built-in package manager that allows analysts and engineers to publish both public and private repositories. These can then be referenced by other users.
  • Deploy a dbt project after merging updated code in git.
  • Updates to the server can run on a set schedule in git.

Documentation

  • Dbt automatically creates a visual representation of how data flows throughout an organization.
  • Easily create documentation through schema files.
  • Documents are automatically generated and accessible through dbt, with the ability to send files in deployment. Maps are created to show the flow of data through each table in the ETL process.

One other thing to know about dbt is that you can use Jinja, a coding language, in conjunction with SQL to establish macros and integrate other functions outside of SQL’s capabilities. Jinja is particularly helpful when you have to repeat calculations or need to condense code. Using Jinja will enhance SQL within any dbt project, and our dbt consultants are available to help you harness Jinja’s possibilities within dbt.

Where could dbt fit in your tech ecosystem?

As previously mentioned, dbt has a free open source version and a paid cloud version, giving your company flexibility in budget and functionality to build the right tech stack for your organization. Dbt fits nicely with an existing modern technology stack with native connections to tools such as Stitch, Fivetran, Redshift, Snowflake, BigQuery, Looker, and Mode.

With dbt, data analysts and data engineers are able to more effectively transform data in your data warehouses by easily testing and deploying changes to the transformation process, and they gain a visual representation of the dependencies at each stage of the process. Dbt allows you to see how data flows throughout your organization, potentially enhancing the results you see from other data and analytics technologies.

Are you ready to discuss implementing an enterprise data initiative? Contact with on of our data consulting experts.

Where could dbt fit in your tech ecosystem

rss
Facebooktwitterlinkedinmail

Accelerating Application Development with DevOps

If you moved to the cloud to take advantage of rapid infrastructure deployment and development support, you understand the power of quickly bringing applications to market. Gaining a competitive edge is all about driving customer value fast. Immersing a company in a DevOps transformation is one of the best ways to achieve speed and performance.

In this blog post, we’re building on the insights of Harish Jayakumar, Senior Manager of Application Modernization and Solutions Engineering at Google, and Joey Yore, Manager, and Principal Consultant at 2nd Watch. See how the highest performing teams in the DevOps space are achieving strong availability, agility, and profitability with application development according to key four metrics. Understand the challenges, solutions, and potential outcomes before starting your own DevOps approach to accelerating app development.

Hear Harish and Joey on the 2nd Watch Cloud Crunch podcast, 5 Strategies to Maximize Your Cloud’s Value: Strategy 2 – Accelerating Application Development with DevOps  

Accelerating Application Development with DevOps

What is DevOps?

Beyond the fact that DevOps combines software development (Dev) and IT operations (Ops), DevOps is pretty hard to define. Harish thinks the lack of a clinical, agreed-upon definition is by design. “I think everyone is still learning how to get better at building and operating software.” With that said, he describes his definition of DevOps as, “your software delivery velocity, and the reliability of it. It’s basically a cultural and organizational moment that aims to increase software reliability and velocity.”

The most important thing to remember about a DevOps transformation and the practices and principles that make it possible is culture. At its core, DevOps is a cultural shift. Without embracing, adopting, and fostering a DevOps culture, none of the intended outcomes are possible.

Within DevOps there are five key principles to keep top of mind:

  1. Reduce organizational silos
  2. Accept failure as the norm
  3. Implement gradual changes
  4. Leverage tooling and automation
  5. Measure

Measuring DevOps: DORA and CALMS

Google acquired DevOps Research and Assessment (DORA) in 2018 and relies on the methodology developed from DORA’s annual research to measure DevOps performance. “DORA follows a very strong data-driven approach that helps teams leverage their automation process, cultural changes, and everything around it,” explains Harish. Fundamental to DORA are four key metrics that offer a valid and reliable way to measure the research and analysis of any kind of software delivery performance. These metrics gauge the success of DevOps transformations from ‘low performers’ to ‘elite performers’.

  1. Deployment frequency: How often is the organization successfully released to production
  2. Lead time for changes: The amount of time it takes a commit to get into production
  3. Change failure rate: The percentage of deployments causing a failure in production
  4. Time to restore service: How long it takes to recover from a failure in production

DORA is similar to the CALMS model which addresses the five fundamental elements of DevOps starting with where the enterprise is today and continuing throughout the transformation. CALMS also uses the four key metrics identified by DORA to evaluate DevOps performance and delivery. The acronym stands for:

Culture: Is there a collaborative and customer-centered culture across all functions?

Automation: Is automation being used to remove toil or wasted work?

Lean: Is the team agile and scrappy with a focus on continuous improvement?

Measurement: What, how, and against what benchmarks is data being measured?

Sharing: To what degree are teams teaching, sharing, and contributing to cross-team collaboration?

DevOps Goals: Elite Performance for Meaningful Business Impacts

Based on the metrics above, organizations fall into one of four levels: low, medium, high, or elite performers. The aspiration to achieve elite performance is driven by the significant business impact these teams have on their overall organization. According to Harish, and based on research by the DORA team at Google, “It’s proven that elite performers in the four key metrics are 3.56 times more likely to have a stronger availability practice. There’s a strong correlation between these elite performers and the business impact of the organization that they’re a part of. ”

He goes on to say, “High performers are more agile. We’ve seen 46 times more frequent deployments from them. And it’s more reliable. They are five times more likely to exceed any profitability, market share, or productivity goals on it.” Being able to move quickly enables these organizations to deliver features faster, and thus increase their edge or advantage over competitors.

Focusing on the five key principles of DevOps is critical for going from ideation to implementation at a speed that yields results. High and elite performers are particularly agile with their use of technology. When a new technology is available, DevOps teams need to be able to test, apply, and utilize it quickly. With the right tools, teams are alerted immediately to code breaks and where that code resides. Using continuous testing, the team can patch code before it affects other systems. The results are improved code quality and accelerated, efficient recovery. You can see how each pillar of DevOps – from culture and agility to technology and measurement, feeds into one another to deliver high levels of performance, solid availability, and uninterrupted continuity.

Overcoming Common DevOps Challenges

Common DevOps ChallengesBecause culture is so central to a DevOps transformation, most challenges can be solved through cultural interventions. Like any cultural change, there must first be buy-in and adoption from the top down. Leadership plays a huge role in setting the tone for the cultural shift and continuously supporting an environment that embraces and reinforces the culture at every level. Here are some ways to influence an organization’s cultural transformation for DevOps success.

  • Build lean teams: Small teams are better enabled to deliver the speed, innovation, and agility necessary to achieve across DevOps metrics.
  • Enable and encourage transparency: Joey says, “Having those big siloed teams, where there’s a database team, the development team, the ops team – it’s really, anti-DevOps. What you want to start doing is making cross-functional teams to better aid in knocking down those silos to improve deployment metrics.”
  • Create continuous feedback loops: Among lean, transparent teams there should be a constant feedback loop of information sharing to influence smarter decision making, decrease redundancy, and build on potential business outcomes.
  • Reexamine accepted protocols: Always be questioning the organizational and structural processes, procedures, and systems that the organization grows used to. For example, how long does it take to deploy one line of change? Do you do it repeatedly? How long does it take to patch and deploy after discovering a security vulnerability? If it’s five days, why is it five days? How can you shorten that time? What technology, automation, or tooling can increase efficiency?
  • Measure, measure, measure: Utilize DORAs research to establish elite performance benchmarks and realistic upward goals. Organizations should always be identifying barriers to achievement and continuously improving on measurements toward improvement.
  • Aim for total performance improvements: Organizations often think they need to choose between performance metrics. For example, in order to influence speed, stability may be negatively affected. Harish says, “Elite performers don’t see trade-offs,” and points to best practices like CICD, agile development, and tests, built-in automation, standardized platform and processes, and automated environment provisioning for comprehensive DevOps wins.
  • Work small: Joey says, “In order to move faster, be more agile, and accelerate deployment, you’re naturally going to be working with smaller pieces with more automated testing. Whenever you’re making changes on these smaller pieces, you’re actually lowering your risk for anyone’s deployment to cause some sort of catastrophic failure. And if there is a failure, it’s easy to recover. Minimizing risk per change is a very important component of DevOps.”

Learn more about avoiding common DevOps issues by downloading our eBook, 7 Major Roadblocks in DevOps Adoption and How to Address Them

Ready to Start Your DevOps Transformation?

Both Harish and Joey agree that the best approach to starting your own DevOps transformation is one based on DevOps – start small. The first step is to compile a small team to work on a small project as an experiment. Not only will it help you understand the organization’s current state, but it helps minimize risk to the organization as a whole. Step two is to identify what your organization and your DevOps team are missing. Whether it’s technology and tooling or internal expertise, you need to know what you don’t know to avoid regularly running into the same issues.

Finally, you need to build those missing pieces to set the organization up for success. Utilize training and available technology to fill in the blanks, and partner with a trusted DevOps expert who can guide you toward continuous optimization.

2nd Watch provides Application Modernization and DevOps Services to customize digital transformations. Start with our free online assessment to see how your application modernization maturity compares to other enterprises. Then let 2nd Watch complete a DevOps Transformation Assessment to help develop a strategy for the application and implementation of DevOps practices. The assessment includes analysis using the CALMS model, identification of software development and level of DevOps maturity, and delivering tools and processes for developing and embracing DevOps strategies.

call to action

 

rss
Facebooktwitterlinkedinmail

26 Quick Tips to Save You Money on Your Snowflake Deployment

One of the benefits to 2nd Watch’s partnership with Snowflake is access to advanced trainings and certifications. Combined with our Snowflake project work, these trainings help us become Snowflake experts and find opportunities to assist our clients in making Snowflake work better for them. The most recent training helped us identify some important tactics for solving one of clients’ biggest concerns: How do I optimize for cost? Here is a list of actions that you should take to make sure you are not overspending on your Snowflake computation or storage.

First, for context, here is an extremely simplified diagram of how Snowflake is functioning in the background:

Snowflake Deployment

Since the most expensive part of any Snowflake deployment is compute, we have identified some useful tactics to store data strategically for efficient reads, write supercharged SQL scripts, and balance your performance vs cost.

Loading

Although very different than storing data on traditional disk, there are many benefits to loading Snowflake data strategically.

1. Sort on ingestion: Data is automatically partitioned in SF on natural ingestion order.

– Sorting an S3 bucket (using something like syncsort) before bulk load via copy could be way faster than inserting with an order by

2. CSV (Gzipped) is the best format for loading to SF (2-3x faster than Parquet or ORC).

3. Use COPY INTO instead of INSERT because it utilizes the more efficient bulk loading processes.

Sizing

Take advantage of the native cloud ability to scale, create, and optimize your compute resources.

4. Scale up or out appropriately.

– As seen above, when you run a query, Snowflake will:

+ Find required FDN files.

+ Pull files down into SSD VMs. (Note: If >160 GB for AWS or >400 GB for Azure, will spill over to remote IO.)

+ Perform compute.

+ Keep files on VM until DW is suspended.

1 big query = increase size of data warehouse

Lots of small queries = queries are queuing = increase # of DWs or # of clusters (if enterprise, you can enable multi-cluster)

5. Turn your VW on and off for certain workloads.

Turn on for batch, then immediately turn off (no reason to wait for auto-suspend).

Use auto-resume when it makes sense.

6. Control query processing and concurrency with parameters.

Max_concurrency_level/p>

Statement queued timeout in seconds

Statement timeout in seconds

7. Use warehouse monitoring to size and limit cost per workload (not per database → this is a shift from the on-prem mentality).

If your workload is queuing, then add more clusters.

If your workload is slow with no queuing, then size up.

Data Modeling

Often overlooked, organizing your information into a mature data model will allow for high-performance SQL scripting better caching potential. Shameless plug, this is 2nd Watch’s bread and butter. Please reach out for a free working session to discuss data modeling for your company.

8. Do a data model for analytics.

Star Schema, 3NF, and data vault are optimal for SF.

Snowflake is NOT ideal for OLTP workloads.

9. Bake your constraints into design because SF DOES NOT enforce them.

Build queries to check for violations

10. Build a process to alert you of loading issues (use an ETL framework).

Information_Schema.load_history

2nd Watch ETL Toolkit

Tracking Usage

Snowflake preserves a massive amount of usage data for analysis. At the very least, it allows you to see which workflows are the most expensive.

11. Use Account Usage Views (eg warehouse_metering_history) for tracking history, performance, and cost.

12. Don’t use AccountAdmin or Public roles for creating objects or accessing data (only for looking at costs)…create securable objects with the “correct” role and integrate new roles into the existing hierarchy.

– Create roles by business functions to track spending by line of business.

13. Use Resource Monitors to cut off DWs when you hit predefined credit amount limits.

Create one resource monitor per DW.

Enable notifications.

Performance Tuning

The history profiler is the primary tool to observe poorly written queries and make the appropriate changes.

14. Use history profiler to optimize queries.

Goal is to put the most expensive node in the bottom right hand corner of profiler diagram.

SYSTEM$CLUSTERING_DEPTH shows how effective the partitions are – the smaller the average depth, the better clustered the table is with regards to the specified columns.

+ Hot tip: You can add a new automatic reclustering service, but I don’t think it is worth the money right now.

15. Analyze Bytes Scanned: remote vs cache.

Make your Bytes Scanned column use “Cache” or “Local” memory most of the time, otherwise consider creating a cluster key to scan more efficiently.

16. Make the ratio of partitions scanned to partition used as small as possible by pruning.

SQL Coding

The number one issue driving costs in a Snowflake deployment is poorly written code! Resist the tendency to just increase the power (and therefore the cost) and focus some time on improving your SQL scripts.

17. Drop temporary and transient tables when done using.

18. Don’t use “CREATE TABLE AS”; SF hates trunc and reloads for time travel issues. Instead, use “CREATE OR REPLACE.”

Again, Use COPY INTO not INSERT INTO.

Use staging tables to manage transformation of imported data.

Validate the data BEFORE loading into SF target tables.

19. Use ANSI Joins because they are better for the optimizer.

Use “JOIN ON a.id = b.id” format.

NOT the “WHERE a.id=b.id”.

20. Use “WITH” clauses for windowing instead of temp tables or sub-selects.

21. Don’t use ORDER BY. Sorting is very expensive!

Use integers over strings if you must order.

22. Don’t handle duplicate data using DISTINCT or GROUP BY.

Storing

Finally, set up the Snowflake deployment to work well in your entire data ecosystem.

23. Locate your S3 buckets in the same geographic region.

24. Set up the buckets to match how the files are coming across (eg by date or application).

25. Keep files between 60-100 MB to take advantage of parallelism.

26. Don’t use materialized views except in specific use cases (e.g., pre-aggregating).

Snowflake is shifting the paradigm when it comes to data warehousing in the cloud. However, by fundamentally processing data differently than other solutions, Snowflake has a whole new set of challenges for implementation.

Whether you’re looking for support implementing Snowflake or need to drive better performance, 2nd Watch’s Snowflake Value Accelerator will help save you money on your Snowflake investment. Click here to learn more.

rss
Facebooktwitterlinkedinmail

3 Questions to Help You Build Your Analytics Roadmap

In our experience, many analytics projects have the right intentions such as:

  • A more holistic view of the organization
  • More informed decision making
  • Better operational and financial insights

With incredible BI and analytics tools such as Looker, Power BI, and Tableau on the market, it’s tempting to start by selecting a tool believing it to be a silver bullet. While these tools are all excellent choices when it comes to visualization and analysis, the road to successful analytics starts well before tool selection.

So where do you begin? By asking and answering a variety of questions for your organization, and building a data analytics roadmap from the responses. From years of experience, we’ve seen that this process (part gap analysis, part soul-searching) is non-negotiable for any rewarding analytics project.

Building an Advanced Data Analytics Roadmap

Give the following questions careful consideration as you run your current state assessment:

How Can Analytics Support Your Business Goals?

There’s a tendency for some stakeholders not immersed in the data to see analytics as a background process disconnected from the day to day. That mindset is definitely to their disadvantage. When businesses fixate on analytical tools without a practical application, they put the cart before the horse and end up nowhere fast. Yet when analytics solutions are purposeful and align with key goals, insights appear faster and with greater results.

One of our higher education clients is a perfect example. Their goal? To determine which of their marketing tactics were successful in converting qualified prospects into enrolled students. Under the umbrella of that goal, their stakeholders would need to answer a variety of questions:

  • How long was the enrollment process?
  • How many touchpoints had enrolled students encountered during enrollment?
  • Which marketing solutions were the most cost effective at attracting students?

As we evaluated their systems, we recognized data from over 90 source systems would be essential to provide the actionable insight our client wanted. By creating a single source of truth that fed into Tableau dashboards, their marketing team was able to analyze their recruiting pipeline to determine the strategies and campaigns that worked best to draw new registrants into the student body.

This approach transcends industries. Every data analytics roadmap should reflect on and evaluate the most essential business goals. More than just choosing an urgent need or reacting to a surface level problem, this reevaluation should include serious soul-searching.

The first goals you decide to support should always be as essential to you as your own organizational DNA. When you use analytics solutions to reinforce the very foundation of your business, you’ll always get a higher level of results. With a strong use case in hand, you can turn your analytics project into a stepping stone for bigger and better things.

What Is Your Analytical Maturity?

You’re not going to scale Mt. Everest without the gear and training to handle the unforgiving high altitudes, and your organization won’t reach certain levels of analytical sophistication without hitting the right milestones first. Expecting more than you’re capable of out of an analytics project is a surefire path to self-sabotage. That’s why building a data analytics roadmap always requires an assessment of your data maturity first.

However, there isn’t a single KPI showing your analytical maturity. Rather, there’s a combination of factors such as the sophistication of your data structure, the thoroughness of your data governance, and the dedication of your people to a data-driven culture.

Here’s what your organization can achieve at different levels of data maturity:

  • Descriptive Analytics – This level of analytics tells you what’s happened in the past. Typically, organizations in this state rely on a single source system without the ability to cross-compare different sources for deeper insight. If there’s data quality, it’s often sporadic and not aligned with the big picture.
  • Diagnostic Analytics – Organizations at this level are able to identify why things happened. At a minimum, several data sets are connected, allowing organizations to measure the correlation between different factors. Users understand some of the immediate goals of the organization and trust the quality of data enough to run them through reporting tools or dashboards.
  • Predictive Analytics – At this level, organizations can anticipate what’s going to happen. For starters, they need large amounts of data – from internal and external sources – consolidated into a data lake or data warehouse. High data governance standards are essential to establish consistency and accuracy in analytical insight. Plus, organizations need to have complex predictive models and even machine learning programs in place to make reliable forecasts.
  • Prescriptive Analytics – Organizations at the level of prescriptive analytics are able to use their data to not only anticipate market trends and changing behaviors but act in ways that maximize outcomes. From end to end, data drives decisions and actions. Moreover, organizations have several layers of analytics solutions to address a variety of different issues.

What’s important to acknowledge is that each level of analytics is a sequential progression. You cannot move up in sophistication without giving proper attention to the prerequisite data structures, data quality, and data-driven mindsets.

For example, if an auto manufacturer wants to reduce their maintenance costs by using predictive analytics, there are several steps they need to take in advance:

  • Creating a steady feed of real-time data through a full array of monitoring sensors
  • Funneling data into centralized storage systems for swift and simple analysis
  • Implementing predictive algorithms that can be taught or learn optimal maintenance plans or schedules

Then, they can start to anticipate equipment failure, forecast demand, and improve KPIs for workforce management. Yet no matter your industry, the gap analysis between the current state of your data maturity and your goals is essential to designing a roadmap that can get you to your destinations fastest.

What’s the State of our Data?

Unfortunately for any data analytics roadmap, most organizations didn’t grow their data architecture in a methodical or intentional way. Honestly, it’s very difficult to do so. Acquisitions, departmental growth spurts, decentralized operations, and rogue implementations often result in an over-complicated web of data.

When it comes to data analysis, simple structures are always better. By mapping out the complete picture and current state of your data architecture, your organization can determine the best way to simplify and streamline your systems. This is essential for you to obtain a complete perspective from your data.

Building a single source of truth out of a messy blend of data sets was essential for one of our CPG clients to grow and lock down customers in their target markets. The modern data platform we created for their team consolidated their insight into one central structure, enabling them to track sales and marketing performance across various channels in order to help adjust their strategy and expectations. Centralized data sources offer a springboard into data science capabilities that can help them predict future sales trends and consumer behaviors – and even advise them on what to do next.

Are you building a data analytics roadmap and are unsure of what your current analytics are lacking? 2nd Watch can streamline your search for the right analytics fit. 

Call to action

rss
Facebooktwitterlinkedinmail

A Short Guide to Understanding Looker Pricing and Capabilities

Navigating the current BI and analytics landscape is often an overwhelming exercise. With buzzwords galore and price points all over the map, finding the right tool for your organization is a common challenge for CIOs and decision-makers. Given the pressure to become a data-driven company, the way business users analyze and interact with their data has lasting effects throughout the organization.

Looker pricing models

Looker, a recent addition to the Gartner Magic Quadrant, has a pricing model that differs from the per-user or per-server approach. Looker does not advertise their pricing model; instead, they provide a “custom-tailored” model based on a number of factors, including total users, types of users (viewer vs. editor), database connections, and scale of deployment.

Those who have been through the first enterprise BI wave (with tools such as Business Objects and Cognos) will be familiar with this approach, but others who have become accustomed to the SaaS software pricing model of “per user per month” may see an estimate higher than expected – especially when comparing to Power BI at $10/user per month. In this article, we’ll walk you through the reasons why Looker’s pricing is competitive in the market and what it offers that other tools do not.

Semantic and Governance Model

Unlike some of its competitors, Looker is not solely a reporting and dashboarding tool – it also acts as a data catalog across the enterprise. Looker requires users to think about their data and how they want their data defined across the enterprise.

Before you can start developing dashboards and visualizations, your organization must first define a semantic model (an abstraction of the database layer into business-friendly terms) using Looker’s native LookML scripting, which will then translate the business definitions into SQL. Centralizing the definitions of business metrics and models guarantees a single source of truth across departments. This will avoid a scenario where the finance department defines a metric differently than the sales or marketing teams, all while using the same underlying data. A common business model also eliminates the need for users to understand the relationships of tables and columns in the database, allowing for true self-service capabilities.

While it requires more upfront work, you will save yourself future headaches of debating why two different reports have different values or need to define the same business definitions in every dashboard you create.

By putting data governance front and center, your data team can make it easy for business users to create insightful dashboards in a few simple clicks.

Customization and Extensibility

At some point in the lifecycle of your analytics environment, there’s a high likelihood you will need to make some tweaks. Looker, for example, allows you to view and modify the SQL that is generated behind each visualization. While this may sound like a simple feature, a common pain point across analytics teams is trying to validate and tie out aggregations between a dashboard and the underlying database. Access to the underlying SQL not only lets analysts quickly debug a problem but also allows developers to tweak the auto-generated SQL to improve performance and deliver a better experience.

Another common complaint from users is the speed for IT to integrate data into the data warehouse. In the “old world” of Cognos and Business Objects, if your calculations were not defined in the framework model or universe, you would be unable to proceed without IT intervention. In the “new world” of Tableau, the dashboard and visualization are prioritized over the model. Looker brings the two approaches together with derived tables.

If your data warehouse doesn’t directly support a question you need to immediately answer, you can use Looker’s derived tables feature to create your own derived calculations. Derived tables allow you to create new tables that don’t already exist in your database. While it is not recommended to rely on derived tables for long-term analysis, it allows Looker users to immediately get speed-to-insight in parallel with the data development team incorporating it into the enterprise data integration plan.

Collaboration

Looker takes collaboration to a new level as every analyst gets their own sandbox. While this might sound like a recipe for disaster with “too many cooks in the kitchen,” Looker’s centrally defined, version-controlled business logic lives in the software for everyone to use, ensuring consistency across departments. Dashboards can easily be shared with colleagues by simply sending a URL or exporting directly to Google Drive, Dropbox, and S3. You can also send reports as PDFs and even schedule email delivery of dashboards, visualizations, or their underlying raw data in a flat file.

Embedded Analytics

Looker enables collaboration outside of your internal team. Suppliers, partners, and customers can get value out of your data thanks to the modern approach to embedded analytics. Looker makes it easy to embed dashboards, visuals, and interactive analytics to any webpage or portal because it works with your own data warehouse. You don’t have to create a new pipeline or pay for the cost of storing duplicate data in order to take advantage of embedded analytics.

So, is Looker worth the price?

Looker puts data governance front and center, which in itself is a decision your organization needs to make (govern first vs. build first). The addition of a centralized way to govern and manage your models is something that is often included as an additional cost in other tools, increasing the total investment when looking at competitors. If data governance and a centralized source of the truth is a critical feature of your analytics deployment, then the ability to manage this and avoid headaches of multiple versions of the truth makes Looker worth the cost.

call to action

If you’re interested in learning more or would like to see Looker in action, 2nd Watch has a full team of data consultants with experience and certifications in a number of BI platforms as well as a thorough understanding of how these tools can fit your unique needs. Get started with our data visualization starter pack.

 

rss
Facebooktwitterlinkedinmail

4 Key Differences between Data Lakes and Data Warehouses

Businesses today increasingly rely on data analytics to provide insights, identify opportunities, make important decisions, and innovate. Every day, a large amount of data (a.k.a Big Data) is generated from multiple internal and external sources that can and should be used by businesses to make informed decisions, understand their customers better, make predictions, and stay ahead of their competition.

difference between data lake and warehouse

For effective data-driven decisions, securely storing all this data in a central repository is essential. Two of the most popular storage repositories for big data today are data lakes and data warehouses. While both store your data, they each have different uses that are important to distinguish before choosing which of the two works best for you.

What is a Data Lake?

With large amounts of data being created by companies on a day-to-day basis, it may be difficult to determine which method will be most effective based on business needs and who will be using the data. To visualize the difference, each storage repository functions similarly to how it sounds. A data lake, for example, is a vast pool of raw, unstructured data. One piece of information in a data lake is like a small raindrop in Lake Michigan.

All the data in a data lake is loaded from source systems and none is turned away, filtered, or transformed until there is a need for it. Typically, data lakes are used by data scientists to transform data as needed. Data warehouses, on the other hand, have more organization and structure – like a physical warehouse building. These repositories house structured, filtered data that is used for a specific purpose. Still both repositories have many more layers to them than these analogies suggest.

To learn more about data lakes and their benefits, specifically with AWS Lake Formation, visit this post.

What is a Data Warehouse?

A data warehouse is the traditional, proven repository for storing data. Data warehouses use an ETL (Extract, Transform, Load) process, compared to data lakes, which use an ELT (Extract, Load, Transform) process. The data is filtered, processed, and loaded from multiple sources into the data warehouse once its use is defined. This structure, in turn, allows for its users to run queries in the SQL environment and get quick results. The users of data warehouses tend to be business professionals because once the data is fully processed, there is a highly structured and simplified data model designed for data analysis and reporting.

While the structure and organization provided by a data warehouse is appealing, one major downside you might hear about data warehouses is the time-consuming nature of changing them. Since the use of the data in data warehouses is already identified, the complexity of changing the data loading process for quick reporting takes developers a lengthy amount of time. When businesses want fast insights for decision-making, this can be a frustrating challenge if changes to the data warehouse need to be made.

In terms of cost, data warehouses tend to be more expensive in comparison to data lakes, especially if the volume of data is large. This is because of the accessibility, which is costly to make changes to. However, since a data warehouse weeds out data outside the identified profile, a significant amount of space is conserved reducing overall storage cost.

What are the Benefits of a Data Warehouse?

 While data lakes come with their own benefits, data warehouses have been used for decades compared to data lakes, proving their strong reliability and performance over time. Thus, there are several benefits that can be derived using a strong data warehouse, including the following:

  • Saves Time: With all your data loaded and stored into one place, a lot of time is saved from manually retrieving data from multiple sources. Additionally, since the data is already transformed, business professionals can query the data themselves rather than relying on an IT person to do it for them.
  • Strong Data Quality: Since a data warehouse consists only of data that is transformed, this refined quality removes data that is duplicated or inadequately recorded.
  • Improves Business Intelligence: As data within a data warehouse is extracted from multiple sources, everyone on your team will have a holistic understanding of your data to make informed decisions.
  • Provides Historical Data: A data warehouse stores historical data that can be utilized by your team to make future predictions and thus, more informed decisions.
  • Security: Data warehouses improve security by allowing certain security characteristics to be implemented into the setup of the warehouse.

Should I use a Data Lake or a Data Warehouse?

Due to their differences, there is not an objectively better repository when it comes to data lakes and data warehouses. However, a company might prefer one based on their resources and to fulfill their specific business needs. In some cases, businesses may transition from a data lake to a data warehouse for different reasons and vice versa. For example, a company may want a data lake to temporarily store all their data while their data warehouse is being built. In another case, such as our experience with McDonald’s France, a company may want a data lake for an ongoing data collection from a wide ranges of data sources to be used and analyzed later. The following are some of the key differences between data lakes and data warehouses that may be important in determining the best storage repository for you:

  • User type: When comparing the two, one of the biggest differences comes down to who is using the data. Data lakes are typically used by data scientists who transform the data when needed, whereas data warehouses are used by business professionals who need quick reports.
  • ELT vs ETL: Another major difference between the two is the ELT process of data lakes vs ETL process of data warehouses. Data lakes retain all data, while data warehouses create a highly structured data model that filters out the data that doesn’t match this model. Therefore, if your company wants to save all data for later use – even data that may never be used – then a data lake would be the choice for you.
  • Data type: As the more traditional repository, the data in data warehouses consists of data extracted from transaction systems with quantitative metrics and attributes to describe them. On the other hand, data lakes embrace non-traditional data types (e.g. web server logs, social network activity, etc.) and transforms it when it is ready to be used.
  • Adaptability: While a well-constructed data warehouse is highly effective, they do take a long time to change if changes need to be made. Thus, a lot of time can be spent getting the desired structure. Since data lakes store raw data, the data is more accessible when needed and a variety of schemas can be easily applied and discarded until there is one that has some reusability.

Furthermore, different industries may lean towards one or the other based on industry needs. Here’s a quick breakdown of what a few industries are using most commonly to store their data:

  • Education: A popular choice for storing data among education institutions are data lakes. This industry benefits from the flexibility provided by data lakes, as student grades, attendance, and other data points can be stored and transformed when needed. The flexibility also allows universities to streamline billing, improve fundraising, tailor to the student experience, and facilitate research.
  • Financial Services: Finance companies may tend to choose a data warehouse because they can provide quick, relevant insights for reporting. Additionally, the whole company can easily access the data warehouse due to their existing structure, rather than limiting access to data scientists.
  • Healthcare: In the healthcare industry, businesses have plenty of unstructured data including physicians notes, clinical data, client records, and any interaction a consumer has with the brand online. Due to these large amounts of unstructured data, the healthcare industry can benefit from a flexible storage option where this data can be safely stored and later transformed.

There is no black-and-white answer to which repository is better. A company may decide that using both is better, as a data lake can hold a vast amount of structured and unstructured data working alongside a well-established data warehouse for instant reporting. The accessibility of a good warehouse will be available to the business professionals of your organization, while data scientists use the data lake to provide more in-depth analyses.

Contact Us

Choosing between a data lake and data warehouse, or both, is important to how your data is stored, used, and who it is used by. If you are interested in setting up a data lake or data warehouse and want advisory on your next steps, 2nd Watch has a highly experienced and qualified team of experts to get your data to where it needs to be. Contact us to talk to one of our experts and take your next steps in your cloud journey.

rss
Facebooktwitterlinkedinmail