Should You Build or Buy Your ETL Tool? 8 Questions to Help You Decide

Analyzing raw data without a singular, standardized format is as fruitful as trying to understand all 193 UN delegates shouting in their native tongues. Something important is being said, but good luck figuring out what that is. But reformat that raw data and shift them from their disparate sources into a single data warehouse, and the message rings through as clear as a bell.

That is the benefit that extract, transform, load (ETL) processes provide to organizations. Yet before you can access the hidden patterns and meanings in your data, you need to decide how you want to acquire your ETL tool: build one from scratch or buy an automated solution. Here’s what to consider as you make your decision.

Just getting started? Check out our 60-minute data architecture assessment.

What’s the project size?

Often, a small project scope with simple data flow benefits from a custom build, allowing your organization to calibrate your ETL tool to your precise needs and spend less in the process. Small shops may have fewer technical resources, but they will spend as much time integrating a pre-built ETL tool as building up simple data flows from the ground up.

When the scope is a massive enterprise-level ETL framework, it makes more sense to engage with a preexisting ETL tool and accelerate your analytics timeline. Even then, we recommend a data management partner experienced in ETL processes, one that’s done the technical work of hooking the sources together and transforming the data numerous times. They know the accelerators to get your program up and running enterprise-wide.

What technology are you using?

Your current tech stack is always a consideration. For example, if you prefer open source technology or depend on a web of legacy systems for daily operations, building your own system eliminates the worry that your integration won’t work. Building your ETL program is also a preferred option for organizations with a custom or niche development environment that aims to use fewer computing resources or accelerate your performance.

On the other hand, GUI environments that value ease of use are better-suited for buying their ETL program. For example, we had an online e-commerce client with a few internal technical resources. They understood their current state and their source systems but did not want to deal with setting up the actual workflows. In that scenario, we determined that integrating a preexisting ETL solution into their ecosystem would help their team to load data and run reports more effectively.

What’s the shelf-life of your proposed solution?

How long you’ll use a specific ETL solution has significant influence on the decision to build or buy. If you need a quick-and-dirty, one-off load, it doesn’t make sense to invest $15,000+ per year for an ETL solution. If you have resources capable of scripting an ad hoc solution, utilizing their talents will achieve faster results.

On the other hand, companies that need a scalable or long-term strategic solution tend to lean toward a prepackaged ETL tool. Due to the evolving data sources and streams available in these organizations, a preexisting ETL tool in which ongoing development and integration are handled by the vendor is ideal. The only major challenge is ensuring that your team is maximizing your investment by using the full capacity of your vendor’s ETL solution. Fortunately, it’s a feat that’s more manageable if you work with a technical consulting partner like 2nd Watch.

What’s your budget?

This one is a little deceptive. Though there is an initial investment in building your own solution, you lack the ongoing subscription and the initial integration costs that are frequently overlooked in preliminary estimates. Additionally, buying ETL solutions often means you’ll be charged per source system being transformed and loaded into your data warehouse. So depending on the number of disparate sources and volume of data, the build option is a good way to avoid overspending on data ingestion.

Though large enterprises will still end up paying these costs, they can justify them as a trade-off for greater traceability and cataloging for the sake of compliance. The ability to track business data and smoothly conduct audits is more than enough for some organizations to defend the elevated price tag, especially if those organizations are in the healthcare or financial sectors.

Who will manage the ETL process?

Control is a significant consideration for plenty of organizations. For those who want to own the ETL system, building is the right choice. Often, this makes the most sense when you already have a custom infrastructure, legacy storage system, or niche analytics needs.

Yet not every organization wants to divert attention from their primary business. Let’s say you’re a healthcare organization that wants to build a comprehensive data warehouse from a myriad of data sources while still maintaining compliance. Trusting an experienced vendor removes a considerable amount of risk.

Do you need flexibility in your analytics?

What types of reports will you be running? Standard ones for your industry or business environment? Or reports that are particular to your own unique needs? Your answer heavily influences the choices you make about your ETL tool.

If you feel your demands upon a data warehouse will be uncommon, then building is the ideal choice. That way, your reporting isn’t curtailed to fit a preconceived notion of your needs. Hand-coding your own ETL program enables you to write scripts for whatever schemas or parameters you had in mind. The only limitation is your own technical capability or that of your data management consulting partner.

If performance outranks customization, buying an ETL tool like Attunity, Talend, or others is the superior option. As we’ve said before, you’ll lose some level of flexibility and back-end control, but these enterprise-level ETL solutions allow you to gather, cleanse, and refine data with very minimal effort. Who said data transformation needed to be difficult?

Do you have access to technical experts?

Effective ETL processes require a skilled workforce to deliver maximum results. Even more, that workforce needs to know how to build a data warehouse. You either need internal resources, a data management partner, or a proficient solutions provider involved in the development, auditing, and testing processes.

Internal resources allow you to build and launch your own ETL program with their ability to hand-code scripts and manage data workflows. Additionally, you don’t need to hire outside resources to monitor ongoing performance or troubleshoot issues. The trade-off is that their work on your ETL solution and data integration can divert their attention from long-term strategic projects or operations. An effective compromise is having an internal resource take ownership of the project and outsource the scripting, loading, and data migration to a technical partner.

For organizations without spare technical talent, buying a prepackaged ETL tool simplifies a portion of the initial technical investment. However, most organizations still need assistance with current state audits to verify all the source systems, hands-on integration support to get reporting up and running, and training on the new reporting processes. Choosing the right technical consulting partner enables you to deliver results in reasonable timetables without hiring new IT talent to handle the ETL process.

The advantage of a data management partner like 2nd Watch is that we’re proficient in both build and buy situations. If you decide to build, we can help with the scripting and create a support team. If you decide to buy, we can help integrate the tool and teach your internal team how to maximize all of the ETL tool’s features. That way, you can prioritize other more strategic and/or inflexible considerations while still implementing your disparate data sources into a single data warehouse.

Can you provide internal training?

What happens after the implementation? Will your team be able to grab the baton and confidently sprint forward without any impediments? Or are you at risk from the “bus factor,” where one person getting hit by a bus shuts down your total knowledge of the ETL solution? The success of both building an ETL platform and buying a cloud-based subscription depends on the effectiveness of the associated training process.

Going with a custom build means you’re dependent on your own knowledge sharing. You may encounter a bottleneck scenario where only resourceful employees will understand how to run reports after ETL processes are conducted. And if a tool is time-consuming or frustrating, you’ll struggle to encourage buy-in.

However, with a purchased ETL tool, resources outside of your team should easily understand the logistics of the workflows and be able to support your system. Your organization can then recruit or contract staff that is already familiar with the technical function of your tool without painfully reverse-engineering your scripting. Beware, though! You will encounter the same problems as a built system if you integrate the tool poorly. (Don’t just write custom scripting within your workflows if you want to get the benefits from a purchased option.)

The right data management partner can avoid this situation entirely. For example, the 2nd Watch team is skilled at sharing organizational process changes and communicating best practices to users and stakeholders. That way, there’s no barrier to usage of any ETL tool across your organization.

Whether you build or buy your ETL tool, 2nd Watch can help you implement the right solution. Schedule a whiteboard session to review your options and start on the path to better data analytics.

rss
Facebooktwitterlinkedinmail

A CTO’s Guide to a Modern Data Platform: How to Build a Data Warehouse in 6-8 Weeks

It’s true that in a very short period of time, you can get an operational data warehouse loaded with all your source data. Wondering how to build a data warehouse that yields these results? Snowflake and its technology partner ecosystem, such as partners like Fivetran, allow you to take your databases and SaaS applications, replicate them to Snowflake, and see results in a significantly shorter timeframe than we’ve seen before. Write some SQL views in Snowflake against this data, point your favorite BI tool at it, and get lightning-fast results.

With the right plan in place, you can (aggressively) deliver that first business “wow” in six to eight weeks. 2nd Watch typically recommends at least two analytical virtual warehouses in enterprise deployments: one for data discovery and one for more structured and governed reporting.

The practice of having both a data discovery layer and a governed layer for your Snowflake deployment not only saves you time in the initial build, but it creates a continuous integration and deployment pattern. It finally makes a dent in the notion that a data warehouse cannot be “agile.”

Future State Snowflake Data ArchitectureFuture State Snowflake Data Architecture

Future State Snowflake Data Architecture

With this approach, you not only achieve governance and speed-to-insight, but you’ve also cut down your Snowflake consumption costs. Running complex queries at run-time, every time, can get expensive. Snowflake’s caching can help here, but if you’re constantly running a complex join across over 20 tables, that physical layer might help you move from an XL-sized Snowflake warehouse to a L or M. In the long run, those cost savings will add up. When “best practice” or risk avoidance isn’t enough to justify this approach, it’s possible the dollar savings might speak for themselves.

The initial six- to eight-week path assumes a focus on the data discovery layer, as depicted below. Loading all your data into a data discovery layer should be the primary development activity in your Snowflake proof of concept (POC) or pilot. Here are some tips:

  • Find several source systems that have the data your stakeholders need.
  • Begin the process of rapidly loading into your Snowflake data discovery layer.
  • Write iterative SQL in Snowflake views to build your business logic.
  • Connect your BI tool to Snowflake and build a very simple dashboard.
  • Get feedback from business, with the new dashboard as a facilitation tool.
  • Repeat this process – more feedback is better.

You’ve not only increased the speed to market, but you’ve also enabled developers and business users to execute an analytics project in a completely new way.

Generic Snowflake Data ArchitectureHighlighted: Speed-to-insight approach, with transformations in Snowflake

This all comes with a caveat: Yes, you can write all your logic in Snowflake and do everything you need from an analytics standpoint, but it will limit you in the long run. Every data project has sacrifices to be made, whether it’s time, features, cost, or flexibility. You need to balance these with a long-term vision for how your data will be used across the organization. Snowflake will get you part of the way there, but a good plan and strategy will take you even further.

In the meantime, if you have any questions or want to learn more about how to build a data warehouse using Snowflake, contact us today.

Snowflake Deployment Best Practices Button (1)

rss
Facebooktwitterlinkedinmail

How Business and IT Should Partner to Build a Data Strategy

Any functioning company has a vision, goals, and direction. A successful company follows a strategy – set forth by its executives – to realize its vision, achieve its goals, and determine its direction. Executing executive strategy requires cross-functional collaboration amongst teams across the organization.

Having a data strategy is crucial to achieving your greater business goals. To set attainable and tangible business objectives, you need to utilize data. Your IT department will implement and oversee the technologies powering your data strategy; therefore, business processes and goals must factor in IT resources and capabilities. Your organization’s business unit should work closely with your IT team to build an effective data strategy.  

Utilizing Data Analysis to Pursue Strategies

When it comes to business goals, each goal should have a quantitative KPI or metric that is extracted from data and analytics. Of course, this is easier said than done! Even the simplest of business questions can spawn dozens of data threads needed for analysis. For example, if you are looking to track business profitability, you’ll have to look at key components such as revenue, COGS, and overhead. Each of these data sources must be broken down even further to fully understand the several metrics that make up the greater question of profitability. The ultimate goal of having a data strategy is to define all the inputs required to calculate key metrics and incorporate them into your production processes.

Your Data Strategy Checklist

Before engaging in any data management project, it’s important to outline your to-do list. Below is a checklist with the foundational steps to building out a data strategy:

  • Scope the project (what business problem are you trying to solve?)
  • Generate a work breakdown structure for strategy.
  • Create a timeline with milestones.
  • Define teams and individual role definitions.
  • Determine how you collaborate with others.
  • Schedule meetings (kick-off, status, sprint planning, and steering meetings). 
  • Begin research of your technology stack.
  • Budget for your data strategy project.
  • Kick off with the team!

Let’s Get Started!

As you work toward building a data strategy, there are four deliverables you should focus on: a current state understanding, a future state architecture, a gap analysis, and a road map.

Current State Understanding

Your data strategy should reinforce and advance your overall business strategy, which refers to the processes you use to operate and improve your business. To grow, you must establish your baseline to measure your growth. To gain an understanding of your current state, review your current data architecture, data flows, and source systems by:

  • Collecting existing diagrams and analyzing existing discovery
  • Verifying source systems (you have more than you think!)
  • Diagramming and visualizing
  • Having a logical model (what are the existing business domains?)

Future State Architecture

Once you’ve established a baseline, imagine where you’d like the future state architecture. Your future state architecture should support the data platform at your company, including a logical model of the data warehouse and technology decisions. To determine your future state, define your data threads, which requires you to identify your source systems and domains. 

Gap Analysis

What changes need to happen to move your company from the current state to the desired future state? Establish your technical framework and identify the gaps. This framework should be built for moving, curating, and delivering data to your business use cases. 

Road Map

Roading mapping is the most time-consuming and the most important deliverable. Your data strategy roadmap will outline how and when to implement your data vision and mission. This will help you manage change and ensure that IT goals are aligned with business goals. A well-thought-out roadmap reveals true dependencies so you can adjust as priorities shift. There are several important steps to building out a road map, which we will cover below. 

Step 1: Build and estimate a backlog.

Your backlog should be based on your intended future state and what business deliverables you want to generate.

Questions to answer:

  • What data threads need to be engineered?
  • How are you going to deliver the inputs to the artifact?
  • How are you going to productionalize the result to create the action?

Step 2: Define your vision and interim goals.

Before you can embark on any project, you’ll need to establish your end goals and the goals along the way.

Questions to answer:

  • What does success look like?
  • What priorities should you target?

Step 3: Identify your KPIs.

Determine the data and metrics that will be the most useful for operational and strategic decisions.

Questions to answer:

  • Who needs what information?
  • How valuable is that information to guide your decisions?

Step 4: Understand everything about your data.

This means understanding your sources of data, data relationships, and data quality issues.

Questions to answer:

  • What data are you sourcing from?
  • What data issues can be resolved by code versus process changes?
  • How can you map this data to a new, clean, holistic data model for reporting and analytics?

Step 5: Recommend a selection of technologies and tools.

Technologies and tools will be the means to your end, so be sure to conduct a thorough evaluation process. 

Questions to answer:

  • What architecture considerations are required to best use your technology stack and ecosystem?
  • What orchestration tools will you need?
  • What additional tools will you need to consider in the future as the platform matures?

Step 6: Design a high-level solution architecture.

Use this architecture on your data marts, leading to quicker access to data and faster insights. 

Questions to answer:

  • How should raw data, data collection, data storage, and access be structured for ease of use?
  • How frequently – and with what method – should this data be refreshed?
  • Who will own and support this in the future?

Step 7: Generate a specific backlog of tasks for implementing o9 development.

Questions to answer:

  • How will you build out the end-to-end foundation?
  • How will you coordinate with groups to gain the most immediate benefit?

Step 8: Develop a go-forward plan with a budget, milestones, and a timeline.

These components of your roadmap will make your future state a reality.

Questions to answer:

  • When can you complete this project?
  • How will you allocate the work?
  • How much will it all cost?
  • What internal and external resources are needed to put everything into action?

A Timeline of Your Data Strategy Road Map

At 2nd Watch, our approach is based on a 5- to 7-month data strategy timeline. Our data experts and consultants think about implementing a data strategy in phases. 

Phase 1: Foundation and process – 3 to 4 months

  • Set up your ETL and audit framework.
  • Ingest your primary data set into a persistent storage layer.
  • Engineer a business model for Phase 1 scope.
  • Create a data dictionary, source to target, and support documentation for scaling the platform.
  • Develop business use cases based on Phase 1 scope.
  • Establish a strategic foundation for ongoing maturity and add future data sources.
  • Assign a data engineer and architect to support model development.

Phase 2: Development analytics – 2 to 3 months

  • Ingest your primary data sources for development.
  • Add integration dataflows to increase and expand the business layer for Phase 2 scope.
  • Identify SMEs and data champions to drive QA and identify new report requirements.

Phases 3 and 4: Secondary sources

  • Ingest all other data sources to supplement data warehouse development.
  • Add integration dataflows to increase and expand the business layer for Phase 3 and 4 scope.
  • Integrate new dataflows into existing reporting and create opportunities for new audiences and enterprise level roll-ups.

Phase 5 and beyond: Support

  • Continue to model data for Phases 5 and beyond.
  • Expand advanced analytics capabilities for predictive and clustering models.
  • Eliminate downstream dependencies on legacy systems to allow sunsetting applications.
  • Integrate additional data consumers and data applications for expanded functionality.

2nd Watch’s consultants work with your team to understand your business goals and design a tech strategy to ensure success throughout the organization. From planning to technology selection to user adoption, our team’s data advisory services enhance business optimization via data-driven insights for the long haul. Contact us today to learn how 2nd Watch can build a modern data foundation for your business intelligence and predictive analytics to ensure your organization can make better decisions now and in the future.

rss
Facebooktwitterlinkedinmail

3 Data Visualization Best Practices That Uncover Hidden Patterns

At a first glance, raw data doesn’t offer anyone but data experts many actionable conclusions. The high volume and complexity of consolidated data sources in most organizations create a visual overload, hiding essential truths about your business and your customers in plain sight. Yet with the right data visualizations, hidden patterns and trends can spring forth from the screen and inform decisions. The question is: how do you ensure the data visualizations within your dashboard will tell a story that guides your next steps? If your organization adheres to the following data visualization best practices, you’ll be equipped to uncover hidden patterns that enhance your awareness and refine your decision-making.

Data Visualization Best Practices

 

Want better dashboards? Our data and analytics experts are here to help. Learn more about our Data Visualization Starter Pack.

 

1. Follow design best practices.

 

The value of your data lies in the clarity of your visuals. Good design fosters the right conclusions, allowing the cream to rise to the top of its own accord. And bad design? It baffles your users and confuses your findings – even if the patterns and trends from your reports would otherwise be straightforward.

The importance of data visualizations is only really apparent when you take time to review the worst of the worst. This Tumblr provides plenty of mindboggling examples of data visualizations gone wrong. Moreover, it helps to communicate the importance of following design best practices:

Follow Design Best Practices

Appreciate White Space

Images Source: Tumblr

The readability of your data visualization is a key consideration. Not every inch of a visualization needs to be crammed with information (we aren’t designing for Where’s Waldo). The image on the left illustrates just how important it is to give your graphical visualization some breathing room.

One way is to break up the information. Your dashboard should have no more than 15 objects or tiles at a time. Any more and the content can make it difficult to focus on any single takeaway. If the visuals are intertwined, following the rule of thirds (dividing your dashboard vertically or horizontally into thirds) can prevent clutter and draw the eye to key observations.

 

Adopt a Color Palette

Images Source: Tumblr

The colors within your dashboard visuals matter. By choosing colors that reflect your brand or fall within a clear color palette, you allow the visuals to stand out on their own.

When you pick colors that are not complementary, you fail to draw the eye to any insight, encouraging the key findings to be overlooked at a first glance. The image on the right is a perfect example of how your takeaways can all too easily fade into the background.

 

Respect the Message

Images Source: Tumblr

Visualizations should never prioritize style over substance. You want pleasant graphics that do not distract from the actual message. Clarity is always key.

In the image on the left, the vertical axis representing height makes it seem as if women from Latvia tower over those from India. Most users can still extract value but it calls the dashboard visualization’s credibility into question.

 

 

Provide Clear Answers

Images Source: Tumblr

Provide Clear Answers

Complexity is another area to avoid. Your business users shouldn’t need to perform mental gymnastics to figure out what you’re trying to communicate. Any extra steps they need to take can muddle the actual findings of a report.

In the image on the right, there are a variety of factors to calculate the number of avocado toasts it takes to afford a deposit on a house (bad idea already) that are not clear at first glance.

Each piece of avocado toast on the graphic represents 100 toasts, making it next to impossible to gauge the amount represented by the incomplete pieces for Mexico City or Johannesburg. Plus, people would need to calculate the cost based on the average price in each city, adding an additional step to verify the data. Regardless, nothing is clear and the “insight” doesn’t merit the additional work.

At the end of the day, you don’t have to have an exceptional eye for visual design, but the person implementing visualizations into your dashboard does. This can be an external resource (if you have one) or a partner experienced in following data visualization best practices.

 

2. Cater to your target users.

 

Great data visualizations are tailored to their target end users. The data sources, the KPIs, and the visualizations themselves all need to align with their goals and challenges. Before implementing your data visualization, ask these questions to determine the perfect parameters:

  • What is the background and experience of your end users? Your data visualizations need to rise to the level of your users. Executives need visualizations that offer strategic observations about revenue streams, operational efficiencies, and market trends. Employees on the front lines need answers to allow them to evaluate performance, KPIs, and other tactical needs.
  • What experience do they have with reporting dashboards or data visualizations? Frame of reference matters. If your users have primarily used Excel, you need to strike a balance between enhancing their reporting and creating a sense of familiarity. That might mean visualizing KPIs that they’ve previously created in Excel or building an interface that mirrors the experience of any previous tools.
  • What are their most common usages? There’s finite space for data visualizations within your dashboards. Every graphic or chart you provide should apply to frequent use cases that provide the greatest value for your team. Otherwise, there’s a risk that your investment will fail to earn the fullest ROI.
  • Which pain points do they struggle with most? Visualizations are meant to solve problems. As you are determining what to illuminate with your dashboard visualizations, you need to reflect on the greatest needs. Are there challenges pinpointing customer motivation and behaviors? Have revenues stagnated? Are old processes inefficient? The squeakiest wheels should be oiled first.

Answering all of these questions creates a foundation that your partner can implement as they create meaningful dashboards. Data visualizations that cater to these audiences are better at satisfying their needs.

For example, if you are creating a dashboard for a restaurant, your visualizations should cater to the pressing needs and concerns of the owner or manager. Expenses by supply category, total sales vs monthly expenses, special orders by count on receipt line items, and other KPIs can supply owners with quick and essential insight that can enhance their business strategy.

Regardless of industry, the data visualizations within your dashboards should balance immediate needs with actionable insight.

 

3. Show overlooked relationships between KPIs.

 

Reviewing one benchmark alone gives very narrow insights. Good data visualization can help organizations to connect the dots between KPIs. In fact, there are plenty of instances where the connection between one KPI and another is not apparent until that data is visualized. That’s when it helps to have an experienced partner guiding your work.

Let’s use our home health staffing firm partner as an example. We helped them to implement a Snowflake data warehouse, and one of the key lessons they wanted to learn was about the response they should take during COVID-19. They were eager to review cases across the United States, but that would only provide limited insights.

We suggested they visualize the data against a few other parameters. A timeline functionality could help to create an interactive experience that showed the growth of outbreaks over a period of time. In other scenarios, organizations could do a side-by-side comparison of KPIs like CO2 emissions, automotive traffic, or other conditions to measure the impact or trends related to the virus.

What’s equally important is that you not overlook outliers. There are organizations that will get in the habit of burying statistical anomalies, not realizing when those outliers become the norm. Working with the right partner can give a fresh perspective, preventing essential findings from falling outside of your awareness.

 

We can help you unlock the power of your visual analytics. Learn more here.

rss
Facebooktwitterlinkedinmail

How to Federate Amazon Redshift Access with Azure Active Directory

Single sign-on (SSO) is a tool that solves fundamental problems, especially in mid-size and large organizations with lots of users.

End users do not want to have to remember too many username and password combinations. IT administrators do not want to have to create and manage too many different login credentials across enterprise systems. It is a far more manageable and secure approach to federate access and authentication through a single identity provider (IdP).

As today’s enterprises rely on a wide range of cloud services and legacy systems, they have increasingly adopted SSO via an IdP as a best practice for IT management. All access and authentication essentially flow through the IdP wherever it is supported. Employees do not have to remember multiple usernames and passwords to access the tools they need to do their jobs. Just as important, IT teams prevent an administrative headache. They manage a single identity per user, which makes tasks like removing access when a person leaves the organization much simpler and less prone to error.

The same practice extends to AWS. As we see more customers migrate to the cloud platform, we hear a growing need for the ability to federate access to Amazon Redshift when they use it for their data warehouse needs.

Database administration used to be a more complex effort. Administrators had to figure out which groups a user belonged to, which objects a user or group were authorized to use, and other needs—in manual fashion. These user and group lists—and their permissions—were traditionally managed within the database itself, and there was often a lot of drift between the database and the company directory.

Amazon Redshift administrators face similar challenges if they opt to manage everything within Redshift itself. There is a better way, though. They can use an enterprise IdP to federate Redshift access, managing users and groups within the IdP and passing the credentials to Amazon Redshift at login.

We increasingly hear from our clients, “We use Azure Active Directory (AAD) for identity management—can we essentially bring it with us as our IdP to Amazon Redshift?”

They want to use AAD with Redshift the way they use it elsewhere, to manage their users and groups in a single place to reduce administrative complexity. With Redshift, specifically, they also want to be able to continue managing permissions for those groups in the data warehouse itself. The good news is you can do this and it can be very beneficial.

Without a solution like this, you would approach database administration in one of two alternative ways:

  1. You would provision and manage users using AWS Identity and Access Management (IAM). This means, however, you will have another identity provider to maintain—credentials, tokens, and the like—separate from an existing IdP like AAD.
  2. You would do all of this within Redshift itself, creating users (and their credentials) and groups and doing database-level management. But this creates similar challenges to legacy database management, and when you have thousands of users, it simply does not scale.

Learn more about our AWS expertise here.

rss
Facebooktwitterlinkedinmail

How to Federate Amazon Redshift Access with Okta

Single sign-on (SSO) is a tool that solves fundamental problems, especially in midsize and large organizations with lots of users.

End users do not want to have to remember too many username and password combinations. IT administrators do not want to have to create and manage too many different login credentials across enterprise systems. It is a far more manageable and secure approach to federate access and authentication through a single identity provider (IdP).

As today’s enterprises rely on a wide range of cloud services and legacy systems, they have increasingly adopted SSO via an IdP as a best practice for IT management. All access and authentication essentially flows through the IdP wherever it is supported. Employees do not have to remember multiple usernames and passwords to access the tools they need to do their jobs. Just as importantly, IT teams prevent an administrative headache: They manage a single identity per user, which makes tasks like removing access when a person leaves the organization much simpler and less prone to error.

The same practice extends to AWS. As we see more customers migrate to the cloud platform, we hear a growing need for the ability to federate access to Amazon Redshift when they use it for their data warehouse needs.

Database administration used to be a more complex effort. Administrators had to figure out which groups a user belonged to, which objects a user or group were authorized to use, and other needs—in manual fashion. These user and group lists—and their permissions—were traditionally managed within the database itself, and there was often a lot of drift between the database and the company directory.

Amazon Redshift administrators face similar challenges if they opt to manage everything within Redshift itself. There is a better way, though. They can use an enterprise IdP to federate Redshift access, managing users and groups within the IdP and passing the credentials to Amazon Redshift at login.

We increasingly hear from our clients, “We use Okta for identity management—can we essentially bring it with us as our IdP to Amazon Redshift?” They want to use Okta with Redshift the way they use it elsewhere, to manage their users and groups in a single place to reduce administrative complexity. With Redshift, specifically, they also want to be able to continue managing permissions for those groups in the data warehouse itself. The good news is you can do this and it can be very beneficial.

Without a solution like this, you would approach database administration in one of two alternative ways:

  1. You would provision and manage users using AWS Identity and Access Management (IAM). This means, however, you will have another identity provider to maintain—credentials, tokens, and the like—separate from an existing IdP like Okta.
  2. You would do all of this within Redshift itself, creating users (and their credentials) and groups and doing database-level management. But this creates similar challenges to legacy database management, and when you have thousands of users, it simply does not scale.

Our technical white paper covers how to federate access to Amazon Redshift using Okta as your IdP, passing user and group information through to the database at login. We outline the step-by-step process we follow when we implement this solution for 2nd Watch clients, including the modifications we found were necessary to ensure everything worked properly. We explain how to set up a trial account at Okta.com, build users and groups within the organization’s directory, and enable single sign-on (SSO) into Amazon redshift.

Download the technical white paper

-Rob Whelan, Data & Analytics Practice Director

rss
Facebooktwitterlinkedinmail