A Developer’s Guide to Power BI

There are many options when it comes to data analytics tools. Choosing the right one for your organization will depend on a number of factors. Since many of the reviews and articles on these tools are focused on business users, the 2nd Watch team wanted to explore these tools from the developer’s perspective. In this developer’s guide to Power BI, we’ll go over the performance, interface, customization, and more to help you get a full understanding of this tool.

A Developer’s Guide to Power BI

Why Power BI?

Power BI is a financially attractive alternative to the likes of Tableau and Looker, which either offer custom-tailored pricing models or a large initial per-user cost followed by an annual fee after the first year. However, don’t conflate cost with quality; getting the most out of Power BI is more dependent on your data environment and who is doing the data discovery. Companies already relying heavily on Microsoft tools should look to add Power BI to their roster, as it integrates seamlessly with SQL Server Analysis Services to facilitate faster and deeper analysis.

Performance for Developers

When working with large datasets, developers will experience some slowdown as they customize and publish their reports. Developing on Power BI works best with small-to-medium-sized data sets. At the same time, Microsoft has come out with more optimization options such as drill-through functionality, which allows for deeper analytical work for less processing power.

Performance for Users

User performance through Power BI Services is controlled through row-level security implementation. For any sized dataset, the number of rows can be limited depending on the user’s role. Overviews and executive dashboards may run somewhat slowly, but as the user’s role becomes more granular, dashboards will operate more quickly.

User Interface: Data Layer

Data is laid out in a tabular form; clicking any measure column header reveals a drop-down menu with sorting options, filtering selections, and the Data Analysis Expressions (DAX) behind the calculation.

User Interface: Relationship Layer

The source tables are draggable objects with labeled arrows between tables denoting the type of relationship.

Usability and Ease of Learning

Microsoft Power BI documentation is replete with tutorials, samples, quickstarts, and concepts for the fundamentals of development. For a more directed learning experience, Microsoft also put out the Microsoft Power BI Guided Learning set, which is a freely available collection of mini courses on modeling, visualization, and exploration of data through Power BI. It also includes an introduction to DAX development as a tool to transform data in the program. Additionally, the Power BI community forums almost always have an answer to any technical question a developer might have.

Modeling

Power BI can easily connect to multiple data sources including both local folders and most major database platforms. Data can be cleaned and transformed using the Query Editor; the Editor can change data type, add columns, and combine data from multiple sources. Throughout this transformation process, the Query Editor records each step so that every time the query connects to the data source, the data is transformed accordingly. Relationships can be created by specifying a from: table and to table, the keys to relate, a cardinality, and a cross-filter direction.

Customization

In terms of data transformation, Power Query is a powerful language for ensuring that your report contains the exact data and relationships you and your business user are looking to understand. Power Query simplifies the process of data transformation with an intuitive step-by-step process for joining, altering, or cleaning your tables within Power BI. For actual report building, Power BI contains a comprehensive list of visualizations for almost all business needs; if one is not found within the default set, Microsoft sponsors a visual gallery of custom user-created visualizations that anyone is free to explore and download.

Permissions and User Roles

Adding permissions to workspaces, datasets, and reports within your org is as simple as adding an email address and setting an access level. Row-level security is enabled in Power BI Desktop; role management allows you flexibly customize access to specific data tables using DAX functions to specify conditional filters. Default security filtering is single-directional; however, bi-directional cross-filtering allows for the implementation of dynamic row-level security based on usernames and/or login IDs.

Ease of Dev Opp and Source Control

When users have access to a data connection or report, source and version control are extremely limited without external GitHub resources. Most of the available activities are at the macro level: viewing/editing reports, adding sources to gateways, or installing the application. There is no internal edit history for any reports or dashboards.

Setup and Environment

Setup is largely dependent on whether your data is structured in the cloud, on-premises, or a hybrid. Once the architecture is established, you need to create “data gateways” and assign them to different departments and data sources. This gateway acts as a secure connection between your data source and development environments. From there, security and permissions can be applied to ensure the right people within your organization have access to your gateways. When the gateways are established, data can be pulled into Power BI via Power Query and development can begin.

Implementation

The most common implementation of Power BI utilizes on-premises source data and Power BI Desktop for data preparation and reporting, with Power BI Service used in the cloud to consume reports and dashboards, collaborate, and establish security. This hybrid implementation strategy takes advantage of the full range of Power BI functionality by leveraging both the Desktop and Service versions. On-premises data sources connect to Power BI Desktop for development, leading to quicker report creation (though Power BI also supports cloud-based data storage).

Summary and Key Points

Power BI is an extremely affordable and comprehensive analytics tool. It integrates seamlessly with Excel, Azure, and SQL Server, allowing for established Microsoft users to start analyzing almost instantly. The tool is easy to learn for developers and business users alike, and there are many available resources, like Microsoft mini-courses and community forums.

A couple things to be aware of with Power BI: It may lack some of the bells and whistles as compared to other analytics tools, and it’s best if you’re already in the Microsoft ecosystem and are coming in with a solid data strategy.

If you want to learn more about Power BI or any other analytics tools, contact us today to schedule a no-obligation whiteboard session.

Data Visualization Starter Pack

rss
Facebooktwitterlinkedinmail

A Developer’s Guide to Tableau

Why Tableau?

Tableau gets a good reputation for being sleek and easy to use and by bolstering an impeccable UI/UX. It’s by and large an industry leader due to its wide range of visualizations and ability to cohesively and narratively present data to end users. As a reliable, well-established leader, Tableau can easily integrate with many sources, has extensive online support, and does not require a high level of technical expertise for users to gain value.

Want better Tableau dashboards? Our modern data and analytics experts are here to help. Learn more about our modern cloud analytics solutions with Snowflake and Tableau.

Performance for Developers

One of the easiest ways to ensure good performance with Tableau is to be mindful of how you import your data. Utilizing extracts rather than live data and performing joins or unions in your database reduces a lot of the processing that Tableau would otherwise have to do. While you can easily manipulate data without any coding, these capabilities reduce performance significantly, especially when dealing with large volumes of information. All data manipulation should be done in your database or data warehouse prior to adding it as a source. If that isn’t an option, Tableau offers a product called Tableau Prep that enables data manipulation and enhanced data governance capabilities.

Performance for Users

Dashboard performance for users depends almost entirely on practices employed by developers when building out reports. Limiting the dataset to information required for the goals of the dashboard reduces the amount of data Tableau processes as well as the number of filters included for front-end users. Cleaning up workbooks to reduce unnecessary visualizations will enhance front-end performance as well.

User Interface: Data Source

After connecting to your source, Tableau presents your data using the “Data Source” tab. This is a great place to check that your data was properly loaded and doesn’t have any anomalies. Within this view of the data, you have the chance to add more sources and the capability to union and join tables together as well as filter the data to a specific selection and exclude rows that were brought in.

User Interface: Worksheet

The “Worksheet” tabs are where most of the magic happens. Each visualization that ends up on the dashboard will be developed in separate worksheets. This is where you will do most of the testing and tweaking as well as where you can create any filters, parameters, or calculated fields.

User Interface: Dashboards

In the “Dashboard” tab, you bring together all of the individual visualizations you have created. The drag-and-drop UI allows you to use tiles predetermined by Tableau or float the objects to arrange them how you please. Filters can be applied to all of the visualizations to create a cohesive story or to just a few visualizations to break down information specific to a chart or table. It additionally allows you to toggle between different device layouts to ensure end-user satisfaction.

User Interface: Stories

One of the most unique Tableau features is its “Stories” capability. Stories work great when you need to develop a series of reports that present a narrative to a business user. By adding captions and placing visualizations in succession, you can convey a message that speaks for itself.

Usability and Ease of Learning

The Tableau basics are relatively easy to learn due to the intuitive point-and-click UI and vast amount of educational resources such as their free training videos. Tableau also has a strong online community where answers to specific questions can be found either on the Help page or third-party sites.

Creating an impressive variety of simple visualizations can be done without a hitch. This being said, there are a few things to watch out for:

  • Some tricks and more niche capabilities can easily remain undiscovered.
  • Complex features such as table calculations may confuse new users.
  • The digestible UI can be deceiving – visualizations often appear correct when the underlying data is not. One great way to check for accuracy is to right-click on the visualization and select “View Data.”

Modeling

Unlike Power BI, Tableau does not allow users to create a complicated semantic layer within the tool. However, users can establish relationships between different data sources and across varied granularities through a method called data blending. One way to implement this method is by selecting the “Edit Relationships” option in the data drop-down menu.

Data blending also eliminates duplicates that may occur by using a function that returns a single value for the duplicate rows in the secondary source. Creating relationships among multiple sources in Tableau requires attention to detail as it can take some manipulation and may have unintended consequences or lead to mistakes that are difficult to spot.

Customization

The wide array of features offered by Tableau allows for highly customizable visualizations and reports. Implementing filter actions (which can apply to both worksheets and dashboards), parameters, and calculated fields empowers developers to modify the source data so that it better fits the purpose of the report. Using workarounds for calculations not explicitly available in Tableau frequently leads to inaccuracy; however, this can be combated by viewing the underlying data. Aesthetic customizations such as importing external images and the large variety of formatting capabilities additionally allow developers boundless creative expression.

Permissions and User Roles

The type of license assigned to a user determines their permissions and user roles. Site administrators can easily modify the site roles of users on the Tableau Server or Tableau Online based on the licenses they hold. The site role determines the most impactful action (e.g., read, share, edit) a specific user can make on the visualizations. In addition to this, permissions range from viewing or editing to downloading various components of a workbook. The wide variety of permissions applies to various components within Tableau. A more detailed guide to permissions capabilities can be found here.

Ease of Dev Opp and Source Control

Dev opp and source control improved greatly when Tableau implemented versioning of workbooks in 2016. This enables users to select the option to save a history of revisions, which saves a version of the workbook each time it is overwritten. This enables users to go back to previous versions of the workbook and access work that may have been lost. When accessing prior versions, keep in mind that if an extract is no longer compatible with the source, its data refresh will not work.

Setup and Environment

With all of the necessary information on your sources, setup in Tableau is a breeze. It has built-in connectors with a wide range of sources and presents your data to you upon connection. You also have a variety of options regarding data manipulation and utilizing live or static data (as mentioned above). Developers utilize the three Tableau environments based primarily on the level of interactions and security they desire.

  • Tableau Desktop: Full developer software in a silo; ability to connect to databases or personal files and publish work for others to access
  • Tableau Server: Secure environment accessed through a web browser to share visualizations across the organization; requires a license for each user
  • Tableau Online: Essentially the same as Tableau Server but based in the cloud with a wider range of connectivity options

Implementation

Once your workbook is developed, select the server and make your work accessible for others either on Tableau Online or on Tableau Server by selecting “publish.” During this process, you can determine the specific project you are publishing and where to make it available. There are many other modifications that can be adjusted such as implementing editing permissions and scheduling refreshes of the data sources.

Summary and Key Points

Tableau empowers developers of all skill levels to create visually appealing and informative dashboards, reports, and storytelling experiences. As developers work, there is a wealth of customization options to tailor reports to their specific use case and draw boundless insights for end users. To ensure that Tableau gleans the best results for end users, keep these three notes in mind:

  1. Your underlying data must be trustworthy as Tableau does little to ensure data integrity. Triple-check the numbers in your reports.
  2. Ensure your development methods don’t significantly damage performance for both developers and end users.
  3. Take advantage of the massive online community to uncover vital features and leverage others’ knowledge when facing challenges.

If you have any questions on Tableau or need help getting better insights from your Tableau dashboards, contact us for an analytics assessment.

rss
Facebooktwitterlinkedinmail

Blockchain: The Basics

Blockchain is one of those once-in-a-generation technologies that has the potential to really change the world around us. Despite this, blockchain is something that a lot of people still know nothing about. Part of that, of course, is because it’s such a new piece of technology that really only became mainstream within the past few years. The main reason, though, (and to address the elephant in the room) is because blockchain is associated with what some describe as “fake internet money” (i.e., Bitcoin). The idea of a decentralized currency with no guarantor is intimidating, but let’s not let that get in the way of what could be a truly revolutionary technology. So, before we get started, let’s remove the Bitcoin aspect and simply focus on blockchain. (Don’t worry, we’ll pick it back up later on.)

Blockchain, at its very core, is a database. But blockchains are different from traditional databases in that they are immutable, unable to be changed. Imagine this: Once you enter information into your shiny new blockchain, you don’t have to worry about anybody going in and messing up all your data. “But how is this possible?” you might ask.

Blockchains operate by taking data and structuring it into blocks (think of a block like a record in a database). This can be any kind information, from names and numbers all the way to executable code scripts. There are a few essential pieces of information that should be placed in all blocks, those being an index (the block number), a timestamp, and the hash (more on this later) of the previous block. All of this data is compiled into a block, and a hashing algorithm is applied to the information.

blockchain After the hash is computed, the information is locked and you can’t change information without re-computing the hash. This hash is then passed on to the next block where it gets included in its data, creating a chain. The second block then compiles all of its own data and, including the hash of the previous block, creates a new hash and sends it to the next block in the chain. In this way, a blockchain is created by “chaining” together blocks by means of a block’s unique hash. In other words, the hash of one block is reliant on the hash of the previous block, which is reliant on that of the one before it, ad infinitum.

And there you go, you have a blockchain! Before we move on to the next step (which will really blow your mind), let’s recap:

Blockchain: The Basics ChartYou have Block-0. Information is packed into Block-0 and hashed, giving you Hash-0. Hash-0 is passed to Block-1, which is combined with the data in Block-1. So, Block-1’s data now includes its own information and Hash-0. This is now hashed to release Hash-1, and it’s passed to the next block.

The second major aspect of blockchain is that it is distributed. This means that the entire protocol is operated across a network of nodes at the same time. All of the nodes in the network store the entire chain, along with all new blocks, at the same time and in real time.

Secure Data Is Good Data

Remember earlier when we said a blockchain is immutable? Let’s go back to that.

Suppose you have a chain 100 blocks long and running on 100 nodes at once. Now let’s say you want to stage an attack on this blockchain to change Block-75. Because the chain is run and stored across 100 nodes simultaneously, you have to instantaneously change Block-75 in all 100 nodes at the same time. Let’s imagine somehow you are able to hack into those other nodes to do this; now you have to rehash everything from Block-75 to Block-100 (which, remember, rehashing is extremely computationally difficult). So while you (the singular malicious node) are trying to rehash all of those blocks, the other 99 nodes in the network are working to hash new blocks, thereby extending the chain. This makes it impossible for a compromised chain to become valid because it will never reach the same length of the original chain.

About That Bitcoin Thing…

Now, there are two types of blockchains. Most popular blockchains are public, in which anybody in the world is able to join and contribute to the network. This requires some incentive, as without it nobody would join the network, and this comes in the form of “tokens” or “coins” (i.e., Bitcoin). In other words, Bitcoin is an incentive for people to participate and ensure the integrity of the chain. Then there are permissioned chains, which are run by individuals, organizations, or conglomerates for their own reasons and internal uses. In permissioned chains, only nodes with certain permissions are able to join and be involved in the network.

And there you go, you have the basics of blockchain. At a fundamental level, it’s an extremely simple yet ingenious idea with applications for supply chains, smart contracts, auditing, and many more to come. However, like any promising new technology, there are still questions, pitfalls, and risks to be explored. If you have any questions about this topic or want to discuss the potential for blockchain in your organization, contact us here.

rss
Facebooktwitterlinkedinmail

3 Takeaways from a New Looker Developer

In this blog post, read about this consultant’s experience with Looker, in their own words.

As a data management and analytics consultant, I have developed dashboards in a majority of the popular BI tools such as Tableau and Power BI, as well as their backend data structures. The opportunity to develop dashboards in Looker arose when a new client, Byrider, needed 2nd Watch to help them model their data and develop Looker dashboards for their sales team (more details here).

Based on my limited experience with Looker, I knew that it makes creating quality visuals simple and that coding in LookML is unavoidable. I worried that LookML would be extremely nuanced and I would lose time troubleshooting simple tasks. I could not have been more wrong on that front. Along with this realization, below are my top takeaways from my first Looker project.

Takeaway 1: LookML is easy to learn and ensures consistent metrics across reports.

Given the vast amount of documentation provided by Looker and the straightforward format of LookML code, I quickly caught on. This learning curve may be slightly different for report developers who have minimal experience with SQL. LookML adds transparency into what happens with data presented in visuals by directly showing how the code translates into the SQL queries that run against the source data. This makes it much easier to trust the results of dashboards and QA as you develop.

More importantly, LookML allows users to ensure their metric definitions are consistent across dashboards and reports. Establishing this single source of truth is key for the success of any reporting efforts. Within the semantic layer of the reporting tool, users can create SQL queries or harness LookML functions to develop custom measures and include descriptions to define them. Transforming the source data into predefined measures in the back end of the reporting tool ensures that report developers access the same metrics for every dashboard business users will see. This is a clear contrast from tools like Power BI and Tableau where the custom measures are created in each workbook and can vary. Furthermore, by using roles, administrators can limit who has access to change this code.

Takeaway 2: Creating dashboards and visuals is super intuitive for about 95% of use cases.

After setting up your data connections and LookML, developing a visual (“Look”) in Looker only requires a simple point and click process. Once you select the filters, measures, and dimensions to include in a visual, you can click through the visualization options to determine the best possible way to present the data. From there, you can easily adjust colors and stylistic options in settings using drop-down menus. Compared to other BI tools, these visuals are fairly standard across the board. That being said, Looker greatly stands out when it comes to table visualizations. It allows for conditional formatting similar to that in Excel and a wide range of visual options in comparison to other BI tools. This makes Looker a great selection for companies that often require tables to meet reporting requirements.

Although detailed documentation and the simple interface meet most reporting needs, there are limitations when it comes to easy customization in visuals. This includes the inability to set drill-ins by a visual rather than a field. In Looker, any demographic used across reports has to drill into the same fields (unlike those set per visual in a Tableau Tool Tip, for example). Additionally, you cannot format visuals based on customized metrics (e.g., color bands, conditional formatting for Field A based on the value of Field B, etc.). The caveat here is that you can unlock many customized visuals by writing custom code, a skill not always handy for report developers.

Looker Development Environment

Takeaway 3: Looker is extremely collaborative, something not often seen in BI tools.

With most BI tools, developers are forced to work independently because two people cannot easily contribute to a single workbook at the same time. Looker’s web-based format seems to have been built with collaborative development in mind, making this tool stand out when it comes to teamwork. Business users can also easily contribute because the web-based tool makes sharing dashboards and embedding them within websites easy. While this may seem minor to some, it significantly enhances productivity and yields a better result.

The following features ensure that your team can iterate on each other’s work, edit the same dashboards, and develop LookML without accidentally overwriting work or creating multiple versions of the same report:

  • Version control and deployment processes built into the “Development” window where users can modify and add LookML code
  • Ability to duplicate Looks developed by others and iterate on them, and Looks can then be added to dashboards
  • Shared folders where Looks and Dashboards used by multiple people can be stored and reused (if needed)
  • Ability to “Explore” a Look created by someone else to investigate underlying data
  • Ability to edit a dashboard at the same time others can make changes
  • Sharing dashboards using a link and the ease of embedding dashboards, which allows for seamless collaboration with business users as well

With a properly modeled data source, Looker impressed in terms of its performance and ability to provide highly drillable dashboards. This enabled us to dramatically reduce the number of reports needed to address the wide range of detail that business users within a department required. While the visuals were not as flashy as other BI tools, Looker’s highly customizable table visualizations, row-level security, and drill-in options were a perfect fit for Byrider’s use cases.

2nd Watch specializes in advising companies on how to gain the most business value possible from their analytics tools. We assist organizations with everything from selecting which tool best suits your needs to developing dashboards for various departments or structuring data to enable quick reporting results. Contact us if you need help determining if Looker is the tool you need or if want guidance on how to get started.

rss
Facebooktwitterlinkedinmail

Should You Build or Buy Your ETL Tool? 8 Questions to Help You Decide

Analyzing raw data without a singular, standardized format is as fruitful as trying to understand all 193 UN delegates shouting in their native tongues. Something important is being said, but good luck figuring out what that is. But reformat that raw data and shift them from their disparate sources into a single data warehouse, and the message rings through as clear as a bell.

That is the benefit that extract, transform, load (ETL) processes provide to organizations. Yet before you can access the hidden patterns and meanings in your data, you need to decide how you want to acquire your ETL tool: build one from scratch or buy an automated solution. Here’s what to consider as you make your decision.

Just getting started? Check out our 60-minute data architecture assessment.

What’s the project size?

Often, a small project scope with simple data flow benefits from a custom build, allowing your organization to calibrate your ETL tool to your precise needs and spend less in the process. Small shops may have fewer technical resources, but they will spend as much time integrating a pre-built ETL tool as building up simple data flows from the ground up.

When the scope is a massive enterprise-level ETL framework, it makes more sense to engage with a preexisting ETL tool and accelerate your analytics timeline. Even then, we recommend a data management partner experienced in ETL processes, one that’s done the technical work of hooking the sources together and transforming the data numerous times. They know the accelerators to get your program up and running enterprise-wide.

What technology are you using?

Your current tech stack is always a consideration. For example, if you prefer open source technology or depend on a web of legacy systems for daily operations, building your own system eliminates the worry that your integration won’t work. Building your ETL program is also a preferred option for organizations with a custom or niche development environment that aims to use fewer computing resources or accelerate your performance.

On the other hand, GUI environments that value ease of use are better-suited for buying their ETL program. For example, we had an online e-commerce client with a few internal technical resources. They understood their current state and their source systems but did not want to deal with setting up the actual workflows. In that scenario, we determined that integrating a preexisting ETL solution into their ecosystem would help their team to load data and run reports more effectively.

What’s the shelf-life of your proposed solution?

How long you’ll use a specific ETL solution has significant influence on the decision to build or buy. If you need a quick-and-dirty, one-off load, it doesn’t make sense to invest $15,000+ per year for an ETL solution. If you have resources capable of scripting an ad hoc solution, utilizing their talents will achieve faster results.

On the other hand, companies that need a scalable or long-term strategic solution tend to lean toward a prepackaged ETL tool. Due to the evolving data sources and streams available in these organizations, a preexisting ETL tool in which ongoing development and integration are handled by the vendor is ideal. The only major challenge is ensuring that your team is maximizing your investment by using the full capacity of your vendor’s ETL solution. Fortunately, it’s a feat that’s more manageable if you work with a technical consulting partner like 2nd Watch.

What’s your budget?

This one is a little deceptive. Though there is an initial investment in building your own solution, you lack the ongoing subscription and the initial integration costs that are frequently overlooked in preliminary estimates. Additionally, buying ETL solutions often means you’ll be charged per source system being transformed and loaded into your data warehouse. So depending on the number of disparate sources and volume of data, the build option is a good way to avoid overspending on data ingestion.

Though large enterprises will still end up paying these costs, they can justify them as a trade-off for greater traceability and cataloging for the sake of compliance. The ability to track business data and smoothly conduct audits is more than enough for some organizations to defend the elevated price tag, especially if those organizations are in the healthcare or financial sectors.

Who will manage the ETL process?

Control is a significant consideration for plenty of organizations. For those who want to own the ETL system, building is the right choice. Often, this makes the most sense when you already have a custom infrastructure, legacy storage system, or niche analytics needs.

Yet not every organization wants to divert attention from their primary business. Let’s say you’re a healthcare organization that wants to build a comprehensive data warehouse from a myriad of data sources while still maintaining compliance. Trusting an experienced vendor removes a considerable amount of risk.

Do you need flexibility in your analytics?

What types of reports will you be running? Standard ones for your industry or business environment? Or reports that are particular to your own unique needs? Your answer heavily influences the choices you make about your ETL tool.

If you feel your demands upon a data warehouse will be uncommon, then building is the ideal choice. That way, your reporting isn’t curtailed to fit a preconceived notion of your needs. Hand-coding your own ETL program enables you to write scripts for whatever schemas or parameters you had in mind. The only limitation is your own technical capability or that of your data management consulting partner.

If performance outranks customization, buying an ETL tool like Attunity, Talend, or others is the superior option. As we’ve said before, you’ll lose some level of flexibility and back-end control, but these enterprise-level ETL solutions allow you to gather, cleanse, and refine data with very minimal effort. Who said data transformation needed to be difficult?

Do you have access to technical experts?

Effective ETL processes require a skilled workforce to deliver maximum results. Even more, that workforce needs to know how to build a data warehouse. You either need internal resources, a data management partner, or a proficient solutions provider involved in the development, auditing, and testing processes.

Internal resources allow you to build and launch your own ETL program with their ability to hand-code scripts and manage data workflows. Additionally, you don’t need to hire outside resources to monitor ongoing performance or troubleshoot issues. The trade-off is that their work on your ETL solution and data integration can divert their attention from long-term strategic projects or operations. An effective compromise is having an internal resource take ownership of the project and outsource the scripting, loading, and data migration to a technical partner.

For organizations without spare technical talent, buying a prepackaged ETL tool simplifies a portion of the initial technical investment. However, most organizations still need assistance with current state audits to verify all the source systems, hands-on integration support to get reporting up and running, and training on the new reporting processes. Choosing the right technical consulting partner enables you to deliver results in reasonable timetables without hiring new IT talent to handle the ETL process.

The advantage of a data management partner like 2nd Watch is that we’re proficient in both build and buy situations. If you decide to build, we can help with the scripting and create a support team. If you decide to buy, we can help integrate the tool and teach your internal team how to maximize all of the ETL tool’s features. That way, you can prioritize other more strategic and/or inflexible considerations while still implementing your disparate data sources into a single data warehouse.

Can you provide internal training?

What happens after the implementation? Will your team be able to grab the baton and confidently sprint forward without any impediments? Or are you at risk from the “bus factor,” where one person getting hit by a bus shuts down your total knowledge of the ETL solution? The success of both building an ETL platform and buying a cloud-based subscription depends on the effectiveness of the associated training process.

Going with a custom build means you’re dependent on your own knowledge sharing. You may encounter a bottleneck scenario where only resourceful employees will understand how to run reports after ETL processes are conducted. And if a tool is time-consuming or frustrating, you’ll struggle to encourage buy-in.

However, with a purchased ETL tool, resources outside of your team should easily understand the logistics of the workflows and be able to support your system. Your organization can then recruit or contract staff that is already familiar with the technical function of your tool without painfully reverse-engineering your scripting. Beware, though! You will encounter the same problems as a built system if you integrate the tool poorly. (Don’t just write custom scripting within your workflows if you want to get the benefits from a purchased option.)

The right data management partner can avoid this situation entirely. For example, the 2nd Watch team is skilled at sharing organizational process changes and communicating best practices to users and stakeholders. That way, there’s no barrier to usage of any ETL tool across your organization.

Whether you build or buy your ETL tool, 2nd Watch can help you implement the right solution. Schedule a whiteboard session to review your options and start on the path to better data analytics.

rss
Facebooktwitterlinkedinmail

How to Build a Data Warehouse in 6-8 Weeks

It’s true that in a very short period of time, you can get an operational data warehouse loaded with all your source data. Wondering how to build a data warehouse that yields these results? Snowflake and its technology partner ecosystem, such as partners like Fivetran, allow you to take your databases and SaaS applications, replicate them to Snowflake, and see results in a significantly shorter timeframe than we’ve seen before. Write some SQL views in Snowflake against this data, point your favorite BI tool at it, and get lightning-fast results.

With the right plan in place, you can (aggressively) deliver that first business “wow” in six to eight weeks. 2nd Watch typically recommends at least two analytical virtual warehouses in enterprise deployments: one for data discovery and one for more structured and governed reporting.

The practice of having both a data discovery layer and a governed layer for your Snowflake deployment not only saves you time in the initial build, but it creates a continuous integration and deployment pattern. It finally makes a dent in the notion that a data warehouse cannot be “agile.”

Future State Snowflake Data ArchitectureFuture State Snowflake Data Architecture

Future State Snowflake Data Architecture

With this approach, you not only achieve governance and speed-to-insight, but you’ve also cut down your Snowflake consumption costs. Running complex queries at run-time, every time, can get expensive. Snowflake’s caching can help here, but if you’re constantly running a complex join across over 20 tables, that physical layer might help you move from an XL-sized Snowflake warehouse to a L or M. In the long run, those cost savings will add up. When “best practice” or risk avoidance isn’t enough to justify this approach, it’s possible the dollar savings might speak for themselves.

The initial six- to eight-week path assumes a focus on the data discovery layer, as depicted below. Loading all your data into a data discovery layer should be the primary development activity in your Snowflake proof of concept (POC) or pilot. Here are some tips:

  • Find several source systems that have the data your stakeholders need.
  • Begin the process of rapidly loading into your Snowflake data discovery layer.
  • Write iterative SQL in Snowflake views to build your business logic.
  • Connect your BI tool to Snowflake and build a very simple dashboard.
  • Get feedback from business, with the new dashboard as a facilitation tool.
  • Repeat this process – more feedback is better.

You’ve not only increased the speed to market, but you’ve also enabled developers and business users to execute an analytics project in a completely new way.

Generic Snowflake Data ArchitectureHighlighted: Speed-to-insight approach, with transformations in Snowflake

Here are some additional considerations to keep in mind:

Security Considerations: It’s important to remember that when creating a data warehouse, data security and compliance can’t be overlooked. Using Snowflake and Fivetran, we ensure robust security protocols are in place, safeguarding your sensitive data while complying with necessary regulations.

Data Quality: Another critical aspect of this process is guaranteeing the quality of the data. By employing rigorous data quality checks and validation methods during the loading phase, we ensure that the data in your Snowflake data discovery layer is accurate, consistent, and reliable.

Scalability: Our approach to data warehousing is not only efficient but also scalable. As your business grows and the amount of data increases, Snowflake’s cloud-native architecture can easily adapt to your expanding needs while maintaining cost savings.

Training and Support: Adapting to a new data warehouse can be a challenge, but our team provides comprehensive training and continuous support to ensure a smooth transition. We’re here to guide you through every step of the process, from initial setup to advanced data analytics.

Alternative Tools: While we prefer Snowflake for its efficiency and scalability, there are other technology options available, like Redshift and BigQuery. However, Snowflake stands out due to its unique capabilities, such as separate storage and compute resources, robust security, and excellent performance.

ROI and Performance Metrics: After setting up your data warehouse, we’ll provide you with tools to measure its performance and the return on your investment. We’ll help you monitor query performance, storage utilization, and other key metrics that align with your business goals.

Integration with Existing Systems: We also take care of integrating Snowflake with your existing systems and workflows. Our team will devise a plan to ensure minimum disruption to your business during this integration phase.

This all comes with a caveat: Yes, you can write all your logic in Snowflake and do everything you need from an analytics standpoint, but it will limit you in the long run. Every data project has sacrifices to be made, whether it’s time, features, cost, or flexibility. You need to balance these with a long-term vision for how your data will be used across the organization. Snowflake will get you part of the way there, but a good plan and strategy will take you even further.

In the meantime, if you have any questions or want to learn more about how to build a data warehouse using Snowflake, contact us today.

Snowflake Deployment Best Practices Button (1)

rss
Facebooktwitterlinkedinmail