There are many options when it comes to data analytics tools. Choosing the right one for your organization will depend on a number of factors. Since many of the reviews and articles on these tools are focused on business users, the 2nd Watch team wanted to explore these tools from the developer’s perspective. In this developer’s guide to Power BI, we’ll go over the performance, interface, customization, and more to help you get a full understanding of this tool.
Why Power BI?
Power BI is a financially attractive alternative to the likes of Tableau and Looker, which either offer custom-tailored pricing models or a large initial per-user cost followed by an annual fee after the first year. However, don’t conflate cost with quality; getting the most out of Power BI is more dependent on your data environment and who is doing the data discovery. Companies already relying heavily on Microsoft tools should look to add Power BI to their roster, as it integrates seamlessly with SQL Server Analysis Services to facilitate faster and deeper analysis.
Performance for Developers
When working with large datasets, developers will experience some slowdown as they customize and publish their reports. Developing on Power BI works best with small-to-medium-sized data sets. At the same time, Microsoft has come out with more optimization options such as drill-through functionality, which allows for deeper analytical work for less processing power.
Performance for Users
User performance through Power BI Services is controlled through row-level security implementation. For any sized dataset, the number of rows can be limited depending on the user’s role. Overviews and executive dashboards may run somewhat slowly, but as the user’s role becomes more granular, dashboards will operate more quickly.
User Interface: Data Layer
Data is laid out in a tabular form; clicking any measure column header reveals a drop-down menu with sorting options, filtering selections, and the Data Analysis Expressions (DAX) behind the calculation.
User Interface: Relationship Layer
The source tables are draggable objects with labeled arrows between tables denoting the type of relationship.
Usability and Ease of Learning
Microsoft Power BI documentation is replete with tutorials, samples, quickstarts, and concepts for the fundamentals of development. For a more directed learning experience, Microsoft also put out the Microsoft Power BI Guided Learning set, which is a freely available collection of mini courses on modeling, visualization, and exploration of data through Power BI. It also includes an introduction to DAX development as a tool to transform data in the program. Additionally, the Power BI community forums almost always have an answer to any technical question a developer might have.
Power BI can easily connect to multiple data sources including both local folders and most major database platforms. Data can be cleaned and transformed using the Query Editor; the Editor can change data type, add columns, and combine data from multiple sources. Throughout this transformation process, the Query Editor records each step so that every time the query connects to the data source, the data is transformed accordingly. Relationships can be created by specifying a from: table and to table, the keys to relate, a cardinality, and a cross-filter direction.
In terms of data transformation, Power Query is a powerful language for ensuring that your report contains the exact data and relationships you and your business user are looking to understand. Power Query simplifies the process of data transformation with an intuitive step-by-step process for joining, altering, or cleaning your tables within Power BI. For actual report building, Power BI contains a comprehensive list of visualizations for almost all business needs; if one is not found within the default set, Microsoft sponsors a visual gallery of custom user-created visualizations that anyone is free to explore and download.
Permissions and User Roles
Adding permissions to workspaces, datasets, and reports within your org is as simple as adding an email address and setting an access level. Row-level security is enabled in Power BI Desktop; role management allows you flexibly customize access to specific data tables using DAX functions to specify conditional filters. Default security filtering is single-directional; however, bi-directional cross-filtering allows for the implementation of dynamic row-level security based on usernames and/or login IDs.
Ease of Dev Opp and Source Control
When users have access to a data connection or report, source and version control are extremely limited without external GitHub resources. Most of the available activities are at the macro level: viewing/editing reports, adding sources to gateways, or installing the application. There is no internal edit history for any reports or dashboards.
Setup and Environment
Setup is largely dependent on whether your data is structured in the cloud, on-premises, or a hybrid. Once the architecture is established, you need to create “data gateways” and assign them to different departments and data sources. This gateway acts as a secure connection between your data source and development environments. From there, security and permissions can be applied to ensure the right people within your organization have access to your gateways. When the gateways are established, data can be pulled into Power BI via Power Query and development can begin.
The most common implementation of Power BI utilizes on-premises source data and Power BI Desktop for data preparation and reporting, with Power BI Service used in the cloud to consume reports and dashboards, collaborate, and establish security. This hybrid implementation strategy takes advantage of the full range of Power BI functionality by leveraging both the Desktop and Service versions. On-premises data sources connect to Power BI Desktop for development, leading to quicker report creation (though Power BI also supports cloud-based data storage).
Summary and Key Points
Power BI is an extremely affordable and comprehensive analytics tool. It integrates seamlessly with Excel, Azure, and SQL Server, allowing for established Microsoft users to start analyzing almost instantly. The tool is easy to learn for developers and business users alike, and there are many available resources, like Microsoft mini-courses and community forums.
A couple things to be aware of with Power BI: It may lack some of the bells and whistles as compared to other analytics tools, and it’s best if you’re already in the Microsoft ecosystem and are coming in with a solid data strategy.
If you want to learn more about Power BI or any other analytics tools, contact us today to schedule a no-obligation whiteboard session.
Tableau gets a good reputation for being sleek and easy to use and by bolstering an impeccable UI/UX. It’s by and large an industry leader due to its wide range of visualizations and ability to cohesively and narratively present data to end users. As a reliable, well-established leader, Tableau can easily integrate with many sources, has extensive online support, and does not require a high level of technical expertise for users to gain value.
One of the easiest ways to ensure good performance with Tableau is to be mindful of how you import your data. Utilizing extracts rather than live data and performing joins or unions in your database reduces a lot of the processing that Tableau would otherwise have to do. While you can easily manipulate data without any coding, these capabilities reduce performance significantly, especially when dealing with large volumes of information. All data manipulation should be done in your database or data warehouse prior to adding it as a source. If that isn’t an option, Tableau offers a product called Tableau Prep that enables data manipulation and enhanced data governance capabilities.
Performance for Users
Dashboard performance for users depends almost entirely on practices employed by developers when building out reports. Limiting the dataset to information required for the goals of the dashboard reduces the amount of data Tableau processes as well as the number of filters included for front-end users. Cleaning up workbooks to reduce unnecessary visualizations will enhance front-end performance as well.
User Interface: Data Source
After connecting to your source, Tableau presents your data using the “Data Source” tab. This is a great place to check that your data was properly loaded and doesn’t have any anomalies. Within this view of the data, you have the chance to add more sources and the capability to union and join tables together as well as filter the data to a specific selection and exclude rows that were brought in.
User Interface: Worksheet
The “Worksheet” tabs are where most of the magic happens. Each visualization that ends up on the dashboard will be developed in separate worksheets. This is where you will do most of the testing and tweaking as well as where you can create any filters, parameters, or calculated fields.
User Interface: Dashboards
In the “Dashboard” tab, you bring together all of the individual visualizations you have created. The drag-and-drop UI allows you to use tiles predetermined by Tableau or float the objects to arrange them how you please. Filters can be applied to all of the visualizations to create a cohesive story or to just a few visualizations to break down information specific to a chart or table. It additionally allows you to toggle between different device layouts to ensure end-user satisfaction.
User Interface: Stories
One of the most unique Tableau features is its “Stories” capability. Stories work great when you need to develop a series of reports that present a narrative to a business user. By adding captions and placing visualizations in succession, you can convey a message that speaks for itself.
Usability and Ease of Learning
The Tableau basics are relatively easy to learn due to the intuitive point-and-click UI and vast amount of educational resources such as their free training videos. Tableau also has a strong online community where answers to specific questions can be found either on the Help page or third-party sites.
Creating an impressive variety of simple visualizations can be done without a hitch. This being said, there are a few things to watch out for:
Some tricks and more niche capabilities can easily remain undiscovered.
Complex features such as table calculations may confuse new users.
The digestible UI can be deceiving – visualizations often appear correct when the underlying data is not. One great way to check for accuracy is to right-click on the visualization and select “View Data.”
Unlike Power BI, Tableau does not allow users to create a complicated semantic layer within the tool. However, users can establish relationships between different data sources and across varied granularities through a method called data blending. One way to implement this method is by selecting the “Edit Relationships” option in the data drop-down menu.
Data blending also eliminates duplicates that may occur by using a function that returns a single value for the duplicate rows in the secondary source. Creating relationships among multiple sources in Tableau requires attention to detail as it can take some manipulation and may have unintended consequences or lead to mistakes that are difficult to spot.
The wide array of features offered by Tableau allows for highly customizable visualizations and reports. Implementing filter actions (which can apply to both worksheets and dashboards), parameters, and calculated fields empowers developers to modify the source data so that it better fits the purpose of the report. Using workarounds for calculations not explicitly available in Tableau frequently leads to inaccuracy; however, this can be combated by viewing the underlying data. Aesthetic customizations such as importing external images and the large variety of formatting capabilities additionally allow developers boundless creative expression.
Permissions and User Roles
The type of license assigned to a user determines their permissions and user roles. Site administrators can easily modify the site roles of users on the Tableau Server or Tableau Online based on the licenses they hold. The site role determines the most impactful action (e.g., read, share, edit) a specific user can make on the visualizations. In addition to this, permissions range from viewing or editing to downloading various components of a workbook. The wide variety of permissions applies to various components within Tableau. A more detailed guide to permissions capabilities can be found here.
Ease of Dev Opp and Source Control
Dev opp and source control improved greatly when Tableau implemented versioning of workbooks in 2016. This enables users to select the option to save a history of revisions, which saves a version of the workbook each time it is overwritten. This enables users to go back to previous versions of the workbook and access work that may have been lost. When accessing prior versions, keep in mind that if an extract is no longer compatible with the source, its data refresh will not work.
Setup and Environment
With all of the necessary information on your sources, setup in Tableau is a breeze. It has built-in connectors with a wide range of sources and presents your data to you upon connection. You also have a variety of options regarding data manipulation and utilizing live or static data (as mentioned above). Developers utilize the three Tableau environments based primarily on the level of interactions and security they desire.
Tableau Desktop: Full developer software in a silo; ability to connect to databases or personal files and publish work for others to access
Tableau Server: Secure environment accessed through a web browser to share visualizations across the organization; requires a license for each user
Tableau Online: Essentially the same as Tableau Server but based in the cloud with a wider range of connectivity options
Once your workbook is developed, select the server and make your work accessible for others either on Tableau Online or on Tableau Server by selecting “publish.” During this process, you can determine the specific project you are publishing and where to make it available. There are many other modifications that can be adjusted such as implementing editing permissions and scheduling refreshes of the data sources.
Summary and Key Points
Tableau empowers developers of all skill levels to create visually appealing and informative dashboards, reports, and storytelling experiences. As developers work, there is a wealth of customization options to tailor reports to their specific use case and draw boundless insights for end users. To ensure that Tableau gleans the best results for end users, keep these three notes in mind:
Your underlying data must be trustworthy as Tableau does little to ensure data integrity. Triple-check the numbers in your reports.
Ensure your development methods don’t significantly damage performance for both developers and end users.
Take advantage of the massive online community to uncover vital features and leverage others’ knowledge when facing challenges.
If you have any questions on Tableau or need help getting better insights from your Tableau dashboards, contact us for an analytics assessment.
87% of data science projects never make it beyond the initial vision into any stage of production. Even some that pass-through discovery, deployment, implementation, and general adoption fail to yield the intended outcomes. After investing all that time and money into a data science project, it’s not uncommon to feel a little crushed when you realize the windfall results you expected are not coming.
Yet even though there are hurdles to implementing data science projects, the ROI is unparalleled – when it’s done right.
Coca-Cola has used data from social media to identify its products or competitors’ products in images, increasing the depth of consumer demographics and hyper-targeting them with well-timed ads.
You can accelerate your production timelines.
GE has used artificial intelligence to cut product design times in half. Data scientists have trained algorithms to evaluate millions of design variations, narrowing down potential options within 15 minutes.
With all of that potential, don’t let your first failed attempt turn you off to the entire practice of data science. We’ve put together a list of primary reasons why data science projects fail – and a few strategies for forging success in the future – to help you avoid similar mistakes.
You lack analytical maturity.
Many organizations are antsy to predict events or decipher buyer motivations without having first developed the proper structure, data quality, and data-driven culture. And that overzealousness is a recipe for disaster. While a successful data science project will take some time, a well-thought-out data science strategy can ensure you will see value along the way to your end goal.
Effective analytics only happens through analytical maturity. That’s why we recommend organizations conduct a thorough current state analysis before they embark on any data science project. In addition to evaluating the state of their data ecosystem, they can determine where their analytics falls along the following spectrum:
Descriptive Analytics: This type of analytics is concerned with what happened in the past. It mainly depends on reporting and is often limited to a single or narrow source of data. It’s the ground floor of potential analysis.
Diagnostic Analytics: Organizations at this stage are able to determine why something happened. This level of analytics delves into the early phases of data science but lacks the insight to make predictions or offer actionable insight.
Predictive Analytics: At this level, organizations are finally able to determine what could happen in the future. By using statistical models and forecasting techniques, they can begin to look beyond the present into the future. Data science projects can get you into this territory.
Prescriptive Analytics: This is the ultimate goal of data science. When organizations reach this stage, they can determine what they should do based on historical data, forecasts, and the projections of simulation algorithms.
Your project doesn’t align with your goals.
Data science, removed from your business objectives, always falls short of expectations. Yet in spite of that reality, many organizations attempt to harness machine learning, predictive analytics, or any other data science capability without a clear goal in mind. In our experience, this happens for one of two reasons:
1. Stakeholders want the promised results of data science but don’t understand how to customize the technologies to their goals. This leads them to pursue a data-driven framework that’s prevailed for other organizations while ignoring their own unique context.
2. Internal data scientists geek out over theoretical potential and explore capabilities that are stunning but fail to offer practical value to the organization.
Outside of research institutes or skunkworks programs, exploratory or extravagant data science projects have a limited immediate ROI for your organization. In fact, the odds are very low that they’ll pay off. It’s only through a clear vision and practical use cases that these projects are able to garner actionable insights into products, services, consumers, or larger market conditions.
Every data science project needs to start with an evaluation of your primary goals. What opportunities are there to improve your core competency? Are there any specific questions you have about your products, services, customers, or operations? And is there a small and easy proof of concept you can launch to gain traction and master the technology?
The above use case from GE is a prime example of having a clear goal in mind. The multinational company was in the middle of restructuring, reemphasizing its focus on aero engines and power equipment. With the goal of reducing their six- to 12-month design process, they decided to pursue a machine learning project capable of increasing the efficiency of product design within their core verticals. As a result, this project promises to decrease design time and budget allocated for R&D.
Organizations that embody GE’s strategy will face fewer false starts with their data science projects. For those that are still unsure about how to adapt data-driven thinking to their business, an outsourced partner can simplify the selection process and optimize your outcomes.
Your solution isn’t user-friendly.
The user experience is often an overlooked aspect of viable data science projects. Organizations do all the right things to create an analytics powerhouse customized to solve a key business problem, but if the end users can’t figure out how to use the tool, the ROI will always be weak. Frustrated users will either continue to rely upon other platforms that provided them with limited but comprehensible reporting capabilities, or they will stumble through the tool without unlocking its full potential.
Your organization can avoid this outcome by involving a range of end users in the early stages of project development. This means interviewing both average users and extreme users. What are their day-to-day needs? What data are they already using? What insight do they want but currently can’t obtain?
An equally important task is to determine your target user’s data literacy. The average user doesn’t have the ability to derive complete insights from the represented data. They need visualizations that present a clear-cut course of action. If the data scientists are only thinking about how to analyze complex webs of disparate data sources and not whether end users will be able to decipher the final results, the project is bound to struggle.
You don’t have data scientists who know your industry.
Even if your organization has taken all of the above considerations into mind, there’s still a chance you’ll be dissatisfied with the end results. Most often, it’s because you aren’t working with data science consulting firms that comprehend the challenges, trends, and primary objectives of your industry.
Take healthcare, for example. Data scientists who only grasp the fundamentals of machine learning, predictive analytics, or automated decision-making can only provide your business with general results. The right partner will have a full grasp of healthcare regulations, prevalent data sources, common industry use cases, and what target end users will need. They can address your pain points and already know how to extract full value for your organization.
And here’s another example from one of our own clients. A Chicago-based retailer wanted to use their data to improve customer lifetime value, but they were struggling with a decentralized and unreliable data ecosystem. With the extensive experience of our retail and marketing team, we were able to outline their current state and efficiently implement a machine-learning solution that empowered our client. As a result, our client was better able to identify sales predictors and customize their marketing tactics within their newly optimized consumer demographics. Our knowledge of their business and industry helped them to get the full results now and in the future.
Is your organization equipped to achieve meaningful results through data science? Secure your success by working with 2nd Watch. Schedule a whiteboard session with our team to get you started on the right path.
Insurance providers are rich with data far beyond what they once had at their disposal for traditional historical analysis. The quantity, variety, and complexity of that data enhance the ability of insurers to gain greater insights into consumers, market trends, and strategies to improve their bottom line. But which projects offer you the best return on your investment? Here’s a glimpse at some of the most common insurance analytics project use cases that can transform the capabilities of your business.
Use your historical data to predict when a customer is most likely to buy a new policy.
Both traditional insurance providers and digital newcomers are competing for the same customer base. As a result, acquiring new customers requires targeted outreach with the right message at the moment a buyer is ready to purchase a specific type of insurance.
Predictive analytics allows insurance companies to evaluate the demographics of the target audience, their buying signals, preferences, buying patterns, pricing sensitivity, and a variety of other data points that forecast buyer readiness. This real-time data empowers insurers to reach policyholders with customized messaging that makes them more likely to convert.
Quoting Accurate Premiums
Provide instant access to correct quotes and speed up the time to purchase.
Consumers want the best value when shopping for insurance coverage, but if their quote fails to match their premium, they’ll take their business elsewhere. Insurers hoping to acquire and retain policyholders need to ensure their quotes are precise – no matter how complex the policy.
For example, one of our clients wanted to provide ride-share drivers with four-hour customized micro policies on-demand. Using real-time analytical functionality, we enabled them to quickly and accurately underwrite policies on the spot.
Improving Customer Experience
Better understand your customer’s preferences and optimize future interactions.
A positive customer experience means strong customer retention, a better brand reputation, and a reduced likelihood that a customer will leave you for the competition. In an interview with CMSWire, the CEO of John Hancock Insurance said many customers see the whole process as “cumbersome, invasive, and long.” A key solution is reaching out to customers in a way that balances automation and human interaction.
For example, the right analytics platform can help your agents engage policyholders at a deeper level. It can combine the customer story and their preferences from across customer channels to provide more personalized interactions that make customers feel valued.
Stop fraud before it happens.
You want to provide all of your customers with the most economical coverage, but unnecessary costs inflate your overall expenses. Enterprise analytics platforms enable claims analysis to evaluate petabytes of data to detect trends that indicate fraud, waste, and abuse.
See for yourself how a tool like Tableau can help you quickly spot suspicious behavior with visual insurance fraud analysis.
Now, high-powered analytics has the potential to provide insurers with a real-time understanding of loss ratios, using a wide range of data points to evaluate which of your customers are underpaying or overpaying.
What percent of your enterprise data goes completely untapped? It’s far more than most organizations realize. Research suggests that as much as 68% of global enterprise data goes unused. The reasons are varied (we can get to the root cause with a current state assessment), but one growing problem stems from misconceptions about CRMs, ERPs, EHRs, and similar operational software systems.
The right operational software systems are valuable tools with their own effective reporting functions. The foundation of any successful reporting or analytics initiative depends on two factors: on a centralized source of truth that exists in a unified source format. All operational software systems struggle to satisfy either aspect of that criteria.
Believe it or not, one of the most strategic systems for data-driven decision-making is still a dedicated data warehouse. Here is the value a data warehouse brings to your organization and the necessary steps to implement that enhance your analytics’ accuracy and insight.
CRMs and ERPs Are Data Silos with Disparate Formats
Operational software systems are often advertised as offering a unified view, but that’s only true for their designed purpose. CRMs offer a comprehensive view of customers, ERPs of operations, and EHRs of patient or member medical history. Outside of their defined parameters, these systems are data silos.
In an HBR blog post, Edd Wilder-James captures the conundrum perfectly: “You can’t cleanly separate the data from its intended use. Depending on your desired application, you need to format, filter, and manipulate the data accordingly.”
Some platforms are enabled to integrate outside data sources, but even that provides you with a filtered view of your data, not the raw and centralized view necessary to generate granular and impactful reports. It’s the difference between abridged and unabridged books – you might glean chunks of the big picture but miss entire sections or chapters that are crucial to the overall story.
Building a dedicated data warehouse removes the question of whether your data sets are complete. You can extract, transfer, and load data from source systems into star schemas with a unified format optimized for business users to leverage. The data is formatted around the business process rather than the limitations of the tool. That way, you can run multifaceted reports or conduct advanced analytics when you need it – without anchoring yourself to any specific technology.
Tracking Down Your Data Sources
In all honesty, organizations not familiar with the process often overlook vital information sources. There might be a platform used to track shipping that only one member of your team uses. Maybe there’s a customer service representative who logs feedback in an ad hoc document. Or it’s possible there’s HIPAA-compliant software in use that isn’t automatically loading into your EHR. Regardless of your industry, there are likely gaps in your knowledge well outside of the CRMs, ERPs, EHRs, and other ostensibly complete data sources.
How do you build a single source of truth? It’s not as simple as shifting around a few sources. Implementing a dedicated data warehouse requires extensive planning and preparation. The journey starts with finding the invisible web of sources outside of your primary operational software systems. Those organizations that choose to forgo a full-fledged current state assessment to identify those hidden sources only achieve fragmentary analytics at best.
Data warehouse implementations need guidance and buy-in at the corporate level. That starts with a well-defined enterprise data strategy. Before you can create your strategy, you need to ask yourself questions such as these:
What are your primary business objectives?
What are your key performance indicators?
Which source systems contribute to those goals?
Which source systems are we currently using across the enterprise?
By obtaining the answers to these and other questions from decision-makers and end users, you can clarify the totality of your current state. Otherwise, hunting down those sources is an uphill battle.
Creating Data Warehouse Value that Lasts
Consolidating your dispersed data sources is just a starting point. Next, you need to extract the data from each source system and populate them within the data warehouse framework itself. A key component of this step is to test data within your warehouse to verify quality and completeness.
If data loss occurs during the ETL process, the impact of your work and veracity of your insights will be at risk. Running a variety of different tests (e.g., data accuracy, data completeness, data transformation, etc.) will reduce the possibility of any unanticipated biases in your single source of truth.
What about maintaining a healthy and dynamic data warehouse? How often should you load new data? The answer depends on the frequency of your reporting needs. As a rule of thumb, think in terms of freshness. If your data has gone stale by the time you’re loading it into your data warehouse, increase the frequency of your data refresh. Opt for real-time analytics if it will provide you with a strategic advantage, not because you want to keep current with the latest buzzword.
Improving Your Results with an Outsourced Partner
Each step in the process comes with its own complications. It’s easy to fall into common data warehousing pitfalls unless you have internal resources with experience pinpointing hidden data sources, selecting the right data model, and maintaining your data warehouse post-implementation.
One of our clients in the healthcare software space was struggling to transition to a dynamic data warehousing model that could enhance their sales. Previously, they had a reporting application that they were using on a semi-annual basis. Though they wanted to increase the frequency of their reporting and enable multiple users to run reports simultaneously, they didn’t have the internal expertise to confidently navigate these challenges.
Working with 2nd Watch made a clear difference. Our client was able to leverage a data warehouse architecture that provided daily data availability (in addition to the six-month snapshot) and self-service dashboards that didn’t require changes or updates on their part. We also set them on the right path to leverage a single source of the truth through future developments.
Our strategies in that project prioritized our client’s people instead of a specific technology. We considered the reporting and analytics needs of their business users rather than pigeonholing their business into a specific tool. Through our tech-agnostic approach, we guided them toward a future state that provided strategic advantage and a clear ROI that might have otherwise gone unachieved.
Want your data warehouse to provide you with a single source of the truth? Schedule a whiteboard session to review your options and consolidate your data into actionable insight.
As dashboards and reports become more and more complex, slow run times can present major roadblocks. Here’s a collection of some of the top tips on how to improve dashboard performance and cut slow run times when using Tableau, Power BI, and Looker.
Before getting into how to improve dashboard performance within the three specific tools, here are a few universal principles that will lead to improved performance in almost any case.
Limit logic used in the tool itself: If you’re generating multiple calculated tables/views, performing complex joins, or adding numerous calculations in the BI tool itself, it’s a good idea for performance and governance to execute all those steps in the database or a separate business layer. The more data manipulation done by your BI tool, the more queries and functions your tool has to execute itself before generating visualizations.
Note: This is not an issue for Looker, as Looker offloads all of its computing onto the database via SQL.
Have the physical data available in the needed format: When the physical data in the source matches the granularity and level of aggregation in the dashboard, the BI tool doesn’t need to execute a function to aggregate it. Developing this in the data mart/warehouse can be a lot of work but can save a lot of time and pain during dashboard development.
Keep your interface clean and dashboards focused: Consolidate or delete unused report pages, data sources, and fields. Limiting the number of visualizations on each dashboard also helps cut dashboard refresh time.
Simplify complex strings: In general, processing systems execute functions with strings much more slowly than ints or booleans. Where possible, convert fields like IDs to ints and avoid complex string calculations.
Take advantage of built-in performance tracking: Always the sleek, powerful, and intuitive leading BI tool, Tableau has a native function that analyzes performance problem areas. The performance recorder tells you which worksheets, queries, and dashboards are slow and even shows you the query text.
Execute using extracts rather than live connections: Tableau performs much faster when executing queries on extracts versus live connections. Use extracts whenever possible, and keep them trimmed down to limit query execution time. If you want to stream data or have a constantly refreshing dataset, then extracts won’t be an option.
Again, limit logic: Tableau isn’t built to handle too much relational modeling or data manipulation – too many complex joins or calculations really slow down its processing. Try to offload as many of these steps as possible onto the database or a semantic layer.
Limit marks and filters: Each mark included on a visualization means more parsing that Tableau needs to perform, and too many filters bog down the system. Try instead to split complex worksheets/visualizations into multiple smaller views and connect them with filter actions to explore those relationships more quickly.
Further Sources: Tableau’s website has a succinct and very informative blog post that details most of these suggestions and other specific recommendations. You can find it here.
Understand the implications of DirectQuery: Similar in concept to Tableau’s extract vs. live connection options, import and DirectQuery options for connecting to data sources have different impacts on performance. It’s important to remember that if you’re using DirectQuery, the time required to refresh visuals is dependent on how long the source system takes to execute Power BI’s query. So if your database server is flushed with users or operating slowly for some other reason, you will have slow execution times in Power BI and the query may time out. (See other important considerations when using DirectQuery here.)
Utilize drillthrough: Drillthrough pages are very useful for data exploration and decluttering reports, but they also have the added benefit of making sure your visuals and dashboards aren’t overly complex. They cut down query execution time and improve runtime while still allowing for in-depth exploration.
Be careful with row-level security: Implementing row-level security has powerful and common security use cases, but unfortunately, its implementation has the tendency to bog down system performance. When RLS is in place, Power BI has to query the backend and generate caching separately for each user role. Try to create only as many roles as absolutely necessary, and be sure to test each role to know the performance implications.
Further Sources: Microsoft’s Power BI documentation has a page dedicated to improving performance that further details these options and more. Check it out here.
Utilize dashboard links: Looker has a wonderful functionality that allows for easy URL linking in their drill menus. If you’re experiencing long refresh times, a nifty remedy is to split up your dashboard into different dashboards and provide links between them in drill menus.
Improve validation speed: LookML validation checks the entire project – all model, view, and LookML dashboard files. Increased complexity and crossover between logic in your files lead to longer validation time. If large files and complex relationships make lag in validation time problematic, it can be a good idea to break up your projects into smaller pieces where possible. The key here is handling complex SQL optimally by utilizing whatever methods will maximize SQL performance on the database side.
Pay attention to caching: Caching is another important consideration with Looker performance. Developers should be very intentional with how they set up caching and the conditions for dumping and refreshing a cache, as this will greatly affect dashboard runtime. See Looker’s documentation for more information on caching.
Optimize performance with Persistent Derived Tables (PDTs) and Derived Tables (DTs): Caching considerations come into play when deciding between using PDTs and DTs. A general rule of thumb is that if you’re using constantly refreshing data, it’s better to use DTs. If you’re querying the database once and then developing heavily off of that query, PDTs can greatly increase your performance. However, if your PDTs themselves are giving you performance issues, check out this Looker forum post for a few remedies.
Further Sources: Looker’s forums are rich with development tips. These two forum pages are particularly helpful to learn more about how to improve dashboard performance using Looker:
Enhanced predictions. Dynamic forecasting. Increased profitability. Improved efficiency. Data science is the master key to unlock an entire world of benefits. But is your business even ready for data science solutions? Or more importantly, is your business ready to get the full ROI from data science?
Let’s look at the overall market for some answers. Most organizations have increased their ability to use their data to their advantage in recent years. BCG surveys have shown that the average organization has moved beyond the “developing” phase of data maturity into a “mainstream” phase. This means more organizations are improving their analytics capabilities, data governance, data ecosystems, and data science use cases. However, there’s still a long way to go until they are maximizing the value of their data.
So, yes, there is a level of functional data science that many organizations are exploring and capable of reaching. Yet if you want to leverage data science to deliver faster and more complete insights (and ROI), your business needs to ensure that the proper data infrastructure and the appropriate internal culture exist.
The following eight tips will help your machine learning projects, predictive analytics, and other data science initiatives operate with greater efficiency and speed. Each of these tips will require an upfront investment of time and money, but they are fundamental in making sure your data science produces the ROI you want.
Laying the Right Foundation with Accurate, Consistent, and Complete Data
Tip 1: Before diving into data science, get your data in order.
Raw data, left alone, is mostly an unruly mess. It’s collected by numerous systems and end users with incongruous attention to detail. After it’s gathered, the data is often subject to migrations, source system changes, or unpredictable system errors that alter the quality even further. While you can conduct data science projects without first focusing on proper data governance, what ends up on your plate will vary greatly – and comes with a fair amount of risk.
Consider this hypothetical example of predictive analytics in manufacturing. A medium-sized manufacturer wants to use predictive maintenance to help lower the risk and cost of an avoidable machine breakdown (which can easily amount to $22,000 per minute). But first, they need to train a machine learning algorithm to predict impending breakdowns using their existing data. If the data’s bad, then the resulting detection capabilities might result in premature replacements or expensive disruptions.
Tip 2: Aim to create a single source of truth with your data.
Unifying data from assorted sources into a modern data warehouse or data mart simplifies the entire analytical process. Organizations should always start by implementing data ingestion best practices to extract and import high-quality data into the destination source. From there, it’s critical to build a robust data pipeline that maintains the flow of quality data into your warehouse.
Tip 3: Properly cleanse and standardize your data.
Each department in your organization has its own data sources, formats, and definitions. Before your data can be data science-ready and generate accurate predictions, it must be cleansed, standardized, and devoid of duplicates before it ever reaches your analytics platform or data science tool. Only through effective data cleansing and data governance strategy can you reach that level.
Tip 4: Don’t lean on your data scientist to clean up the data.
Sure, data scientists are capable of cleaning up and preparing your data for data science, but pulling them into avoidable data manipulation tasks slows down your analytical progress and impacts your data science initiatives. Leaning on your data scientist to complete these tasks can also lead to frustrated data scientists and increase turnover.
It’s not that data scientists shouldn’t do some data cleansing and manipulation from time to time; it’s that they should only be doing it when it’s necessary.
Tip 5: Create a data-driven culture.
Your data scientist or data science consulting partner can’t be the only ones with data on the mind. Your entire team needs to embrace data-driven habits and practices, or your organization will struggle to obtain meaningful insights from your data.
Frankly, most businesses have plenty of room to grow in this regard. For those looking to implement a data-driven culture before they forge deep into the territory of data science, you need to preach from the top down – grassroots data implementations will never take hold. Your primary stakeholders need to believe not only in the possibility of data science but in the cultivation of practices that fortify robust insights.
A member of your leadership team, whether a chief data officer or another senior executive, needs to ensure that your employees adopt data science tools, observe habits that foster data quality, and connect business objectives to this in-depth analysis.
Tip 6: Train your whole team on data science.
Data science is no longer just for data scientists. A variety of self-service tools and platforms have allowed ordinary end users to leverage machine learning algorithms, predictive analytics, and similar disciplines in unprecedented ways.
With the right platform, your team should be able to conduct sophisticated predictions, forecasts, and reporting to unlock rich insight from their data. What that takes is the proper training to acclimate your people to their newfound capabilities and show the practical ways data science can shape their short- and long-term goals.
Tip 7: Keep your data science goals aligned with your business goals.
Speaking of goals, it’s just as important for data-driven organizations to inspect the ways in which their advanced analytical platforms connect with their business objectives. Far too often, there’s disconnection and data science projects either prioritize lesser goals or pursue abstract and impractical intelligence. If you determine which KPIs you want to improve with your analytical capabilities, you have a much better shot at eliciting the maximum results for your organization.
Tip 8: Consider external support to lay the foundation.
Though these step-by-step processes are not mandatory, focusing on creating a heartier and cleaner data architecture as well as a culture that embraces data best practices will set you in the right direction. Yet it’s not always easy to navigate on your own.
With the help of data science consulting partners, you can make the transition in ways that are more efficient and gratifying in the long run.
Need some support getting your business ready for data science? 2nd Watch’s team of data management, analytics, and data science consultants can help you ensure success with your data science initiatives from building the business case and creating a strategy to data preparation and building models.
With your experience in the insurance industry, you understand more than most about how the actions of a smattering of people can cause disproportionate damage. The $80 billion in fraudulent claims paid out across all lines of insurance each year, whether soft or hard fraud, is perpetrated by lone individuals, sketchy auto mechanic shops, or the occasional organized crime group. The challenge for most insurers is that detecting, investigating, and mitigating these deceitful claims is a time-consuming and expensive process.
Rather than accepting loss to fraud as part of the cost of doing business, some organizations are enhancing their detection capabilities with insurance analytics solutions. Here is how your organization can use insurance fraud analytics to enhance fraud detection, uncover emerging criminal strategies, and still remain compliant with data privacy regulations.
Recognizing Patterns Faster
When you look at exceptional claim’s adjusters or special investigation units, one of the major traits they all share is an uncanny ability to recognize fraudulent patterns. Their experience allows them to notice the telltale signs of fraud, whether it’s frequent suspicious estimates from a body shop or complex billing codes intended to hide frivolous medical tests. Though you trust adjusters, many rely on heuristic judgments (e.g., trial and error, intuition, etc.) rather than hard rational analysis. When they do have statistical findings to back them up, they struggle to keep up with the sheer volume of claims.
This is where machine learning techniques can help to accelerate pattern recognition and optimize the productivity of adjusters and special investigation units. An organization starts by feeding a machine learning model a large data set that includes verified legitimate and fraudulent claims. Under supervision, the machine learning algorithm reviews and evaluates the patterns across all claims in the data set until it has mastered the ability to spot fraud indicators.
Let’s say this model was given a training set of legitimate and fraudulent auto insurance claims. While reviewing the data for fraud, the algorithm might spot links in deceptive claims between extensive damage in a claim and a lack of towing charges from the scene of the accident. Or it might notice instances where claims involve rental cars rented the day of the accident that are all brought to the same body repair shop. Once the algorithm begins to piece together these common threads, your organization can test the model’s unsupervised ability to create a criteria for detecting deception and spot all instances of fraud.
What’s important in this process is finding a balance between fraud identification and instances of false positives. If your program is overzealous, it might create more work for your agents, forcing them to prove that legitimate claims received an incorrect label. Yet when the machine learning model is optimized, it can review a multitude of dimensions to identify the likelihood of fraudulent claims. That way, if an insurance claim is called into question, adjusters can comb through the data to determine if the claim should truly be rejected or if the red flags have a valid explanation.
Detecting New Strategies
The ability of analytics tools to detect known instances of fraud is only the beginning of their full potential. As with any type of crime, insurance fraud evolves with technology, regulations, and innovation. With that transformation comes new strategies to outwit or deceive insurance companies.
One recent example has emerged through automation. When insurance organizations began to implement straight through processing (STP) in their claim approvals, the goal was to issue remittances more quickly, easily, and cheaply than manual processes. For a time, this approach provided a net positive, but once organized fraudsters caught wind of this practice, they pounced on a new opportunity to deceive insurers.
Criminals learned to game the system, identifying amounts that were below the threshold for investigation and flying their fraudulent claims under the radar. In many cases, instances of fraud could potentially double without the proper tools to detect these new deception strategies. Though most organizations plan to enhance their anti-fraud technology, there’s still the potential for them to lose millions in errant claims – if their insurance fraud analytics are not programmed to detect new patterns.
In addition to spotting red flags for common fraud occurrences, analytics programs need to be attuned to any abnormal similarities or unlikely statistical trends. Using cluster analysis, an organization can detect statistical outliers and meaningful patterns that reveal potential instances of fraud (such as suspiciously identical fraud claims).
Even beyond the above automation example, your organization can use data discovery to find hidden indicators of fraud and predict future incidents. Splitting claims data into various groups through a few parameters (such as region, physician, billing code, etc., in healthcare) can help in detecting unexpected correlations or warning signs for your automation process or even human adjusters to flag as fraud.
Safeguarding Personally Identifiable Information
As you work to improve your fraud detection, there’s one challenge all insurers face: protecting the personally identifiable information (PII) of policyholders while you analyze your data. The fines related to HIPAA violations can amount to $50,000 per violation, and other data privacy regulations can result in similarly steep fines. The good news is that insurance organizations can balance their fraud prediction and data discovery with security protocols if their data ecosystem is appropriately designed.
Maintaining data privacy compliance and effective insurance fraud analytics requires some maneuvering. Organizations that derive meaningful and accurate insight from their data must first bring all of their disparate data into a single source of truth. Yet, unless they also implement access control through a compliance-focused data governance strategy, there’s a risk of regulatory violations while conducting fraud analysis.
One way to limit your exposure is to create a data access layer that tokenizes the data, replacing any sensitive PII with unique identification symbols to keep data separate. Paired with clear data visualization capabilities, your adjusters and special investigation units can see clear-cut trends and evolving strategies without revealing individual claimants. From there, they can take their newfound insights into any red flag situation, saving your organization millions while reducing the threat of noncompliance.
Want to learn more about how the right analytics solutions can help you reduce your liability, issue more policies, and provide better customer service? Check out our insurance analytics solutions page for use cases that are transforming your industry.
In part 1 and part 2 of our modern data warehouse series, we laid out the benefits of a data warehouse and compared the different types of modern data warehouses available. In part 3, we take a step back and see how the modern data warehouse fits in your overall data architecture.
A modern data warehouse is just one piece of the puzzle of a modern data architecture that will ultimately provide insights to the business via reporting, dashboarding, and advanced analytics.
There are many factors to consider when it comes to modern data warehousing, and it’s important to understand upfront that it’s a huge endeavor. With that in mind, a well-designed modern data warehouse will help your organization grow and stay competitive in our ever-changing world.
The ultimate goal of modern architecture is to facilitate the movement of data not only to the data warehouse but also to other applications in the enterprise. The truth of the matter is that a modern data architecture is designed very similarly to how we at 2nd Watch would design an on-premise or traditional data architecture, though with some major differences. Some of the benefits of a modern data architecture are as follows:
Tools and technology available today allow the development process to speed up tremendously.
Implementation of near real-time scenarios is much more cost-effective and easier to implement utilizing cloud technologies.
With some SaaS providers, you can worry much less about the underlying hardware, indexing, backups, and database maintenance and more about the overall business solution.
While technology advances have removed some of the technical barriers experienced in on-premises systems, data must still be modeled in a way that supports goals, business needs, and specific use cases.
Below you will find a high-level diagram of a modern data architecture we use at 2nd Watch, along with a description of the core components of the architecture:
Technical details aside, 2nd Watch’s architecture provides key benefits that will add value to any business seeking a modern data warehouse. The raw data layer enables the ingestion of all forms of data, including unstructured data. In addition, the raw layer keeps your data safe by eliminating direct user access and creating historical backups of your source data. This historical record of data can be accessed for data science use cases as well as modeled for reports and dashboards to show historical trends over time.
The transformation-focused data hub enables easy access to data from various source systems. For example, imagine you have one customer that can be tracked across several subsidiary companies. The business layer would enable you to track their activity across all of your business lines by conforming the various data points into one source of truth. Furthermore, the business layer allows your organization to add additional data sources without disrupting your current reporting and solutions.
The enterprise data warehouse provides a data layer structured with reporting in mind. It ensures that any reports and dashboards update quickly and reliably, and it provides data scientists with reliable data structured for use in models. Overall, the modern data warehouse architecture enables you to provide your end users with near real-time reporting, allowing them to act on insights as they occur. Each component of the architecture provides unique business value that translates into a competitive advantage.
If you depend on your data to better serve your customers, streamline your operations, and lead (or disrupt) your industry, a modern data platform built on the cloud is a must-have for your organization.
Blockchain is one of those once-in-a-generation technologies that has the potential to really change the world around us. Despite this, blockchain is something that a lot of people still know nothing about. Part of that, of course, is because it’s such a new piece of technology that really only became mainstream within the past few years. The main reason, though, (and to address the elephant in the room) is because blockchain is associated with what some describe as “fake internet money” (i.e., Bitcoin). The idea of a decentralized currency with no guarantor is intimidating, but let’s not let that get in the way of what could be a truly revolutionary technology. So, before we get started, let’s remove the Bitcoin aspect and simply focus on blockchain. (Don’t worry, we’ll pick it back up later on.)
Blockchain, at its very core, is a database. But blockchains are different from traditional databases in that they are immutable, unable to be changed. Imagine this: Once you enter information into your shiny new blockchain, you don’t have to worry about anybody going in and messing up all your data. “But how is this possible?” you might ask.
Blockchains operate by taking data and structuring it into blocks (think of a block like a record in a database). This can be any kind information, from names and numbers all the way to executable code scripts. There are a few essential pieces of information that should be placed in all blocks, those being an index (the block number), a timestamp, and the hash (more on this later) of the previous block. All of this data is compiled into a block, and a hashing algorithm is applied to the information.
After the hash is computed, the information is locked and you can’t change information without re-computing the hash. This hash is then passed on to the next block where it gets included in its data, creating a chain. The second block then compiles all of its own data and, including the hash of the previous block, creates a new hash and sends it to the next block in the chain. In this way, a blockchain is created by “chaining” together blocks by means of a block’s unique hash. In other words, the hash of one block is reliant on the hash of the previous block, which is reliant on that of the one before it, ad infinitum.
And there you go, you have a blockchain! Before we move on to the next step (which will really blow your mind), let’s recap:
You have Block-0. Information is packed into Block-0 and hashed, giving you Hash-0. Hash-0 is passed to Block-1, which is combined with the data in Block-1. So, Block-1’s data now includes its own information and Hash-0. This is now hashed to release Hash-1, and it’s passed to the next block.
The second major aspect of blockchain is that it is distributed. This means that the entire protocol is operated across a network of nodes at the same time. All of the nodes in the network store the entire chain, along with all new blocks, at the same time and in real time.
Secure Data Is Good Data
Remember earlier when we said a blockchain is immutable? Let’s go back to that.
Suppose you have a chain 100 blocks long and running on 100 nodes at once. Now let’s say you want to stage an attack on this blockchain to change Block-75. Because the chain is run and stored across 100 nodes simultaneously, you have to instantaneously change Block-75 in all 100 nodes at the same time. Let’s imagine somehow you are able to hack into those other nodes to do this; now you have to rehash everything from Block-75 to Block-100 (which, remember, rehashing is extremely computationally difficult). So while you (the singular malicious node) are trying to rehash all of those blocks, the other 99 nodes in the network are working to hash new blocks, thereby extending the chain. This makes it impossible for a compromised chain to become valid because it will never reach the same length of the original chain.
About That Bitcoin Thing…
Now, there are two types of blockchains. Most popular blockchains are public, in which anybody in the world is able to join and contribute to the network. This requires some incentive, as without it nobody would join the network, and this comes in the form of “tokens” or “coins” (i.e., Bitcoin). In other words, Bitcoin is an incentive for people to participate and ensure the integrity of the chain. Then there are permissioned chains, which are run by individuals, organizations, or conglomerates for their own reasons and internal uses. In permissioned chains, only nodes with certain permissions are able to join and be involved in the network.
And there you go, you have the basics of blockchain. At a fundamental level, it’s an extremely simple yet ingenious idea with applications for supply chains, smart contracts, auditing, and many more to come. However, like any promising new technology, there are still questions, pitfalls, and risks to be explored. If you have any questions about this topic or want to discuss the potential for blockchain in your organization, contact us here.