Real-time analytics. Streaming analytics. Predictive analytics. These buzzwords are thrown around in the business world without a clear-cut explanation of their full significance. Each approach to analytics presents its own distinct value (and challenges), but it’s tough for stakeholders to make the right call when the buzz borders on white noise.
Which data analytics solution fits your current needs? In this post, we aim to help businesses cut through the static and clarify modern analytics solutions by defining real-time analytics, sharing use cases, and providing an overview of the players in the space.
TL;DR
Real-time or streaming analytics allows businesses to analyze complex data as it’s ingested and gain insights while it’s still fresh and relevant.
Real-time analytics has a wide variety of uses, from preventative maintenance and real-time insurance underwriting to improving preventive medicine and detecting sepsis faster.
To get the full benefits of real-time analytics, you need the right tools and a solid data strategy foundation.
What is Real-Time Analytics?
In a nutshell, real-time or streaming analysis allows businesses to access data within seconds or minutes of ingestion to encourage faster and better decision-making. Unlike batch analysis, data points are fresh and findings remain topical. Your users can respond to the latest insight without delay.
Yet speed isn’t the sole advantage of real-time analytics. The right solution is equipped to handle high volumes of complex data and still yield insight at blistering speeds. In short, you can conduct big data analysis at faster rates, mobilizing terabytes of information to allow you to strike while the iron is hot and extract the best insight from your reports. Best of all, you can combine real-time needs with scheduled batch loads to deliver a top-tier hybrid solution.
How does the hype translate into real-world results? Depending on your industry, there is a wide variety of examples you can pursue. Here are just a few that we’ve seen in action:
Next-Level Preventative Maintenance
Factories hinge on a complex web of equipment and machinery working for hours on end to meet the demand for their products. Through defects or standard wear and tear, a breakdown can occur and bring production to a screeching halt. Connected devices and IoT sensors now provide technicians and plant managers with warnings – but only if they have the real-time analytics tools to sound the alarm.
Azure Stream Analytics is one such example. You can use Microsoft’s analytics engine to monitor multiple IoT devices and gather near-real-time analytical intelligence. When a part needs a replacement or it’s time for routine preventative maintenance, your organization can schedule upkeep with minimal disruption. Historical results can be saved and integrated with other line-of-business data to cast a wider net on the value of this telemetry data.
Real-Time Insurance Underwriting
Insurance underwriting is undergoing major changes thanks to the gig economy. Rideshare drivers need flexibility from their auto insurance provider in the form of modified commercial coverage for short-term driving periods. Insurance agencies prepared to offer flexible micro policies that reflect real-time customer usage have the opportunity to increase revenue and customer satisfaction.
In fact, one of our clients saw the value of harnessing real-time big data analysis but lacked the ability to consolidate and evaluate their high-volume data. By partnering with our team, they were able to create real-time reports that pulled from a variety of sources ranging from driving conditions to driver ride-sharing scores. With that knowledge, they’ve been able to tailor their micro policies and enhance their predictive analytics.
Healthcare Analytics
How about this? Real-time analytics saves lives. Death by sepsis, an excessive immune response to infection that threatens the lives of 1.7 million Americans each year, is preventable when diagnosed in time. The majority of sepsis cases are not detected until manual chart reviews conducted during shift changes – at which point, the infection has often already compromised the bloodstream and/or vital tissues. However, if healthcare providers identified warning signs and alerted clinicians in real time, they could save multitudes of people before infections spread beyond treatment.
HCA Healthcare, a Nashville-based healthcare provider, undertook a real-time healthcare analytics project with that exact goal in mind. They created a platform that collects and analyzes clinical data from a unified data infrastructure to enable up-to-the-minute sepsis diagnoses. Gathering and analyzing petabytes of unstructured data in a flash, they are now able to get a 20-hour early warning sign that a patient is at risk of sepsis. Faster diagnoses results in faster and more effective treatment.
That’s only the tip of the iceberg. For organizations in the healthcare payer space, real-time analytics has the potential to improve member preventive healthcare. Once again, real-time data from smart wearables, combined with patient medical history, can provide healthcare payers with information about their members’ health metrics. Some industry leaders even propose that payers incentivize members to make measurable healthy lifestyle choices, lowering costs for both parties at the same time.
Getting Started with Real-Time Analysis
There’s clear value produced by real-time analytics but only with the proper tools and strategy in place. Otherwise, powerful insight is left to rot on the vine and your overall performance is hampered in the process. If you’re interested in exploring real-time analytics for your organization, contact us for an analytics strategy session. In this session lasting 2-4 hours, we’ll review your current state and goals before outlining the tools and strategy needed to help you achieve those goals.
While most servers spend the majority of their time well below peak usage, companies often pay for max usage 24/7.
Cloud providers enable the ability to scale usage up and down, but determining the right schedule is highly prone to human error.
Machine learning models can be used to predict server usage throughout the day and scale the servers to that predicted usage.
Depending on the number of servers, savings can be in the millions of dollars.
How big of a server do you need? Do you know? Enough to handle peak load, plus a little more headroom? How often is your server going to run at peak utilization? For two hours per day? Ten hours? If your server is only running at two hours per day at peak load, then you are paying for 22 hours of peak performance that you aren’t using. Multiply that inefficiency across many servers, and that’s a lot of money spent on compute power sitting idle.
Cloud Providers Make Scaling Up and Down Possible (with a Caveat)
If you’ve moved off-premise and are using a cloud provider such as AWS or Azure, it’s easy to reconfigure server sizes if you find that you need a bigger server or if you’re not fully utilizing the compute, as in the example above. You can also schedule these servers to resize if there are certain times where the workload is heavier. For example, scheduling a server to scale up during nightly batch processes or during the day to handle customer transactions.
The ability to schedule is powerful, but it can be difficult to manage the specific needs of each server, especially when your enterprise uses many servers for a wide variety of purposes. The demands of a server can also change, perhaps without their knowledge, requiring close monitoring of the system. Managing the schedules of servers becomes yet another task to pile on top of all of IT’s other responsibilities. If only there was a solution that could recognize the needs of a server and create dynamic schedules accordingly, and do so without any intervention from IT. This type of problem is a great example for the application of machine learning.
How Machine Learning Can Dynamically Scale Your Server Capacity (without the Guesswork)
Machine learning excels at taking data and creating rules. In this case, you could use a model to predict server utilization, and then use that information to dynamically create schedules for each database.
Server Optimization In Action
We’ve previously done such an application for a client in the banking industry, leading to a 68% increase in efficiency and a cost savings of $10,000 per year for a single server. When applied to the client’s other 2,000 servers, this method could lead to savings of $20 million per year!
While the actual savings will depend on the number of servers employed and the efficiency at which they currently run, the cost benefits will be significant once the machine learning server optimization model is applied.
If you’re interested in learning more about using machine learning to save money on your server usage, click here to contact us about our risk-free server optimization whiteboard session.
The scales have finally tipped! According to a Flexera survey, 93% of organizations have a multi-cloud strategy and 53% are now operating with advanced cloud maturity. For those who are now behind the bell curve, it’s a reminder that keeping your data architecture in an on-premises solution is detrimental to remaining competitive. On-prem architecture restricts your performance and the overall growth and complexity of your analytics. Here are some of the setbacks of remaining on-prem and the benefits of data migration from legacy systems.
For most organizations, data architecture did not grow out of an intentional process. Many on-prem storage systems developed from a variety of events ranging from M&A activity and business expansion to vertical-specific database initiatives and rogue implementations. As a result, they’re often riddled with data silos that prevent comprehensive analysis from a single source of truth.
When organizations conduct reporting or analysis with these limitations, they are at best only able to find out what happened – not predict what will happen or narrow down what they should do. The predictive analytics and prescriptive analytics that organizations with high analytical maturity are able to conduct are only possible if there’s a consolidated and comprehensive data architecture.
Though you can create a single source of data with an on-prem setup, a cloud-based data storage platform is more likely to prevent future silos. When authorized users can access all of the data from a centralized cloud hub, either through a specific access layer or the whole repository, they are less likely to create offshoot data implementations.
Slower Query Performance
The insights from analytics are only useful if they are timely. Some reports are evergreen, so a few hours, days, or even a week doesn’t alter the actionability of the insight all that much. On the other hand, real-time analytics or streaming analytics requires the ability to process high-volume data at low latency, a difficult feat for on-prem data architecture to achieve without enterprise-level funding. Even mid-sized businesses are unable to justify the expense – even though they need the insight available through streaming analysis to keep from falling behind larger industry competitors.
Using cloud-based data architecture enables organizations to access much faster querying. The scalability of these resources allows organizations of all sizes to ask questions and receive answers at a faster rate, regardless of whether it’s real-time or a little less urgent.
Plus, those organizations that end up working with a data migration services partner can even take advantage of solution accelerators developed through proven methods and experience. Experienced partners are better at avoiding unnecessary pipeline or dashboard inefficiencies since they’ve developed effective frameworks for implementing these types of solutions.
More Expensive Server Costs
On-prem data architecture is far more expensive than cloud-based data solutions of equal capacity. When you opt for on-prem, you always need to prepare and pay for the maximum capacity. Even if the majority of your users are conducting nothing more complicated than sales or expense reporting, your organization still needs the storage and computational power to handle data science opportunities as they arise.
All of that unused server capacity is expensive to implement and maintain when the full payoff isn’t continually realized. Also, on-prem data architecture requires ongoing updates, maintenance, and integration to ensure that analytics programs will function to the fullest when they are initiated.
Cloud-based data architecture is far more scalable, and providers only charge you for the capacity you use during a given cycle. Plus, it’s their responsibility to optimize the performance of your data pipeline and data storage architecture – letting you reap the full benefits without all of the domain expertise and effort.
Hindered Business Continuity
There’s a renewed focus on business continuity. The recent pandemic has illuminated the actual level of continuity preparedness worldwide. Of the organizations that were ready to respond to equipment failure or damage to their physical buildings, few were ready to have their entire workforce telecommuting. Those with their data architecture already situated in the cloud fared much better and more seamlessly transitioned to conducting analytics remotely.
The aforementioned accessibility of cloud-based solutions gives organizations a greater advantage over traditional on-prem data architecture. There is limited latency when organizations need to adapt to property damage, natural disasters, pandemic outbreaks, or other watershed events. Plus, the centralized nature of this type of data analytics architecture prevents unplanned losses that might occur if data is stored in disparate systems on-site. Resiliency is at the heart of cloud-based analytics.
It’s time to embrace data migration from legacy systems in your business. 2nd Watch can help! We’re experienced with migration legacy implementations to Azure Data Factory and other cloud-based solutions.
To get your source data ingested and loaded, or for a deep dive into how you can build a fully automated data integration process in Snowflake on Azure, schedule a Snowflake whiteboarding session with our team of data architects.
If you have a Microsoft ecosystem but have been wanting to take advantage of more tools on the market, Snowflake on Azure means additional opportunities for you to upgrade your analytics platform while not throwing away the investment in and keeping the uniformity of your current environment.
For those who are not familiar, Snowflake is a cloud-based, massively parallel processing (MPP), columnar storage database. It’s a newer option for data warehousing that is set up for more efficient querying of large data volumes. It also consumes structured and semi-structured data in a way that traditional relational databases are not designed to do as effectively – think “big data” without the “big overhead.”
With this release, Microsoft companies can evaluate using Snowflake to increase the performance and flexibility of their analytics environment by adding it on top of their existing data integration process. To determine where Snowflake might be a good fit, our 2nd Watch consultants took a dive into where Snowflake could sit in our current and prospective Microsoft clients’ ecosystems.
Where does Snowflake fit in your current Azure environment?
Snowflake is best used as the home of an analytical layer (or dimensional model, for the more technical) that enables reporting. Think of this as the substitute for products like SQL Server Analysis Services (SSAS).
While we still recommend that you maintain a data integration hub and related process for all of your consuming application needs (i.e., sending consolidated and cleansed data back to each source system to keep them in sync), Snowflake can sit right at the end of that process. Because it’s optimized for read activities, it complements the needs of business intelligence tools like Power BI, Looker, Tableau, etc., making it faster for business users to grab the information they need via those tools. Integration and ETL are possible with many tools and services, including Azure Data Factory.
Example architecture for adding Snowflake to the end of your data integration process
When would you want to use Snowflake?
There are two primary ideas behind Snowflake’s competitive advantage when it comes to data warehousing platforms: its automatic optimization of query execution and the hands-off nature of its maintenance.
We recommend Snowflake for two main use cases:
Developing a more efficient analytics platform
Creating a platform for flexible data discovery
Our clients that fit within the above use cases usually had:
Multiple business users – The more people you have querying your database at one time, the more the database has to be configured to handle that load so as to not lock up other processes. Traditional databases can be scaled up to handle larger reads (think of a query that produces data), but this takes a decent amount of time and effort to achieve, and they are more often optimized for writes (think of loading data into a table). In Snowflake, a user can spin up resources for just the one query, then spin it back down right after. This allows for a more modular use of higher power resources.
Lots of data – If you’ve ever tried to perform a huge query on your current system, you likely noticed a slowdown from your usual processing. Traditional databases are not as optimized for read activities as columnar databases are. This makes options like Snowflake more attractive to those performing heavier analytical queries on a regular basis.
“Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform row-based systems. ” – MIT Computer Science and Artificial Intelligence Laboratory
A mix of structured and semi-structured data – Though many traditional databases offer options for consuming semi-structured data (e.g., JSON, XML, etc.), they aren’t optimized to do so. If you have a mix of structured and semi-structured data, options like Snowflake might be more efficient for your process.
What’s the tradeoff?
Snowflake can be a great option for clients who need a true analytical platform where they can perform data discovery or need to read out a lot of data at once. That said, the following are situations where an alternative to Snowflake might make more sense:
You need a data hub to keep your various applications in sync (application integration and API endpoints).
You aren’t trying to perform complex aggregations or data discovery.
You’re invested in an existing solution that doesn’t have many performance or management overhead concerns.
So what’s the gist?
Snowflake makes it easy to kick off a data warehouse project with its ease of use and start-small, scale-big approach. Despite this innovative architecture, we still recommend applying the same data warehousing fundamentals that you would have in the past.
Yes, Snowflake will allow you to create views on top of messy raw source data, but you will ultimately lose the battle on performance as your data grows in complexity. We suggest approaching a new Snowflake deployment with a vision of your data platform as a whole – not just the analytics layer. Performing all your complex business logic in Snowflake is possible, but due to the columnar architecture, integration use cases are better served in complementary cloud database platforms, such as Azure SQL.
With a roadmap and strategy in place for both your integration and analytics needs, you’ll be better able to select the right mix of tools and technologies for your solution. Learn more about Snowflake deployment best practices in our eBook.
When using a modern data warehouse, your organization is likely to see improved access to your data and more impactful analytics. One such data warehouse is Azure Synapse, a Microsoft service. When paired with a powerful BI tool, like Looker, or a data science platform, like Dataiku, your organization can more quickly gain access to impactful insights that will help you drive business decisions across the enterprise.
In this post, we’ll provide a high-level overview of Azure Synapse, including a description of the tool, why you should use it, pros and cons, and complementary tools and technologies.
Overview of Azure Synapse
Azure Synapse is Microsoft’s service that puts an umbrella around various existing and new offerings, including Azure DW, Azure Databricks, and on-demand SQL querying, but lacks the tight integration across these services. Similar to Redshift, Azure DW is charged by size of instance and time running, while other Synapse services offer more of a consumption-based model.
Value Prop:
Once data is stored within Azure Data Lake, no need to stage data again within the warehouse
Scalability:
Easy to scale up or down on the fly with use of Azure
Increases in pricing tiers only increase concurrent queries by 4 at each level
Performance:
Built for MPP (massive parallel processing)
Performance optimal for data volumes larger than 1TB
Not suitable for running high volumes of concurrent queries (four concurrent requests per service level)
Requires active performance tuning (indexes, etc.)
Features:
Native connection with Power BI
Can select either a serverless SQL pool or a dedicated SQL pool based on the needs of the organization
Supports the ability to run Spark on Databricks
Core product still relies on Azure DW, an older technology
Security:
Supports row-level and column-level security, multi-factor authentication, and Azure AD integration
Why Use Azure Synapse
The Microsoft SQL Server ecosystem is familiar, with tighter integrations into Azure’s data ecosystem, including Azure Databricks and the MPP version of SQL Server, Azure DW – just don’t expect a turnkey solution quite yet.
Pros of Azure Synapse
Can be easily provisioned with existing Azure subscription and provides pay-as-you-go pricing
Integration with Azure Active Directory and Azure Purview can provide an easy way to manage user roles and insights into data
Transferable knowledge from on-premise Microsoft SQL Server background
Cons of Azure Synapse
“Synapse” is largely a marketing umbrella of technologies, with Azure DW at its core, requiring management of disparate services
Difficulty managing high volumes of concurrent queries due to tuning and cost of higher service tiers
Requires complex database administration tasks, including performance tuning, which other cloud data solutions have made more turnkey
Serverless capabilities are limited to newer Azure services, and lacks the on-demand, frictionless sizing of compute within Azure DW
Select Complementary Tools and Technologies for Azure Synapse
Azure Analysis Services
Azure Data Factory
Azure Databricks
Azure ML
Azure Purview
Power BI
We hope you found this high-level overview of Azure Synapse helpful. If you’re interested in learning more about Azure Synapse or other modern data warehouse tools like Amazon Redshift, Google BigQuery, and Snowflake, contact us to learn more.
The content of this blog is an excerpt of our Modern Data Warehouse Comparison Guide. Click here to download a copy of that guide.
In the first part of this series, A Step by Step Guide to Getting the Most from Your JD Edwards Data, we walked through the process of collecting JDE data and integrating it with other data sources. In this post, we will show you how to add business logic unique to a company and host analyzable JDE data.
Adding Business Logic Unique to a Company
When working with JD Edwards, you’ll likely spend the majority of your development time defining business logic and source-to-target mapping required to create an analyzable business layer. In other words, you’ll transform the confusing and cryptic JDE metadata into something usable. So, rather than working with columns like F03012.[AIAN8] or F0101.[ABALPH], the SQL code will transform the columns into business-friendly descriptions of the data. For example, here is a small subset of the customer pull from the unified JDE schema:
Furthermore, you can add information from other sources. For example, if a business wanted to include new customer information only stored in Salesforce, you can build the information into the new [Customer] table that exists as a subject area rather than a store of data from a specific source. Moreover, the new business layer can act as a “single source of the truth” or “operational data store” for each subject area of the organization’s structured data.
Looking for Pre-built Modules?
2nd Watch has built out data marts for several subject areas. All tables are easily joined on natural keys, provide easy-to-interpret column names, and are “load-ready” to any visualization tool (e.g., Tableau, Power BI, Looker) or data application (e.g., machine learning, data warehouse, reporting services). Modules already developed include the following:
Account Master
Accounts Receivable
Backlog
Balance Sheet
Booking History
Budget
Business Unit
Cost Center
Currency Rates
Customer Date
Employee
General Ledger
Inventory
Organization
Product
Purchase Orders
Sales History
Tax
Territory
Vendor
Hosting Analyzable JDE Data
After creating the data hub, many companies prefer to warehouse their data in order to improve performance by time boxing tables, pre-aggregating important measures, and indexing based on frequently used queries. The data warehouse also provides dedicated resources to the reporting tool and splits the burden of the ETL and visualization workloads (both memory-intensive operations).
By design, because the business layer is load-ready, it’s relatively trivial to extract the dimensions and facts from the data hub and build a star-schema data warehouse. Using the case from above, the framework would simply capture the changed data from the previous run, generate any required keys, and update the corresponding dimension or fact table:
Simple Star Schema
Evolving Approaches to JDE Analytics
This approach to analyzing JD Edwards data allows businesses to vary the BI tools they use to answer their questions (not just tools specialized for JDE) and change their approach as technology advances. 2nd Watch has implemented the JDE Analytics Framework both on premise and in a public cloud (Azure and AWS), as well as connected with a variety of analysis tools, including Cognos, Power BI, Tableau, and ML Studio. We have even created API access to the different subject areas in the data hub for custom applications. In other words, this analytics platform enables your internal developers to build new business applications, reports, and visualizations with your company’s data without having to know RPG, the JDE backend, or even SQL!