AWS re:Invent 2020 was a little different, to say the least. The line for the restroom was way shorter, if you entered the Tatonka Challenge you were guaranteed to win or at least have a really good shot at the trophy depending on how many of your family members joined in and how many wings your air fryer can handle, and the lines for shuttle busses were non-existent as the commute time between sessions was reduced from hours to seconds depending on how fast you can click. However, in typical AWS fashion, they made lemonade out of lemons and put on one of the best public cloud virtual event of the year.
Instead of the typical action packed, sleepless week in Vegas, AWS broke it up in to 3 weeks sprinkled with all of their major announcements throughout. Vendors set up to provide breakout sessions and virtual booths to discuss solutions/products and have 1 on 1 sessions with potential leads via chat and live demos. Hunters of the precious SWAG had to engage with vendors as well as participate in specific activities to obtain their various rewards. With all of the turmoil going on in the world, AWS was still able to announce over 140 new products and features at re:Invent 2020. Here are just a few of the highlights.
For the first time ever, re:Invent was opened to the world free of charge and attracted over 500,000 participants. Andy Jassy’s overall keynote theme was centered around the customer driving innovation within AWS based on solving their needs. In part due to the pandemic, cloud adoption has accelerated this year and has fueled AWS’ continued growth.
AWS announced new compute innovations including MacOS (literally integrating a Mac Mini into a server chassis) as well as making tremendous investments in the processor space with their Graviton 2 processors and Trainium chips. If you didn’t catch week 1, here’s what you missed:
- EC2: macOS instances, Intel/AMD/ARM/Graviton2 options
- New C6g Graviton instance announced, almost 50% savings
- Lower cost for AWS Inferentia, used by Alexa
- Habana Gaudi Based EC2 Instances GPU based, machine learning instances
- AWS Trainium AWS ML chip used in EC2 and Sagemaker
- Gp3 for EBS allowing 4x peak throughput
- Io2 Block Express First SAN built for cloud
The mindset of “100% in the cloud all the time” is slowly being shifted to include new options for hybrid environments with the announcements of ECS and EKS anywhere allowing customers to run their workloads in their own data center. Taking it a step further is the announcement of AWS Monitron that uses machine learning to help predict failures in data center infrastructure. Placing compute closer to the customer (Edge Computing) has become more important especially as connectivity providers roll out 5G. To allow for this evolution, AWS has released AWS Wavelength. Also, additional options for Outposts (1 and 2U server sizes) have been released for customers not requiring a full cabinet of hardware.
Data science, AI, and machine learning have become front and center as customers continue to take advantage of cloud native technologies. Making the best use of your data and making it work for your business have been a huge focus this year. Some of the highlights include:
- Amazon SageMaker Data Wrangler: Clean and aggregate data to prepare it for machine learning.
- AWS Glue Elastic Views: Easily combine and replicate data from different data stores.
- Amazon Code Guru: Automate code reviews and identify your most expensive lines of code.
- Amazon DevOps Guru: automatically detect operational issues and recommend actions to fix
- Amazon Quicksight: Ask any question in natural language and get answers in seconds.
- Amazon Connect Wisdom: Reduces the time agents spend finding answers for customers.
AWS partner relationships continue to be a central focus as well, and this was highlighted by Doug Yeum in his keynote:
- Cohesity DMaaS (Data Management as a Service) service announcement.
- AWS SaaS Boost: Open source SaaS reference environment to accelerate traditional applications to SaaS on AWS.
- AWS ISC Partner path: More access to millions of active AWS customers with AWS field sellers globally.
- Managed entitlements for AWS Marketplace: Automate 3rd party software license distribution and simplify entitlement tracking.
- AWS Service Catalog App Registry: Define and associate resources to better manage applications.
- AWS Energy Competency: Helping customers accelerate their transition to a more balanced and sustainable energy future.
Kicking off week two was an infrastructure specific deep dive with Peter DeSantis. Given my background in the data center space, I found his keynote to be extremely interesting as I have noticed over the past few years that questions and conversations around how cloud services are actually provided are very common. Back before the “cloud” and even virtual machines existed, servers were deployed into data centers and enterprises ran their mission critical workloads on them. Some companies deployed and managed their own physical infrastructure, some outsourced the management of those environments to MSPs, but the overall principals have not changed over the years. Yes, your workloads run “in the cloud” but behind that are still data centers housing servers, networking gear, storage, cooling, water chillers, power distribution, connectivity, etc.
AWS has taken those principals and scaled them to another level and has been focusing on redundancy and sustainability to ensure that, if built properly, their customers’ workloads have no single point of failure and can keep running should an outage occur. AWS has not only made strides in the disk storage and processor space, but they have also designed and integrated their own switching gear control systems and custom designed, rack installed UPS infrastructure.
These are items that users of the cloud don’t have to deal with and one of the major selling points of moving to cloud. You don’t have to worry about rack space, power, cooling, hardware purchases, maintenance contracts, and the list goes on and on. BUT rest assured that the man behind the curtain is very aware of these items and is taking best in class steps to ensure that the infrastructure behind the scenes is always on.
Next on the list was the machine learning Keynote with Swami Sivasubramanian. This was more of a deep dive into some of the announcements made by Andy Jassy in week one, and he did not disappoint. As customers continue the shift to cloud native, ML and AI services have become front and center in their Application Modernization journey. Out of the 250+ new products and product enhancements announced by AWS in 2020, most of those were centered around SageMaker and 11 other AI and ML products.
ML Frameworks and Infrastructure
AWS announced AWS Inferentia, a high performance, machine learning chip that powers EC2 Inf1 instances. Inferentia boasts 45% lower costs and 30% higher throughput than comparable GPU based instances and helps Alexa achieve 25% lower end to end latency. AWS Tranium is another high-performance machine learning chip with the most teraflops of compute power for ML that enables a broader set of ML applications.
AWS had several announcements around Amazon SageMaker.
“Thus, we need a platform where the data scientist will be able to leverage his existing skills to engineer and study data, train and tune ML models and finally deploy the model as a web-service by dynamically provisioning the required hardware, orchestrating the entire flow and transition for execution with simple abstraction and provide a robust solution that can scale and meet demands elastically.” – Jojo John Moolayil, AWS AI Research Scientist
- SageMaker Data Wrangler is a faster way to prepare data for ML without a single line of code.
- SageMaker Clarify provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions.
- SageMaker Debugger helps identify bottlenecks, visualize system resources like GPU, CPU, I/O, memory and provides adjustment recommendations.
The most important take-away from this keynote is AWS’ goal of the democratization of machine learning, or the transparent embedding of ML functionality into other AWS services.
“The company’s overall aim is to enable machine learning to be embedded into most applications before the decade is out by making it accessible to more than just experts.” – Andy Jassy, AWS CEO
With that goal in mind, AWS announced Redshift ML, which imports trained models into the data warehouse and makes them accessible using standard SQL queries. Use SQL statements to create and train Amazon SageMaker machine learning models using your Redshift data and embed them directly in reports.
Aurora ML enables you to add ML-based predictions to applications via the familiar SQL programming language, so you don’t need to learn separate tools or have prior machine learning experience. It provides simple, optimized, and secure integration between Aurora and AWS ML services without having to build custom integrations or move data around.
Neptune ML brings predictions to their fully managed graph database service in the form of graph neural networks and the Deep Graph Library.
For companies involved with handling medical data, Amazon Healthlake is worth looking at. With built-in data query, search and ML capabilities, you can seamlessly transform data to understand meaningful \ medical information at petabtye scale.
Wrapping up the final week of re:Invent 2020 was Werner Vogels rocking his typical iconic t-shirt, however not announcing who would be playing at re:Play this year, unfortunately. Presenting from the Netherlands in the historic SugarCity factory, he masterfully wove in the story of transforming and adapting to external events. To say that COVID has impacted all aspects of our lives in 2020 would be an understatement, but when presented with challenges, innovators continue to find ways to overcome those obstacles.
Collaboration and remote working were beyond challenging to everyone in 2020. AWS CloudShell was announced to provide users access to AWS critical resources such as the AWS console, AWS CLI and even 1GB of persistent storage at no cost. In addition, enhancements to AWS Cloud9 were announced that enables users to develop, run, and debug code from a browser.
To help mitigate potential issues in the future, AWS announced Fault Injection Simulator that sounds more like a load test on steroids utilizing chaos engineering. Chaos engineering allows an application or environment to be pushed to its limits to highlight any potential issues, bottle necks, or failures before they are pushed into production for end user use.
Additionally, Werner focused on helping the community and sustainability. The pandemic has financially hurt millions of people and AWS has developed the re:Start program designed to help the unemployed develop new skills that will allow them to pursue new career paths.
In summary, AWS continues to dominate the public cloud market and rapidly innovates based on their customer requirements. We may not have been standing elbow to elbow with 60,000 of our closest friends, navigating the miles and miles of casino floors, or enjoying all of the surprises of re:Invent in-person this year, but AWS did a stellar job of bringing us together virtually. Hopefully in a year’s time, we will all be back together and enjoying the wonderful craziness that is AWS re:Invent, Vegas style.
-Jeff Collins, Optimization Product Manager, 2nd Watch
Many organizations have created data lakes to store both relation and non-relational data to enable faster decision making. All too often, these data lakes move from proof-of-concept to production and quickly become just another data repository, not achieving the required strategic business dependency. Watch Reality Check: Moving Your Data Lake from Storage to Strategic to get a reality check on how the data lake management approaches you’ve employed have led to failure. Discover the steps needed to build strategic importance and restore data dependency and learn how cloud native creates efficiency along with a long-term competitive advantage.
Winning is easy:
- Watch our breakout session, ‘ANT283-S Reality Check: Moving the Data Lake from Storage to Strategic’
- Share what you learned from the session on social by 12/18
- Tag @2nd Watch, and you’ll be entered into the drawing held on 12/22!
AWS re:Invent 2020 is off to a great virtual start, and we want to meet you here! Visit the 2nd Watch re:Invent Sponsor Page now through December 18 to speak with one of our cloud experts, watch our session “ANT283-S Reality Check: Moving the Data Lake from Storage to Strategic” (and don’t forget to comment on the session on social for your chance to win a Sony PlayStation 5), access a ton of downloadable content, and claim your free 2nd Watch re:Invent sweatpants.
Your Trusted Cloud Advisor
As a cloud native AWS Premier Partner, we orchestrate your cloud transformation from strategy to execution, fueling business growth. Our focus is on enabling accelerated cloud migration, application modernization, IT optimization and data engineering to facilitate true business transformation.
When you deployed Redshift a few years ago, your new data lake was going to allow your organization to make better, faster, more informed business decisions. It would break down data silos allowing your Data Scientists to have greater access to all data sources, quickly, enabling them to be more efficient in delivering consumable data insights.
Now that some time has passed, though, there is a good chance your data lake may no longer be returning to you the value it initially did. It has turned into a catch all for your data and maybe even a giant data mess with your clusters filling up too quickly, resulting in the need to constantly delete data or scale up. Teams are blaming one another for consuming too many resources, even though they are split and shouldn’t be impacting one another. Slow queries have resulted from a less than optimal table structure decided upon when initially deployed that no longer fits the business and data you are generating today. All of this results in your expensive Data Scientists and Analysts being less productive than when you initially deployed Redshift.
Keep in mind, though, that the Redshift you deployed a few years ago is not the same Redshift today. We all know that AWS is continuously innovating, but over the last 2 years they have added more than 200 new features to Redshift that can address many of these problems, such as:
- Utilizing AQUA nodes, which can deliver a 10x performance improvement
- Refreshing instance families that can lower your overall spend
- Federated query, which allows you to query across Redshift, S3, and relational database services to come up with aggregated data sets, which can then be put back into the data lakes to be consumed by other analytic services
- Concurrency scaling, which automatically adds and removes capacity to handle unpredictable demand from thousands of concurrent users, so you do not take a performance hit
- The ability to take advantage of machine learning with automatic workload management (WLM) to dynamically manage memory and concurrency, helping maximize query throughput
As a matter of fact, clients repeatedly tell us there have been so many innovations with Redshift, it’s hard for them to determine which ones will benefit them, let alone be aware of all of them all.
Having successfully deployed and maintained AWS Redshift for years here at 2nd Watch, we have packaged our best practice learnings to deliver the AWS Redshift Health Assessment. The AWS Redshift Health Assessment is designed to ensure your Redshift Cluster is not inhibiting the productivity of your valuable and costly specialized resources.
At the end of our 2-3 week engagement, we deliver a lightweight prioritized roadmap of the best enhancements to be made to your Redshift cluster that will deliver immediate impact to your business. We will look for ways to not only improve performance but also save you money where possible, as well as analyze your most important workloads to ensure you have an optimal table design deployed utilizing the appropriate and optimal Redshift features to get you the results you need.
AWS introduced the concept of a Lake House analogy to better describe what Redshift has become. A Lake House is prime real estate that everyone wants because it gives you a view of something beautiful, with limitless opportunities of enjoyment. With the ability to use a common query or dashboard across your data warehouses and multiple data lakes, like a lake house, Redshift provides you the beautiful sight of all your data and limitless possibilities. However, every lake house needs ongoing maintenance to ensure it brings you the enjoyment you desired when you first purchased it and a lake house built with Redshift is no different.
Contact 2nd Watch today to maximize the value of your data, like you intended when you deployed Redshift.
-Rob Whelan, Data Engineering & Analytics Practice Manager
AWS says Amazon Redshift is the world’s fastest cloud data warehouse, allowing customers to analyze petabytes of structured and semi-structured data at high speeds that allow for exploratory analysis. According to a 2018 Forrester report, Redshift is the most popular cloud data warehouse for enterprises.
To better understand how enterprises are using Redshift, 2nd Watch surveyed Redshift users at large companies. A majority of respondents (57%) said their Redshift implementation had delivered on corporate expectations, while another 26% said it had “somewhat” delivered.
With all the benefits Redshift enables, it’s no wonder tens of thousands of customers use it. Benefits like three times the performance of any cloud data warehouse or being 50% less expensive than all other cloud data warehouses make it an attractive service to Fortune 500 companies and startups alike, including McDonald’s, Lyft, Comcast, and Yelp, among others.
Despite its apparent success in the market, not all Redshift deployments have gone according to plan. 45% of respondents said queries stacking up in queues was a recurring problem in their Redshift deployment; 30% said some of their Data Analyst’s time was unproductive as a result of tuning Redshift queries; and 34% said queries were taking more than one minute to return results. Meanwhile, 33% said they were struggling to manage requests for permissions, and 25% said their Redshift costs were higher than anticipated.
Query and Queuing Learnings:
Queuing of queries is not a new problem. Redshift has a long-underutilized feature called Workload Management queues, or WLM. These queues are like different entrances to a baseball stadium. They all go to the same baseball game, but with different ways to get in. WLM queues divvy up compute and processing power among groups of users so no single “heavy” user ends up dominating the database and preventing others from accessing. It’s common to have queries stack up in the Default WLM queue. A better pattern is to have at least three or four different workload management queues:
- ETL processes
- Ad hoc exploration
- Data loading and unloading
As for time lost due to performance tuning, this is a tradeoff with Redshift: it is inexpensive on the compute side but takes some care and attention on the human side. Redshift is extremely high-performing when designed and implemented correctly for your use case. It’s common for Redshift users to design tables at the beginning of a data load, then not return to the design until there is a problem, after other data sets enter the warehouse. It’s a best practice to routinely run ANALYZE and have auto-vacuum turned on, and to know how your most common queries are structured, so you can sort tables accordingly.
If queries are taking a long time to run, you need to ask whether the latency is due to the heavy processing needs of the query, or if the tables are designed inefficiently with respect to the query. For example, if a query aggregates sales by date, but the timestamp for sales is not a sort key, the query planner might have to traverse many different tables just to make sure it has all the right data, therefore taking a long time. On the other hand, if your data is already nicely sorted but you have to aggregate terabytes of data into a single value, then waiting a minute or more for data is not unusual.
Some survey respondents mentioned that permissions were difficult to manage. There are several options for configuring access to Redshift. Some users create database users and groups internal to Redshift and manage authentication at the database level (for example, logging in via SQL Workbench). Others delegate permissions with an identity provider like Active Directory.
Implementation and Cost Savings
Enterprise IT directors are working to overcome their Redshift implementation challenges. 30% said they are rewriting queries, and 28% said they have compressed their data in S3 as part of a LakeHouse architecture. Query tuning was having the greatest impact on the performance of Redshift clusters.
When Redshift costs exceed the plan, it is a good practice to assess where the costs are coming from. Is it from storage, compute, or something else? Generally, if you are looking to save on Redshift spend, you should explore a LakeHouse architecture, which is a storage pattern that shifts data between S3 and your Redshift cluster. When you need lots of data for analysis, data is loaded into Redshift. When you don’t need that data anymore, it is moved back to S3 where storage is much cheaper. However, the tradeoff is that analysis is slower when data is in S3.
Another place to look for cost savings is in the instance size. It is possible to have over-provisioned your Redshift nodes. Look for metrics like CPU utilization; if it is consistently 25% or even 30% or lower, then you have too much headroom and might be over-provisioned.
Challenges aside, enterprise IT directors seem to love Redshift. The top four Redshift features, according to our survey, are query monitoring rules (cited by 44% of respondents), federated queries (35%) and custom-built ETL workflows (33%).
Query Monitoring Rules are custom rules that track bad or slow queries. Customers love Query Monitoring Rules because they are simple to write and give you great visibility into queries that will disrupt operations. You can choose obvious metrics like query_execution_time, or more subtle things like query_blocks_read, which would be a proxy for how much searching the query planner has to do to get data. Customers like these features because the reporting is central, and it frees them from having to manually check queries themselves.
Federated queries allow you to bring in live, external data to join with your internal Redshift data. You can query, for example, an RDS instance in the same SQL statement as a query against your Redshift cluster. This allows for dynamic and powerful analysis that normally would take many time-consuming steps to get the data in the same place.
Finally, custom-built ETL workflows have become popular for several reasons. One, the sheer compute power sitting in Redshift makes it a very popular source for compute resources. Unused compute can be used for ongoing ETL. You would have to pay for this compute whether or not you use it. Two, and this is an interesting twist, Redshift has become a popular ETL tool because of its capabilities in processing SQL statements. Yes, ETL written in SQL has become popular, especially for complicated transformations and joins that would be cumbersome to write in Python, Scala, or Java.
Redshift’s place in the enterprise IT stack seems secure, though how IT departments use the solution will likely change over time – significantly, perhaps. The reason for persisting in all the maintenance tasks listed above, is that Redshift is increasingly becoming the centerpiece for a data-driven analytics program. Data volume is not shrinking; it is always growing. If you take advantage of these performance features, you will make the most of your Redshift cluster and therefore your analytics program.
Download the infographic on our survey findings.
-Rob Whelan, Data Engineering & Analytics Practice Director