When you deployed Redshift a few years ago, your new data lake was going to allow your organization to make better, faster, more informed business decisions. It would break down data silos allowing your Data Scientists to have greater access to all data sources, quickly, enabling them to be more efficient in delivering consumable data insights.
Now that some time has passed, though, there is a good chance your data lake may no longer be returning to you the value it initially did. It has turned into a catch all for your data and maybe even a giant data mess with your clusters filling up too quickly, resulting in the need to constantly delete data or scale up. Teams are blaming one another for consuming too many resources, even though they are split and shouldn’t be impacting one another. Slow queries have resulted from a less than optimal table structure decided upon when initially deployed that no longer fits the business and data you are generating today. All of this results in your expensive Data Scientists and Analysts being less productive than when you initially deployed Redshift.
Keep in mind, though, that the Redshift you deployed a few years ago is not the same Redshift today. We all know that AWS is continuously innovating, but over the last 2 years they have added more than 200 new features to Redshift that can address many of these problems, such as:
- Utilizing AQUA nodes, which can deliver a 10x performance improvement
- Refreshing instance families that can lower your overall spend
- Federated query, which allows you to query across Redshift, S3, and relational database services to come up with aggregated data sets, which can then be put back into the data lakes to be consumed by other analytic services
- Concurrency scaling, which automatically adds and removes capacity to handle unpredictable demand from thousands of concurrent users, so you do not take a performance hit
- The ability to take advantage of machine learning with automatic workload management (WLM) to dynamically manage memory and concurrency, helping maximize query throughput
As a matter of fact, clients repeatedly tell us there have been so many innovations with Redshift, it’s hard for them to determine which ones will benefit them, let alone be aware of all of them all.
Having successfully deployed and maintained AWS Redshift for years here at 2nd Watch, we have packaged our best practice learnings to deliver the AWS Redshift Health Assessment. The AWS Redshift Health Assessment is designed to ensure your Redshift Cluster is not inhibiting the productivity of your valuable and costly specialized resources.
At the end of our 2-3 week engagement, we deliver a lightweight prioritized roadmap of the best enhancements to be made to your Redshift cluster that will deliver immediate impact to your business. We will look for ways to not only improve performance but also save you money where possible, as well as analyze your most important workloads to ensure you have an optimal table design deployed utilizing the appropriate and optimal Redshift features to get you the results you need.
AWS introduced the concept of a Lake House analogy to better describe what Redshift has become. A Lake House is prime real estate that everyone wants because it gives you a view of something beautiful, with limitless opportunities of enjoyment. With the ability to use a common query or dashboard across your data warehouses and multiple data lakes, like a lake house, Redshift provides you the beautiful sight of all your data and limitless possibilities. However, every lake house needs ongoing maintenance to ensure it brings you the enjoyment you desired when you first purchased it and a lake house built with Redshift is no different.
Contact 2nd Watch today to maximize the value of your data, like you intended when you deployed Redshift.
-Rob Whelan, Data Engineering & Analytics Practice Manager