All of these features work towards the end goal of making serverless databases a production-ready solution. Even with the latest offerings, should you explore migrating to a serverless architecture? This blog highlights some considerations when looking to use Backend-as-a-Services (BaaS) at your data layer.
Let’s assume that you’ve already either made the necessary schema changes and have migrated already or have a general familiarity of implementing a new database with Aurora Classic. Aurora currently comes in two models -Provisioned and Serverless Aurora. A traditional AWS database that is provisioned either has a self-managed EC2 instance or operates as a PAAS model using an AWS managed RDS instance. In both use cases, you have to allocate memory and CPU in addition to creating security groups to allow applications to listen on a TCP connection string.
In this pattern, issues can arrive right at the connection. There are limits as to how many connections can access a database before you start to see performance degradation or an inability to connect altogether when the limit is maxed out. In addition to that, your application may also receive varying degrees of traffic (e.g., a retail application used during a peak season or promotion). Even if you implement a caching layer in front, such as Memcache or Redis, you still have scenarios where the instance will eventually either have to scale vertically to a more robust instance or horizontally with replicas to distribute reads and writes.
This area is where serverless provides some value. It’s worth recalling that a serverless database does not equal no servers. There are servers there, but that is abstracted away from the user (or in this case the application). Following recent compute trends, Serverless focuses more on writing business logic and less on infrastructure management and provisioning to deploy from the requirements stage, to prod-ready quicker. In the traditional database model, you are still responsible for securing the box, authentication, encryption, and other operations unrelated to the actual business functions.
How Aurora Serverless works
What serverless Aurora provides to help alleviate issues with scaling and connectivity is a Backend as a Service solution. The application and Aurora instance must be deployed in the same VPC and connect through endpoints that go through a network load balancer (NLB). Doing so allows for connections to terminate at the load balancer and not at the application.
By abstracting the connections, you no longer have to create logic manage load balancing algorithms or worry about making DNS changes to facilitate for database endpoint changes. The NLB has routing logic through request routers that make the connection to whichever instance is available at the time, which then maps to the underlying serverless database storage. If the serverless database needs to scale up, a pool of resources is always available and kept warm. In the event the instances scale down to zero, a connection cannot persist.
By having an available pool of warm instances, you now have a pay-as-you-go model where you pay for what you utilize. You can still run into the issue of max connections, which can’t be modified, but the number allowed for smaller 2 and 4 ACU implementations has increased since the initial release.
Note: Cooldowns are not instantaneous and can take up to 5 mins after the instance is entirely idle, and you are still billed for that time. Also, even though the instances are kept warm, the connection to those instances still has to initiate. If you make a query to the database during that time, you can see wait times of 25 seconds or more before the query fully executes.
Can you really scale down completely? Technically yes, if certain conditions are made:
CPU below 30 percent utilization
Less than 40 percent of connections being used
To achieve this and get the cost savings, the database must be completely idle. There can’t be long-running queries or locked tables. Also, varying activities outside of the application can generate queries such as open sessions, monitoring tools, health-checks, so on and so forth. The database only pauses when the conditions are met, AND there is zero activity.
Serverless Aurora at .06/VCU starts at a higher price than its provisioned predecessor at .041. Aurora classic also charges hourly, where Serverless Aurora charges by the second with a 5-minute minimum AND a 5-minute cool-down period. We already discussed that cool-downs in many cases are not instantaneous, and now you pile on that billing doesn’t stop until an additional 5 minutes after that period. If you go with the traditional minimal setup of 2 VCU and never scale down the instances, the cost is more expensive by a magnitude of at least 3x. Therefore, to get the same cost payoff, your database would have to run only 1/3 of the time and can be achievable for dev/test boxes that are parked or apps only used during business hours in a single time-zone. Serverless Aurora is supposed to be highly available by default, so if you are getting two instances at this price point, then you are getting a better bargain performance-wise than running a single, provisioned instance for an only slightly higher price point.
Allowing for a minimum of 1ACU allows you the option of fully scaling down to a serverless database and makes the price point more comparable to RDS without enabling pausing.
Migration and Data API
Migrating to Serverless Aurora is relatively simple as you can just load in a snapshot from an existing database. With Data API, you no longer need a persistent connection to query the database. In previous scenarios, a fetch could take 25 seconds or more if the query is executed after a cool-down period. In this scenario, you can query the serverless database even if it’s been idle for some time. You can leverage a Lambda function via API gateway which works around the VPC implementation. AWS has mentioned it is providing performance metrics around the time it takes on average to execute a query with data API in the next coming months.
With the creation of EC2, Docker, and Lambda functions, we’ve seen more innovation in the area of compute and not as much on the data layer. Traditional provisioned relational databases have difficulties scaling and have a finite limit on the number of connections. By eliminating the need for an instance, this level of abstraction presents a strong use case for unpredictable workloads. Kudos to AWS for engineering a solution at this layer.
The latest updates these last few months embellish AWS’ willingness to solve complex problems. Running 1ACU does bring the cost down to a rate comparable to RDS while providing a mechanism for better performance if you disable pauses. However, while it is now possible to run Aurora serverless 24/7 more cost-effectively, this scenario contrasts their signature use case of having an on/off database.
Serverless still seems a better fit for databases that are rarely used and only see spikes on occasion or applications primarily used during business hours. Administration time is still a cost, and serverless databases, despite the progress, still has many unknowns. It can take an administrator some time and patience to truly get a configuration that is performant, highly available, and not overly expensive. Even though you don’t have to rely on automation and can manually scale your Aurora serverless cluster, it takes some effort to do so in a way that doesn’t immediately terminate the connections.
Today, you can leverage ECS or Fargate with spot instances and implement a solution that yields similar or better results at a cheaper cost if a true serverless database is the desired goal. I would still recommend this for dev/test workloads and see if you can work your way up to prod for smaller workloads as the tool still provides much value. Hopefully, AWS releases GA offerings for MySQL 5.7 and Postgres soon.
While AWS re:Invent 2017 is still fresh in our minds, here are some of the highlights of the most significant announcements.
Aurora Multi-Master/Multi-Region: This is a big deal! The concept of geographically distributed databases with multiple masters has been a long-desired solution. Why is this important?
Having additional masters allows for database writes, not just reads like the traditional read replicas that have been available. This feature enables a true multi-region, highly available solution that eliminates a single point of failure and achieves optimum performance. Previously, third party tools like Golden Gate and various log shipping approaches were required to accomplish proper disaster recovery and high availability. This will greatly simplify architectures for some that want to go active-active across regions and not just availability zones. Additionally, it will enable pilot light (and more advanced) DR scenarios for customers that are not going to be using active-active configurations.
Aurora Serverless: Aurora Serverless is an on-demand, auto-scaling configuration for the Aurora MySQL and PostgresSQL compatible database service, where the database will automatically start-up and scale up or down based on your application’s capacity needs. It will shut down when required, basically scaling down to zero when not being used. Traditionally, Aurora RDS required changing the underlying instance type to scale for database demand. This is a large benefit and cost saver for development, testing, and QA environments. Even more importantly, if your workload has large spikes in demand, then auto-scaling is a game changer in the same way that EC2 auto scaling enabled automated compute flexibility.
T2 Unlimited: T2 is one of the most popular instance types used by 2nd Watch and AWS customers, accounting for around 50% of all instances under 2nd Watch Managed Cloud Services. In the case of frequent, small and inconsistent workloads, T2 is the best price and performance option. However, one of the most common reasons that customers do not heavily leverage T2 is due to concerns related to a sustained spike in load that will deplete burstable credits and result in unrecoverable performance degradation. T2 unlimited solves this problem by essentially allowing unlimited surges over the former limits. We expect to see more customers will adopt T2 for those inconsistent workloads as a cost-effective solution. We will watch to see if this this shift is reflected in the instance type data for accounts being managed by 2nd Watch.
Spot Capacity: Spot instances are normally used as pools of compute that run standard AMIs and work on datasets located outside of EC2. This is because the instances are terminated when the spot price increases beyond your bid, and all data is lost. Now, when AWS reclaims the capacity, the instance can essentially hibernate, preserving the operating system and data, and startup again when the spot pricing is favorable. This removes another impediment in the use of spot capacity, and will be a large cost saver for environments that only need to be temporarily available.
M5 Instance Type: Given the large increase in performance of the newer processor generations, one can see large cost savings and performance improvements by migrating to a smaller sized offering of the latest instance type that meets your application’s needs. Newer instance types can also offer higher network bandwidth as well, so don’t put off the adoption of the latest products if possible.
Inter-region Peering: It’s always been possible to establish peering relationships between VPCs in the same region. Inter-region Peering uses AWS private links between VPCs in different availability zones and does not transit the open internet, eliminating VPNs, etc. This same feature is available inter-region. This makes multi-region designs cleaner and easier to implement, without having to build and configure VPN networking infrastructure to support it, which of course also needs monitoring, patching, and other maintenance. It was also announced that users of Direct Connect can now route traffic to almost every AWS region from a single Direct Connect circuit.
There were also some announcements that we found interesting but need to digest a little longer. Look for a follow up from us on these.
EKS: Elastic Container Services for Kubernetes – Amazon Elastic Container Service for Kubernetes (Amazon EKS) is a managed service that makes it easy for you to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes clusters. Even at last years’ AWS re:Invent we heard people wondering where the support for Kubernetes was, particularly since it has become the de facto industry standard over the past several years.
GuardDuty: AWS has now added a cloud-native tool to the security toolbox. This tool utilizes “machine learning” for anomaly detection. AWS GuardDuty monitors traffic flow and API logs for your accounts, letting you establish a baseline for “normal” behavior on your infrastructure, and then watches for security anomalies. These are reported with a severity rating, and remediation for certain types of events can be automated using existing AWS tools. We will be considering the best methods of implementation of this new tool.
Fargate: Run Amazon EKS and ECS without having to manage servers or clusters.
Finally, a shameless plug: If compliance is on your mind, watch this AWS re:Invent breakout session from our product and engineering experts.
Peter Meister, Director of Product Management, 2nd Watch
Lars Cromley, Director of Engineering, 2nd Watch
In cloud migrations, the cloud’s elastic nature is often touted as a critical capability in delivering on key business initiatives. However, you must account for it in your security and compliance plans or face some real challenges. Always counting on a virtual host to be running, for example, causes issues when that host is rebooted or retired. Managing security and compliance in the cloud is continuous, requiring forethought and automation. Learn how a leading, next generation managed cloud provider uses automation and cloud expertise to manage security and compliance at scale in an ever-changing environment. Through code examples and live demos, we show tools and automation to provide continuous compliance of your cloud infrastructure.
Obviously, there was a lot more going on and it will take some time to go through it. We will keep you up to date with our thoughts.
The Product Development team at 2nd Watch is responsible for many technology environments that support our software and solutions—and ultimately, our customers. These environments need to be easily built, maintained, and kept in sync. In 2016, 2nd Watch performed an analysis on the amount of AWS billing data that we had collected and the number of payer accounts we had processed over the course of the previous year. Our analysis showed that these measurements had more than tripled from 2015 and projections showed that we will continue to grow at the same, rapid pace with AWS usage and client onboarding increasing daily. Knowing that the storage of data is critical for many systems, our Product Development team underwent an evaluation of the database architecture used to house our company’s billing data—a single SQL Server instance running a Web edition of SQL Server with the maximum number of EBS volumes attached.
During the evaluation, areas such as performance, scaling, availability, maintenance and cost were considered and deemed most important for future success. The evaluation revealed that our current billing database architecture could not meet the criteria laid out to keep pace with growth. Considerations were made to increase the storage capacity by one VM to the maximum family size or potentially upgrade to MS SQL Enterprise. In either scenario, the cost of the MS SQL instance doubled. The only option for scaling without substantially increasing our cost was to scale vertically, however, to do so would result in diminishing performance gains. Maintenance of the database had become a full-time job that was increasingly difficult to manage.
Ultimately, we chose the cloud-native solution, Amazon Aurora, for its scalability, low-risk, easy-to-use technology. Amazon Aurora is a MySQL relational database that provides speed and reliability while being delivered at a lower cost. It offers greater than 99.99% availability and can store up to 64TB of data. Aurora is self-healing and fully managed, which, along with the other key features, made Amazon Aurora an easy choice as we continue to meet the AWS billing usage demands of our customers and prepare for future growth.
The conversion from MS SQL to Amazon Aurora was successfully completed in early 2017 and, with the benefits and features that Amazon Aurora offers, many gains were made in multiple areas. Product Development can now reduce the complexity of database schemas because of the way Aurora stores data. For example, a database with one hundred tables and hundreds of stored procedures was reduced to one table with 10 stored procedures. Gains were made in performance as well. The billing system produces thousands of queries per minute and Amazon Aurora handles the load with the ability to scale to accommodate the increasing number of queries. Maintenance of the Amazon Aurora system is now virtually managed. Tasks such as database backups are automated without the complicated task of managing disks. Additionally, data is copied across six replicas in three availability zones which ensures availability and durability.
With Amazon Aurora, every environment is now easily built and setup using Terraform. All infrastructure is automatically setup—from the web tier to the database tier—with Amazon CloudWatch logs to alert the company when issues occur. Data can easily be imported using automated processes and even anonymized if there is sensitive data or the environment is used to demo to our customers. With the conversion of our database architecture from a single MS SQL Service instance to Amazon Aurora, our Product Development team can now focus on accelerating development instead of maintaining its data storage system.
Amazon Web Services, Microsoft Azure and Google Cloud Platform are the clear leaders in the Cloud Service Provider (CSP) space. The competition between these players drives innovation and results in customers receiving more benefits and better business outcomes over older technologies. The database as a service space is a massive opportunity for customers to gain more in terms of performance, scalability, and high-availability without the overhead or long-term contracts from traditional solutions.
At 2nd Watch we see a lot of customers who want to utilize database as a service to get better performance without the overhead or having all the hassles of traditional database products. In that, we watch all three Cloud Service Providers very closely, their services and run benchmark comparisons on their products. In our opinion, Amazon Aurora is probably the best-performing database as a service platform that we’ve seen to date, while still being very cost-effective for the performance benefits.
What is Amazon Aurora?
Amazon Aurora is AWS’s fully-managed MySQL-compatible relational database service. Aurora was built to provide commercial-grade performance and availability, while being as easy to use and as cost-effective as an open source engine. Naturally, a lot of people are interested in Aurora’s performance, and a number of people have tried to benchmark it.
Benchmarking accurately is not always easy to do, and it’s why you see different results from various benchmarks. We’ve helped many of our customers benchmark Aurora to understand how it would perform to meet their needs, and in many cases we have migrated, or are actively working on migrating, customers to Aurora.
Cloud SQL vs Amazon Aurora
Recently, Google benchmarked their own Cloud SQL data against Amazon Aurora. Since a number of our customers use Aurora, we were interested in digging in to take a deeper look at their findings.
The chart below shows the benchmark results published by Google.
Original Source: Google. See: https://cloudplatform.googleblog.com/2016/08/Cloud-SQL-Second-Generation-performance-and-feature-deep-dive.html
In their blog post about this comparison, the blog claims that 1) Cloud SQL performs better at low thread counts and 2) customers do not use a lot of threads, so the results at higher thread counts do not matter.
Our experience is that the latter claim is not true, especially as we look at our enterprise customers. So, we questioned our results from previous benchmarking. Therefore, we decided to benchmark things again to see if our results would fall in line with Google’s results. Below you can see an overlay of our data on top of their original graph. Since we do not have the original results from Google’s , an overlay is the best we can do for comparison.
To perform our own benchmarks, we used Terraform to stand up a VPC, our instances of Aurora, and the hosts from which we’d run the sysbench tests, all within the same Availability Zone. We used the same settings and commands as outlined in the Google blog post to against our results. We shared our results with the Aurora team at AWS, and they confirmed that they saw very similar results when they tested Aurora with sysbench in a configuration similar to ours.
Full data from our tests, R scripts, and automation to stand up the infrastructure can be found on github.
Here is a summary of our observations on this study:
1. Data for Aurora’s Performance is Incorrect
This data does not match the Aurora performance we see when we run the test ourselves. Without any special tuning, we saw higher performance results for Aurora. Without the original data from the tests Google conducted it is difficult to say why their results differ. All we can say is that, from our testing, Aurora does appear to be more performant and is clearly the database of choice, especially when dealing with higher thread counts.
2nd Watch overlay in purple:
Results from our tests on Aurora:
Both studies agree that Aurora has a significant performance advantage above a certain thread count. Google’s shows the previous release of Aurora having an advantage at 16 threads or more (Cloud SQL starts higher but then drops below Aurora). Ours shows the latest version of Aurora having an advantage at 8 threads, with results between the two databases being fairly comparable at 4 or fewer threads.
2.The Study Claims Customers Don’t Need Many Threads
Google is suggesting that you should only focus on the results with lower thread counts. Their blog post says that Cloud SQL “performed better than Aurora when active thread count is low, as is typical for many web applications.” The claim that low thread counts are typical for web applications (and thus what people should focus on) is inaccurate.
There are a large number of AWS customers who run applications using Aurora, and it is very common for them to run with hundreds of threads (and many customers choose to run with thousands). As we mentioned previously, both our study and Google’s study show that Aurora has an advantage at higher thread counts.
There are a number of areas where Amazon improved on the performance of MySQL when developing Aurora. One of them was taking advantage of high thread counts to deliver better performance. With Aurora, customers with higher thread counts will typically see a large performance advantage over MySQL and Cloud SQL.
3. Aurora’s Largest Instance Outperforms Cloud SQL’s Largest Instance by 3X
Google’s benchmark was done on an r3.4XL, which is only half the size of Aurora’s largest instance (the r3.8XL). We understand that this study used Cloud SQL’s largest machine and that they were trying to compare to a comparable Aurora instance size. But a customer is more likely to run a workload like the one in Google’s benchmark on Aurora’s largest machine, the r3.8XL, because the 8XL’s larger cache would improve performance. We ran the with Aurora’s r3.8XL instance, and the results were more than 3X higher than Cloud SQL’s performance. It was a struggle to fit Aurora’s r3.8XL performance on the same scale.
Aurora on r3.8xlarge (scale adjusted):
More Aurora Advantages
We’ve already pointed out Aurora’s performance advantage, which is evident in both studies. Aurora has a number of other advantages over Google Cloud SQL. One example is scalability. We showed earlier that Aurora’s largest instance exceeds the performance of Cloud SQL’s largest instance by 3X in this benchmark. Aurora can scale up to handle more storage (64 TB for Aurora vs 10 TB for Cloud SQL). Aurora supports up to 15 low-latency read replicas that have very low replica lag.
Aurora replica lag is typically only a few milliseconds, whereas traditional binlog-based replication, which is used by MySQL and Cloud SQL, can result in replica lag on the order of seconds or even minutes. Aurora allows up to 15 replicas that can act as failover targets without impacting the performance of the primary instance, whereas Cloud SQL allows only one failover target. Another advantage is that Aurora has extremely fast crash recovery, as its log-structured storage system does not require the replay of redo logs after a database crash.
If you’re thinking about benchmarking Aurora yourself, there are two common errors to watch out for. First, make sure you put the database and the client in the same Availability Zone or you will pay a latency and throughput penalty. This is what customers actually do when they use Aurora in production. Second, make sure that the client instance driving traffic to Aurora has enhanced networking enabled. The benchmarking guide mentioned above has instructions for how to do that.
Making good use of benchmarking data is tricky. You’re trying to build or migrate your application and you want to understand how your new database is likely to behave, now and in the future. There’s no magic benchmark that will exactly match your production workload, but you need to make a decision anyway. What should you do?
Here’s some general advice we try to follow for our own projects as well.
Plan for success by testing at scale: The last thing you want to do is make design choices that hurt you when your application starts to really take off. That’s when you want to celebrate with your team and build the next great feature, not attempt an emergency database migration. Think about how your workload might change as your application grows (higher thread counts, more data, higher TPS, more tables…) and build that into your tests.
Choose benchmarking tests that align to your application: Do you expect your data set to fit in memory or will your database be disk-bound? Will your traffic pattern be spiky? Write-heavy? You can find a large number of benchmarks for any major technology. Try to identify a set of tests that are as close as possible to your real-world application. Don’t just accept the first benchmarking study you find. If you can re-run your actual production workload as a test, even better.
Know that your benchmarks are not the real world and plan for that: The best benchmarking study you run will be, at best, slightly wrong (not the same as the real world). Make a note of the assumptions you made in your and keep an eye on your database in production. Watch for signs that your production workload is moving into unused areas and adjust your tests accordingly.
When you think of AWS, the first thing that comes to mind is scalability, high availability, and performance. AWS is now bringing the same features to a relational database by offering Aurora through Amazon RDS. As announced at AWS re:Invent 2014, Amazon Aurora is a fully managed relational database engine that is feature compatible with MySQL 5.6. Though it is currently in preview, it promises to give you the performance and availability of high end commercially available databases with built-in scalability without the administrative overhead.
To create Amazon Aurora, AWS has taken a service-oriented approach, making individual layers such as storage, logging and caching scale out while keeping the SQL and transaction layers using the core MySQL engine.
So, let’s talk about storage. Aurora can automatically add storage volume in 10 GB chunks up to 64 TB. This eliminates the need to plan for the data growth upfront and manually intervene when storage needs to be added. This feature is described by AWS to happen seamlessly without any downtime or performance hit.
Amazon Aurora’s storage volume is automatically replicated across three AWS Availability Zones (AZ) with two copies of data in each AZ. Aurora uses quorum writes and reads across these copies. It can handle the loss of two copies of data without affecting database write availability and up to 3 copies without affecting read availability. This SSD powered multi-tenant 10 GB storage chunks allows for concurrent access and reduces hot spots making it fault tolerant. It is also self-healing as it continuously scans data blocks for errors and repairs them automatically.
Unlike in a traditional database where it has to replay the redo log since last check point, which is single threaded in MySQL, Aurora’s underlying storage replays redo logs on-demand, in parallel, distributed and asynchronous mode. This allows you to recover from a crash almost instantaneously.
Aurora database can have up to 15 Aurora replicas. Since master and replica share the same underlying storage, you can failover with no data loss. Also, there’s very little load on the master since there’s no log replay, resulting in minimal replica lag of approximately 10 to 20 milliseconds. The cache lives outside of the database process which allows it to survive during a database restart. As a result, operations can be resumed much faster as there is no need to warm the cache. The backup is automatic, incremental and continuous. This enables point-in-time recovery up to the last five minutes.
A new added feature of Amazon Aurora is its ability to simulate failure of node, disk or networking components using SQL commands. This allows you to the high availability and scaling features of your application.
According to the Aurora FAQ, it is five times faster than a stock version of MySQL. Using Sysbench “Aurora delivers over 500,000 selects/sec and 100,000 updates/sec running the same benchmark on the same hardware.”
While Aurora may cost slightly more than the RDS for MySQL per instance based on the US-EAST-1, the features may justify it, and you only pay for the storage consumed at a rate of $0.10/GB per month and IOs at a rate of $0.20 per million requests.
It’s exciting to see the challenges of scaling the relational database being addressed by AWS. To learn more about Amazon Aurora, you can sign up for a preview at http://aws.amazon.com/rds/aurora/preview/.
2nd Watch specializes in helping our customers with legacy investments in Oracle achieve successful migrations to Aurora. Our Oracle-certified experts understand the specialized features, high availability and performance capabilities that proprietary components such as RAC, ASM and Data Guard provide and are skilled in delivering low-risk migration roadmaps. We also offer schema and PL/SQL migration guidance, client tool validation, bulk data upload and ETL integration services.