CloudWatch is a tool for monitoring Amazon Web Services (AWS) cloud resources. With CloudWatch you can gather and monitor metrics for many of your AWS assets. CloudWatch for AWS EC2 allows 10 pre-selected metrics that are polled at five minute frequencies. These pre-selected metrics include CPU Utilization, Disk Reads, Disk Read Operations, Disk Writes, Disk Write Operations, Network-In, Network-Out, Status Check Failed (Any), Status Check Failed (Instance), and Status Check Failed (System). These metrics are designed to give you the most relevant information to help keep your environment running smoothly. CloudWatch goes one step further and offers seven pre-selected metrics that poll at an increased frequency of one-minute intervals for an additional charge. With CloudWatch you can set alarms based on thresholds set on any of your metrics. The alarms can trigger you to receive status notifications or to have the environment take automated action. For example you can set an alarm to notify you if one of your instances is experiencing high CPU load. As you can see from the graph below we’re using CloudWatch to gain insight on an instance’s average CPU Utilization over a period of 1 hour at 5 minute intervals:
You can clearly see that at 19:10 the CPU Utilization is at zero and then spikes over the next 35 minutes and is at 100% CPU utilization. 100% CPU utilization lasts for longer than 10 minutes. Without any monitoring this could be a real problem as the CPU of the system is being completely taxed, and performance would undoubtedly become sluggish. If this was a webserver, users would experience dropped connections, timeouts, or very slow response times. In this example it doesn’t matter what is causing the CPU spike, it matters how you would deal with it. If this happened in the middle of the night you would experience downtime and a disruption to your business. With a lot riding on uninterrupted 24×7 operations, processes must be in place to withstand unexpected events like this. With CloudWatch, AWS makes monitoring a little easier and setting alarms based on resource thresholds simple. Here is one way to do it for our previous CPU Utilization example:
2. In the Dashboard go to Metrics and select the instance and metric name in question. On the Right side of the screen you should also see a button that says Create Alarm. (See figure below)
3. Once you hit Create Alarm, the page will allow you to set an Alarm Threshold based on parameters that you choose. We’ll call our threshold “High CPU” and give it a description “Alarm when CPU is 85% for 10 minutes or more”.
4. Additionally you have to set the parameters to trigger the alarm. We choose “Whenever CPU Utilization is 85% for 2 consecutive periods” (remember our periods are 5 minutes each). This means after 10 minutes in an alarm state our action will take place.
5. For Actions we select “Whenever this alarm: State is ALARM” send notification to our SNS Topic MyHighCPU and send an email. This will cause the trigger to send an email to an email address or distribution list. (See the figure below)
6. Finally we hit Create Alarm, and we get the following:
7. Finally you have to go to the email account of the address you entered and confirm the SNS Notification subscription. You should see a message that says: “You have chosen to subscribe to the topic: arn:aws:sns:us-west-1:xxxxxxxxxxxxx:MyHighCPU. To confirm this subscription, click or visit the link below (If this was in error no action is necessary). Confirm subscription.
Overall the process of creating alarms for a couple metrics is pretty straight forward and simple. It can get more complex when you incorporate more complex logic. For example you could have a couple EC2 instances in an Auto Scale Group behind an Elastic Load Balancer, and if CPU spiked over 85% for 10 minutes you could have the Auto Scale Group take immediate automated action to spin up additional EC2 instances to take on the increased load. When that presumed web traffic that was causing the CPU spike subsides you can have a trigger that scales back instances so you are no longer paying for them. With the power of CloudWatch managing your AWS systems can become completely automated, and you can react immediately to any problems or changing conditions.
In many environments the act of monitoring and managing systems can become complicated and burdensome leaving you little time for developing your website or application. At 2nd Watch we provide a suite of managed services (https://www.2ndwatch.com/cloud-services/managed-cloud-platform/) to help you free up time for more important aspects of your business. We can put much of the complex logic in place for you to help minimize your administrative cloud burden. We take a lot of the headache out of managing your own systems and ensure that your operations are secure, reliable, and compliant at the lowest possible cost.
-Derek Baltazar, Senior Cloud Engineer