CloudWatch

CloudWatch Basics

📒 Homepage ∙ Documentation ∙ FAQ ∙ Pricing
CloudWatch monitors resources and applications, captures logs, and sends events.
CloudWatch monitoring is the standard mechanism for keeping tabs on AWS resources. A wide range of metrics and dimensions are available via CloudWatch, allowing you to create time based graphs, alarms, and dashboards.
- Alarms are the most practical use of CloudWatch, allowing you to trigger notifications from any given metric.
- Alarms can trigger SNS notifications, Auto Scaling actions, or EC2 actions.
- Alarms also support alerting when any M out of N datapoints cross the alarm threshold.
- Publish and share graphs of metrics by creating customizable dashboard views.
  - Monitor and report on EC2 instance system check failure alarms.
Using CloudWatch Events:
- Events create a mechanism to automate actions in various services on AWS. You can create event rules from instance states, AWS APIs, Auto Scaling, Run commands, deployments or time-based schedules (think Cron).
- Triggered events can invoke Lambda functions, send SNS/SQS/Kinesis messages, or perform instance actions (terminate, restart, stop, or snapshot volumes).
- Custom payloads can be sent to targets in JSON format, this is especially useful when triggering Lambdas.
Using CloudWatch Logs:
- CloudWatch Logs is a streaming log storage system. By storing logs within AWS you have access to unlimited paid storage, but you also have the option of streaming logs directly to ElasticSearch or custom Lambdas.
- A log agent installed on your servers will process logs over time and send them to CloudWatch Logs.
- You can export logged data to S3 or stream results to other AWS services.
- CloudWatch Logs can be encrypted using keys managed through KMS.
Detailed monitoring: Detailed monitoring for EC2 instances must be enabled to get granular metrics, and is billed under CloudWatch.

CloudWatch Alternatives and Lock-In

CloudWatch offers fairly basic functionality that doesn't create significant (additional) AWS lock-in. Most of the metrics provided by the service can be obtained through APIs that can be imported into other aggregation or visualization tools or services (many specifically provide CloudWatch data import services).
🚪 Alternatives to CloudWatch monitoring services include NewRelic, Datadog, Sumo Logic, Zabbix, Nagios, Ruxit, Elastic Stack, open source options such as StatsD or collectd with Graphite, and many others.
🚪 CloudWatch Log alternatives include Splunk, Sumo Logic, Loggly, LogDNA, Logstash, Papertrail, Elastic Stack, and other centralized logging solutions.

CloudWatch Tips

Some very common use cases for CloudWatch are billing alarms, instance or load balancer up/down alarms, and disk usage alerts.
You can use EC2Config to monitor watch memory and disk metrics on Windows platform instances. For Linux, there are example scripts that do the same thing.
You can publish your own metrics using the AWS API. Incurs additional cost.
You can stream directly from CloudWatch Logs to a Lambda or ElasticSearch cluster by creating subscriptions on Log Groups.
Don't forget to take advantage of the CloudWatch non-expiring free tier.

CloudWatch Gotchas and Limitations

🔸Metrics in CloudWatch originate on the hypervisor. The hypervisor doesn't have access to OS information, so certain metrics (most notably memory utilization) are not available unless pushed to CloudWatch from inside the instance.
🔸You can not use more than one metric for an alarm.
🔸Notifications you receive from alarms will not have any contextual detail; they have only the specifics of the threshold, alarm state, and timing.
🔸By default, CloudWatch metric resolution is 1 minute. If you send multiple values of a metric within the same minute, they will be aggregated into minimum, maximum, average and total (sum) per minute.
🐥In July 2017, a new high-resolution option was added for CloudWatch metrics and alarms. This feature allows you to record metrics with 1-second resolution, and to evaluate CloudWatch alarms every 10 seconds.
- The blog post introducing this feature describes how to publish a high-resolution metric to CloudWatch. Note that when calling the PutMetricData API, StorageResolution is an attribute of each item you send in the MetricData array, not a direct parameter of the PutMetricData API call.
🔸Data about metrics is kept in CloudWatch for 15 months, starting November 2016 (used to be 14 days). Minimum granularity increases after 15 days.