For those who has not heard about CloudWatch, they might think that it's Watch somewhere in Cloud! (It’s Funny, isn't it!) But it's not like that.
You must have heard about CloudWatch that it is a monitoring service given by AWS. And how it helps you to manage the application logging. So, in this article, I will explain you some points and use cases which you might not know can be performed with CloudWatch. After reading this blog, you won’t consider CloudWatch as a normal logging service. Trust me, it's more than we think.
This is a basic and important feature you all know about. But I am sure other features are worth looking at.
You can create alarm that triggers whenever any metric reaches your specified value. As an example, your instance CPU usage is above 80 %. Your application has so many 5xx errors. RDS has low storage space remaining. And so many other.
I would not talk that much as its basic thing everyone knows. So, let’s move forward.
2. Filter anything:
“Too many logs, I will not go through all of them !!”
CloudWatch has metric filter, a feature of filtering your logs. As an example, you can filter ‘login failed’ event happened on your Linux instance. You can do this by configuring CloudWatch agent on your instance (even in on-premise) and forward /var/secure logs to CloudWatch. After that create one metric, filter to recognize that and send event to security team that instance has more than 3 times failed login occurred. So that they can investigate further. Simple enough, right.!!
Similarly, you can design your own filter as per your application and create filter accordingly. After all AWS is doing all these things for its customer only.
3. Automated Alerting:
Best thing we can do using cloud is automation! I am talking about automation of everything. Resource provisioning, scaling, deployment almost every process can be automated. Alerting can be called as the heart of security of an application. So,How can we forget about alert automation.
I will explain this topic by one scenario. Suppose, you want to notify team whenever a user logs in to console. Similarly, a user creation, deletion can also be performed.
You can create CloudTrail and then use CloudWatch event rule to notify you about specific types of event. So whenever anything specified happens, CloudWatch will filter that and sends you notification that something has happened.
Some of them are not real time, as depends on CloudTrail, but it is still useful as it alerts you as soon as possible so you can act. After all, knowing little late is better than not knowing at all.
Let me know if you want a detailed blog in this use case. I will explain you.
4. Automated Response:
“Why to do it manually if response is fixed and can be automated!”
I told you about automation everywhere, right? So how come response is remained.
Yeah, you can also do response automation with help of CloudWatch. Let’s suppose, I want VPC FlowLogs to be enabled in every region or I want an instance to be restarted whenever some alert occurs or I want a bucket to have private objects only. In these cases, you can configure a lambda function which identifies event based on criteria and act as you defined. So, you don’t need to login to console every time just to restart the instance.
In above cases, you can create following lambda function and trigger it using CloudWatch Rule:
- Enables VPC flowlogs whenever it finds it disabled.
- Stop and Start EC2 whenever it triggers.
- Changes the object ACL and make object private, whenever it detects public object.
- Turn CloudTrail logging on whenever Logging Stop is detected. These are just few of them. You can share if you have other interesting as let me know which I need to explain in future articles.
5. Schedule Event:
When requirement is about scheduling, you must be thinking about cron expression. CloudWatch also provides scheduling of event. It will trigger target specified by you at a particular time.
You probably have booked tickets before. So there you might have encountered problems at a time when booking is just opened. Afterall whole country is booking tickets at that time! In that case you can schedule autoscaling of your servers on that time.
Similarly, on a sale you can schedule SQS or autoscaling. You can open your SSH port only on office timings and close it after shift over. You can schedule anything until you have a lambda. Try it and let me know if you have got some idea or problems too! I would love solving that.
6. Centralize Logging:
It is good practice to use multi accounts for different activities. This isolates your production and other tasks. So that if anything happens to your testing server, you may not end up losing your production too! In a same way, you can have separate security account too. By that you can ensure your logs safety if any malicious activity affects or deletes logs in any of your other accounts.
I will use Master account word for mentioning the account where all your logs will be stored. You can use CloudWatch Event Bus to forward your logs to some other account. Also to store logs at Master account’s bucket. By that, you can have all other account’s CW event into one Master account’s CloudWatch event and you can configure CloudWatch rules in Master account so that all your logs (including slave account’s) can be triggered using one master account. And when at a time of incident happens in any account, you can use stored logs in Master’s bucket to audit incident.
Okay so that's all for now.
You can reach out to us for any feedback and query.