ECS Cluster Hibernation-Scheduled Stop/Start

ECS Cluster Hibernation-Scheduled Stop/Start

But Why?

Intensely utilized ECS Clusters can cost lots of Benjamins. Needless to say, there is no way around to lower the costs in Production environments besides provisioning appropriate type of instances for the tasks, coding efficiently, architecting a well-planned infrastructure and so on since the clusters must be running all the times. But of course an Infrastructure Engineer can take actions on Development environments to lower the costs.

I’m shutting down all clusters on Development environment between 23:00-07:00 in my way by a Python Lambda script which gets deployed by Terraform. I stop clusters by setting their Auto Scaling Group to 0. This makes all Container Instances to shut down. But what about the inital ASG states? Where do minimum, maximum and desired values go? I write them to a DynamoDB Table before setting them to 0.

I start the clusters by reading and setting the initial values for Cluster specific ASG from that DynamoDB Table.

For the schedule, I’m using CW Event Rules to trigger the Lambda script.

Let’s see the scripts!

The Python Code

To Stop All ECS Clusters

To Start All ECS Clusters

This Post Has 3 Comments

  1. Hello Mert. Thanks for this tutorial. It is exactly what I was looking for. I am new to pyhton and AWS. Could you tell me what permissions are needed to run the script? Another doubt, where is the name of the table specified? Thanks.

    1. Hello Sergio,

      I’m glad this is helpful for you. The table name is “ASGValues”, as declared in the scripts and for the permissions, on top of my head, I think you’ll need ASG:DescribeAutoScalingGroups, ASG:UpdateAutoScalingGroup and DynamoDB:UpdateItem. Or you can use ASG:*, DynamoDB:* if you want it to get working ASAP.

      1. Hello Mert. Thank you. I was able to implement the functions and understand the code better. I am not an aws expert, in fact, I had to use the ecs_client = boto3.client (‘ecs’) function that returns all services in a given cluster. So, I was able to change the DesiredCount attribute (the only one available), but it is enough to stop and start the services.

Leave a Reply

Close Menu