ECS auto scaling: Here is what you need to know

Whenever we build ECS clusters, the topic of ECS scaling seems to be the most confusing for our customers. We wrote this blog post to explain two basic concepts you need to know about: scaling ec2 instances and scaling containers.


Scaling EC2 instances

As illustrated in the image above, you should configure your autoscaling to auto-scale based on the reserved capacity. Doing so will enable ECS to spin up new instances when you run out of memory.

The same logic can be applied to Spot Fleet with Auto Scaling

If you look at this CloudFormation link, you’ll notice that we’re scaling up EC2 instances based on MemoryReservation. We reserve additional memory each time we launch a container and use auto scaling to launch additional instances once we run out of capacity to launch additional instances.

You can refer to the reserved metrics in CloudWatch and click on the ClusterName.


In the example below, we have reserved 65% of capacity already. If you continue to launch more containers, they will fail to launch due to a lack of capacity.


Scaling ECS services

You should always keep an eye on the utilization of memory and CPU whenever you use an ECS console. Memory utilization is a measurement of how much memory is being used by containers. You can launch additional containers if you run out of memory.


You must also scale Docker containers according to memory utilization, CPU utilization, and other relevant utilization metrics. You can launch additional containers if you run out of memory.

Here is an example of memory utilization for a service running on ECS.



We mostly configure auto-scaling using Terraform or CloudFormation, but you can also configure service scaling from the AWS console. All you have to do is select a service running in the cluster and click on update.




We would love to learn how you are using ECS autoscaling. Please leave a comment with any tips.

  • Amol Pol

    Helped me in setting up ecs autoscaling based on memory utilization.
    But there is one problem in this. We have to always make one instance running in our cluster.
    We would like to start one machine only when there is requirement

  • Mark Sawers

    Thanks for the great article!

    If I understand it correctly, this approach favors consistent performance and availability over cost efficiency. The cluster must already have idle capacity in order for the service to expand. The service expands on utilization thresholds, and only after that, the cluster expands (autoscaling group) on that subsequent reservation. The cluster expansion is a side-effect of the service expansion. What is implied here is that there is always some cluster buffer (unreserved compute resources). Do I have that right?

    This is consistent with the official Amazon doc I’ve seen: autoscaling with alarms tutorial,, and the ecs autoscaling blog,

    I’m having quite a bit of trouble optimizing for cost efficiency, and accepting some temporary performance degredation. I make the autoscaling group front-run the service autoscaling with a more sensitive trigger. If they use the same trigger value (e.g. request count per target, or cpu utilization) then ecs often triggers **before** the asg, complains there is no resource available, and gives up. It may try five minutes later, once it was ten minutes later. By then the asg has gone wild and overshot the cluster size it needed. There seems to be no correlation between warm up and cool down times (shame on AWS for using two different terms for the same concept) and ecs’ frequency of scheduling/placement tasks.

    (On a side note, I’ve turned on ecs agent logging and see really questionable choices. ECS seems really lazy, and not even close to real-time in it’s decisions. For example, if I have a really simple cluster of three services with desired count of one, and bring up the cluster size from zero to one, it may take 10-15 minutes for all services to get running. It’ll bring up the daemon services almost within a couple mins, then a couple mins I’ll get one replica service, and then a five minute gap of doing nothing, I’ll get the last replica service. I see no areas in the aws console that let me tune this behavior.)

    The other problem I see is ASG target tracking with average cpu utilization. If some of the cluster is idle on cpu, it seems not be included in the average. It’s not using the average across the cluster and instead using either some kind of max, or an average of only those hosts that have non-zeroish cpu. This is not exactly the same problem mentioned and observed with missing data (insufficient data), so it’s not clear how to resolve this with another alarm as the official doc suggests on insufficient data.


Subscribe to Our Newsletter

Join our community of DevOps enthusiast - Get free tips, advice, and insights from our industry leading team of AWS experts.