nSights Talks

Spot Instances

Tutorial Highlights & Transcript

00:00 - Introduction
Hey everyone, hope you’re having an amazing Friday, wherever you are. Good morning. Good evening, good afternoon. I’m giving a demo on something that I find as a Solutions Architect is pretty underrated. It’s something that everyone knows about. It’s a thing that is taught when you’re trying to learn about the AWS eco-space. But I’ve rarely seen this being used. It’s the humble Spot, the Spot instance, how you can use it, what the interface looks like, and how easy AWS has made using Spot in recent times. You are trained to just go to the on-demand instance panel and just click an on-demand instance, whenever you’re doing something. And I did that for my PC, and I got slapped with a $150 bill. And when I made some slight changes and did it with a Spot instance, my weekly bill reduced to $43, which is a lot of savings, which prompted me to investigate Spot Instances more and how you can incorporate them.
01:03 - Compute Options on AWS
As I mentioned, we have computed options, which are a variety available on AWS. On-demand is the most expensive compute option yet it is the most commonly used. I think a big part of it is how we as engineers have been trained in our training and learning process. I’ve done like four or five certifications on AWS and hardly any tutor dwells on Spot Instances a lot. And apart from a couple of random questions in certification on solution architecture or DevOps, you hardly ever see a Spot mentioned. What happened is that even for routine non-production computes like a dev environment or a staging environment or even if you’re running a POC, or just doing some testing on an instance, you hardly ever think, “oh, I should provision a Spot instance.” You just go to the default option which is an on-demand instance.
02:00 - What is a Spot Instance?
And that brings us back to what is actually a Spot instance. Different from an on-demand instance, a Spot instance isn’t actually a specific instance class. It’s unused compute capacity that AWS offers for a discounted rate. But cash would have unused compute capacity that AWS has that is offering its current prices that AWS can claim at any time, and we’ve been conditioned to think of Spot Instances, as they might disappear at any time, which is not really the case if you plan them strategically.
02:36 - Why Spot?
And that begs the question, why Spot? As I mentioned, I was running a computer workload, running an application on a Windows Server. And the price for a C5x file is pretty high. But when I was working on the instance, you can check it on a price page AWS has and also on the Spot console, and I’ll show you that in a little bit, as well. The Spot price for a C5x.Large, one of the most commonly used instances for compute workloads both in production environments and in non-prod environments, is $0.038 per hour. Whereas on-demand pricing for the same instance is $0.17 per hour. That is an astounding 125% saving reduction. In essence, if you run a Spot instance, instead of an on-demand instance, you can potentially even provision two instances for the same price as you would pay for one on-demand instance, which is just mind-blowing.
03:40 - What’s the catch?
So what’s the catch? As I mentioned and how we know, for instance, I provision by bid price rather than a fixed price per hour. AWS provides a two-minute warning when an instance is about to be reclaimed. Previously, you had to put in some automation scripts, and just programming in place so that the EC2 instance could pull AWS metadata, which was where this Two Minute Warning was sent out. But AWS, as I mentioned, has done a lot of improvements in how Spot Instances are being handled. Now it also sends out a CloudWatch event. You can potentially put in alerts for that. You can also have a CloudWatch event check for EventBridge so that you can plan on how the Spot Instances terminations can be handled. There are also multiple improvements that AWS has done, which I will go through in the demo.
04:32 - What can we use Spot Instances for?
What can we use Spot Instances for? Spot Instances can be used in a variety of ways and each provides some amazing savings. Let me show you some of the good stuff that you can do with AWS Spot Instances. You can use Spot Instances for big data processing EC2 instances. With Spot, you can get very large instance types for very cheap, run your big data processing jobs and have them disappear, it gives you a lot more processing power for a fraction of the cost. CI/CD runners are instances that are provisioned. When a job is running, they don’t need to be around for the entirety of the process. Usually, they are provisioned, they process the job, and then they are turned down. This is perfect for a Spot instance. Any instance that is leveraged with batch processing, be it any in-house compute option or if they’re EC2 instances that are back with AWS batch, Spot Instances are great for that. Also, you can create auto scaling groups, which have a combination of on-demand and Spot Instances so that you have a backup of processing power with Spot Instances. And if they terminate due to AWS claiming them, you don’t need to wait for the termination or handle the termination. The Auto Scaling group can handle that, as well. Most importantly, this is something that I discussed with a client previously. For development and testing workloads where the requirement for an instance to stay up for a higher durability time like for 99.9%, uptime is not a requirement for development testing workloads, Spot Instances are perfect. You don’t really care if one development instance tapers out or is reclaimed since it’s a non-interactive workload. You can easily provision a new instance and do your development workload or testing workload, with minimum disruption on that. So these are perfect. And it provides a lot of savings, roughly 50-70% savings on those workloads.
06:32 - Demo - Spot Instance Console
Let me show you the new and improved Spot instance console. So where do you actually find the Spot instance console which is I think the big problem? When you go here, on the front page, there’s nothing that basically attracts you to Spot Instances. You see your resources and instances and these are by default demand instances. Spot Instances are checked all the way here to a small corner on the left side. You click the Spot instance, and you see this page. This is the page where you have a couple of different options. The first thing that I want to get your attention to is pricing history. You can see at a quick glance what sort of discounts you can see. What I do personally when I’m doing this sort of testing is I take the date range, and again, go back like three months, and see the historical pricing data. I’ll give you an example of the C5x.large that I tested out for my own work. For C5x.large, the on-demand price is $0.17. For the USD Swan region, the price is currently $0.07, which is like a 52% downtime. And as you can see from the graph here, it hasn’t budged above $0.1 for three months. If I plan smartly, I can give a bid price of $0.1 and know that my Spot instance will not in probability be reclaimed. You can actually go and check how likely it is that your request can be granted. You can just go to this Spot placement score tab and enter your requirements. Let’s say I need two instances for my dev environment and I want them to be C5x.larges. It doesn’t hurt to get some additional capacity so I’ll just do a C5x.large, as well.

Alright, so I got to two instances that I want to check. Let’s load the placement scores. In these two instances, you get a chance to do one to 10. If I make some changes, now I can see that I have a likely score and that my instances are likely to be successful. Not a lot, but it’s still better than on the lower end, but there’s still a chance that my instances will be requested and I can get these two instances. There’s also a Spot blueprint section that they’ve added recently and these are just quick and easy templates based on Terraform or CloudFormation that you can use to set up a Spot instance requirement pretty easily. You can use the EC2 Auto Scaling group and it will give you templates to download like Terraform template or CloudFormation that uses the default, a Spot, and EC2 instance a combination to provide this. This is pretty neat and you can also create your own blueprints that if you want, you can go and use in the future. Here, you go back and if you want to create a Spot instance, you just click Request Spot Instances. And it will take you to this panel where you can give some parameters for the launch. Here you have a couple of different options. You can select one or two instances and set the maximum costs for a Spot instance. Here, you can set in pricing that you see fit. You can go with the average price for the past week, or you can play smart and use a much larger price than hits down on your savings but ensures that your instances most likely won’t go away or get reclaimed. Here’s one thing that you can set. You can click this and it gives you a lot of different options, which is something that is new. You have interruption behavior and give you three options. You can either have AWS terminate your instances, which was how it was done in legacy and default. You can actually have AWS stop these instances. You can have some automation script set up in the back where Lambda can pull the Spot price. If the instances are now available, again, you can start them up. You can also hibernate them, which is an AWS EC2 option where in hibernation EC2 actually saves all the operations that are being done in memory, and it shuts down. When it is brought back up, it has all those operations. It acts like the instance didn’t terminate, it can revert back to the work that is doing. These are a couple of different options. Also, capacity rebalance is also a new feature. This gives you a couple of ways to decide how AWS will rebalance the EC2 instances. There are two options. Launch only. When AWS gives that two-minute warning that an instance might be reclaimed, it will launch a new instance of the replacement but will not terminate the instance that receives the AWS recommendation. Now, either the AWS instance that got the two-minute warning gets reclaimed by AWS or it does not. Essentially, you create a high availability backup solution for your loss. For instance, where you bring up a replacement and have the option that the original instance may or may not go down. The second one is launched before termination, where you essentially say that if an instance gets a reclaim signal from AWS, it will launch a replacement. Once the new instance comes up, it will terminate the original one. These are a couple of different ways that AWS has provided a lot more value on Spot Instances, how you can decide. That gives Spot Instances a lot more credibility in being actually useful in a lot of working environments. For the capacity fleet that I requested, which is two target instances,I have a strong option to get these instances and the estimated hourly price for these two is $0.067, which is on default, like a 72% savings, which is incredible. We all know that this is possible for Spot Instances, but due to the new features that AWS has provided with Spot, as well as some improvements on how Spots were reclaimed. This is now a much better approach to actually being used with development and product testing workloads, and also in a combination with on-demand on a lot of different container iced options in production environments, as well. I think these Spot Instances are pretty underrated. With some clever engineering, you can potentially bring down your AWS cost by half. At least in the couple of POCs that I’ve done on my own projects, and with some clients in the early phases of the testing, we can definitely see savings of 40-50% just by making some smart changes to their dev and staging environment. So yep, pretty excited about Spot Instances. I’m using them now in my own development POCs and I hope you guys also do some testing on it, see how you can utilize it, and use it going forward.

Jasmeet Singh

Saad Lodhi

Senior Solutions Architect

nClouds

Saad joined nClouds in 2018 as a Senior Solutions Architect. He holds several AWS Certifications including Big Data - Specialty, Solution Architect - Associate, Developer - Associate, and Cloud Practitioner.