nSights Talks

Terraform Compliance with Open Policy Agent (OPA)

Tutorial Highlights & Transcript

00:00 - Beginning of Video

Hello, my name is Carlos. And for today I want to talk to you about compliance and specifically how to achieve compliance with Terraform using OPA.

00:21 - What is Compliance?

So, let’s get started with defining things like, what is compliance? Like, what are we trying to do? And the important part here is, when we try to be compliant with something, it means that we’re following certain guidelines or specifications that are already set by somebody else. So here I put some examples, like I’m sure we’ve all had some projects where they require either SOC-2, or HIPAA compliance. Those come with their own guidelines and specifications, right. So for HIPAA, you need to do certain things. For SOC-2, you need to do other things. Also, here we have the Well-Architected Framework. So probably we will have Well-Architected projects. And that comes with its own set of guidelines that we need to comply with, in order to achieve this framework, right to be compliant with them.

01:15 - How Does it work?

So how does it work? Basically, we have a set of rules. And then once we have our infrastructure or our environment, we inspect it or not necessarily we, maybe the auditing firm that’s going to do the SOC compliance for a client, for example. They will go in and investigate and check that everything matches whatever rules they specify. And then if that’s true, then you get like a blue check there like your SOC-2 compliance or HIPAA compliance, something like that. Okay.

01:54 - nOps Tool to Check Compliance

So we already have a tool to check for these types of things. So if you have used nOps before, you notice on the reports, here, we have, like, SOC-2 readiness report, Well-Architected report, we have a HIPAA readiness report. So we can check for these things as soon as we have our environment. Like if we connect nOps, then we will get a report back and it will tell us like, where are the violations? What are we missing to achieve this compliance?

02:28 - What if?

But what if we could know these things before we actually create anything, right? Like, what if we were compliant from the code itself that is creating everything. So we’re already trying to do that, like, if you have used it, if you have used our code base than on our nCode repos, we have some security checks in place already trying to make everything that we built to be secure and compliant with certain things. But we are adding something extra to check for more specific things like not necessarily just security features, for example. And that is called OPA. That’s the tool that I wanted to talk to you about today.

03:18 - OPA

So OPA is an open policy agent. And it’s a framework that allows control and administration over environments. So if we define our policies or our rules using OPA policies, then we can run those against our stuff before it even runs. For example, we can write a policy that will run against our Terraform changes. So even before we attempt to deploy something, we can know like, Okay, this change that I’m trying to introduce, it’s not fully compliant with our company’s rules, or the rules of the organization that we’re trying to follow in general. So that is OPA. It’s an open source project. It uses templates in a syntax called don’t remember that. But then, we basically write our policies in .rego files. Because it’s code, we can basically check for anything, and it has a Terraform integration. So that’s how we are using it right now.

04:28 - nCode Library

So what do we have right now in our codebase, we run some checks for linting and formatting of the code using TFLint and Terraform format. We have security checks running with TFSec. So that runs as a pre-commit hook and also runs on pull requests and releases when we push and merge in GitHub actions. And we also run Terraform validate plan and then an apply destroy on releases so we know everything is working. And now we are adding OPA compliance checks. And we’re focusing on the Well-Architected Framework right now. But we could extend this to check for HIPAA specific things or SOC-2 specific things, because it gives us that freedom, we just need to code it. So that’s what we have.

05:40 - What are we building in the demo?

Now, let’s go to a quick demo. I just want to show you how it works, how we can write a policy and how are we using it? So what are we going to do? I have here a screenshot from nOps about SOC-2 report. And we’re just going to fix one specific thing here. So if you look at the screenshot here at the bottom says unencrypted AWS S3 buckets, so that means that nOps inspected the infrastructure, and detected a bucket that doesn’t have encryption, and that is not compliant with SOC-2. So with OPA, what we can do is we can run a check on the Terraform code that creates this bucket and make sure that it is encrypted before it actually goes through the CI/CD pipeline run Terraform plan and apply. So let’s go to the demo.

06:30 - Demo

Here I have a Terraform configuration to deploy a Kubernetes cluster. Here on the infrastructure is where we have our templates, and that is run through these. So we have our networking stack. We have a bucket for our flow logs and VPC, we have some security groups. Some roles. We have our Kubernetes cluster, our worker nodes, using managed node groups, and Fargate profiles and a couple other things. We’re not going to focus too much on everything we have here right now like we are just trying to fix the bucket problem.

So here we are creating a bucket to store the VPC flow logs. And you’ll know you’ll notice that we’re using our code base. So this bucket is compliant, because by default, that module has encryption enabled. So if I run the policy checks against this code, it should pass all the checks because there shouldn’t be anything broken with it. Now, how does OPA work? So first, you need to have it installed. So I have OPA installed on my system. It’s a CLI tool. And the way it works is we run a command, so OPA test, and then we give it a directory. So in this case it is policies, infrastructure, that’s my directory. Let’s see what we have there. So if we go to policies, infrastructure, we have separated this into the different pillars of the Well-Architected Framework. So we have some policies to check for security, for reliability for operations, and for cost. So these policies, let’s take a look at security, for example. Here we have first to enforce inline policy. So this is our policy written in rego. So that’s the OPA format. And this one is checking for inline policies in roles. So as per the Well-Architected Framework, we shouldn’t have any inline policies, everything should be a managed policy. So here we are inspecting the resources of type AWS IAM role and checking if they have an inline policy attached to them here. And then if our Inline policy is empty, then we’re good. But if any role in our configuration has an inline policy defined, then we flag it as an error and we can print this message. So not allowed to create inline policies, blah, blah, blah.

So this is an example policy. If you notice, it comes in two files. So we have the policy and then they underscore test file with the same name. And this one is just invoking the test and that’s the convention right? So we define it on a package Terraform security IAM and then test out policies and then we run the check. And one important part here is this with input. So what OPA is checking is the Terraform plan object. So if you ever use Terraform, and you run Terraform plan that will output like the whole plan to the terminal, but you can choose to save it. So if you save it, then you can see that plan as a JSON. And what we’re giving OPA is the as its input, it’s the whole, like Terraform plan JSON object. So based on that, then it’s able to search through it and perform the checks that we defined in our code. For example, in here, we check for inline policies.

So what we are going to add right now is a policy to check for if our buckets have encryption, and here I already have a file called enforce encryption at rest. So I’m going to add a policy, I’m going to add more code in here. So I have checks for encrypted EBS volumes for encrypted RDS instances. So I’m going to want to check for S3 buckets as well. So I have that right here. So what did I just add, I declared an object called S3 buckets. And basically this syntax might be a little confusing. But what this is, is basically a filter. So I’m filtering all of my resources. And I’m looking for the ones that are of type AWS S3 bucket. And then, based on this filter, I have a list right here of all the S3 bucket objects that I’m trying to create or modify in the current Terraform operation. Then, now that I have my object with buckets here, what we can do is add a check. So we’re going to deny any changes that involve a bucket being not encrypted. So we open up the knife block, then we get the object that we did before S3 buckets, where we know this contains all the buckets that we’re creating as part of the Terraform plan operation. And then we look if they have encryption enabled. So this part right here might look a little confusing, because we’re grabbing R which is one instance of that bucket, then we’re looking for changes. And after this is Terraform plans syntex. So if you have looked at the Terraform state or added a Terraform plan before, you have probably seen these objects in the JSON. But yeah, this is just so we know where to look. And then we’re looking for server side encryption configuration. That is an attribute of the AWS S3 bucket object in Terraform. So what we’re doing here is we’re checking every bucket and counting how many server side encryption configuration blocks they have inside of them. If that count is equal to zero, that means that the bucket doesn’t have any encryption configuration, and then we deny it with this block. And we print the message encryption at rest is not enabled for and the name of the bucket.

Okay, so now that we have the policy implemented, I can run the OPA checks against it. To do that, I need to get the Terraform plan as a JSON. So for that you run two commands actually. So first one, we would run Terraform plan and then out and then save it and this one saves it as a binary file. So that generates a plan.bin. And then on that one we will do a Terraform show and it will plan the bin as an input. And then the output of this we can save as a JSON file. And to save us some time I already did that. So here to the left, I have two Terraform plan objects. First one is a plan that passes all the checks. So that’s how the templates are right now. And then what I did is I added an S3 bucket here, just another one so it would fail the checks and notice this one I’m just creating as a resource. I’m using a module. And I just want this one to fail. So I didn’t define any encryption configuration for it. So now I have two plan objects, one that has an object that doesn’t comply with my checks, and then one that complies with everything. So when I was writing the command, I told you it was an OPA test, and then directory. So this will look through everything it finds in that directory, and the Terraform plan has to be in that same directory. So I’m going to grab the plan pass JSON file, and I’m going to put it in the policies folder. And here it is. So now I can run the OPA test. And you see here, we have a past seven out of seven, because in all my folders, in total, I’m defining seven policies. Now, what happens if we change to the plan fail. So I’m going to put this one in here, and I’m going to take the other one out. Remember, this one has a bucket without encryption, so it doesn’t comply. So if we run this, now we get an error. And let’s see what we got. So we have, oh, it’s actually failing two things, because it’s not tagged. So it’s failing, mandatory tags fail, and it’s failing encryption at rest. And it prints the message that I showed you in the code before. So it says encryption at rest. It’s not enabled for the AWS S3 bucket.failed test. And that is the name of my object right here. So the AWS S3 bucket failed the test. And then we can use this result to give back some information to the person making the changes like hey, your changes do not comply with certain policies that we have. And it’s really powerful, because right now we are writing policies for the Well-Architected Framework. But really, we could just write a policy specific for a client or specific for a project. Well, that’s it for the policies, that’s what I wanted to show you.

Now, some of you may be wondering, like, Okay, that’s a tedious setup step to perform, like getting the Terraform plan, then feeding it to the OPA test command and checking the results. So we have already automated this, and it’s part of our GitHub actions workflow. So we are still writing some policies like See, see here, we have seven, I think we have some more, I just don’t have all the changes locally. And I want to show you how it looks on the pipeline. So here we have a pull request, this was already merged. But as part of the GitHub actions workflow, we are running the OPA compliance report. And we’re making it so it comments back on the pull request, this small report. So here we have this category of security, we pass three of the three tests, so no failures, operations, two out of two, pass reliability, two out of two. And then here at the bottom, like if something was failing, for example, security, two out of three passing, that means one out of three is failing, and it gives you the comment here, like which policy or failing operations, this is failing two out of two. So it tells you a little description, like not allowed to create an enroll resources, not allowed to do something, right. So this is how the report looks like on a pull request. Here we have one that Oh, this one. In this one, we have one extra policy so it gives us one more row. And here everything is passing so we know it’s good, and we let it go through. Okay, so that’s, that’s what I got for today.

Carlos Rodríguez

DevOps Team Lead

nClouds

Carlos has been a Senior DevOps Engineer at nClouds since 2017 and works with customers to build modern, well-architected infrastructure on AWS. He has a long list of technical certifications, including AWS Certified DevOps Engineer - Professional, AWS Certified Solutions Architect - Professional, and AWS Certified SysOps Administrator - Associate.