nSights Talks

Deep Dive into YAML

Tutorial Highlights & Transcript

00:00 - Beginning of Video
So it’s not necessarily a deep dive. But, you know, I was actually looking for a bunch of stuff and thought that we can do a demo about it and then I thought I’d go back to the basics. And just because we use it a lot nowadays, checkout what’s up with YAML? Right? So. And interestingly enough, when I Googled a bunch of YAML stuff, I found actually a lot of things people don’t like, not necessarily what they like. So one of the things was, somebody stated a YAML file is almost always still valid, even if it’s trunka. Meaning, you know, when you do a cut and paste of YAML files, you miss something, and whatever you do, it’s still valid, but it doesn’t make any sense anymore, right? So there’s nothing there in a YAML file that kind of, you know, protects you from corrupted YAML files or any of that, right. So that’s the first one.
01:08 - Overview of YAML
Of course, we use YAML all the time, it’s Ansible, Kubernetes, Jenkins, compose, and you can even write CloudFormation. We use it so often, but maybe we didn’t even really look into what YAML actually is, or was and how it was created. So I will not talk much about that, I will give you a bunch of links, where you can actually look something up. But here’s a few things I found kinda, you know, and I never thought about it, I knew they probably exist, but never thought about it, really.
01:40 - What is YAML?
So what is YAML? So it’s a data serialization language that is often used for writing configuration files. That’s what we do most of the time, right? So you can even use it for data itself. But most of the time we use a configuration pass. And again, you know, that’s something I got off the internet some days slightly better than Windows dot any other basically say the best advantage, or the biggest advantage of JSON is because you can write comments, but I mean, who does that really? Right? So what does YAML actually have – that’s sequences, scholars and mappings, right? And what we use basically is a colon, a dash, the pound sign, three dashes and three dots. But the three dots I basically found out, you know, while I was researching, because I’ve never seen them before in any YAML file. But yeah, that’s supposedly what you need to do when you create a document within a file, right. And then there is more to that. So you can go out and say there’s two different ways of writing YAML.
02:53 - Flow Format vs. Block Format
So mostly going in, it’s just like JSON mostly. And then you can write it like this, right?
02:58 - Demo of using YAML validator
So you go in here, YAML. And then you do validate YAML, that doesn’t do much here. So let’s go this – validate YAML is fine. And then you load it and you get your la YAML file, right? So the way we know it all the time, but will it basically accept that right? So you can also do something else, you can go to Python, import YAML and validate your YAML with a YAML dot load. And pattern, basically, whatever you give 500 YAML loads, or the output will be you know, your JSON format, right? Of course, this one will work, right? So I just want to show it in Python. So if you push that into Python, you get the same thing, which is of course not a surprise. So yeah, those are the four ways you basically use the flow form once in a while you see it, you know, because people will probably put a sequence into YAML and just use the flow form, you know, with brackets, right. But what I found out and I did not know is that this is YAML too, right.
04:13 - YAML types
So that’s actually explicitly saying which type is what right? So it says here, this is a map, this is a string, this is an integer, etc, etc. Right? So that’s, you can write in that one as well. So if you really want to hide what you did in YAML, write your GitHub actions like this. I mean, I haven’t tried it, but I think that’s supposed to work. So validate YAML up. There’s a bug in here, and this is this. Okay, so that’s something else I wanted to talk to you about a little bit. So it’s valid YAML, you format it, you get the same thing. You can even do it here. And there you go. So given that you’re in Python, YAML will do that, right? So yes, so if you want to see what types are available in YAML, you go to yaml.org/type. And you will realize that most of the types won’t work. And that is because a lot of YAML, YAML interpreters, or YAML libraries don’t really implement all of YAML.
05:33 - YAML is not language-agnostic
And so that’s the next step is kind of in, you know, when you go in and check, for example, some specific YAML files, like this one, and you’ll see what happens, you will see that some YAML interpreters will interpret correctly and others won’t, right. So it’s kind of like this one, for example, says, you can’t have more than one document. Right? So it might be something like written with a library that doesn’t actually support it. This one says it’s valid, and then okay, here it goes. And here’s a bunch of things I want to point you to. One is, it uses as a key a list. So that’s something that, for example, Python will not tolerate, right. And the other thing you see here is that this string turns into a date format. So it automatically, basically assumes that this one is a date, and then you know, works specifically for that and converts it right. So. But that is one of the issues that a lot of people tell you on the internet is because it’s not perfect. So sometimes it works. Sometimes it doesn’t. And there’s a lot of people complaining about that, and noting that it costs them a lot of time to debug. Right. So, again, here, Ruby, for example, works differently, Ruby would process that, right, but it will ignore the rest of the documents. So especially if you debug some Kubernetes stuff, you probably if you go to one of the debuggers, you usually are better off in your debug document by document and not the whole file. Right. So it’s just one of the thoughts and stuff I learned.
07:25 - YAML caveats and links for more info
So the next thing is, it’s not really related to when you do what we mostly do is we write files for specific systems like GitHub actions, etc, etc. But, you shouldn’t always say that YAML in itself is not safe, right? So it’s insecure. So if you do this, there you go. So it actually runs the script. And of course, LS slashes, you know, a lot of bad scripts. So it’s fine, but you don’t know what’s in there, right? So. So a lot of people say, if you do that, and if you run a little script, or create a little Python script that actually gets YAML, you should basically use Save load, right? Not load. And the same is true, by the way for Ruby, and I just don’t do much Ruby. So I didn’t demo that. And it means that actually a lot of companies or a lot of Ruby implementations at one point, we’re all insecure because of that, because they load that YAML file and basically run it without asking, right? So there’s also some more weird stuff here, for example, that has some funny booleans. So if you go here, do another load YAML dot load is all false. Right? So and then YAML will do that to you, depending on the interpreter or the libraries you have. And so people found out about that on the internet and said, Okay, no, or no, and always put Norway so I use that. And it turns out that what it did is kind of translated to false and so their logic was messed up and they had to debug quite a lot. And the same thing, of course, is true for the two values and for example, one of them is on, right. And if you look at GitHub actions that have an on right, but of course they interpreted differently, right. And there’s more stuff going on. And I will show you the links and there’s a bunch of pitfalls here. Here’s a good summary of pitfalls, then as YAML multi line for this 1000s of ways, not necessarily but a lot of ways to do multi line texts. And then there’s also a funny website here that kind of just, you know, makes you think about YAML twice. Especially I mean, we don’t have a choice, really. But yeah, so just be aware of those things when you debug YAML. But there’s one thing that is really cool. And that is especially usable for us when we write long, long, long, long pipelines. And that is anchors and aliases. So you can define an anchor, say, okay, here I use defaults, I define my defaults here. And then I give it an anchor, and defaults can be a different name, that’s fine. And then it says database adapter, and host is this and then I can call these, reference them over again, I can actually reference them. So one, just like using the same, right, so this is true. And also this is true. So that’s basically the same, I just added it in here because one of the YAML interpreters does a little bit differently. So you can append for example, if you want to change or add the port, or you can overwrite stuff, right? So if we use that part of the code, go in here. And let’s see if that one works here. Validate the YAML says validate the YAML. And so that’s what you end up getting, right? So you can also write it like this. But this YAML interpreter does it a little bit differently. It tells you that it’s gonna use that ref. So it’s kind of like you just have to be aware of that. But it works. So there is no issue with that. You can also do that in Python, you can do a YAML load and then get this and there. So you see that it replaces the references and uses the anchors, right. So yeah, that’s all about YAML for today.
Jasmeet Singh

Peter Mooshammer

Senior DevOps Engineer

nClouds

Peter has been a Senior DevOps Engineer at nClouds since 2016. He is a familiar member of the Bay Area tech community as the co-organizer of various meetups including the Bay Area Infracoders, Silicon Valley Code Camp, and DevOpsDays.