Blog

What is YAML, and why is it used for data serialization?

Oct 21, 2021 | Announcements, Migration, MSP

YAML is a straightforward, machine-parsable data serialization format designed for human readability and interaction, which can be used in conjunction with all programming languages. YAML is popular because it is optimized for data serialization, formatted dumping, configuration files, log files, and internet messaging and filtering. Since YAML works in concurrence with any programming language, it is often used to write configuration files. YAML files are easy to work with in a text editor, portable between programming languages, and expressive and extensible. They support the Unicode character set, and YAML files perform many of the same functions as XML or JSON.

However, YAML is not without its quirks.

One of YAML’s quirks is that it is unpredictable and can cause problems for any configuration file. For example, if a YAML config file is truncated because of an error during write or transmission, the resulting broken file will probably be perfectly readable by YAML. This is not a problem if the error has inadvertently created the long-sought-after equation for the theory of everything, but that error has likely created a “bug” that will take time to find and exterminate. In other words, YAML lacks protections from corrupted YAML files.

Another quirk with YAML is that it is not secure by default. Loading a user-provided (untrusted) YAML string needs careful consideration. However, YAML files can be protected. Go to File-> New-> File and create a new YAML file. Define secure properties in the file by enclosing the encrypted values between the sequence![ value].

That said, YAML has distinct advantages.

One advantage is that YAML is concise in syntax and punctuation, which facilitates writing by hand instead of being computer-generated. For example, the basic structure of a YAML file is a hash map and consists of one or more key-value pairs. You can set another key-value pair as a value by indenting the nested key. The basic writing structure is based upon simple syntax with Sequences, Scalars, and Mappings:

:

Between key/value pairs

--

Denotes a sequence entry

#-

Starts a comment

---

Starts a document

...

Ends a document

The uses of dots, dashes, hashtags, colons, and whitespace function as diacritical markings and form the basis of two types of YAML script, Flow and Block:

This is the YAML Flow format:

---
{a: 1, b: {c: 3, d: 4}}

And this is the YAML Block format:

---
a: 1
b:
  c: 3
  d: 4

And this is also YAML:

%YAML 1.1
---
!!map {
  ? !!str "a"
   : !!int 1,
   ? !!str "b"
    : !!map {
        ? !!str "c"
        : !!int 2,
        ? !!str "d"
        : !!int 4
   }
}

See:
https://yaml.org/type/

It seems that the YAML Flow format suits those coders with more of a mathematical mindset, while the YAML Block format is a better fit for coders who lean toward a more literary outline style of writing. A beneficial advantage is that YAML can incorporate more than one document. YAML even allows independent comments within a pipeline thread because YAML is language-agnostic to a certain extent.

Example:

Python can’t process two documents or use complex map keys. Try:

---
? - Detroit Tigers
  - Chicago cubs
: - 2001-07-23
---
? Oakland A
: 2001-07-23

Ruby will process the above example but will ignore everything after the first document.

YAML’s data serialization format includes an understated and unique advantage, its Anchors.  YAML anchors and aliases let you reference and use the same data multiple times within a single YAML document. YAML’s Anchors are a time-saving advantage for a developer writing lengthy pipelines.

defaults: &defaults
  database: myapp_test
  adapter:  postgres
  host:     localhost

YAML Anchors are features that identify an item and then reference it elsewhere in a file. Anchors are created using the & sign, and an alias name follows the ampersand. The alias name can later be used to reference the value following the Anchor.

development:
  <<: *defaults

And referenced with a *(development: *defaults)

qa:
  <<: *defaults

They can be appended…

prod:
  <<: *defaults                               
  database: myapp_prod

And overwritten…

YAML has its positives and negatives, its quirks and its advantages. And, depending on whom you ask, YAML can stand for “Yet Another Markup Language ” or “YAML Ain’t Markup Language.” However a person feels about it, YAML is a popular programming language because it is human-readable and easy to understand.

Here are some interesting links with more information about YAML:

Fun stuff:
https://noyaml.com

And this link for multiline text:
https://yaml-multiline.info

And a good summarization:
https://www.arp242.net/yaml-config.html

Need help with your data and analytics strategy? We’ve got the experience, AWS data, and analytics how-to knowledge and credentials, plus our research initiatives, to help you plan and execute your strategy.  Contact Us.

GET SUBSCRIBED