
Published 2023-02-21 17:22:50
Learn YAML With Python
thanks to freepik for wonderful images
#pythonlearning #python #pythonprogramming #programming #yaml
TL;DR
In this blog post, we explore the power of YAML, a data serialization language widely used in software development and DevOps. From understanding why you should learn YAML and its fundamental concepts to use the PyYAML library in Python, we'll cover everything you need to become proficient in YAML. Dive in and learn essentials of YAML and PyYAML
Why You Should Learn YAML
Are you ready to level up your software development and DevOps game? Say hello to YAML – your new best friend! Here's why mastering YAML is important for you:
-
Embrace the in-demand tools: YAML is the secret sauce behind the configuration files and data structures of top-notch tools like Kubernetes, Docker, and Ansible. Mastering YAML means unlocking the full potential of these popular platforms.
-
Effortless compatibility: Want to exchange data between diverse systems and programming languages? YAML has your back! Its portability makes it an essential tool for seamless collaboration and integration.
-
Conquer complex data: Does the thought of managing intricate data structures give you chills? Worry no more! YAML's proficiency in handling arrays, dictionaries, and nested data structures makes managing large and complex data sets a breeze.
In short, YAML empowers you to manage and organize data like a pro, streamline your workflow, and unlock the full potential of a wide array of tools and platforms. And the best part? YAML's syntax is pretty simple so no reason for not learning it.
Introduction to YAML
YAML (short for "YAML Ain't Markup Language") is a popular data serialization language used in software development, data processing, and configuration management. It is often used to define structured data in a human-readable format that can be easily parsed and understood by both humans and machines.
YAML syntax is designed to be human-friendly and easy to read. It is a whitespace-sensitive language, which means that the indentation of code blocks is significant. YAML supports scalar types like strings, integers, and booleans, as well as collections like lists and dictionaries.
In Python, you can use the PyYAML library to read and write YAML files. PyYAML provides a simple and intuitive API for working with YAML data, making it easy to parse YAML files into Python objects and vice versa.
YAML has become a popular choice for defining configuration files in many software projects because of its readability and flexibility. It can be used for a wide range of tasks, from defining database schemas and application settings to creating complex data structures.
One very common use cases for YAML is creating and managing Kubernetes manifest files. Kubernetes is a popular container orchestration platform that uses YAML files to define and manage resources like pods, deployments, services, and more. With YAML, you can create, modify, and deploy Kubernetes manifests in a standardized and reproducible way.
Overall, YAML is a powerful and versatile language that is well-suited for many different use cases in software development and beyond.
Yaml is superset of JSON
YAML and JSON share a similar structure, as both formats use key-value pairs and collections to represent data.
YAML offers several features that JSON does not, such as support for comments, multi-line strings, and flexible formatting. Because of these added features, YAML is often considered to be more human-friendly than JSON and is commonly used for configuration files, data serialization, and other tasks that require readability and ease-of-use. JSON, on the other hand, is preferred for machine-to-machine communication and data exchange due to its more rigid structure and ease of programmatic parsing.
Despite their differences, YAML and JSON are often used interchangeably, and many programming languages provide libraries that can convert between the two formats. In Python, for example, the PyYAML library can handle both YAML and JSON data.
Several popular tools and platforms use YAML and JSON
For example, Ansible uses YAML for defining tasks and JSON for storing inventory and configuration data, while Docker Compose supports both YAML and JSON syntax for defining and running multi-container Docker applications. GitHub supports both YAML and JSON for defining workflows and automation tasks, and Kubernetes uses YAML to define resources such as pods, deployments, and services, while also supporting JSON for this purpose. Terraform, a tool for managing infrastructure as code, uses a domain-specific language called HashiCorp Configuration Language (HCL), but also supports both JSON and YAML for defining configuration files.
YAML Essentials
In this section, we'll get into the fundamentals of the YAML language, providing you with the knowledge you need to begin working with it.
YAML syntax
Here is an example of an simple YAML file:
# This YAML file defines information about a character from The Expanse TV show
name: James Holden # string, scalar types
age: 34 # integer, scalar types
address: # dictionary
street: Tycho Station # string
city: Belt # string
state: Space # string
phone: # The character's phone numbers, defined as a list of dictionaries
- type: work # string
number: 555-1234 # string
- type: ship # string
number: 555-5678 # string
In this YAML document, we have information about a character from my favorite sci-fi show called The Expanse.
The character's name and age are represented as scalar nodes, with the name being a string and the age an integer.
The address is a mapping node, which is like a dictionary, containing details such as street, city, and state.
Each of these address elements is a scalar node with a string value.
Lastly, the phone numbers are organized in a sequence node, which is a list containing dictionaries.
Each dictionary in the list has two keys, type and number, both of which are scalar nodes with string values.
The YAML document demonstrates an efficient way to organize and represent data in a clear and human-readable format. We will now learn what a node is and explore the various types of nodes available.
What is a Node in YAML?
There is a term called node in YAML. It may sound hard, but it really isn't and we should go through those since it's an elementary building block that represents each piece of data in YAML file.
There are four types of nodes in YAML:
- Scalar Nodes: These represent single values like strings, integers, or booleans (true/false). In essence, they're the simplest form of data.
- Sequence Nodes: As the name suggests, these nodes represent ordered lists or arrays of data elements. They're usually denoted by a hyphen followed by a space in YAML.
- Mapping Nodes: These nodes represent key-value pairs, similar to dictionaries or hash tables in various programming languages. In YAML, they're denoted using a colon followed by a space.
- Nesting Nodes: The beauty of YAML lies in its ability to create nested structures. This means you can combine and nest different types of nodes to create more complex data structures. For example, you can nest a sequence node within a mapping node to create an organized hierarchy of data.
To sum it up, a node in YAML is a fundamental unit of data, which can be a scalar value, a sequence, or a mapping. By understanding and mastering nodes, you'll be well on your way to creating clear, organized, and human-readable YAML documents!
Next, we will learn everything about each node type.
What is Scalar Node?
Scalar types in YAML are simple, atomic values that cannot be divided into smaller values. In YAML, there are several scalar types, including:
-
Strings: A string is a sequence of characters, such as a person's name, address, or phone number. In YAML, strings can be written using single or double quotes, and can contain special characters like newlines and tabs.
-
Integers: An integer is a whole number, such as a person's age or zip code. In YAML, integers are represented as plain numbers, without any quotes or other special notation.
-
Floats: A float is a decimal number, such as a person's height or weight. In YAML, floats are represented as plain numbers, with a decimal point separating the integer and fractional parts.
-
Booleans: A boolean is a value that is either true or false. In YAML, booleans are represented using the words "true" and "false", without any quotes or other special notation.
-
Nulls: A null is a value that represents the absence of a value. In YAML, nulls are represented using the word "null", without any quotes or other special notation.
Example of the scalar node types:
title: "My First Travel in the System" # Scalar node (string)
author: "Amos Burton" # Scalar node (string)
date: "2022-02-28" # Scalar node (string)
tags:
- space travel # Scalar node (string) within a sequence node
- science fiction # Scalar node (string) within a sequence node
- personal
Scalar types are the building blocks of more complex data structures in YAML, such as dictionaries and lists. Understanding scalar types is an important part of working with YAML data in Python or other programming languages.
What is a Sequence Node?
In YAML, sequence nodes provide a way to represent ordered lists or arrays of data elements. They play a crucial role in organizing data and making YAML documents easy to read and understand. Let's look at the following document:
title: "My First Travel in the System"
author: "Amos Burton"
date: "2022-02-28"
tags:
- space travel
- science fiction
- personal
content: >
Welcome to my blog post about my first space travel in the system!
In this YAML document, the tags
key contains a sequence node, which represents an ordered list of elements. The sequence node is a list of three scalar nodes, with each node representing a different tag associated with the blog post:
tags:
- space travel
- science fiction
- personal
Notice that each item in the list begins with a hyphen (-) followed by a space. This syntax is used to create a sequence node in YAML. In this example, the sequence node makes it easy to understand and manage the tags related to the blog post.
By using sequence nodes, you can create organized and easily readable YAML documents that are perfect for handling various types of data, from simple lists to more complex structures.
What is a Mapping Node?
In YAML, a mapping node is a way to store data in a key/value format, similar to a dictionary or an associative array. It's a collection of key/value pairs, where each key represents a unique identifier for a value. Mapping nodes are denoted in YAML by using the colon (:) to separate the key and value, and by indenting the value below the key.
Here's an example of a YAML file that uses mapping nodes to store information for a blog post:
title: "My First Travel in the System"
author: "Amos Burton"
date: "2022-02-28"
tags:
- space travel
- science fiction
- personal
content: >
Welcome to my blog post about my first space travel in the system! In this post, I'll be using mapping nodes in YAML to organize my experience and observations from the trip. Mapping nodes are a great way to store and structure data, and they'll help
In this example, the mapping node contains the following key/value pairs:
- "title" is the key for the title of the blog post, and the value is "My First Travel in the System".
- "author" is the key for the author of the blog post, and the value is "Amos Burton".
- "date" is the key for the blog post date, and "2022-02-28" is the value.
- "tags" is the key for the tags associated with the blog post. The value is a sequence of three scalar values: "space travel", "science fiction", and "personal".
- "content" is the key for the content of the blog post, and the value is a multi-line scalar that contains the text of the blog post. In this case, the value is a string that includes an introduction to the blog post topic, and a description of how the author will be using mapping nodes in YAML to structure their thoughts and experiences from their space travel.
Mapping nodes can be used to store any kind of key/value data, such as configuration settings, database records, or API responses. They're a flexible and easy-to-read format for storing structured data, and they can be easily converted to and from other formats, such as JSON or Python dictionaries.
More details about scalar and mapping nodes
Example 1
A scalar node represents a single value, such as a string, a number, a boolean, or null. In YAML, the string "James" by itself is a scalar node, not a mapping node.
In YAML, the key-value pair "name: James" is a mapping node, not a scalar node.
# Mapping node
person:
name: James
age: 34
# Scalar node
name: James
The entire key-value pair is considered to be a single node, with "name" as the key and "James" as the value.
Here's an example to clarify the difference:
name: James Holden # scalar node representing the character's name
age: 34 # scalar node representing the character's age
occupation: captain # scalar node representing the character's occupation
ship: # mapping node representing the character's ship
name: Rocinante # scalar node representing the name of the ship
class: Corvette # scalar node representing the class of the ship
crew: # mapping node representing the crew of the ship
- name: Naomi Nagata # scalar node representing the name of a crew member
position: chief engineer # scalar node representing the position of the crew member
- name: Alex Kamal # scalar node representing the name of another crew member
position: pilot # scalar node representing the position of the other crew member
In this example, the YAML data uses scalar nodes to represent individual pieces of information about James Holden, such as his name, age, and occupation.
It also uses a mapping node to represent the ship that Holden captains, the Rocinante.
The mapping node contains key-value pairs for the name and class of the ship, as well as a nested mapping node for the crew of the ship.
The crew mapping node contains a sequence of two mapping nodes, each representing a crew member.
Each crew member mapping node contains key-value pairs for the name and position of the crew member:
PyYAML library
The PyYAML library is a popular Python library developed to help you work with YAML documents within Python projects.
It provides methods necessary to read, create, and modify YAML files and work with data provided in YAML documents.
Installation
Yaml is not part of Python standard library, so you will need to install it before you can start using it.
The actual library name is PyYAML. Here are the brief steps of how to install PyYAML library:
-
Open a terminal or command prompt.
-
Type the following command and press Enter to ensure that your pip installation is up to date:
pip install --upgrade pip
-
Type the following command and press Enter to install PyYAML:
pip install pyyaml
Reading and Writing YAML in Python
The PyYAML library is a Python package that provides support for working with YAML data. It is a full-featured YAML parser and emitter, and can be used to read and write YAML files, as well as convert between YAML and Python data structures.
Here are some of the main features of the PyYAML library:
-
Full support for YAML 1.1: PyYAML fully implements the YAML 1.1 specification, including support for scalar types, collections, anchors, and aliases.
-
Simple and intuitive API: PyYAML provides a simple and intuitive API for working with YAML data, making it easy to read and write YAML files in Python.
-
Support for custom data types: PyYAML allows you to define and use custom data types in YAML documents, which can be useful for representing complex data structures.
-
Compatibility with Python data types: PyYAML can convert YAML data to Python data types and vice versa, making it easy to work with YAML data in Python code.
-
Secure and reliable: PyYAML includes security features to prevent malicious code execution when loading YAML data, and is regularly updated to fix any bugs or security issues.
PyYAML is widely used in the Python community for a variety of tasks, including reading and writing configuration files, data serialization, and more. It is a powerful and flexible tool for working with YAML data, and is an essential library for any Python developer who needs to work with YAML.
PyYAML submethods
The following are the submethods of PyYAML:
-
safe_load(stream)
: safely loads a YAML document from a string or a file-like object. This method ensures that only standard YAML constructs are used in the input data. -
safe_dump(data, stream=None)
: safely serializes a Python object to a YAML document, optionally writing the document to a file-like object. -
load(stream)
: loads a YAML document from a string or a file-like object. This method allows for arbitrary Python objects in the input data. -
dump(data, stream=None)
: serializes a Python object to a YAML document, optionally writing the document to a file-like object. This method also allows for arbitrary Python objects. -
add_constructor(tag, constructor)
: adds a constructor for a given YAML tag. When PyYAML encounters the tag in a YAML document, it will call the constructor with the node representing the value associated with the tag. -
add_representer(type, representer)
: adds a representer for a given Python type. When PyYAML is serializing a Python object to a YAML document, it will call the representer with the object to get a YAML representation. -
compose_all(stream)
: returns a list of Python objects that correspond to the YAML documents in the input stream. -
compose(stream)
: returns a single Python object that corresponds to the first YAML document in the input stream. -
emit(events)
: takes a list of events generated by the parser and returns a string representing a YAML document. -
parse(stream)
: returns a generator that yields parsing events from the input stream.
❗Although we won't cover all of them in this blog post, we'll focus on the ones that are most commonly used.
Python Data Type - Dictionary
The Python data type that is most commonly used when serializing and deserializing YAML data is the dictionary. This is because YAML syntax is designed to be a superset of JSON syntax, which is based on key-value pairs, and dictionaries are the most natural way to represent key-value pairs in Python.
When PyYAML reads a YAML file, it converts the YAML data into a Python data structure. If the YAML file contains key-value pairs, PyYAML will create a Python dictionary to represent that data. For example, the following YAML data:
Create the following file and store it in intro.yaml:
- name: 'Mandalorian'
description: 'The Mandalorian homeworld'
names:
- 'Cara Dune'
- 'Greef Karga'
- 'Din Djarin'
planets:
- 'Nevarro'
- 'Mandalore'
- 'Sorgan'
- 'Trask'
create a python file like the following:
intro.py
"""
devoriales.com, 2023
Path: yaml/intro.py
description: yaml
Learn basic yaml structure and syntax
"""
import yaml
with open("data/intro.yaml") as f:
# Deseralise the YAML data to a Python dictionary
data = yaml.safe_load(f)
print(data)
Output:
- name: 'Mandalorian'
description: 'The Mandalorian homeworld'
names:
- 'Cara Dune'
- 'Greef Karga'
- 'Din Djarin'
planets:
- 'Nevarro'
- 'Mandalore'
- 'Sorgan'
- 'Trask'
❗This is the most importan introduction to the structure in Yaml file.
The YAML file represents a small dataset of information about the Mandalorian show.
When we deseralize the yaml data, we see the following output:
[{'name': 'Mandalorian', 'description': 'The Mandalorian homeworld', 'names': ['Cara Dune', 'Greef Karga', 'Din Djarin'], 'planets': ['Nevarro', 'Mandalore', 'Sorgan', 'Trask']}]
Using the dash symbol '-' in YAML indicates the creation of a list. It's important to note that the dash symbol is not linked to the 'name' key but is rather used to denote a list.
The file contains a dictionary with four key-value pairs.
The second key is "description" and its value is a string "The Mandalorian homeworld"
The third key is "names", which has a list of strings as its value.
The fourth key is "planets", which has a list of strings as its value.
Multiline String Values
In Yaml, it is pretty common to add a multiline string value to a key by starting the value with a vertical bar |
character. Here's an example of how you can add a multiline description to the 'Mandalorian' key:
- name: 'Mandalorian'
description: |
The Mandalorian homeworld is a harsh and unforgiving place, where the strong survive and the weak perish. It is a world of warriors, where honor and loyalty are valued above all else. The Mandalorians are a proud people, with a rich culture and a long history of conflict and conquest.
names:
- 'Cara Dune'
- 'Greef Karga'
- 'Din Djarin'
planets:
- 'Nevarro'
- 'Mandalore'
- 'Sorgan'
- 'Trask'
Now when we run the same python code, we get the following output:
[{'name': 'Mandalorian', 'description': 'The Mandalorian homeworld is a harsh and unforgiving place, where the strong survive and the weak perish. It is a world of warriors, where honor and loyalty are valued above all else. The Mandalorians are a proud people, with a rich culture and a long history of conflict and conquest.\n', 'names': ['Cara Dune', 'Greef Karga', 'Din Djarin'], 'planets': ['Nevarro', 'Mandalore', 'Sorgan', 'Trask']}]
Create YAML with Python
In this section we will constuct the same
Serialization is the process of converting data structures or objects into a format that can be stored or transmitted, such as JSON, YAML, or XML. When you use the yaml.dump()
method in Python, you are serializing a Python object into YAML format, which can then be stored in a file or sent over a network.
In the following example, we will create a Python dictionary with the same data as we used before in our Mandalorian Yaml file:
serialize.py
'''
devoriales.com, 2023
Path: yaml/serialize.py
create a yaml file with dict data
'''
import yaml
data = {
'name': 'Mandalorian',
'description': 'The Mandalorian homeworld is a harsh and unforgiving place, where the strong survive and the weak perish. It is a world of warriors, where honor and loyalty are valued above all else. The Mandalorians are a proud people, with a rich culture and a long history of conflict and conquest.',
'names': ['Cara Dune', 'Greef Karga', 'Din Djarin'],
'planets': ['Nevarro', 'Mandalore', 'Sorgan', 'Trask']
}
with the following code we can serialize the data into a Yaml:
# serialize the data to yaml
print(yaml.dump(data, indent=4, sort_keys=False))
Output:
name: Mandalorian
description: The Mandalorian homeworld is a harsh and unforgiving place, where the
strong survive and the weak perish. It is a world of warriors, where honor and
loyalty are valued above all else. The Mandalorians are a proud people, with a
rich culture and a long history of conflict and conquest.
names:
- Cara Dune
- Greef Karga
- Din Djarin
planets:
- Nevarro
- Mandalore
- Sorgan
- Trask
Anchors
In YAML, anchors are used to define a reference point for a certain piece of data that can be reused elsewhere in the document. An anchor is defined using the ampersand symbol (&), followed by a name for the anchor. The data to be anchored is then defined using the normal YAML syntax. For example:
❓Try to guess which characters each name represents 🙂
Let's have a look at an example:
#example yaml data
- &id001
name: &name001 Cara Dune
role: soldier
- &id002
name: &name002 Greef Karga
role: leader
- &id003
name: &name003 Din Djarin
role: bounty-hunter
- names:
- *name001
- *name002
- *name003
- The dash
-
at the beginning of each line indicates a list item. - The ampersand
&
is used to define an anchor, which allows us to reference a value later in the document. - The colon
:
separates keys and values. - The asterisk
*
is used to indicate a reference to an anchor defined earlier in the document.
The first three list items define some characters from the show, each with a unique identifier (id) and a name and role. The &id001
, &id002
, and &id003
anchors allow us to reference these list items later in the document.
The last list item defines a list of names, which uses the *name001
, *name002
, and *name003
references to refer to the names of the characters we defined earlier.
Overall, this YAML document is a simple example of how to define and reference values using anchors and references. It allows us to define data in a structured way that is easy for both humans and machines to read and understand.
Advanced Tags
Basic YAML includes some data types like strings, numbers, and arrays. However, sometimes you need to represent more complex data like dates or regular expressions. That's where advanced tags come in.
Advanced tags are a way to extend the basic data types of YAML by defining custom tags for representing more complex data structures. A tag is like a label that tells the YAML parser how to interpret the data in a particular part of the YAML document. A scalar node is a single value within a YAML document, and it can have a tag to indicate what kind of value it represents.
For example, you might want to represent a date and time value in your YAML document. With advanced tags, you can define a custom tag to represent that data type. The tag URI is a way of defining the custom tag using a domain name and a name for the tag. For example, you might define a tag URI like !example/date-time
to represent a date and time value.
Once you've defined the tag URI, you can use it in your YAML document by adding the tag to the scalar node that represents the value. For example, you might have a YAML document like this:
created: !example/date-time 2023-02-26T10:00:00Z
In this example, the created
field represents a date and time value, and it has the !example/date-time
tag to indicate the data type.
Overall, advanced tags are a powerful feature of YAML that allow you to represent more complex data in a clear and concise way. While they can seem a bit intimidating at first, they're actually quite easy to use once you get the hang of them!
Example - Advanced Tags in Kubernetes
The following is our precious configmap with the same data as we have seen before.
We are also specifying the creation timestamp for this object using the creationTimestamp
field. The value for this field is specified using a YAML tag
!<tag:yaml.org,2002:timestamp>
, which tells the parser to treat the value as a timestamp.
The timestamp is in ISO 8601 format and represents a date and time in UTC time zone.
By setting the creation timestamp of the ConfigMap, we can track when it was created, which can be useful for troubleshooting and auditing purposes.
❗The creationTimestamp
field in a Kubernetes resource metadata is usually automatically populated by the Kubernetes API server when a resource is created. This is done by adding the timestamp to the resource's metadata before it is persisted to the Kubernetes datastore. In YAML, you can specify the creationTimestamp
field yourself, but it will be overwritten by the Kubernetes API server when the resource is created or updated. It is generally not necessary or recommended to set this field manually, unless you have a specific use case for doing so.
apiVersion: v1
kind: ConfigMap
metadata:
name: my-precious-config-map
creationTimestamp: !<tag:yaml.org,2002:timestamp> '2023-02-25T23:05:00Z'
data:
# This YAML file defines a person's information
name: Mary Joe
age: 35
address:
street: 123 Main St.
city: Anytown
state: CA
zip: 12345
phone:
- type: home
number: 555-555
- type: work
number: 444-444
You may wonder why you should writing the date using the !<timestamp> tag instead of just writing a date as a string?
Here are some reason why you should do this:
- Timezone: With
!<timestamp>
we can specify the timezone as well. In this case, when the system recieves the configuration specified in the ConfigMap, it will interpret the timezone correctly. - Precision: The
!<timestamp>
tag can represent a timestamp with up to nanosecond precision. On the other hand, a string representation of a date can only represent it up to second precision. -
Parsing: it will be natively interpreted by our YAML parser if we want to read the data instead of using our own arbitrary string based format.
-
Human-readability: While the string representation of a date may be familiar to humans, the
!<timestamp>
tag provides a standardized format that is more machine-readable and easier to work with programmatically.
In short, using the !<timestamp>
tag gives you more options and flexibility when working with date and time values in YAML. It's especially useful for more advanced use cases.
Ok now we will see this in action why this is so good.
Use case:
Suppose you have a YAML configuration file for a web application and it contains a field named "creationTimestamp
" that specifies when the application was first created. You want to load this file using Python and extract the creation timestamp value, which is specified using advanced tagging. Then you want to parse this timestamp into a Python datetime object and display it in 12-hour format.
This script reads a YAML file named my-configmap.yaml
that represents a Kubernetes ConfigMap. The script uses advanced tagging in the YAML file to specify the creation timestamp of the ConfigMap. The timestamp is read and converted to a Python datetime object using the fromisoformat
method. Finally, the timestamp is printed in 12-hour format with AM or PM.
'''
devoriales.com, 2023
Path: yaml/adv_tagging.py
description: yaml
this script will read a yaml file and work with data specified with advanced tagging
'''
import yaml
import datetime
with open('data/my-configmap.yaml') as f:
data = yaml.safe_load(f) # safe_load is a wrapper for load that catches common exceptions
timestamp = str(data['metadata']['creationTimestamp'])
# in am pm format
dt = datetime.datetime.fromisoformat(timestamp)
print(dt)
# change the timestamp to 12 hour format with am pm
dt = dt.strftime('%m/%d/%Y %I:%M:%S %p')
print(dt)
We then extract the "creationTimestamp" value from the data object and convert it to a string using str
. This is necessary because the value is initially loaded as a YAML node with a timestamp tag.
Next, we parse the timestamp string into a Python datetime object using datetime.datetime.fromisoformat
. This function can parse ISO-formatted datetime strings, which include a "T" separator between the date and time components, and a "Z" suffix to indicate that the timestamp is in UTC.
Finally, we display the parsed datetime object in 12-hour format using the strftime
method with the format string '%Y-%m-%d %I:%M:%S %p'
. The %I
specifier outputs the hour in 12-hour format with leading zeros (e.g., "01" for 1:00 AM), and the %p
specifier outputs the AM or PM suffix.
Cool, isn't it?
Custom Tags
In this section, we will continue on the same concept using advanced tagging to define some custom tags.
We will continue using Kubernetes examples.
Use case:
Suppose you have a YAML file named configmap.yaml
that defines a Kubernetes ConfigMap with an IP address as a custom tag. The YAML file looks like this:
apiVersion: v1
kind: ConfigMap
metadata:
name: ip-config-map
data:
ip_address: !iptags/ip 192.168.1.100
As before, we can use the following Python script to read the IP address from the YAML file and print it to the console. We won't be able to read the custom tag if we don't define !iptags/ip
. The tag represents an IP address, and is used in a ConfigMap YAML file to set the value of an IP address:
'''
devoriales.com, 2023
Path: yaml/mod_ip.py
description: custom tag for yaml
this script will read and set ip address in a yaml file
'''
import yaml
class IpLoader(yaml.SafeLoader):
pass
def ip_constructor(loader, node):
value = loader.construct_scalar(node)
return str(value)
IpLoader.add_constructor('!iptags/ip', ip_constructor)
with open('data/configmap_ip.yaml') as f:
data = yaml.load(f, Loader=IpLoader)
ip_address = data['data']['ip_address']
# print ip address
print(ip_address)
The IpLoader
class is a custom YAML loader derived from the yaml.SafeLoader
class, which is used to safely load YAML data into Python.
The ip_constructor
function is a callback function that is called by the loader when the tag !iptags/ip
is encountered in the YAML file. This function takes the value of the tag as input, which is the IP address in this case. It then converts the IP address to a string, and returns the string.
Finally, the add_constructor
method is used to associate the tag !iptags/ip
with the ip_constructor
function. This ensures that when the loader encounters the tag !iptags/ip
, it calls the ip_constructor
function to process the corresponding value.
Output:
192.168.1.100
Since we have associated our custom tag !iptags/ip
via the ip_constructor
function, we can also pretty easily work with yaml data and change the ip address:
'''
devoriales.com, 2023
Path: yaml/mod_ip.py
description: custom tag for yaml
this script will read and set ip address in a yaml file
'''
import yaml
class IpLoader(yaml.SafeLoader):
pass
def ip_constructor(loader, node):
value = loader.construct_scalar(node)
return str(value)
IpLoader.add_constructor('!iptags/ip', ip_constructor)
with open('data/configmap_ip.yaml') as f:
data = yaml.load(f, Loader=IpLoader)
ip_address = data['data']['ip_address']
# change ip address
data['data']['ip_address'] = '192.168.1.23'
print(yaml.dump(data, indent=4, sort_keys=True))
Output:
apiVersion: v1
data:
ip_address: 192.168.1.23
kind: ConfigMap
metadata:
name: ip-config-map
Summary
In this blog post, we've explored YAML and various topic of it. Here's a recap of the key topics previously covered:
- Why You Should Learn YAML: We discussed the reasons that make YAML an invaluable skill for modern software development and DevOps.
- Introduction to YAML: We provided a brief overview of YAML and its origins.
- YAML as a Superset of JSON: We highlighted the relationship between YAML and JSON, showcasing YAML's extended capabilities.
- YAML Essentials: We delved into YAML's fundamental concepts, including syntax, nodes, scalar nodes, sequence nodes, and mapping nodes.
- PyYAML Library: We introduced the PyYAML library and explored its features, such as installation, reading and writing YAML in Python, submethods, handling Python dictionaries, multiline string values, creating YAML with Python, using anchors, advanced tags, and custom tags.
With the knowledge you've gained from this blog post, you're now well-equipped to harness the full potential of YAML in your projects, be it in software development, DevOps, or any other field that requires efficient and human-readable data serialization.
Happy coding!
About the Author
Aleksandro Matejic, a Cloud Architect, began working in the IT industry over 21 years ago as a technical specialist, right after his studies. Since then, he has worked in various companies and industries in various system engineer and IT architect roles. He currently works on designing Cloud solutions, Kubernetes, and other DevOps technologies.
In his spare time, Aleksandro works on different development projects such as developing devoriales.com, a blog and learning platform launching in 2022/2023. In addition, he likes to read and write technical articles about software development and DevOps methods and tools.
You can contact Aleksandro by visiting his LinkedIn Profile