Archive

Uncategorized

Nested mappings and lists do not exist in bash. That is not problem for bash-pros. Whatever silly web-API or knickknack structured configuration file comes along we pick the piece of information we need using sed or awk. After a series of corrections it will work fine for the cases seen.

Sometimes, however, we feel it would be convenient to have ready-made tool that can be used for reading yaml-strings, some code for which we just need to know what it is supposed to do and dont have dig into how it does it and whether it can be fixed for the current case.

Lets see whether we find some bash code for loading simple mappings, say in yaml format, into a bash-script.

User pkucziski at github has a solution. It even accesses nested mappings by means variable names consisting of the keys joined with an underscore. But we want to look at the flat case here.

#!/bin/sh
parse_yaml() {
   local prefix=$2
   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p"  $1 |
   awk -F$fs '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]}}
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
      }
   }'
}

The bash-audience is appreciative.

Works Great! Thanks

Awesome!

There are quiet some cases where it works. Though not for this one.

database: 'my_database'
username: 'root'

Nor for this one.

database: |
  my_database
username: root

But you dont bother to fix the above pipe, do you? Take your time, likewise many people before you.

Advertisements

How do we deal with json- and yaml-string in bash? Before we discuss ways to dump and load structured data we first want to get grip on it within bash.

Lets start with something simple: a nested mapping with two levels, e.g. a mapping containing mappings.

birds:
  tombit: 4
  canary: 0
fish:
  moray: 5

As we know bash support associative arrays, e.g. mappings. All we need to do is to do assign the inner mapping as value of the outer.

declare -A BIRDS=([tombit]=4 [canary]=0)
declare -A FISH=([moray]=5)

echo "BIRDS-keys: ${!BIRDS[@]}"
echo "canary-count: ${BIRDS[canary]}"

And now:

declare -A COUNTS=([birds]=${BIRDS} [fish]=${FISH})
echo "ROOT-keys: ${COUNTS[@]}"

Ok, this does not work. It turns out that there is no other syntax to assign or access an (associative) array as value of another array.

For workaround one may flatten the mapping out to a one-level mapping by joining the keys that describe the path to a leave into one key using some separator.

declare -A ROOT=(
  [birds,tombit]=4 
  [birds,canary]=0
  [fish,moray]=5
)
echo ${ROOT[birds,tombit]}

That not exactly what we mean by structured data and its not particularly practical. For instance you cannot replace an mapping at once unless you provide some functions for doing so. Also, you need to ensure the key-separator is not used within a key, or somehow define and handle its masking.

Having no real structured data in bash is quite relieving. There is unreliable code for parsing yaml in circulation for some other languages as the matter is more complex than it appears on first sight. Bash settles that in neat way: if you cannot hold and use structured data, there is no for loading and dumping it.

Json and Yaml have similar purpose. We make another detour and look what they have in common and what are the differences.

What sort of data can they be used for?

Basically json and yaml can represent the same sort of data, something we call here a data-tree. In terms of graphs a data-tree is a tree where the inner nodes are labeled either as mappings or as lists and the leave nodes are labeled as bools, numbers, or strings. The edges from the mapping nodes to theirs children are labeled with the key of the respective item whereas the child-edges of a list node have no labels but are ordered.

Sometimes the child-nodes of a mapping are regarded as ordered too. In the json/yaml-string they are ordered by their appearance, but this order is usually not transfered to the deserialized object.

Any data-tree of this sort can be written as json and yaml-string. And any json-string can be seen as the representation of such a data-tree.

What are syntactical differences?

Yaml syntax lets you add string without putting quote signs around unless they are needed for some reason. In json the quote signs are always mandatory. You can use them in yaml too meaning that any string node in json is also a string in yaml. This also applies to other node types, even the mapping and list nodes can be written with in-line format (yaml-speak) where indentation and outer line-breaks becomes irrelevant. That is, any valid json-string is valid for yaml too, and also interpreted the same way.

The following string is valid json and inline-formated yaml. In contrast to block-formated yaml the newlines and indentations irrelevant for the deserialization. Still, they are important for human reading.

{
  "lid": 3029, 
  "pids": {
    "21994": "/usr/lib/applets/netspeed"
  }, 
  "command": "/usr/lib/xorg/Xorg :2 -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt"
}

What are other differences?

Besides the whitespace-sensitive block-format-syntax yaml has other features unknown to json. Though those yaml-features are not often used.

For instance, in yaml you may use references to previously defined nodes. You can write arbitrary graph-structures in yaml if you like. But more importantly it is used to reflect identity-relation between (sub-)objects that often exist before serialization.

Also, you can add comments, what is convenient because otherwise you would have to provide documentation on the meaning of the keys in a separate document even though often only one instance for the described structure exist. These comments can be added anywhere in the yaml-string likewise comments are added to code and therefor elude a simple modeling in terms the above tree-model.

There are more feature of yaml to tell about but we break off here.

What does this all have to do with bash?

We will see. But we want to mention that today json is the prevalent syntax to serialize data for over-network communication between computational compounds. Whenever you are to use some remote service build in the last say ten years it is very likely that your are expected or at least allowed to communicate with the service using json. People dont talk much about json’s syntax because is such that anyone can understand and use it.

The sort of data it covers, the described data-trees, make a good trade-off between simplicity and expressiveness. More importantly data-trees seem to be somehow a natural model as it fits very well to the object-model of most programming language used today.

We often want to serialize stuctured data. Think of configuration files or of communication between computational units. For smaller amount of data the yaml has become a popular format. A simple nested mapping in yaml-syntax looks like this:

owner: adam
lid: 3002
command: malm --replace
pids:
  21994: tmux new
  30438: /usr/lib/xorg/Xorg :2 vt

This format is both machine and human readable. The trick is that yaml does not let the more difficult cases obstruct the syntax for simple cases as the above one where a plain syntax can do the job. For instance quotation marks around strings are only mandatory when the content of a string make it neccessary say because it looks like the beginning of the next item.

Another example of this lazy syntax is the handling of new-lines. New-lines are of course an important element for the representation of the structure of an oject. On the other hand you may want to have new-lines in the value of a string leave. Or, you may want to wrap long lines in the yaml-string without adding new-line to a payload string.

Here again, yaml can, to some extend, deal with both but does not demand knowledge on the particular syntax for that if you dont need it as in the example above.

The following example has a wrapping new-line in the value of @command@ to avoid a long line. In the deserialized object that new-line is, along woth other whitespace araound it, replaced by a single blank.

lid: 3029
command: /usr/lib/xorg/Xorg :2 -audit 0 -auth
  /var/lib/gdm/:0.Xauth -nolisten tcp vt
pids: 
  21994: /usr/lib/applets/netspeed

This is a hint that things quickly become more complicated for more particular demands. In the end yaml serves a lot of demands and alltogether is a rather complicated matter. But if come up with plain demands you will see yaml a as benign gear representing structured objects as redeable texts.

What about yaml with bash? Can we deserialize yaml-strings with bash and work with the outcome in our scripts? Is there a built-in or some other available functions that deal with the something complicated task of yaml-parsing?

By default bash does not complain if you call a variable to which no value has been assigned. Other languages instantly and permanently grump on any undefined variable. Bash lets you correct typos in var names when you want to do it.

For instance you can correct the following typo once you are really sure that was unintended

SUBIRTMP=autot_$TNO/tmp/
...
rm -rf $HOME/projects/$SUBDIRTMP 

provided that the script was not stored in-side $HOME/projects/.

This tolerant point of bash might be one reason why the rm command now has a --preserve-root switched on by default preventing you from accidentally calling rm -rf /.

If you do not make typos you are ready for bash. If you do, consider putting set -u in the beginning of your script. It lets you fall back to a coward mode where any undefined var is complained.

Word splitting somehow includes trimming, e.g. removing leading and trailing whitespace.

$ X="        
bim bam   bom    "
$ echo A${X}A
A bim bam bom A

This is not exactly trimming, though.

  • If there is whitespace it is not replaced by nothing but by a single space.
  • Whitespace in the middle of the string is replaced too.

How do we trim? The answer can be found on stackoverflow.com

For other languages people quickly go over it with some answer like trim(), chasing after whatever distracting real world problems instead. For bash we explore possible solutions at rest. Or, we would if they let us do so.

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Spam was supposedly not the problem.

http://mywiki.wooledge.org/BashPitfalls highlights the “broken legacy misfeature” word splitting.

Before a bash command is executed the arguments are processed. After most of other processing is done, any argument containing whitespace is split into separate arguments, unless the whitespace is somehow protected. This is the word splitting feature and it is one big source of trouble.

Users tend to believe that in the line

cmd $A

there is a command with one argument. But in truth the number of arguments cannot be seen unless you know how the value of A was defined.

This is somehow unnecessary. Tcl is similar to bash as is does not really has types, uses polish notation, and variable are referenced by putting a $ in front of the variable name. But when you refer a variable in tcl you always get one value. If the value is a list and if you want pass its entries as separate arguments you have to split it up explicitly.

All together tcl solves the distinctions of whitespace that separates items and whitespace that belongs to items pretty well. The tcl way of solving it cannot be added to bash as it uses curly brackets that are, as any other brackets, already loaded with semantics in bash. Nonetheless, its strange that the prevailing scripting language affords to make people stumble on this issue.