In an earlier post we discussed processing the output of the find command, e.g. a list of paths. Since a path may contain spaces using just whitespace as separator between the list entires wont always work.

A common solution is the one given here. The bash variable IFS (“Internal Field Separator”) is set to newline and then set back to the original value in order to avoid trouble that may arise from changing this important parameter.

#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for f in *
do
  echo "$f"
done
IFS=$SAVEIFS

Why is the output of echo is used for setting the IFS to newline instead of a plain "\n"? I do not know but I am afraid of the reasons there might be for this. Anyway, setting and setting back that variable is somehow crooked. What if the command within the for-body relied on the preset value of IFS?

Dealing with containers such as lists or arrays you may want to code a condition like ‘does the array A contains an element (equal to) e’.
How to do this in bash? There are vivid discussions about this issue on relevant sites. As stated by tokland at superuser.com:

It’s an old question.

A frequently proposed solution is this.

A=(4 12 35 43)
e=12
if [[ ${A[@]} =~ $e ]] ; then
    echo "e=$e is in A"
fi

It works fine under certain conditions.

e=5 is in A

If you actually want to rely on the outcome you should iterate over the array, see check-if-an-array-contains-a-value. This comes quite handy and you can even save some lines by using ;.

rush is an ruby-shell designed for command-line tasks such as file and process handling.

In lieu of bash command sequence joined by pipes, you subsequently apply methods or refere to members on the object yielded before. The following example start with a filtered list of the files in the directory some_dir.

some_dir['**/*.rb'].search(/^\s*class/).lines.size 

This has several advantages:

  • Presuming you are in ruby you can fully use your ruby tools-set.
  • You can do one-liners subsequently processing arbitrary objects.
  • For more complex cases you have a descent programming language at hand.

There are downsides:

  • In order to call other system commands you have use bash-wrapper command.
  • The project does seem to be maintained at the moment and there is very little documentation. The above example, though taken from the projects home page, does not seem to work.

It is a good approach to start with reasonable programming language and then make some alternations in order to get a useful shell. Maybe ruby is not the very best choice as base language because it has a bit particular syntax that might be unnecessarily confusing to non-expert users.

The return statement is for passing a return code, e.g. number between 0 and 255. If you want to output a string you have to do a echo within the function body. There are two issues with that.

First, you cannot distinguish between the output the function returns and some information you want to print for instance for debugging while the body is executed.

Assume a function list_certain_pdfs that computes a list of all pdf-files that contain a certain string and can be found below a base-directory passed as input.
For each directory below the base-dir that contain at least one fitting pdf the function should also print the number of found pdfs to stdout while running.

A=$(list_certain_pdfs /var/run/bkm/)
/var/run/bkm/tmp/00/: 3
/var/run/bkm/tmp/08/: 120
/var/run/bkm/tmp/ext/: 1

This should not be difficult to do, but with bash it is.

Second, it is not clear how pass an array as output of a function. As we already know, structured data types do not belong to bashs assets. One reason arrays are rarely used might be that you cannot simply return them from a functions body.

A way to deal with the situation is pointed out here. It is a valid solution but it is also a kludge.

As a shell-pro you do not need support wheels. That is why you cannot give names to input arguments in bash. If you are soft you may give names to arguments inside the functions body.

logarithm () {
	X=$1 
	BASE=${2:-"10"}
	RESULT=$(echo "l($X)/l($BASE)" | bc -l )
 	echo $RESULT
}

The defined function logarithm outputs the logarithm of the first input with respect to the base given as second input. The latter is optional: its value is assigned to the variable BASE if the second input ($2) is provided, otherwise the default value 10 is used.

Many peoble tend to use the same fitting variable names inside and outside the function. A common way to solve resulting name conflicts is that, inside the functions body, you have to explicitly state it if want to refer to a variable that lives outside. Not so with bash.

BASE=2
A=$(logarithm 1000)
B=$(logarithm 8 $BASE)

We expect both the values of A and B to be 3. But the value of B is something else. The first call of logarithm changed the value of BASE. For protecting the outside BASE from being set within a function call we have to declare the variable BASE in the functions body as local. And you better do so if your function happens to use variable names like USER or PATH.

A crucial trick in writing scripts is to identify well-rounded subtasks and to enclose the needed commands into a function with a speaking name.

If you aim to use functions you need to know

  • syntax of function definition
  • ways to pass inputs
  • how to return outputs

We have a look at bashs function syntax. Which is a valid first line for a function definition?

foo ( ) {
foo( ) {
foo ( ){
foo(){

Surprise: here, bash is as benign as most scripting languages, all four are valid. Apparently, the parser looks for the round brackets first and thereby can detect function definitions. So why cant it do similarly in case of assignments by looking for “=”?

Another direction is ipython, http://ipython.org, a pimped python shell. python does not use polish notation nor knows the system commands like find or touch. Within ipython, however, you can change that by calling the magic functions %autocall and %rehashx.

In [1]: %autocall
In [2]: type 2
Out[2]: int
In [3]: %rehashx
In [4]: find /var -maxdepth 1 -name log
/var/log
In [5]: l_dirs = !find /var -maxdepth 1 -path "*/l*"
In [6]: l_dirs
Out[6]: ['/var/log', '/var/lib', '/var/local', '/var/lock']

The autocall-mode is possible because the python syntax is ‘non-polish’ in sense that any valid statement of the form a b can also be written as a(b) (besides compound statement that control the execution).

Unfortunately, the autocall mode does not integrate bash completion. When it finds out that the command is going to be a system command it could give the completion list as provided by bash. Instead it comes up with a proper completion list which is rarely usefull.

How would a better shell look like?

There are aspects of shells that are important but have a somehow exterior character, for instance good completion support, fancy prompts, or aliases. We dont consider those things here but aim to scetch the core syntax and features of a useful shell-and-script language that fits better to non-expert users than bash does.

A simple approach is to combine bash/sh features with a more benign language such as . Such a hypothetical language, “mesh”, has a tcl-like syntax but knows concept as sh-environments, anonymous and named pipes. Here are example mesh snippets:

 
ls A/ | grep _old 
procedure cleanUp { targetDir } {
	set now      [ clock seconds ]
	set maxAge   3600  
	set all     [ glob -nocomplain "targetDir/*" ]
	set trashed [ list ]
	set kept    [ list ]
	foreach {path access} [ stat -c "%n %X" $all ] { 	   	
		if { $now - $access  >  $maxAge } { 
			lappend trashed $path
		} else {
			lappend kept $path
		}
	}

	set trash    "$targetDir/old"
	mkdir $trash
	mv {*}$trashed $trash	

	return $kept
 }

Tcl is a proximate meshing-partner for bash because it also uses polish notation. However, tcl has its down-sides too. Arguments are passed by value, for object manipulation you therefore have to trick around by passing the variable name. Technically, there is only one (returnable) data type, strings, and you cannot distinguish between a word and a list containing a word, or between a list with even number of entries and a dict/mapping.

Still, the syntax of tcl solves the conflict between convenience of polish notation for one-liners versus the fit for structured scripts much better that bash does. Bash, on the other hand, better integrates into the system by direct access to executables and pipes/redirection.

An attractive though vague solution would be to combine bash/sh-feature and tcl-syntax in the way depicted above along with basics container types as used by lua and object piping as in PowerShell. The most difficult part probably is to make up syntax that supports structured types but is still as simple as the tcl-syntax is now.

Assume we want to move specific files from folder S to folder D, namely those that do not contain the string jamero and are older than one hour.

find S -type f -mmin +60 | xargs -i grep -v -l jamero | xargs -i mv '{}' D

There are different opinions on the aesthetic value of this solution.  However, it is undeniably inaccessible for many people that have demand to such helper but have little bash-experience. As beginners, some knowledge of the find command presumed, we would rather try something like this:

OLD_FILES=$(find S -type f -mmin +60)

for FILE in $OLD_FILES; do
    if [[ $(grep jamero $FILE) == "" ]]; then
        mv $FILE D
    fi
done

Unfortunately, we now cannot cope with whitespaces in paths. We want OLD_FILES to be a list the entries of wich are paths. But it is a string and the for-structure splits it by whitespace. One way to deal with it is to explicitly state that splitting should be done at newlines.

http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html

Here, this does not help, because $(..) replace the newlines by spaces. Although the task is clear and simple, we get lost in tinkering. Note that the first solution might be shiny but it misses empty files.