2010-06-21

Concatenating process outputs

It's very common and simple to pipe the output of a process to another process as its input. But how do you concatenate output from multiple processes together to another process? It's actually extremely simple: just put parenthesis around the sequences of commands to run, that is:

(command 1; command 2; ...; command n) | receiving process


As an example, to combine messages from two Google Gadget messagebundle files, we can define a bash function that takes three parameters as follows:
function mergemsg {
[[ -f $1 && -f $2 && -n $3 ]] && \
(head -2 $1 ; \
(sed -e '1,2d;$d' $1 ; sed -e '1,2d;$d' $2) | sort; \
tail -1 $1) > $3

}

  • [[ -f $1 && -f $2 && -n $3 ]] checks that the first two parameters are files, and the third is a non-empty string
  • && (short-circuit AND) will run the latter only if the former is true
  • head -2 $1 prints the first two lines of file $1, which is the XML declaration and the opening root tag
  • sed -e '1,2d;$d' $1 strips the first two lines and the last line of the message bundle file $1; similar for file $2
  • (sed ... $1 ; sed ... $2 ) | sort combines the messages from file $1 and $2 and sort them in scending order
  • tail -1 $1 prints the last line of file $1, which is the closing root tag
  • ( ... ) > $3 finally writes the combined output to file $3
We can then simply run:
mergemsg zh-TW_ALL.xml ALL_hk.xml zh-TW_NEWS.xml
mergemsg ar_ALL.xml ar_mx.xml ar_NEWS.xml
...