Wednesday, September 9, 2009

Adding a license text to a set of files using sed

The problem: I needed to add (prepend) a license text to a set of source files.

I thought sed might be a good tool for this task. Having not done anything but trivial sed scripts in the past, I took a look in the web for advice.

I was faced with the following script (I already lost the original URL, I'm sorry):


1{h; r [file]
D;}
2{x; G; }



It needs some explaining. Sed works by maintaining two data buffers: the pattern space and the hold space. A sed execution cycle is performed for each input line. First the input line is placed in the pattern space. Then the commands are (conditionally) executed. Finally, the contents of the pattern space are printed to stdout (if not explicitly prevented). The hold space maintains its contents between two cycles. The pattern space on the other hand is cleared between two cycles.

The GNU sed manual explains the used commands as follows.

h - "Replace the contents of the hold space with the contents of the pattern space."

r [file] - "Queue the contents of filename to be read and inserted into the output stream at the end of the current cycle, or when the next input line is read. Note that if filename cannot be read, it is treated as if it were an empty file, without any error indication."

D - "Delete text in the pattern space up to the first newline. If any text is left, restart cycle with the resultant pattern space (without reading a new line of input), otherwise start a normal new cycle."

x - "Exchange the contents of the hold and pattern spaces."

G - "Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space. "


So the program is performing the job as follows. The first set of commands is executed only for the first line of the input. The first line is copied to the hold buffer. After that, the file [file] is queued to be printed to stderr in the end of the current cycle. Finally, the contents of the pattern space is deleted (so that the first line won't be printed just yet).

Now the file [file] (containing the license text) has been printed. Nothing else has been printed. Also, the first line has been saved to the hold space. Looks good. The second set of commands do the rest. These commands are executed for the second line of the input only. First the contents of the hold space and the pattern space are switched. Now the hold space contains the second line of the input and the pattern space contains the first line of the input. Now a newline and the contents of the hold space are appended to the pattern space. The pattern space now contains the first two lines of the input in correct order. Next, the cycle ends and the contents of the pattern space are printed.

No commands are executed for the remaining lines of the input. They are just printed by default. We have printed the license text, the first line of the input, the second line of the input and the rest of the input in correct order - effectively prepending the license text on the input files.

Now considering the license text is located at /tmp/license, the following code is exactly what is needed to do the job for each *.c and *.h files in the current directory.


sed -i '1{h; r /tmp/license
D;}
2{x; G; }' *.[ch]