Fri, 18 Oct 2013

Using AWK to print column N and beyond

Sometimes you have a file that has a certain number of defined columns, and then some variable number of columns after that. Sadly, AWK doesn't have a nice way of specifying $N..., but this works well enough for me:

awk '{print substr( $0, index($0, <N>))}' filename

where you replace <N> with the column specifier (e.g. $11 or whatever).

Posted at: 15:08 | category: /computers/unix | Link

Thu, 14 Aug 2008

Logging with Time Stamps -or- Why doesn't this already exist?

From the "I can't believe this doesn't already exist department":

I have at work on occasion the need to pound the shit out of things. What can I say, I like breaking things. A side effect of this is the need to go back, after pounding the shit out of things, and trying to decypher what happened. To help with this, I really want my log to look something like this:

20080814-134403 stdout> standard output here 20080814-134404 stderr> standard error here

And, along with the proverbial pony, I really want to do this with a command like timelog , e.g. timelog 1,2 log.output my-command -flags -go -here.

After asking my co-worker Steve, who I tend to ask weird Unix questions of, if such a beast existed, we sat and thought for a moment of a few ways of doing this. Steve started doing some searching, and, since watching someone else search Google is painful even in the best of times, I ambled back to my office and started looking as well.

It starts with using something in bash called a Process Substitution. Whether or not this is a POSIXly sort of thing or not, I'm not sure and is something I should check up on. Anywho, process substitution looks like this: >( list of commands) . On systems with either named pipe support or support for using /dev/fd what this does is runs the list of commands in another shell and tying those commands' input to a named pipe, the name of which is returned to the command line. For example, let's say I ran the following command:

cat longfile > >( sort -o my-output)
two things would get ran, first, sort -o my-output, which would have it's stdin tied to a named pipe, called something like /tmp/sh-np-874474869, i.e. anything added to /tmp/sh-np-874474869 would be provided as input to the sort command. Second, the rest of the stuff would get ran, but looking like this:
cat longfile > /tmp/sh-np-874474869
. Basically, a very simple way in the shell of making a process run and attach it to a named pipe.

Tying this all together, I threw together a simple program to put a timestamp and prefix for every line of input, and throw it in a file:

#!/bin/sh
CAPTION=$1
OUTPUT=$2
DATE="date +%Y%m%d-%H%M%S-%N"

touch $OUTPUT

while read line
do
        echo "`$DATE` $CAPTION $line" >> $OUTPUT
done
Which I could then use like: ./big-program > >(./timelog "stdout>" stdout.out) 2> >(./timelog "stderr>" stderr.out)

This provides me with two files, stdout.out and stderr.out, which I could then cat together and sort, to get one long list, sorted by time, of the output of the program, with stdout and stderr interlaced as they appear. Sharp-eyed readers may notice I used the %N option to date, which gives you "nanoseconds" --- it's unlikely you'll actually get nanoseconds, but it's enough to get lines of stdout and stderr interleved like you'd expect them to. When I started, I had only one second granularity, and you can get a lot of output from both streams in one second, which will get muddled together with the sort. This solution isn't perfect, but I think it is good enough for the job at hand.

Again, I don't know why this program doesn't exist. It should --- I simply can't imagine this not being a common enough thing that someone's thrown together a nice program to just do this. I'm tempted to write one that will do this for you, but in any case, I at least learned a neat trick in bash

Posted at: 14:18 | category: /computers/unix | Link