Wednesday, October 17, 2012

Consuming Output, eating tables for lunch

There are times when a table is presented, and you the script writer is interested in only part of the data. The traditional method of parsing tables is to figure out the rows and columns, and assign them to variables, and check the variables for the right values. But this is a lot of work, and not in keeping with expect-lite's credo, keeping it easy.

Take the following example, which can be done on any linux machine (in windows use just the route print command):
$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 eth0
0.0.0.0         10.1.1.1        0.0.0.0         UG    100    0        0 eth0

Without turning this into a networking lesson, you can see there are rows and columns. You will note on the last row, second column is '10.1.1.1' which is the the default gateway for this machine.

Suppose you are interested in checking that the default gateway is correct, you could script the following:
>route -n
<10.1.1.1
And you would be done. You don't have to worry about which row or column the value of 10.1.1.1 is in, since expect-lite will search the entire 'route -n' output.

But suppose you wanted to capture the default gateway into a dynamic variable, and there was no guarantee that it would always be 10.1.1.1? Now things are a little more tricky, but not too much.
>route -n
<\n0.0.0.0
+$default_gateway=(\d+\.\d+\.\d+\.\d+)

Only took 3 lines, still didn't have to know about rows and columns. The second line, '<\n0.0.0.0' is using a very powerful feature of expect-lite, one I call consuming output.

When a command is sent in expect-lite, and text is returned (like the output of route -n), it is held as a big blob (a buffer) of text. With each '<' or '<<' command, it eats or consumes the blob of text when it finds a match. So in the above 3 line example, line #2 consumes all the text up to and including the 0.0.0.0 on the last line (of the route -n output). This means the first 4 lines of the table are gone, consumed. The very next IP address in the remaining text blob is '10.1.1.1', which is what is assigned to the dynamic variable $default_gateway.

If you want to see text blob consumption at work, run the following:
*DEBUG
>route -n
<0.0.0.0
<0.0.0.0
<0.0.0.0

The output will look a bit chatty, but you will see the text blob (of route -n output) being gobbled by each '0.0.0.0'. I won't copy and paste the whole output here, but to give you and idea, it will look similar to this (abbreviated):
>route -n
...
find<<0.0.0.0>>
  in<<route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.1.1.0        0.0.0.0>>

The first 4 and a half lines
(including the 'route -n' line) are consumed

find<<0.0.0.0>>
  in<<         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0>>

The next half and a line are consumed

find<<0.0.0.0>>
  in<<         255.255.0.0     U     1000   0        0 eth0
0.0.0.0>>

The last chunk is consumed, leaving the next thing in the text blob, the default gateway. Each time text is found in the text blob (0.0.0.0 in this example), the top of the text blob is consumed, and no longer available to be found (or expected).

Using consumption can be useful in using the not-expect feature as well. As it says in the documentation, how long does one wait for something to not appear? With expect-lite it is about 100ms, or about 1/10th of a second. But what if the unexpected item comes after 1/10th of a second? You can use consumption.

Suppose you never want to see the default gateway of 192.168.1.1. with the above example and using the not-expect feature, you could script the following:
>route -n
<\n0.0.0.0
-<192.168.1.1
It will only not-expect 192.168.1.1 after it consumes the text blob down to the last line of output.

Another problem with tables is finding the right value, when multiple instances are in the table, such as 0.0.0.0 in route -n output. I use consumption to bookend the value I want. For example, if I want to check that the second line (the one starting with 169.254.0.0) gateway was correct, I could use consumption to find something before and after the desired value, such as:
>route -n
<169.254.0.0
<0.0.0.0
<255.255.0.0

Without having to know which row or column these items are on, it is possible to expect 0.0.0.0 is the gateway for 169.254.0.0. Bookending and consuming output ensures that the correct 0.0.0.0 will be matched.

So go, and fear not ye tables with rows and columns, for expect-lite is here to slay and consume them, making automation quick and easy.

PS. you will remember from "Demystifying Regex" that \n is a new line.
PPS. one could create a "simpler" regex to capture the default gateway, but then you would have to know about character classes, such as:
+$default_gateway=([0-9.]+)
PPPS. Think of a shelf with books, old school but still can be found in the library. The things holding up the books at the end of the shelf are called bookends.