Thursday, January 7, 2016

Demystifying Regex Redux with 9 simple terms

Can I have your number?
When I wrote demystifying regex with 7 simple terms a while ago, I left out a couple really useful regex terms. So I guess this would have to be re-written as 9 Simple Terms.

Regex is one of those things that when you need it, you need it. But it is from the 80's and cryptic. Most regex expressions you see are too complex, and hard to follow. In this post, I'll show you a couple more terms to help you keep it simple and to a minimum, while allowing you to tap the power of regex.

expect-lite has very good support for regex meta characters (the ones that start with a backslash "\"). As a quick review of the 7 terms, there are:
  • Repeats: * and +
  • Meta characters: \d, \w, \n, \t
  • Or: |
But there are a couple of regex meta characters which I have found useful in addition to the 7 above, when skipping over some columnar info to get that column you want to validate (or capture into a variable).
\s is whitespace (space, tab, or newline)
\S is not whitespace

Working with the example from demystifying regex with 7 simple terms:
$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 eth0
0.0.0.0         10.1.1.1        0.0.0.0         UG    100    0        0 eth0


You could match the first IP address:
>route -n
<\d+.\d+.\d+.\d+ 


But what if you wanted to match the default route metric (on the last line) , rather than the first IP address, you could use (I have highlighted the value we are searching for in bold):
#validate metric
>route -n
<0.0.0.0\s+\d+.\d+.\d+.\d+\s+0.0.0.0\s+UG\s+100\s+\d\s+\d\s+eth0

This introduces a new meta character, \s which is a space (or any white space). But it starts to look complex. Keeping it simple, use the non-space \S, and back tracking from a known point (in this example 'eth0') will simplify the regex some:
#validate metric
>route -n
<100\s+\S+\s+\S+\s+eth0

It is better, but could be simpler by keying off of the flags column.
#validate metric of default route
>route -n
<UG\s+100.+eth0

This uses the space meta-character, \s, and uses another meta-character mentioned in the fine print of the original post, the dot, or '.' which matches any character. It is good to use the dot sparingly, as it can often match more than you would expect. But in this example, because there is only one default route (it is after all, only IPv4*), and it is always on the last line, it is pretty safe to use the dot.

To see just what expect-lite did match, use the *EXP_INFO directive on the CLI or earlier in your script.

Regex Guidelines

It is a good idea to keep regex as simple as possible. As we saw above, it is easy to create complex regex, but that leads to challenges in maintaining code later. Every time someone has to debug the script, they have to figure out what the regex is doing. Shorter, simpler regex will always win out.

Regex has the concept of anchors (^,$), but I haven't included them the 9 simple terms because of a couple of reasons:

  • Anchors don't work as you would expect in expect-lite. One would expect that you could use an anchor at the beginning of a line, but expect-lite doesn't evaluate output on a line by line basis, but rather a blob of text which includes new-lines. Therefore, if you need to "anchor" your regex, do something like '\n169.254.0.0' Regexs with anchors tend to not be simple or short
  • I have seen regexs where the entire line is described, from beginning of the line to the end of the line, with anchors at each end. This almost always makes a very complex and brittle regex. A change in the column width, can break these kinds of regexes. Rather, it is much easier, and less code intensive to do a sparse validation of output using simple regexes (as shown in the example above).


Not everyone is a regex expert. Plan on helping the next person who looks at your script by writing a comment about what the regex is doing. And if you are lucky enough to be the next person to look at your code, then you will be thankful that you wrote your future-self a note.

Recap the 9 terms

To recap, and give you a single place to look for a reference, the 9 terms are the single character meta-characters:

  • \d  is a number
  • \w  is a letter
  • \n  is a new line (think of it as a carriage return)
  • \t  is a tab
  • \s is white space (including \t and \n)
  • \S is a non-space (any letter, number, symbol)
  • . is any character (use this sparingly**)


And the repeat characters which are modifiers to the terms above:

  • *  repeats 0 or more times
  • +  repeats 1 or more times


And the regex OR term, |

The power of 9

You can still use only the original 7 regex terms and accomplish 90% of what you need. The additional 2 meta-characters just give you a bit more control over matching. And for those of us with a finite memory, it is still fewer than the fingers on two hands.


* IPv6 can often have multiple default routes, and the metric becomes very important in determining which one is used.
** the regex dot is extra credit



Tuesday, September 15, 2015

Sleeping while you work

Sleeping: restful, relaxing, restorative. But sleeping in a computer script is pausing the script for a specified amount of time. As of version 4.9.0 a native sleep command, using a colon, has been added.
Counting the seconds

"Why wait until expect-lite has been around for 10 years before adding sleep?" you may ask. Because I have seen sleep abused in other scripting languages, usually a scripter will add a 60 to 600 second sleep rather than check for the event with a polling loop.

Polling with a sleep

But polling loops are an excellent example of where to use a sleep. It is usually unnecessary to check for a state change (ethernet interface up, for example) on a mili-second time basis. More likely if the interface comes up in a couple of seconds, that is good enough. The sleep will slow down the polling loop so the loop does not put an undo load on the machine.

# using a polling loop to check when eth0 is up
$intf=eth0
$int_state=none
[ $int_state != UP
    ip link show dev $intf
    +$int_state=(UP|DOWN)
    # sleep 2 seconds to slow down loop
    :2
]

Wait a sec :01

The colon ":" indicates a sleep. In the above example :2 is used to sleep (or pause) the script for 2 seconds. Sleep is always in seconds, but mili-seconds are also supported such as 5 mili-seconds:
    :0.005

The native sleep (using the colon) also gives indication that the script is sleeping. There is nothing more frustrating that debugging a script and wondering is it hung or is it sleeping. expect-lite will print dots to show the progress of the sleep. A 12 second sleep would output:
Sleeping: 12 
....+....10.. 

Each dot represents a second, with the plus every 5 seconds, and a number every 10 seconds. This output will go to stdout, and also be logged to a file with the *LOG command so that it can be observed later. The output can be disabled with the *NOINFO command if you want less clutter.

Transparency in sleeping, making scripting easier

A goal of expect-lite is transparency, showing you what it is doing, to help you debug script errors, or determine actual problems with the device you are testing. And now, you can sleep on it.


Sunday, January 11, 2015

Looking through the window, the expect-lite debugger

Looking through the debugger
In January 2015, expect-lite will have been making automation easier for 10 years. In celebration of that event, version 4.8.0, includes improvements to the debugger to make automation even easier.

expect-lite has had an integrated debugger,  called with *INTERACT, since version 4.0 (Oct 2010).The expect-lite debugger allows you to:
  • all the standard debugger stuff: step, skip, view variables
  • type commands directly to the device/host the script is connected to
  • execute arbitrary lines of expect-lite script (copy/paste uses this as well)
The debugger has the standard things you would look for in a debugger. Pressing escape h will print the debugger help:

IDE: Help
  Key          Action
  ----        ------
  <esc>s      Step
  <esc>k      sKip next step
  <esc>c      Continue
  <esc>v      show Vars
  <esc>e      show Env
  <esc>0to9   Show N next lines of script
  <esc>-1to-9 Show N previous lines of script
  ctrl+D      Quit & Exit expect-lite
  <esc>h      this Help

The debugger is like a window to the device being automated. Once in a breakpoint, the debugger silently steps aside, and allows you to type directly to the device. Perhaps you aren't getting the response you expected, or the device wasn't configured as you had expected. You can fix it while in the middle of your script.

The debugger silently watches what you type, and decides if the text is for the device, or is it an expect-lite command it must execute. How does it do this? Mostly pretty well.

Typing expect-lite commands in the debugger

You may have noticed that expect-lite commands start with a punctuation character like '>', '<', '?', '@', or ';'. The debugger watches for these characters at the beginning of a line. But there are some tweaks to prevent the debugger from accidentally grabbing a line that was intended for the device. These are limitations only when typing in the debugger, and are not required when executing the expect-lite script.
  • >send this    There must be no space between > and the text to be sent
  • ; comment   There must be a space between the semi-colon and the comment
  • ?if condition...   When using an IF statement in the debugger then the optional 'if' must be used, there are just too many question marks in normal text
  • $var=value  Variable assignments must have an equals sign (no spaces)
  • no white space before the expect-lite command (punctuation)
I wrote in an earlier blog entry (writing scripts with copy and paste), that using the debugger you can also copy/paste into a running script. When copy/pasting into the debugger, only the first line need follow the above limitations, the remaining lines can have leading white space, etc, since the debugger has decided that the entire paste is expect-lite script.

*SHOW ENV

The biggest improvement to the debugger in version 4.8.0,  is *SHOW ENV. Similar to *SHOW VARS, the expect-lite environment consists of a list of directives which are enabled/disabled, such as *INFO, and *TIMESTAMP, as well as ones with values like user defined prompt, and the infinite loop count (*INFINITELOOP). Lastly, *SHOW ENV will display any shell environment variables which begin with EL_.

$ DEBUG Info: Printing expect-lite directives/environment
Environment:          Value:
CURRENT_FORK_SESSION  default
DEBUG                 off
DVPROMPT              on
EOLS                  LF
EXP_INFO              on
FAIL_SCRIPT           fail_test.inc
INFINTE_LOOP_COUNT    5000
INFO                  on
LOG                   off
LOG_EXT               .log
NOFAIL                2
NOINCLUDE             off
NOINTERACT            off
REMOTE_SHELL          none
TIMESTAMP             off
USER_DEFINED_PROMPT   .*$ $
WARN                  on
fuzzy_range           10
timeout               2
EL_CONNECT_METHOD     ssh_key
EL_REMOTE_HOST        localhost

Of course, if you are in the debugger, you don't have to type *SHOW ENV, you can just type <esc>e.

Just as you can start a new shell, by typing 'bash' or 'csh' at the prompt. There is an included example called 'el_shell.elt' which you can use to create an expect-lite shell. It is a simple script which just drops you into the debugger. From there, you can type regular linux commands (or cygwin linux-like commands), expect-lite commands, or explore the debugger help with <esc>h.

Doughnut Scripts

I find I often create a doughnut script, a script with a hole in the middle. I use this to have a script set up my environment, then drop to the debugger, allowing me to plunk around, then when I am done, I exit the debugger (with '+++'), and the script cleans up. I call this automation assist. It doesn't do everything, but it allows me to explore/test faster than doing it all by hand.

The debugger command detection isn't perfect, but it works pretty well. Sometimes you may find a smudge or a dirty spot on the debugger window, but hopefully, it will be clear enough to see that expect-lite is automation for the rest of us.

Monday, December 22, 2014

A tale of two scripts redux: joining Python and expect-lite

Python wrapper
As I mentioned earlier, expect-lite interpreter is quite basic. The advantage of such simpleness is that it ignores anything it doesn't understand. In a tale of two scripts I explained how to join bash and expect-lite in the same script. But the technique can also be applied to other languages.

In this post, I'll show how expect-lite can be embedded in a Python script allowing you to get all the goodness of Python and expect-lite.

First we start with a simple Python script which just makes a call to subprocess (used to execute an external application), prints the output of the expect-lite script in real time and then checks the return code of the subprocess.

#!/usr/bin/env python
"""
Example script: embedding expect-lite in python
    22 December 2014 -- Craig Miller
"""

import subprocess, inspect, os

def embed():
    script_name = inspect.getfile(inspect.currentframe())
    process = subprocess.Popen('expect-lite ' + script_name, 
                                shell=True, stdout=subprocess.PIPE)
    # print stdout in real time
    line = None
    while line != '':
        line = process.stdout.readline()
        print "EL:", line.rstrip()
    out, err = process.communicate()
    if process.returncode > 0:
        print("========= ERROR:%s" % process.returncode)
    else:
        print("========= GOOD:%s" % process.returncode)

#############################
# beginning of python script

if __name__ == '__main__':
    try:
        embed()
    except KeyboardInterrupt:
        print "Detected ^C"
        os._exit(1)

If you are familiar with Python, then you know that indentation is more than just a a good idea, it is required by the language. The script above has the basic components of a Python script, but the important part is the function embed()

In embed(), it figures out the name of the script using inspect.getfile() and saves it in the variable script_name. Then subprocess.Popen() is called with expect-lite and the script_name, to recursively run the script again, this time using expect-lite as the interpreter. 

However, before we do that, we need to add the expect-lite part of the script. At the bottom of the Python script add:
if False:'''
#############################
# beginning of expect-lite code

*EXP_INFO
$count=3
; === test of EL
>echo "inside expect-lite! whoo-hoo"
<inside
@5
;purple === ping loopback
>ping6 -c $count ::1
<packets transmitted
>echo "Continue"
>
; === pau
'''

The trick is to protect the expect-lite lines from the Python interpreter. The Python interpreter is a bit smarter than the bash interpreter, which does not read the entire file before executing. So to make python happy, we must shield the expect-lite lines inside an if false statement. The triple quotes is a Python mechanism that allows anything to be entered, even expect-lite. Save the completed script as two_scripts.py

The expect-lite part of the script is a simple one, showing that we are actually inside the expect-lite script, and a simple ping6 of the IPv6 loopback address.

Because of the dual nature of the script, it is possible to run (this example) without any Python, by running the script directly from expect-lite.
expect-lite two_scripts.py

Or run it using Python:
python two_scripts.py

The expect-lite part of the script can be located anywhere in the python script, as long as if false is used to protect it. Of course, you can also have Python pass parameters to expect-lite using CLI constants, making the script even more flexible and useful. 

There you have it, two languages, one script, all the power of Python, all the simplicity of expect-lite, automation for the rest of us.

Happy Holidays!

Wednesday, November 19, 2014

Sawing Logs

Sawing Logs the way you like them
Recording what happened for later review is called logging. For years expect-lite lacked native logging, and relied on other programs like 'script' or 'tee' to record the output. Logging was added to make it easier to record just the parts you might want to keep for later.

expect-lite uses directives to control behavior of operation. Directives always start with an asterisk, and all CAPS, like *INFO. The *LOG, *NOLOG and *LOGAPPEND directives added native logging support in 2013. Like all directives, the log commands can be used on the command line when starting a script, or within the script to just log the desired portion.

*LOG

The *LOG will automatically create a log file with the <script_name>.log in the script directory. But *LOG can also take a path/filename parameter to log to a different directory.
$path=/tmp
; === get today's date
>date %F
+$today=\n(\d+-\d+-\d+)
*LOG $path/$arg0-$today.log
>do stuff


*NOLOG

And the *NOLOG stops the logging. For example, perhaps there is a while loop that is polling for some event, you don't really need to see that it polled 500 times in the log, you just want to see (in the log) what happens after the event.
...
*LOG $path/myfile-$today.log
>do stuff
*NOLOG
[ $state != $event
   ; poll for event
   >show event
   +$state=(enabled|disabled)
]
*LOGAPPEND $path/$arg0-$today.log
...

And *LOGAPPEND will append to an exiting log file, or create a new file, if there is no existing file.

What is logged

The *LOG log file will capture everything that is seen on the terminal screen, expect-lite messages such as *INFO, *WARN, commands typed in during an *INTERACT (a breakpoint), and all standard out, including colour. (hint use *NOCOLOUR to disable colour output).

Plays well with Instant Interact

Why not just use 'script' or 'tee' commands to record the script output. No reason, if that it working for you. However *LOG provides better control over what goes into the log file. Additionally, I had real problems using 'tee' and Instant Interact (creating a breakpoint on the fly) with Ctrl+\, as it would terminate the 'tee' command (and in turn the running script). Using native logging with *LOG allows you to use Ctrl+\ (control backslash) at any point during the script, giving you immediate access to the running script without having to preset a breakpoint.

So go ahead and saw those logs up anyway you like, because expect-lite is automation for the rest of us.

Note 1: the pre-defined variable $arg0 holds the script name
Note 2: for those who don't require 'U's in colour, *NOCOLOR will also work

Sunday, October 19, 2014

Capturing the State of Mind

State of Mind
There are things out there which have state. Think of your gas tank, could be full or empty. Maybe it is an interface that is up or down, or initializing. You may need to know the state before your expect-lite script can continue.

In this blog entry I'll cover a key tip of grabbing just the state of a device by using one of the simple 7 terms from regex, the OR.

For example, perhaps you want to test an ethernet interface, but it would be really useful if the interface such as wlan0 was UP before starting your test.

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 00:16:cb:b4:7c:52 brd ff:ff:ff:ff:ff:ff



Or the ifconfig command, however it doesn't show the interface as DOWN.

$ /sbin/ifconfig -a

wlan0     Link encap:Ethernet  HWaddr 00:16:cb:b4:7c:52 
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 

          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)


So using a dynamic variable we can capture the state of the interface, letting expect-lite to do all the hard work of parsing through the output. Since the 'ip' command is less typing, and newer, I"ll use it in my examples.

$interface
>ip link
<$interface
+$interface_state=state (UP|DOWN)


There are a few things occurring in this short script which are covered in older posts:
   1. $interface by assigning it to a variable, it is possible to use the power of constants to override the variable on the command line, making the script more flexible.
   2. <$interface is an expect line, which consumes the output of the 'ip link' command. Therefore making the dynamic variable capture on the next line much easier. The first state encountered in the remaining output of the 'ip link' command is the one we want.
   3. Lastly, and this is the key to this post,  (UP|DOWN) is using the regex OR '|' which means: capture either the word 'UP' or the word 'DOWN' and nothing else. The variable $interface_state is guaranteed to only be one of those states.

Now that it is easy to get the state of an interface how do we wait for the interface to change state? With a while loop, of course. Let's expand on the script we have above.

$interface
>ip link
<$interface
+$interface_state=state (UP|DOWN)
# while $interface_state is not equal to UP
[ $interface_state != UP
  !sleep 2
  >ip link
  <$interface

  +$interface_state=state (UP|DOWN)
]
; === The $interface is now $interface_state


Of course this could be improved, such as if the interface never comes 'UP', the script will spend a very long time in the while loop (eventually it will hit the infinite loop protection, and stop). Typically I add a counter variable, and check if the counter variable has exceeded a maximum number. But I'll leave that to you or another post.

When capturing states in expect-lite, it is important to put all the states in capture parens e.g. (UP|DOWN|INIT|RESET) by doing this, you are guaranteed to always capture the state.

Automating state is easy with expect-lite. State of mind... is a little harder.

Friday, May 30, 2014

Saving several slices of output for later with Pseudo Arrays

Pseudo Array of bikes
I was recently talking to a group of expect-lite users who wanted to know how to save several slices of the output of a command, and then be able to use those slices later on. A simple while loop and a pseudo array would be a good solution to this problem.

What is a pseudo array you ask? Well it acts like an array, but it isn't an array in the strictest sense. A real array would have the form $var(index), e.g. $myvar(5). 

But expect-lite doesn't support real arrays, the format for pseudo array is $var$index e.g. $myvar$i. In fact, when using pseudo arrays, new variables names are automatically being created, e.g. $myvar1, $myvar2 .. $myvar100, etc.

An example problem is to collect the block devices (like hard drives) in a pseudo array to be further examined later. In this example, I'll use the ls command to display the devices in /dev. The ones which are block devices start with a 'b'. An example output will look like:
$ ls -l /dev
crw------- 1 root wheel 1, 0 May 4 07:14 auditpipe
crw------- 1 root wheel 13, 0 May 4 07:15 autofs
crw------- 1 root wheel 18, 0 May 4 07:15 autofs_control
crw-rw-rw- 1 root wheel 17, 3 May 4 07:15 autofs_nowait
crw------- 1 root wheel 23, 0 May 27 08:11 bpf0
crw------- 1 root wheel 23, 1 May 27 08:11 bpf1
brw-r----- 1 root operator 14, 0 May 4 07:14 disk0
brw-r----- 1 root operator 14, 2 May 4 07:14 disk0s1
brw-r----- 1 root operator 14, 1 May 4 07:14 disk0s2
brw-r----- 1 root operator 14, 3 May 4 07:14 disk0s3
brw-r----- 1 root operator 14, 4 May 4 07:14 disk0s4
brw-r----- 1 root operator 14, 5 May 4 07:14 disk0s5
brw-r----- 1 root operator 14, 6 May 4 07:14 disk0s6
brw-r----- 1 root operator 14, 7 May 4 07:14 disk0s7
crw-rw-rw- 1 root wheel 4, 0 May 4 07:14 ttyp0
crw-rw-rw- 1 root wheel 4, 1 May 4 07:14 ttyp1
crw-rw-rw- 1 root wheel 4, 2 May 4 07:14 ttyp2
crw-rw-rw- 1 root wheel 4, 3 May 4 07:14 ttyp3
brw------- 1 root operator 1, 0 May 4 07:14 vn0
brw------- 1 root operator 1, 1 May 4 07:14 vn1
brw------- 1 root operator 1, 2 May 4 07:14 vn2
brw------- 1 root operator 1, 3 May 4 07:14 vn3
...

For this example, I'll use a while loop to iterate through the output, and increment the pseudo array index variable, capturing the block devices into a pseudo array of dynamic variables. If the dynamic variable can not find a block device, then the variable will be set to a special expect-lite value of __NO_STRING_CAPTURED__. The while loop will continue looping until there are no more block devices.

After capturing the device in pseudo array variable $dev$i, the script will consume the output (see consuming tables for lunch) to remove the top part of the output. That will leave the next block device to be captured near the top of the blob of text output for the next iteration of the while loop. I'll use some of the basic regex such as \n, \d, and \w (see demystify regex with 7 simple terms) to ensure the line begins with 'b' and ends with the desired device name. Only the part in parens, (\w+), is captured into the dynamic variable (see expect-lite variables).

>ls -l /dev
# initialize index variable
$i=0
# initialize the first element in the pseudo array
$dev$i=none
# while loop testing the pseudo array value is captured
[ $dev$i != __NO_STRING_CAPTURED__
    +$i
    # capture the block device
    +$dev$i=\nb.*\d\d:\d\d (\w+)
    # expect the device to consume the output
    ? $dev$i != __NO_STRING_CAPTURED__ ? <$dev$i
]

# set max devices captured
$max=$i

Now that the block devices have been captured into a pseudo array, I'll explore them a bit more using the file command. Using another while loop to iterate through the pseudo array re-using the index.

# initialize index variable
$i=1
# while loop to check the devices in the pseudo array
[ $i < $max
    # show the pseudo array value - the device
    >file $dev$i
    # check that it is a block special device
    <block special
    +$i
]

Of course you can always look at the pseudo array (and all the expect-lite variables) in the IDE (debugger) by typing <esc>v or you can print it out right from the script using the directive *SHOW VARS. Depending on your system block devices, the output would look something like:
$ DEBUG Info: Printing all expect-lite variables
Var:arg0 Value:test_pseudo_array.txt
Var:dev0 Value:none
Var:dev1 Value:vn0
Var:dev2 Value:vn1
Var:dev3 Value:vn2
Var:dev4 Value:vn3
Var:dev5 Value:__NO_STRING_CAPTURED__
Var:i Value:5
Var:max Value:5

So, not only can you eat your output for lunch, but you can save the slices in a pseudo array for a midnight snack. expect-lite, serving up automation for the rest of us.