Archive for the ‘whitespace’ Category

favorite regular expression

It was clear in one of my philosophy classes at UNH that I was doomed to be a computer nerd even though I was already in the major. It was a class in deterministic and finite autonoma. Actually it was more about Turing machines but the teacher made us write little machines expressed as autonoma without getting into the theory. I loved it. I ate it up.

So it’s little surprise that I love regular expressions. My favorite regular expression has to be this one (expressed using Perl syntax):

$line =~ s#^s*(.*S)?s*$#$1#;

[ Thanks to a reader for correcting this--I forgot the $1 in the replace portion of the regex. ]What does this do? Well, let’s step through it.

$line =~ s{

^s*                  # slurp up all the whitespace at the beginning of the line
(.*S)? # starting at non-whitespace character match everything up to a non-whitespace
s*                      # slurp up all the whitespace at the end of the line
$                            # match end of line
}{

$1     #                 replace entire line with what you matched between the whitespace at beginning and end of line
}x;

key to all of this is the (.*S)? line. The trailing S insures that the match ends with a non-whitespace character. The previous line, ^s*, insures that the next line also starts with a non-whitespace so you are matching the first non-whitespace character to the last non-whitespace character. Neat! Note, that if the line just has whitespace, the (.*S)? is optional because of the ‘?’ and the regex still matches and just strips all the whitespace out of the line.

So this regex removes leading and trailing whitespace from a line of text. Very useful for normalizing input.