Archive for the ‘normalize’ Category
favorite regular expression
It was clear in one of my philosophy classes at UNH that I was doomed to be a computer nerd even though I was already in the major. It was a class in deterministic and finite autonoma. Actually it was more about Turing machines but the teacher made us write little machines expressed as autonoma without getting into the theory. I loved it. I ate it up.
So it’s little surprise that I love regular expressions. My favorite regular expression has to be this one (expressed using Perl syntax):
$line =~ s#^s*(.*S)?s*$#$1#;
[ Thanks to a reader for correcting this--I forgot the $1 in the replace portion of the regex. ]What does this do? Well, let’s step through it.
$line =~ s{
^s* # slurp up all the whitespace at the beginning of the line
(.*S)? # starting at non-whitespace character match everything up to a non-whitespace
s* # slurp up all the whitespace at the end of the line
$ # match end of line
}{
$1 # replace entire line with what you matched between the whitespace at beginning and end of line
}x;
key to all of this is the (.*S)? line. The trailing S insures that the match ends with a non-whitespace character. The previous line, ^s*, insures that the next line also starts with a non-whitespace so you are matching the first non-whitespace character to the last non-whitespace character. Neat! Note, that if the line just has whitespace, the (.*S)? is optional because of the ‘?’ and the regex still matches and just strips all the whitespace out of the line.
So this regex removes leading and trailing whitespace from a line of text. Very useful for normalizing input.
Comments (2)