I absolutely loved Hsing-Hui Hsu’s talk about parsers at RubyConf 2015 last week. It’s a terrific talk and it’s well worth your time to watch it.
But I did have one little nit to pick. What we call regexes in Ruby (and many other programming languages) are much more powerful than the original definition. In addition to regular languages, Ruby regexes can express context-free languages, and even some context-sensitive languages.
For example, Ruby has no trouble with the “ab” language in the presentation:
#!/usr/bin/env ruby
%w(ab aabb aaaaabbbbb aaaaaa abb aab ababab).each do |s|
printf "%10s - %-8s\n", s,
/^(?<ab>a(\g<ab>)?b)$/.match(s) ? 'valid' : 'invalid'
end
Running the above gives:
$ ./ab.rb
ab - valid
aabb - valid
aaaaabbbbb - valid
aaaaaa - invalid
abb - invalid
aab - invalid
ababab - invalid
Easy peazy! We can even use the x modifier to make a more readable regex.
#!/usr/bin/env ruby
pat = /^
(?<ab> # start a capture named ab
a # look for a literal 'a'
(\g<ab>)? # optionally re-execute ab recursively
b # followed by a literal 'b'
) # end of capture; this is ab
$/x
%w(ab aabb aaaaabbbbb aaaaaa abb aab ababab).each do |s|
printf "%10s - %-8s\n", s, pat.match(s) ? 'valid' : 'invalid'
end
This is clear as can be: we look for ‘a’ followed by ‘b’, or ‘ab’ preceded by ‘a’ and followed by ‘b’ (‘aabb’), or that preceded by ‘a’ and followed by ‘b’ (‘aaabbb’), and so on.
Regular expressions regularly (har, har) get slagged for being line noise, but it’s sometimes unfair. If you try to re-write a gnarly regex in Ruby (or your favorite language) without using regexes, you often find it’s even worse! Regexes are very powerful. And they’re pretty cool. Don’t dismiss them!
Regular expressions are one of my favorite programming languages, but they’re not the best tool for everything. Like @SoManyHs says, parsers are awesome. And for many tasks, they’re more appropriate than regexes. We should use them.
Just keep in mind, regexes are more than (historical) regular expressions.