Ruby’s Lonely Operator

I enjoyed Matz’s closing keynote address at RubyConf 2015! In particular, I adored his description of the lonely operator:

Lonely_operator.png

“Look at this figure…and then…you can see…someone sitting on the floor looking at a dot. On the floor. By themself. Now you don’t forget. Yeah.” – @yukihiro_matz

He’s right, I think I will never forget this. It’s ossum! I intend to download Ruby 2.3 the minute it is available and write something that uses this operator. Just because.

Advertisements
Ruby’s Lonely Operator

Regexes > regular expressions

I absolutely loved Hsing-Hui Hsu’s talk about parsers at RubyConf 2015 last week. It’s a terrific talk and it’s well worth your time to watch it.

But I did have one little nit to pick. What we call regexes in Ruby (and many other programming languages) are much more powerful than the original definition. In addition to regular languages, Ruby regexes can express context-free languages, and even some context-sensitive languages.

ab_language.png

For example, Ruby has no trouble with the “ab” language in the presentation:

#!/usr/bin/env ruby

%w(ab aabb aaaaabbbbb aaaaaa abb aab ababab).each do |s|
    printf "%10s - %-8s\n", s,
         /^(?<ab>a(\g<ab>)?b)$/.match(s) ? 'valid' : 'invalid'
end

Running the above gives:

$ ./ab.rb 
        ab - valid   
      aabb - valid   
aaaaabbbbb - valid   
    aaaaaa - invalid 
       abb - invalid 
       aab - invalid 
    ababab - invalid

Easy peazy! We can even use the x modifier to make a more readable regex.

#!/usr/bin/env ruby

pat = /^
       (?<ab>         # start a capture named ab
         a            # look for a literal 'a'
           (\g<ab>)?  # optionally re-execute ab recursively
         b            # followed by a literal 'b'
       )              # end of capture; this is ab
       $/x

%w(ab aabb aaaaabbbbb aaaaaa abb aab ababab).each do |s|
    printf "%10s - %-8s\n", s, pat.match(s) ? 'valid' : 'invalid'
end

This is clear as can be: we look for ‘a’ followed by ‘b’, or ‘ab’ preceded by ‘a’ and followed by ‘b’ (‘aabb’), or that preceded by ‘a’ and followed by ‘b’ (‘aaabbb’), and so on.

Regular expressions regularly (har, har) get slagged for being line noise, but it’s sometimes unfair. If you try to re-write a gnarly regex in Ruby (or your favorite language) without using regexes, you often find it’s even worse! Regexes are very powerful. And they’re pretty cool. Don’t dismiss them!

ab_language_terrible.png

Regular expressions are one of my favorite programming languages, but they’re not the best tool for everything. Like @SoManyHs says, parsers are awesome. And for many tasks, they’re more appropriate than regexes. We should use them.

Just keep in mind, regexes are more than (historical) regular expressions.

Regexes > regular expressions