Regexes > regular expressions

I absolutely loved Hsing-Hui Hsu’s talk about parsers at RubyConf 2015 last week. It’s a terrific talk and it’s well worth your time to watch it.

But I did have one little nit to pick. What we call regexes in Ruby (and many other programming languages) are much more powerful than the original definition. In addition to regular languages, Ruby regexes can express context-free languages, and even some context-sensitive languages.

ab_language.png

For example, Ruby has no trouble with the “ab” language in the presentation:

#!/usr/bin/env ruby

%w(ab aabb aaaaabbbbb aaaaaa abb aab ababab).each do |s|
    printf "%10s - %-8s\n", s,
         /^(?<ab>a(\g<ab>)?b)$/.match(s) ? 'valid' : 'invalid'
end

Running the above gives:

$ ./ab.rb 
        ab - valid   
      aabb - valid   
aaaaabbbbb - valid   
    aaaaaa - invalid 
       abb - invalid 
       aab - invalid 
    ababab - invalid

Easy peazy! We can even use the x modifier to make a more readable regex.

#!/usr/bin/env ruby

pat = /^
       (?<ab>         # start a capture named ab
         a            # look for a literal 'a'
           (\g<ab>)?  # optionally re-execute ab recursively
         b            # followed by a literal 'b'
       )              # end of capture; this is ab
       $/x

%w(ab aabb aaaaabbbbb aaaaaa abb aab ababab).each do |s|
    printf "%10s - %-8s\n", s, pat.match(s) ? 'valid' : 'invalid'
end

This is clear as can be: we look for ‘a’ followed by ‘b’, or ‘ab’ preceded by ‘a’ and followed by ‘b’ (‘aabb’), or that preceded by ‘a’ and followed by ‘b’ (‘aaabbb’), and so on.

Regular expressions regularly (har, har) get slagged for being line noise, but it’s sometimes unfair. If you try to re-write a gnarly regex in Ruby (or your favorite language) without using regexes, you often find it’s even worse! Regexes are very powerful. And they’re pretty cool. Don’t dismiss them!

ab_language_terrible.png

Regular expressions are one of my favorite programming languages, but they’re not the best tool for everything. Like @SoManyHs says, parsers are awesome. And for many tasks, they’re more appropriate than regexes. We should use them.

Just keep in mind, regexes are more than (historical) regular expressions.

Advertisements
Regexes > regular expressions

Hello, Elixir!

A while ago, I talked about trying Elixir and finished with this example of Hello World

#!/usr/bin/env elixir

greet = fn s -> IO.puts "Hello, #{s}!" end

if length(System.argv) == 0 do
  greet.("World")
else 
  Enum.each(System.argv, greet)
end

This looks pretty much the same as it would in an imperative language. First, we define a greet function that prints “Hello, string!” for whatever string we give it. If we’re given no arguments, we call this with “World”. Otherwise, we call it for each of the arguments (okay, that part is already looking a tiny bit functional, since a for loop is missing).

But we don’t often use if-then-else in Elixir. I ended that post saying I would look for a more elixiry way to do it. I did so, but I didn’t post about it.

My second pass at hello world in Elixir looked like this

#!/usr/bin/env elixir

greet = fn s -> IO.puts "Hello, #{s}!" end

hello = fn
  [] -> greet.("World")
  list -> Enum.each(list, greet)
end

hello.(System.argv)

This is the same greet function, but rather than an if-then-else, we define a new function, hello, that has one behavior when handed an empty list and a different behavior when handed a non-empty list. Then we simply call this function with the argument list of the program.

For a third pass, I threw in Elixir’s amazing pipe operator.

#!/usr/bin/env elixir

greet = fn s -> IO.puts "Hello, #{s}!" end

hello = fn
  [] -> greet.("World")
  list -> list |> Enum.each(greet)
end

hello.(System.argv)

This doesn’t really show it off much, but the pipe operator seems to be an important part of Elixir’s readability in real code.

Finally, I learned that we usually name standalone scripts like this with a .exs rather than .ex in Elixir, so the new hello.exs works the same as before

$ ./hello.exs
Hello, World!
$ ./hello.exs Hank Dean Brock
Hello, Hank!
Hello, Dean!
Hello, Brock!

Keen!

Hello, Elixir!