Regexes > regular expressions

I absolutely loved Hsing-Hui Hsu’s talk about parsers at RubyConf 2015 last week. It’s a terrific talk and it’s well worth your time to watch it.

But I did have one little nit to pick. What we call regexes in Ruby (and many other programming languages) are much more powerful than the original definition. In addition to regular languages, Ruby regexes can express context-free languages, and even some context-sensitive languages.


For example, Ruby has no trouble with the “ab” language in the presentation:

#!/usr/bin/env ruby

%w(ab aabb aaaaabbbbb aaaaaa abb aab ababab).each do |s|
    printf "%10s - %-8s\n", s,
         /^(?<ab>a(\g<ab>)?b)$/.match(s) ? 'valid' : 'invalid'

Running the above gives:

$ ./ab.rb 
        ab - valid   
      aabb - valid   
aaaaabbbbb - valid   
    aaaaaa - invalid 
       abb - invalid 
       aab - invalid 
    ababab - invalid

Easy peazy! We can even use the x modifier to make a more readable regex.

#!/usr/bin/env ruby

pat = /^
       (?<ab>         # start a capture named ab
         a            # look for a literal 'a'
           (\g<ab>)?  # optionally re-execute ab recursively
         b            # followed by a literal 'b'
       )              # end of capture; this is ab

%w(ab aabb aaaaabbbbb aaaaaa abb aab ababab).each do |s|
    printf "%10s - %-8s\n", s, pat.match(s) ? 'valid' : 'invalid'

This is clear as can be: we look for ‘a’ followed by ‘b’, or ‘ab’ preceded by ‘a’ and followed by ‘b’ (‘aabb’), or that preceded by ‘a’ and followed by ‘b’ (‘aaabbb’), and so on.

Regular expressions regularly (har, har) get slagged for being line noise, but it’s sometimes unfair. If you try to re-write a gnarly regex in Ruby (or your favorite language) without using regexes, you often find it’s even worse! Regexes are very powerful. And they’re pretty cool. Don’t dismiss them!


Regular expressions are one of my favorite programming languages, but they’re not the best tool for everything. Like @SoManyHs says, parsers are awesome. And for many tasks, they’re more appropriate than regexes. We should use them.

But keep in mind, regexes are more than (historical) regular expressions.

Regexes > regular expressions

Hello, Elixir!

A while ago, I talked about trying Elixir and finished with this example of Hello World

#!/usr/bin/env elixir

greet = fn s -> IO.puts "Hello, #{s}!" end

if length(System.argv) == 0 do
  Enum.each(System.argv, greet)

This looks pretty much the same as it would in an imperative language. First, we define a greet function that prints “Hello, string!” for whatever string we give it. If we’re given no arguments, we call this with “World”. Otherwise, we call it for each of the arguments (okay, that part is already looking a tiny bit functional, since a for loop is missing).

But we don’t often use if-then-else in Elixir. I ended that post saying I would look for a more elixiry way to do it. I did so, but I didn’t post about it.

My second pass at hello world in Elixir looked like this

#!/usr/bin/env elixir

greet = fn s -> IO.puts "Hello, #{s}!" end

hello = fn
  [] -> greet.("World")
  list -> Enum.each(list, greet)


This is the same greet function, but rather than an if-then-else, we define a new function, hello, that has one behavior when handed an empty list and a different behavior when handed a non-empty list. Then we simply call this function with the argument list of the program.

For a third pass, I threw in Elixir’s amazing pipe operator.

#!/usr/bin/env elixir

greet = fn s -> IO.puts "Hello, #{s}!" end

hello = fn
  [] -> greet.("World")
  list -> list |> Enum.each(greet)


This doesn’t really show it off much, but the pipe operator seems to be an important part of Elixir’s readability in real code.

Finally, I learned that we usually name standalone scripts like this with a .exs rather than .ex in Elixir, so the new hello.exs works the same as before

$ ./hello.exs
Hello, World!
$ ./hello.exs Hank Dean Brock
Hello, Hank!
Hello, Dean!
Hello, Brock!


Hello, Elixir!

Equals: a sign or assign

I enjoyed reading “How I learned to stop worrying and love the code” and as a teacher of The Code I took it to heart.

Number 3 is huge, I think. Most people think they know what this means

hello = 4

so if we don’t tell them straight away that it means something else, they might be confused for some time. I guess we can blame Fortran for originally abusing the “equals sign” that way, but that doesn’t really excuse all the languages that have repeated the blunder since then.

I think part of the issue is verbal. When I learned Pascal, which has a distinct assignment operator,

hello := 4

I developed the habit of pronouncing it “hello gets four” and that continues to this day… even in languages that use an equals sign. I’ve noticed that lots of programmers pronounce

hello = 4

“hello equals four,” even though they also pronounce

hello == 4

“hello equals four.” Having a verbal distinction between “hello gets four” and “hello equals four” is useful, in my mind. In practice, though, it’s not that helpful since few people I end up in a code review or pair programming situation with do the same.

It’s also worth noting that

hello = 4

means something a little different in Python, than in most other languages. Namely, it’s binding a name rather than assigning a value. This means that in addition to tripping up beginning programmers, it also trips up experienced programmers who are new to Python.

It means another thing in Erlang and yet another in Elixir. Both of these are matching, rather than simply assigning, but in slightly different ways. Again, this trips up experienced programmers who are new to Erlang or Elixir. It doesn’t mean “equals,” so the beginning programmers are confused, but it doesn’t mean “simple assignment” either, so the experienced programmers are confused too. It doesn’t even mean “bind,” so the Python programmers are confused as well.

I have often wondered if teaching Elixir to beginning programmers might be easier than teaching it to experienced programmers. Teaching Elixir (or Erlang or Clojure or any functional language) to someone who already knows imperative programming seems to involve as much un-learning as learning. If someone didn’t have any baggage from imperative programming, we could just teach them functional programming straight away. Someday, I’d like to try this.

Equals: a sign or assign

Happy Programmers Day

Today is Programmers Day! To celebrate, I just tried my hand at a Rust program. Rust is a fairly new programming language that has been in development for five or so years and just had its 1.0 release earlier this year.

If we wanted to check if today was Programmers Day, we’d probably discover the familiar struct Tm in the time crate. Just as in C, it has tm_yday, which is just what we need. To use it, we could start a new project with Cargo.

$ cargo new programmers_day --bin

Now add

time = "0.1"

to our Cargo.toml and the following in our src/

extern crate time;

fn main() {
    if time::now().tm_yday == 255 {
        println!("Happy Programmers Day!");
    } else {
        println!("Ho hum, just another day.");

et voilà!

$ cargo run
   Compiling programmers_day v0.1.0 (file:///home/tim/rust/programmers_day)
     Running `target/debug/programmers_day`
Happy Programmers Day!
Happy Programmers Day

Say time

This morning, I read on Hacker News that, “You’ll be more productive if your computer announces the time.” The link was to a click-baity article on a number of “productivity hacks,” the first of which was, “Have your computer announce the time.” As is often the case, this article comes from a world that contains only two kinds of computers. “Here’s how to do it on a Mac, and here are instructions for Windows users.”

My machine runs Linux, so I wondered how I might have it announce the time.

It’s easy to get the current time with GNU date

$ date
Sun Sep  6 14:31:30 EDT 2015

and to synthesize some simple text with espeak

$ echo "Hello, world!" | espeak

so a first pass at a solution might be

$ date | espeak

But that’s not very satisfying. A human knows what that string means, but espeak just reads it off as is, “sun sep six…”


Turns out CPAN already contains a Perl module which does exactly what we need: Time::Human.

$ perl -MTime::Human -E 'say "The time is now ", humanize(localtime)'
The time is now a little after half past two in the afternoon

That string is much more suitable for piping to espeak!

$ perl -MTime::Human -E 'say "The time is now ", humanize(localtime)' | espeak

Not bad!


Espeak sounds okay, but perhaps we can do better. There’s another synthesizer called Festival, which sounds a little smoother. And there’s even a lite version, written in C.

$ sudo apt-get install flite
$ perl -MTime::Human -E 'say "The time is now ", humanize(localtime)' | flite

Flite has a number of voices from which to choose. The one that seems to sound the best for this is awb_time

$ perl -MTime::Human -E 'say "The time is now ", humanize(localtime)' | flite -voice awb_time



Now, how to get it to say the time automatically every hour? Cron. I edit my personal crontab file with

crontab -e

and add an entry like so

0 * * * * /usr/bin/perl -MTime::Human -E 'say "The time is now ", humanize(localtime)' | flite -voice awb_time

That will invoke that one-liner at the top of every hour.

Normally, I use plenv to install Perl in my own space, so I installed Time::Human with

cpanm Time::Human

To get the cron job to work with /usr/bin/perl, I had to install Time::Human against the system Perl with

sudo apt-get install libtime-human-perl

Now my system tells me the time every hour on the hour! We’ll see if that helps my productivity.

Random voice

Here’s a little Perl script I wrote to choose from all of flite’s voices randomly

#!/usr/bin/env perl

use v5.20;
use warnings;
use Time::Human;

my $time_string = "The time is now " . humanize(localtime);

die "$time_string\n" if @ARGV;

my $voices = `flite -lv`;

my @voices = split ' ', $voices =~ s/^Voices available: //r;

my $voice = $voices[rand @voices];

system "flite -voice $voice -t '$time_string'";

I didn’t end up using it because I thought the awb_time voice sounded better and I wanted to use it all the time.

Say time

Erlang MOOC

I just completed An Introduction to Functional Programming with Erlang, a mini-MOOC at the University of Kent. It was terrific!

Screen shot of master class

Erlang is a functional programming language created at Ericsson in the 1980s. In 1998, it was open-sourced. Erlang’s claim to fame is massive scalability with high reliability. Ericsson designed it for their telephony system. More recently, WhatsApp used it for their messaging system (you may have heard of this last year, when Facebook paid a huge sum of money for Whatsapp). I’m interested in Erlang because I’m excited about Elixir, a relatively new programming language (version 1.0 was released just last year) which runs on the Erlang virtual machine (BEAM). I figure everything I learn about Erlang will help my understanding of Elixir.

I installed the Erlang compiler and documentation on my Debian laptop with

sudo apt-get install erlang erlang-doc

Easy peasy! Now I can create hello.erl


hello() -> io:fwrite("Hello, World!\n").

and compile and run it in the Erlang REPL

$ erl
Erlang/OTP 17 [erts-6.2]  [64-bit] [smp:2:2] [async-threads:10] [kernel-poll:false]

Eshell V6.2  (abort with ^G)
1> c(hello).
2> hello:hello().
Hello, World!

Lemon squeezy! Now I’m ready to follow along with the course.

The mini-MOOC is a work in progress, but it’s very well done. It’s essentially the first three weeks of what will become a six week MOOC. It uses a University message board called Moodle now, but I think they intend to move to a proper MOOC system for the full course. As such, this three week pilot was limited to some 500 students.

The course is taught by Professor Simon Thompson. There are videos of him lecturing

Screen shot of Prof Thompson

presenting with slides

Screen shot of Prof Thompson with slide

and live coding

Screen shot of Prof Thompson live coding

There was also a “Master Class” segment filmed in a fancy studio (that’s the first photo at the top).

Additionally, there were quizzes and exercises. And every page had a discussion section where you chatted with other students. There was also a teaching assistant, Stephen Adams, who would show up there and post his solutions to the exercises, along with his comments about why he did things a certain way.

Erlang is kind of an odd language and takes some getting used to. I thought Prof Thompson did a good job of explaining the weirder parts, like what Erlang means by variable, assignment, and pattern-matching.

I enjoyed that Erlang was weakly typed, which let us concentrate on things like recursion, testing, and higher-order functions. Contrast this with the Haskell course, which would have us believe that functional programming is all about algebraic data types. Towards the end of the course, Prof Thompson hints that a stronger type system could be helpful for larger projects, but it was nice to not have to mess with it right from the start. That point came up again in a discussion video with Joe Armstrong (creator of Erlang) and Francesco Cesarini (Erlang Solutions Ltd).

Screen shot of Cesarini, Thompson, and Armstrong

Overall, I thought it was time well spent. Based on this pilot, whenever the full six-week course is ready, I would definitely recommend it.

Erlang MOOC

Go diamond

Lately, I’ve been using Go for things that I used to use Perl, Python or Ruby for. This includes quick and dirty scripts for filtering text.

Perl’s “diamond” operator (<>) encapsulates all of this one fell swoop.

#!/usr/bin/env perl

use v5.20;
use warnings;

while (<>) {
    # do something with $_ here

Without any arguments, this will read from stdin line by line. If there are arguments, it will treat them as filenames and read from each of them line by line. If any of those arguments is “-“, it will take that to mean stdin. It’s the perfect thing for the Unix command line. Indeed, we can write one-liners that do all of the above with just a -p flag (or a -n flag, without the print).

In Python, we have a similar capability with the fileinput module.

#!/usr/bin/env python

import fileinput

for line in fileinput.input():
    # do something with line
    print(line, end="")

In Ruby, we iterate through ARGF

#!/usr/bin/env ruby

ARGF.each do |line|
    # do stuff with line
    print line

In short, Perl, Python, and Ruby each make it super easy to write command line utilities that just do the right thing. How do we do something similar in Go?

Well, none of it is hard, but there really is quite a lot going on those tiny little snippets above. That becomes apparent when you write it all out “by hand” in a language like Go. Here’s what I came up with.

package main

import (

func main() {

    filenames := []string{"-"}

    if len(os.Args) > 1 {
        filenames = os.Args[1:]

    for _, filename := range filenames {

        var file *os.File
        var err error

        if filename == "-" {
            file = os.Stdin
        } else {
            if file, err = os.Open(filename); err != nil {
                fmt.Fprintln(os.Stderr, err)
            defer file.Close()

        scanner := bufio.NewScanner(file)
        for scanner.Scan() {

            line := scanner.Text()

            // do something with line here

        if err := scanner.Err(); err != nil {
            fmt.Fprintln(os.Stderr, err)

But wait, there’s more! Perl and Ruby are keeping track of the line numbers already too. So is Python’s fileinput. If we wanted to print those out, we’d just print out “$.” in Perl and Ruby and “fileinput.lineno()” in Python. In Go, we’d have to create a variable to keep track of those as well.

But doing so, we’d know exactly whether we had the line number for each file or for the total. In Perl, Python, and Ruby, we have to take some care to figure out whether it’s per file or not. I think it’s little things like this that cause me to not miss the brevity of Perl, Python, and Ruby when I’m writing Go.

Go diamond