Strings in Go and Rust

This week at Go Meetup, we talked briefly about how strings in Go are UTF-8, but not really. What I mean is, on the one hand, we can write

s := "Hello, 世界!"
fmt.Println(s)

and it prints out

Hello, 世界!

as expected. But on the other hand, we can put an invalid UTF-8 sequence into a string as well

s := "\x67\x72\xfc\xdf\x65"

It will compile just fine, but print out junk.

gr��e

If we accept strings from an external source, we probably don’t want to do stringy things with them without first checking that they’re valid. For example, this code

package main

import (
    "fmt"
    "os"
)

func main() {
    for _, s := range os.Args {
        fmt.Println(s)
    }
}

just prints whatever we give it

$ ./garbage foo bär $(echo -en "\x67\x72\xfc\xdf\x65") baz
./garbage
foo
bär
gr��e
baz

while this one

package main

import (
    "fmt"
    "os"
    "unicode/utf8"
)

func main() {
    for _, s := range os.Args {
        if utf8.ValidString(s) {
            fmt.Println(s)
        } else {
            fmt.Println("not valid")
        }
    }
}

only prints valid strings

$ go build valid_string.go 
$ ./valid_string foo bär $(echo -en "\x67\x72\xfc\xdf\x65") baz
./valid_string
foo
bär
not valid
baz

In Rust, strings are UTF-8 as well. We can write

let s = "Hello, 世界!";
println!("{}", s);

and it prints out

Hello, 世界!

as expected. But unlike Go, we can’t put an invalid UTF-8 sequence in a string. This

let s = "\x67\x72\xfc\xdf\x65";

doesn’t even compile

error: this form of character escape may only be used with characters in the range [\x00-\x7f]

However, we still need to be careful. This

let v = vec![0x67, 0x72, 0xfc, 0xdf, 0x65];
let t = String::from_utf8(v);
println!("{:?}", t);

compiles fine, but gives a run-time error

Err(FromUtf8Error { bytes: [103, 114, 252, 223, 101], error: Utf8Error { valid_up_to: 2 } })

So once again, if we accept strings from an external source, we probably don’t want to do stringy things with them without first checking that they’re valid. But, unlike in Go, we can’t even put them in a string until we check. This code

use std::env;

fn main() {
    for arg in env::args() {
        println!("{}", arg);
    }
}

panics if any arguments are not valid UTF-8

$ ./valid_string_panic foo bär $(echo -en "\x67\x72\xfc\xdf\x65") baz
./valid_string_panic
foo
bär
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "gr��e"', ../src/libcore/result.rs:837
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Instead of std::env::args, we can use std::env::args_os to collect the arguments

use std::env;

fn main() {
    for arg in env::args_os() {
        println!("{:?}", arg);

        //println!("{}", arg);
        // does not compile
    }
}

This gives us an OsString instead of a String. Right away, we can see it’s different because it won’t even compile if we try to print it with “{}”. When we change to “{:?}”, we get junk for invalid UTF-8

$ ./valid_string_garbage foo bär $(echo -en "\x67\x72\xfc\xdf\x65") baz
"./valid_string_garbage"
"foo"
"bär"
"gr��e"
"baz"

To check that it’s valid, we can try to convert the OsString to a String. The to_str method returns an Option, which we can check

use std::env;

fn main() {
    for arg in env::args_os() {
        match arg.to_str() {
            Some(s) => println!("{}", s),
            None => println!("not valid"),
        }
    }
}

Thus we get

$ rustc valid_string.rs
$ ./valid_string foo bär $(echo -en "\x67\x72\xfc\xdf\x65") baz
./valid_string
foo
bär
not valid
baz

just as in Go.

So even though both Go and Rust use UTF-8 for strings, they are not the same model. There’s more to it. When it comes to encodings, there’s always more to it!

Strings in Go and Rust

VS Code

One of the things I learned at Go Maryland tonight was that VS Code is not just for Windows; they have versions for Linux and OS X too! With a name like Visual Studio Code, I just assumed it was a Windows thing. Not so!

Naturally, when I got home I had to try it! And here it is running on my Linux machine!

Screen shot of VS Code

As you can see, it understands Go code (on the left), but not Elixir (on the right). At least, not yet. I’m sure it will eventually. I also tried out Perl (yes), Python (yes), and Ruby (yes) — no surprises there — as well as Erlang (no), Pony (no), Rust (yes), and Clojure (yes) — a couple of nice surprises there!

The cursor blinks by default, so the first thing I had to do was figure out how to shut that off 1. It only took me a minute or two to find and change the configuration to a non-blinking cursor. Well done, VS Code!

I doubt I’ll be giving up Emacs any time soon (indeed, I’m typing this blog entry with org2blog), but it’s nice to see another open source editor available. Great job, Microsoft!

Update: In case you’re curious, here’s a shot of the same two files opened in Emacs, which has an Elixir mode.

Screen shot of same two files in Emacs

Footnotes:

1

I can’t stand blinking. I think it’s genetic. My Mom never let us have Christmas lights that blinked either. And to be fair, my beloved Emacs has a blinking cursor by default also
VS Code

Get small in Go present

Among the many cool tools in the Go ecosystem is present, a package for making slide presentations and blog posts. It’s an easy way to make a nice looking HTML5 presentation that can also run live code samples. Keen!

Because it’s so cool, lots of folks use it. Often they publish their slides afterwards. For example, here is a terrific talk by Brad Fitzpatrick from GoCon Tokyo.

The problem is, Go present must have only been used by folks with high-resolution displays so far. When I look at any Go present presentation in my browser, the top gets chopped off. On most slides, this is the title!

screen cap of Go present in smaller browser

That top line is supposed to read, “60% of the time, it works every time….”, but even though I am scrolled all the way to the top, I can’t see it.

This appears to be a typical case of making fixed-size assumptions in HTML. I fiddled with the CSS a bit until I arrived at manipulating the margin-top value. Ten percent was a bit too much

screen cap of Go present with 10% top margin.

but 5% worked pretty well.

screen cap of Go present with 5% top margin.

So, I added the following to my userContent.css file.

/*
 * This is to force "Go present" presentations to fit in my browser.
 */
.slides { margin-top: 5% !important; }

Now all such presentations I encounter (which is a lot lately) are readable in my browser.

Get small in Go present