Go and Parsing

Go is one of the best languages to write a parser and tools that need some form of parsing in. This is mainly due to:

Screw XML

XML is unwieldy to write and obtuse to read. Unfortunately, things like Wayland use it for protocol descriptions. Fortunately, Go can be used to author tools that generate XML from a more human-readable format.

Introducing the Participle

Participle is a Go library that makes writing and parsing data into ASTs extremely easy. I'll demonstrate a simple usage of it for authoring a better Wayland protocol syntax that can transpile to XML.

One: Designing a syntax

This is mostly up to your opinion: I like the aesthetic of Go, so I went with a very Go-like aesthetic:

protocol appmenu

interface zxdg_appmenu_v1 {
    version 1

    request set_address(service_name string, object_path string)
}

Simple, yet descriptive.

Two: Building trees

Participle by default uses the tokens that form the Go language itself, which is important to know. A grammar has to play by Go rules if you stick with the default tokens.

Let's start by defining a simple protocol struct:

type Protocol struct {
}

It's empty, which isn't very useful. Let's give it a name element since we want to be able to name our protocol.

type Protocol struct {
    Name string
}

This looks like a nice start to our tree, but how does the parsing work? We add some metadata.

type Protocol struct {
    Name string `"protocol" @Ident`
}

This will tell Participle two things:

  1. It should look for the string protocol in our protocol grammar
  2. It should grab the next Identifier token and put it into the field

Now, we probably want to add a hook for an interface, as a protocol without interfaces is useless. Let's write that in:

type Protocol struct {
    Name       string      `"protocol" @Ident`
    Interfaces []Interface `{ @@ }`
}

The { @@ } will instruct the parser to capture as many interfaces as it can and stuff them into the array.

Now let's write a description for what we want an interface to look like, starting with a name.

type Interface struct {
    Name   string     `"interface" @Ident "{"`
    // Put the goodies here!
    Ending struct{}   `"}"`
}

The purpose of the Ending field is to make sure that our interfaces end with a closing bracket.

An interface is composed of requests. Let's take a closer look at what our design looked like:

request set_address(service_name string, object_path string)
^
|
| always "request"
request set_address(service_name string, object_path string)
        ^
        |
        | Must be a valid identifier
request set_address(service_name string, object_path string)
                    ^^^^^^^^^^^^^^^^^^^
                    |
                    | One unit with two parts: identifier and type
request set_address(service_name string, object_path string)
                   ^                                       ^
                   |                                       |
                   | these surround our arguments          |
request set_address(service_name string, object_path string)
                                       ^
                                       |
          this separates our arguments |

Describing this will roughly look like this:

"request" @Ident "(" argument, argument ")"

Let's put that into a struct:

type Request struct {
    Name      string     `"request" @Ident "("`
    Arguments []Argument `{ @@ [","] } ")"`
}

{ @@ [","] } is a fancy way of making the Arguments field say “capture as many of me as possible, and we might have a comma separating us.”

Now let's write an Argument struct.

type Argument struct {
    Name string `@Ident`
    Type string `@Ident`
}

Since this is basically just a tuple of identifiers, that's exactly what we made this struct.

Because an interface can have multiple requests, we add the following field to our Interface struct: Requests []Request `{ @@ } Like above, { @@ } will try and capture as many Requests as possible.

Put together, all our structs look like this:

type Interface struct {
    Name     string       `"interface" @Ident "{"`
    Requests []Request    `{ @@ }`
    Ending   struct{}     `"}"`
}
type Protocol struct {
    Name string `"protocol" @Ident`
}
type Request struct {
    Name      string     `"request" @Ident "("`
    Arguments []Argument `{ @@ [","] } ")"`
}
type Argument struct {
    Name string `@Ident`
    Type string `@Ident`
}

Three: parsing trees

Now that we have our AST designed, let's hook it up to Particple.

parser := participle.MustBuild(&ProtocolDescription{})
protocol := Protocol{}
parser.Parse(os.Stdin, &protocol)

That's easy, eh? Since building XML output is fairly straightforward (just build structs corresponding to the XML output and marshal the AST into them and marshal the structs into XML), I won't be covering that here.

From Here

Some links you may find useful:

blankInterface: A more complete Wayland protocol parser and XML generator.

Participle: The parser library used.

encoding/xml: XML library in Go's stdlib.

Tags: #libre