Friday, August 7, 2015

JSON Date management in Golang

whatif

TL;DR

Arbitrary date unmarshal support + easily set marshal date format for both json and bson.
The code and examples can be found here: https://github.com/simplereach/timeutils.

Small example

package main

import (
        "encoding/json"
        "fmt"
        "os"

        "github.com/simplereach/timeutils"
)

type data struct {
        Time timeutils.Time `json:"time"`
}

func main() {
        var d data
        jStr := `{"time":"09:51:20.939152pm 2014-31-12"}`
        _ = json.Unmarshal([]byte(jStr), &d)
        fmt.Println(d.Time)

        d = data{}
        jStr = `{"time":1438947306}`
        _ = json.Unmarshal([]byte(jStr), &d)
        fmt.Println(d.Time)

        d.Time = d.Time.FormatMode(timeutils.RFC1123)
        _ = json.NewEncoder(os.Stdout).Encode(d)
}

The Standard Library

Go provide an extensive support for dates/time in the standard library with the package time.

This allows to easily deal with dates, compare them or make operations on them as well as moving from a timezone to an other.

Example

package main

import (
    "fmt"
    "time"
)

func main() {
    fmt.Printf("%s\n", time.Now().UTC().Add(-1 * time.Day))
}

Formating

Within the time.Time object, there are easy ways to format the date:

package main

import (
    "fmt"
    "time"
)

func main() {
    now := time.Now().UTC()
    // Display the time as RFC3339
    fmt.Printf("%s\n", now.Format(time.RFC3339))
    // Display the timestamp
    fmt.Printf("%s\n", now.Unix())
    // Display only the hour/minute
    fmt.Printf("%s\n", now.Format("3:04PM"))
}

Parsing

When it comes to parsing, once again, the standard library offers tools.

Parsing date string

package main

import (
    "log"
    "fmt"
    "time"
)

func main() {
    t, err := time.Parse(time.RFC3339, "2006-01-02T15:04:05-07:00")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("%s\n", t)
}

“Parsing” timestamp

package main

import (
    "fmt"
    "time"
)

func main() {
    now := time.Unix(1438947306, 0).UTC()
    fmt.Printf("%s\n", now)
}

This is great, but what if we don’t know what time format we are expecting? i.e. user input or 3rd part API.

A solution would be to iterate through the available time formats until we succeed, but this is often cumbersome and unreliable.

Approxidate

The git library has this Approxidate component that parses arbitrary date format and there is a Golang binding so we can use it!

http://godoc.org/github.com/simplereach/timeutils#ParseDateString

This expects a string as input and will do everything it can to properly yield a time object.

package main

import (
        "fmt"
        "log"

        "github.com/simplereach/timeutils"
)

func main() {
        t, err := timeutils.ParseDateString("09:51:20.939152pm 2014-31-12")
        if err != nil {
                log.Fatal(err)
        }
        fmt.Println(t)
}

Case of JSON Marshal/Unmarshal

Unmarshal

Let’s start with the unmarshal. What if we don’t want to parse the time manually and let json.Unmarshal handle it? Let’s try:

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "time"
)

func main() {
    var t time.Time

    str := fmt.Sprintf("%q", time.Unix(1438947306, 123).Format(time.RFC3339))
    fmt.Printf("json string: %s\n", str)
    if err := json.Unmarshal([]byte(str), &t); err != nil {
        log.Fatal(err)
    }
    fmt.Printf("result: %s\n", t.Format(time.RFC3339))
}

Magically, it works fine! This is great, isn’t it?
But wait, the specs require us to send the date as RFC1123, is this going to work?
Let’s try as well!

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "time"
)

func main() {
    var t time.Time

    str := fmt.Sprintf("%q", time.Unix(1438947306, 123).Format(time.RFC1123))
    fmt.Printf("json string: %s\n", str)
    if err := json.Unmarshal([]byte(str), &t); err != nil {
        log.Fatal(err)
    }
    fmt.Printf("result: %s\n", t.Format(time.RFC1123))
}
2009/11/10 23:00:00 parsing time ""Fri, 07 Aug 2015 11:35:06 UTC"" as ""2006-01-02T15:04:05Z07:00"": cannot parse "Fri, 07 Aug 2015 11:35:06 UTC"" as "2006"

Oups.

So it does not work, how can we work around this?

A solution would be to implement the json.Unmarshaler interface and handle our own parsing format, but we’ll get to this.

Marshal

Ok, we have our time object, and we want to send it as json. Nothing easier:

package main

import (
    "encoding/json"
    "os"
    "time"
)

func main() {
    _ = json.NewEncoder(os.Stdout).Encode(time.Unix(1438947306, 0).UTC())
}

It works fine :) However, the client expects times as RFC1123, how can we set the format to json.Marhsal?

A way to do so would be to implement the json.Marshaler interface and handling our own formatting.

Custom Marshal/Unmarshal

In order to tell Go to use a custom method for json marshal/unmarshal, one needs to implement the json.Marshaler and json.Unmarshaler interfaces.
As we can’t do that on imported type time.Time, we need to create a custom type.

Custom type

In order to create a custom type in Go, we simply do:

type myTime time.Time

However, doing so “hides” all members and methods so we can’t do things like this:

var t myTime
t.UTC()

Which is pretty annoying as our goal is simply to override the JSON behavior. We still want our full blown object.
To do so, we’ll use a struct with an anynomous member:

type myTime struct {
    time.Time
}

This way, we can access all the methods of the nested time object.

Unmarshal RFC1123

As we expect RFC1123, we need a custom parsing, so le’ts implement json.Unmarshaler.
Let’s take our first RFC1123 example and improve it:

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "strings"
    "time"
)

type myTime struct {
    time.Time
}

func (t *myTime) UnmarshalJSON(buf []byte) error {
    tt, err := time.Parse(time.RFC1123, strings.Trim(string(buf), `"`))
    if err != nil {
        return err
    }
    t.Time = tt
    return nil
}

func main() {
    var t myTime

    str := fmt.Sprintf("%q", time.Unix(1438947306, 123).Format(time.RFC1123))
    fmt.Printf("json string: %s\n", str)
    if err := json.Unmarshal([]byte(str), &t); err != nil {
        log.Fatal(err)
    }
    fmt.Printf("result: %s\n", t.Format(time.RFC1123))
}

And now it works! We have a json unmarshal that supports RFC1123 instead of RFC3339!

To implment the json.Unmarshaler interface, we need to write the func (t *myTime) UnmarshalJSON(buf []byte) error method.

This receives the json buffer and return an error. It is expected to set the parsed value to the receiver so it is important that the receiver is a pointer.

The first step, has we expect valid json is to trim down the " from the string, then we call the time.Parse and finally set the result to our object.

Marshal RFC1123

Instead of the default RFC3339, let’s have json encode our time as RFC1123:

package main

import (
    "encoding/json"
    "os"
    "time"
)

type myTime struct {
    time.Time
}

func (t myTime) MarshalJSON() ([]byte, error) {
    return []byte(`"` + t.Time.Format(time.RFC1123) + `"`), nil
}

func main() {
    now := myTime{time.Unix(1438947306, 123)}
    _ = json.NewEncoder(os.Stdout).Encode(now)
}

Same idea as unmarshal. Here we only dump data so we don’t want the receiver to be a pointer and we make sure that we return valid json wrapped in ".

Going further

Changing the time format is great, but what if we need to move around dates as a timestamp integer? Or as a nanosecond timestamp? Or if we expect arbitrary format?

What if we have a REST API that need to move date between json and bson?

The timeutils library (http://github.com/simplereach/timeutils) offers a Time type that supports arbitrary time format via aproxidate as well as Timestamp and nanosecond precision both for marshal/unmarshal in json and bson.

Monday, July 27, 2015

Parsing HTTP Query String in Go

Parsing HTTP Query String in Go

TL;DR

Code and examples can be found here: https://github.com/creack/httpreq

HTTP Server

Go provides a very easy way to create a http server:

package main

import (
        "fmt"
        "log"
        "net/http"
)

func handler(w http.ResponseWriter, req *http.Request) {
        fmt.Fprintf(w, "hello world\n")
}

func main() {
        log.Fatal(http.ListenAndServe(":8080", http.HandlerFunc(handler)))
}
// curl http://localhost:8080

But how to deal with data?

JSON body

A common way to pass data is via a json encoded body:

package main

import (
        "encoding/json"
        "fmt"
        "log"
        "net/http"
)

type query struct {
        Name string
}

func handler(w http.ResponseWriter, req *http.Request) {
        q := &query{}
        if err := json.NewDecoder(req.Body).Decode(q); err != nil {
                log.Printf("Error decoding body: %s", err)
                return
        }
        fmt.Fprintf(w, "hello %s\n", q.Name)
}

func main() {
        log.Fatal(http.ListenAndServe(":8080", http.HandlerFunc(handler)))
}
// curl -d '{"Name": "Guillaume"}' http://localhost:8080/

Query String

But what if we want to pass data via query string? Typically, pagination and extra data.

Go, once again, expose everything necessary:

package main

import (
        "fmt"
        "log"
        "net/http"
        "strconv"
)

func handler(w http.ResponseWriter, req *http.Request) {
        if err := req.ParseForm(); err != nil {
                log.Printf("Error parsing form: %s", err)
                return
        }
        l := req.Form.Get("limit")
        limit, err := strconv.Atoi(l)
        if err != nil {
                log.Printf("Error parsing limit: %s", err)
                return
        }

        dr := req.Form.Get("dryrun")
        dryRun, _ := strconv.ParseBool(dr)
        fmt.Fprintf(w, "hello world. Limit: %d, Dryrun: %t\n", limit, dryRun)
}

func main() {
        log.Fatal(http.ListenAndServe(":8080", http.HandlerFunc(handler)))
}
// curl 'http://localhost:8080?limit=42&dryrun=true'

As we can see, it works as expected, however, if we add more and more fields to our query string, the type conversions quickly become cumbersome.

A Better Query String management

We know how to convert any string to any type.
We know what data we are expecting.
We should be able to do something similar to json.Unmarshal.

Conversion functions

Let’s start with our previous example: we need an int and a bool. However, the strconv functions have different prototypes and return a value.

It would be interesting to write a small helper that will set a value instead of returning it. That way, we could instantiate our query object and pass the fields to be set. In order to do so, we need to use pointers.

package main

import (
        "fmt"
        "log"
        "net/http"
        "strconv"
)

type query struct {
        Limit  int
        DryRun bool
}

func parseBool(s string, dest *bool) error {
        // assume error = false
        *dest, _ = strconv.ParseBool(s)
        return nil
}

func parseInt(s string, dest *int) error {
        n, err := strconv.Atoi(s)
        if err != nil {
                return err
        }
        *dest = n
        return nil
}

func handler(w http.ResponseWriter, req *http.Request) {
        if err := req.ParseForm(); err != nil {
                log.Printf("Error parsing form: %s", err)
                return
        }
        q := &query{}
        if err := parseBool(req.Form.Get("dryrun"), &q.DryRun); err != nil {
                log.Printf("Error parsing dryrun: %s", err)
                return
        }
        if err := parseInt(req.Form.Get("limit"), &q.Limit); err != nil {
                log.Printf("Error parsing limit: %s", err)
                return
        }
        fmt.Fprintf(w, "hello world. Limit: %d, Dryrun: %t\n", q.Limit, q.DryRun)
}

func main() {
        log.Fatal(http.ListenAndServe(":8080", http.HandlerFunc(handler)))
}

Make it generic

It is a bit better, but still could be improved. What if we’d like to have this in a generic way?

As we can see, the conversion helpers have a very similar prototype, let’s make it the same using interface{}

package main

import (
        "fmt"
        "log"
        "net/http"
        "strconv"
)

type query struct {
        Limit  int
        DryRun bool
}

func parseBool(s string, dest interface{}) error {
        d, ok := dest.(*bool)
        if !ok {
                return fmt.Errorf("wrong type for parseBool: %T", dest)
        }
        // assume error = false
        *d, _ = strconv.ParseBool(s)
        return nil
}

func parseInt(s string, dest interface{}) error {
        d, ok := dest.(*int)
        if !ok {
                return fmt.Errorf("wrong type for parseInt: %T", dest)
        }
        n, err := strconv.Atoi(s)
        if err != nil {
                return err
        }
        *d = n
        return nil
}

func handler(w http.ResponseWriter, req *http.Request) {
        if err := req.ParseForm(); err != nil {
                log.Printf("Error parsing form: %s", err)
                return
        }
        q := &query{}
        if err := parseBool(req.Form.Get("dryrun"), &q.DryRun); err != nil {
                log.Printf("Error parsing dryrun: %s", err)
                return
        }
        if err := parseInt(req.Form.Get("limit"), &q.Limit); err != nil {
                log.Printf("Error parsing limit: %s", err)
                return
        }
        fmt.Fprintf(w, "hello world. Limit: %d, Dryrun: %t\n", q.Limit, q.DryRun)
}

func main() {
        log.Fatal(http.ListenAndServe(":8080", http.HandlerFunc(handler)))
}

Parsing object

Now that we have generic helpers, we can easily write a small object that will simplify the way we use it:

We need to store N parsing functions, so we’ll need a slice (or a map). In order to parse a field, we need the helper function, but we also need the original string and the destination.

We have our object!

type parsingMap []parsingMapElem

type parsingMapElem struct {
        Field string
        Fct   func(string, interface{}) error
        Dest  interface{}
}

Once our paringMap constructed, we then need to execute it, let’s write the loop logic:

func (p parsingMap) parse(form url.Values) error {
        for _, elem := range p {
                if err := elem.Fct(elem.Field, elem.Dest); err != nil {
                        return err
                }
        }
        return nil
}

We know can put everything together:

package main

import (
        "fmt"
        "log"
        "net/http"
        "net/url"
        "strconv"
)

// conversion helpers
func parseBool(s string, dest interface{}) error {
        d, ok := dest.(*bool)
        if !ok {
                return fmt.Errorf("wrong type for parseBool: %T", dest)
        }
        // assume error = false
        *d, _ = strconv.ParseBool(s)
        return nil
}

func parseInt(s string, dest interface{}) error {
        d, ok := dest.(*int)
        if !ok {
                return fmt.Errorf("wrong type for parseInt: %T", dest)
        }
        n, err := strconv.Atoi(s)
        if err != nil {
                return err
        }
        *d = n
        return nil
}

// parsingMap
type parsingMap []parsingMapElem

type parsingMapElem struct {
        Field string
        Fct   func(string, interface{}) error
        Dest  interface{}
}

func (p parsingMap) parse(form url.Values) error {
        for _, elem := range p {
                if err := elem.Fct(elem.Field, elem.Dest); err != nil {
                        return err
                }
        }
        return nil
}

// http server
type query struct {
        Limit  int
        DryRun bool
}

func handler(w http.ResponseWriter, req *http.Request) {
        if err := req.ParseForm(); err != nil {
                log.Printf("Error parsing form: %s", err)
                return
        }
        q := &query{}
        if err := (parsingMap{
                {"limit", parseInt, &q.Limit},
                {"dryrun", parseBool, &q.DryRun},
        }).parse(req.Form); err != nil {
                log.Printf("Error parsing query string: %s", err)
                return
        }

        fmt.Fprintf(w, "hello world. Limit: %d, Dryrun: %t\n", q.Limit, q.DryRun)
}

func main() {
        log.Fatal(http.ListenAndServe(":8080", http.HandlerFunc(handler)))
}

Going Further

I wrote this small library: https://github.com/creack/httpreq which provides more helpers and a cleaner API. It fits my current use case, but feel free to add any helper that can be missing :)

Thursday, July 23, 2015

HTTP and Error management in Go



HTTP and Error management in Go

Go comes with a great standard library including net/http which allow a developer to create a reliable http server very easily.

TL;DR

Code and example can be found here: https://github.com/creack/ehttp

Basic http server

To provide context, let’s take a look at a basic http server in Go:

package main

import (
        "fmt"
        "log"
        "net/http"
)

func handler(w http.ResponseWriter, req *http.Request) {
        fmt.Fprintln(w, "hello world")
}

func main() {
        http.HandleFunc("/", handler)
        log.Fatal(http.ListenAndServe(":8080", nil))
}

The first step is to define a handler. In order to be understood by the Go’s http library, the handler needs to follow this specific prototype: func(http.ResponseWriter, *http.Request) which is defined as the http.HandlerFunc type.
Once we have our handler, we can register it on a specific route and then start the server.

This is great! In very few lines of code, we have a working web server ready to go!

Error management

You might have noticed: our handler does not return an error… But luckily, the net/http package exposes the http.Error function in order to report an error to the client. This will set the Content-Type, send a custom http status header and write the error as the body.

Small example:

func handler(w http.ResponseWriter, req *http.Request) {
        if err := doSomething(); err != nil {
                http.Error(w, err.Error(), http.StatusInternalServerError)
                return
        }
        fmt.Fprintln(w, "hello world")
}

As you can imagine, this can become cumbersome pretty fast. We could imagine writing a wrapper for http.Error which is going to log the error, send instrumentations and then call http.Error, but when doing a lot in a handler, we always need to have that call + return.
A nice way to go would be to consider the handler as a simple entrypoint and avoid doing any logic directly in the handler. This is a good approach, especially when you don’t want to be tight to http and able to switch to other protocols.

Custom Handler

A solution is to create a custom handler, let’s try:

type HandlerFunc func(http.ResponseWriter, *http.Request) error

Pretty straight forward for now, but wait, http.HandleFunc expects a different prototype, so we won’t be able to use it anymore, right?
Kind of.. Right in a sense that we can’t use it directly, but we always can work around anything ;)

http.HandlerFunc

In order to “convert” our custom handler to the native http one, we need to write a middleware and we are going to use that to handle our error management.

So, what is a middleware? It a simple “layer” that comes in between the client’s request and our final handler.

// MWError is the main middleware. When an error is returned, it send
// the data to the client if the header hasn't been sent yet, otherwise, log them.
func MWError(hdlr HandlerFunc) http.HandlerFunc {
        return func(w http.ResponseWriter, req *http.Request) {
                if err := hdlr(w, req); err != nil {
                        http.Error(w, err.Error(), http.StatusInternalError)
                        return
                }
        }
}

So, we have a function that takes our custom handler function as a parameter and return a native http HandlerFunc.

router.HandleFunc("/", MWError(hdlr))

When the client calls our server, it ends up in that newly generated native http Handler which in turn calls our custom one and then handle the error.

http.Handler

Alternatively, we can implement the http.Handler interface on our custom type so it can be used by all the http functions expecting that interface (I am thinking mainly about http.Handle and http.ListenAndServe

type HandlerFunc func(http.ResponseWriter, *http.Request) error

func (hdlr HandlerFunc) ServeHTTP(w http.ResponseWriter, req *http.Request) {
        if err := hdlr(w, req); err != nil {
                http.Error(w, err.Error(), http.StatusInternalError)
                return
        }
}

Going further

Now that we have a custom http handler and we can return error, we can think about improvements:
- smarter error management
- custom error type holding the HTTP Status
- panic recovery
- adaptors for non-standard library routers (gorilla, httprouter, etc)
- headers detection (we can’t send error headers if they have already been sent)

I started a small library that implements all this: https://github.com/creack/ehttp, it is very very simple and close to the standard library.

Friday, July 10, 2015

Reverse Proxy in Go



Reverse Proxy in Go

TL;DR

The final code can be found here: https://github.com/creack/goproxy

Goal

In this article, we are going to dive into the standard library's Reverse Proxy and see how to use it as a load balancer with persistent connections that doesn't lose any requests!

Here is our example setup:

  • Service One - version 1 running on http://localhost:9091/ and http://localhost:9092/
  • Reverse Proxy on http://localhost:9090/< service name>/< service version>/

When calling http://localhost:9090/serviceone/v1/, we want the proxy to balance between
http://localhost:9091/ and http://localhost:9092/ without loosing any request if one of the hosts goes down.

Standard Library Example

Let’s start with the doc: http://godoc.org/net/http/httputil#ReverseProxy.
We can see that the ReverseProxy structure has the ServerHTTP method, which means that we can use it as HTTP router directly with http.ListenAndServe.
There is also NewSingleHostReverseProxy, which sound great: we have an example on how to instantiate a ReverseProxy that works with a single host! So let’s see what it looks like:

// NewSingleHostReverseProxy returns a new ReverseProxy that rewrites
// URLs to the scheme, host, and base path provided in target. If the
// target's path is "/base" and the incoming request was for "/dir",
// the target request will be for /base/dir.
func NewSingleHostReverseProxy(target *url.URL) *ReverseProxy {
        targetQuery := target.RawQuery
        director := func(req *http.Request) {
                req.URL.Scheme = target.Scheme
                req.URL.Host = target.Host
                req.URL.Path = singleJoiningSlash(target.Path, req.URL.Path)
                if targetQuery == "" || req.URL.RawQuery == "" {
                        req.URL.RawQuery = targetQuery + req.URL.RawQuery
                } else {
                        req.URL.RawQuery = targetQuery + "&" + req.URL.RawQuery
                }
        }
        return &ReverseProxy{Director: director}
}

The function takes a target as a parameter. This is going to be our target host URL.
Let’s skip the RawQuery part, it is simply used to forward properly the query string arguments.
Then we have director which we then give to the ReverseProxy object. This is what defines the behavior of our reverse proxy.
That director function takes the destination query as a parameter and needs to update it with the expected parameter. First, we need to set the request’s URL, the important parts are the Scheme and Host. The Path and RawQuery are used to manipulate the HTTP route.

So let’s try!

First, let’s write a small http server which is going to be our target server:

package main

import (
        "log"
        "net/http"
        "os"
        "strconv"
)

func main() {
        if len(os.Args) != 2 {
                log.Fatalf("Usage: %s <port>", os.Args[0])
        }
        if _, err := strconv.Atoi(os.Args[1]); err != nil {
                log.Fatalf("Invalid port: %s (%s)\n", os.Args[1], err)
        }

        http.HandleFunc("/", func(w http.ResponseWriter, req *http.Request) {
                println("--->", os.Args[1], req.URL.String())
        })
        http.ListenAndServe(":"+os.Args[1], nil)
}

This small http server listens on the first command line argument port and when called, displays the port and the http request url.

Now, let’s write a small reverse proxy:

package main

import (
        "net/http"
        "net/http/httputil"
        "net/url"
)

func main() {
        proxy := httputil.NewSingleHostReverseProxy(&url.URL{
                Scheme: "http",
                Host:   "localhost:9091",
        })
        http.ListenAndServe(":9090", proxy)
}

The code is straight forward: We create a new single host reverse proxy that targets http://localhost:9091/ and listens on 9090.

Try it! It works fine. curl http://localhost:9090 forwards properly to our http server running on 9091.

Multiple hosts case

The example we saw is working great and is very simple, but not really useful in production. What if we want to have more than one host?

Director

As we saw earlier, the main logic of the reverse proxy resides in the Director member. So let’s try to create our own ReverseProxy object.
We are going to copy/paste the httputil.NewSingleHostReverseProxy code and change the prototype to take a slice of url so we can balance between given hosts and alter the code to use a random host from the given ones.

package main

import (
        "log"
        "math/rand"
        "net/http"
        "net/http/httputil"
        "net/url"
)

// NewMultipleHostReverseProxy creates a reverse proxy that will randomly
// select a host from the passed `targets`
func NewMultipleHostReverseProxy(targets []*url.URL) *httputil.ReverseProxy {
        director := func(req *http.Request) {
                target := targets[rand.Int()%len(targets)]
                req.URL.Scheme = target.Scheme
                req.URL.Host = target.Host
                req.URL.Path = target.Path
        }
        return &httputil.ReverseProxy{Director: director}
}

func main() {
        proxy := NewMultipleHostReverseProxy([]*url.URL{
                {
                        Scheme: "http",
                        Host:   "localhost:9091",
                },
                {
                        Scheme: "http",
                        Host:   "localhost:9092",
                },
        })
        log.Fatal(http.ListenAndServe(":9090", proxy))
}

Demo time

Caveat

At the end of the previous demo, I kill one of the http server and we can see that the reverse proxy yield errors when hitting that host. This result in request loss, which is not ideal. Having a host going down happens, it should be the role of our proxy to make sure the client’s request reaches the expected target.

In order to understand what is going on, let’s dive in the ServerHTTP method. We can see at the beginning:

        transport := p.Transport
        if transport == nil {
                transport = http.DefaultTransport
        }

This means that because we didn’t provide a Transport object, the reverse proxy will use the default one.
Now let’s take a look at the default Transport:

var DefaultTransport RoundTripper = &Transport{
        Proxy: ProxyFromEnvironment,
        Dial: (&net.Dialer{
                Timeout:   30 * time.Second,
                KeepAlive: 30 * time.Second,
        }).Dial,
        TLSHandshakeTimeout: 10 * time.Second,
}

Proxy is a function that will apply the proxy settings, by default, it looks up the env HTTP_PROXY and co.
The next one is more interesting: Dial. It defines how to establish the connection to the target host. The default Transport uses the Dialer from net with some timeouts/keepalive settings.

The error yielded by the reverse proxy when one host went down is: http: proxy error: dial tcp 127.0.0.1:9091: getsockopt: connection refused. It is pretty clear: the issue comes from Dial.

To understand the behavior, let’s extend a bit our code to add some output so we can see exactly what gets called and when.

package main

import (
        "log"
        "math/rand"
        "net"
        "net/http"
        "net/http/httputil"
        "net/url"
        "time"
)

// NewMultipleHostReverseProxy creates a reverse proxy that will randomly
// select a host from the passed `targets`
func NewMultipleHostReverseProxy(targets []*url.URL) *httputil.ReverseProxy {
        director := func(req *http.Request) {
                println("CALLING DIRECTOR")
                target := targets[rand.Int()%len(targets)]
                req.URL.Scheme = target.Scheme
                req.URL.Host = target.Host
                req.URL.Path = target.Path
        }
        return &httputil.ReverseProxy{
                Director: director,
                Transport: &http.Transport{
                        Proxy: func(req *http.Request) (*url.URL, error) {
                                println("CALLING PROXY")
                                return http.ProxyFromEnvironment(req)
                        },
                        Dial: func(network, addr string) (net.Conn, error) {
                                println("CALLING DIAL")
                                conn, err := (&net.Dialer{
                                        Timeout:   30 * time.Second,
                                        KeepAlive: 30 * time.Second,
                                }).Dial(network, addr)
                                if err != nil {
                                        println("Error during DIAL:", err.Error())
                                }
                                return conn, err
                        },
                        TLSHandshakeTimeout: 10 * time.Second,
                },
        }
}

func main() {
        proxy := NewMultipleHostReverseProxy([]*url.URL{
                {
                        Scheme: "http",
                        Host:   "localhost:9091",
                },
                {
                        Scheme: "http",
                        Host:   "localhost:9092",
                },
        })
        log.Fatal(http.ListenAndServe(":9090", proxy))
}

What did we do? We simply reused the code of http.DefaultTransport and add some logging.

More Verbose Demo

As we can see, Dial is called only the first time Director yields a host, after that it reuses the already existing connection in the internal’s pool of ReverseProxy
When one of the servers goes away, the ReverseProxy receives EOF and remove the connection from the pool resulting in a new call to Dial upon next request.

Routing

Let’s put the request loss on the side for the moment and address the routing based on the request’s path.

Service Registry

In order to easily lookup an endpoint for a given service, let’s create a small Registry type instead of using a slice of *url.URL:

type Registry map[string][]string

var ServiceRegistry = Registry{
    "serviceone/v1": {
        "localhost:9091",
        "localhost:9092",
    },
}

Extract Service and Version from Request

In order to know what service we are targeting, we use the /serviceName/serviceVersion/ prefix in the path.

func extractNameVersion(target *url.URL) (name, version string, err error) {
        path := target.Path
        // Trim the leading `/`
        if len(path) > 1 && path[0] == '/' {
                path = path[1:]
        }
        // Explode on `/` and make sure we have at least
        // 2 elements (service name and version)
        tmp := strings.Split(path, "/")
        if len(tmp) < 2 {
                return "", "", fmt.Errorf("Invalid path")
        }
        name, version = tmp[0], tmp[1]
        // Rewrite the request's path without the prefix.
        target.Path = "/" + strings.Join(tmp[2:], "/")
        return name, version, nil
}

It is pretty straightforwrd but wait, where does that target *url.URL comes from?
You might have guess, it is the req.URL from our Director.

Registry Example

Let’s put all this together based on our first multi host example:

package main

import (
        "log"
        "math/rand"
        "net"
        "net/http"
        "net/http/httputil"
        "net/url"
        "time"
)

type Registry map[string][]string

func extractNameVersion(target *url.URL) (name, version string, err error) {
        path := target.Path
        // Trim the leading `/`
        if len(path) > 1 && path[0] == '/' {
                path = path[1:]
        }
        // Explode on `/` and make sure we have at least
        // 2 elements (service name and version)
        tmp := strings.Split(path, "/")
        if len(tmp) < 2 {
                return "", "", fmt.Errorf("Invalid path")
        }
        name, version = tmp[0], tmp[1]
        // Rewrite the request's path without the prefix.
        target.Path = "/" + strings.Join(tmp[2:], "/")
        return name, version, nil
}

// NewMultipleHostReverseProxy creates a reverse proxy that will randomly
// select a host from the passed `targets`
func NewMultipleHostReverseProxy(reg Registry) *httputil.ReverseProxy {
        director := func(req *http.Request) {
                name, version, err := extractNameVersion(req.URL)
                if err != nil {
                    log.Print(err)
                    return
                }
                endpoints := reg[name+"/"+version]
                if len(endpoints) == 0 {
                        log.Printf("Service/Version not found")
                        return
                }
                req.URL.Scheme = "http"
                req.URL.Host = endpoints[rand.Int()%len(endpoints)]
        }
        return &httputil.ReverseProxy{
                Director: director,
        }
}

func main() {
        proxy := NewMultipleHostReverseProxy(Registry{
                        "serviceone/v1": {"localhost:9091"},
                        "serviceone/v2": {"localhost:9092"},
        })
        log.Fatal(http.ListenAndServe(":9090", proxy))
}

We now have a working load balancer!
But we still have an issue when a host goes down..

Avoid loosing request

So, what can we do? When a host is down, the error comes from Dial but our logic is in Director.
So let’s move the logic to Dial! Indeed, it would be great but there is one big issue:
Dial does not know anything about the request: we can’t lookup the target service endpoint list.
In order to work around this, we are going to do something a bit hackish: use the Request’s Host has a placeholder!
We are going to put serviceName/serviceVersion has a string inside the Request which later on will be passed on to Dial where we can lookup the endpoints for our services.

func NewMultipleHostReverseProxy(reg Registry) *httputil.ReverseProxy {
    director := func(req *http.Request) {
        name, version, err := extractNameVersion(req.URL)
        if err != nil {
            log.Print(err)
            return
        }
        req.URL.Scheme = "http"
        req.URL.Host = name + "/" + version
    }
    return &httputil.ReverseProxy{
        Director: director,
        Transport: &http.Transport{
            Proxy: http.ProxyFromEnvironment,
            Dial: func(network, addr string) (net.Conn, error) {
                // Trim the `:80` added by Scheme http.
                addr = strings.Split(addr, ":")[0]
                endpoints := reg[addr]
                if len(endpoints) == 0 {
                    return nil, fmt.Errorf("Service/Version not found")
                }
                return net.Dial(network, endpoints[rand.Int()%len(endpoints)])
            },
            TLSHandshakeTimeout: 10 * time.Second,
        },
    }
}

Going further

Registry

The github.com/creack/goproxy/registry package exposes a Registry interface:

// Registry is an interface used to lookup the target host
// for a given service name / version pair.
type Registry interface {
        Add(name, version, endpoint string)                // Add an endpoint to our registry
        Delete(name, version, endpoint string)             // Remove an endpoint to our registry
        Failure(name, version, endpoint string, err error) // Mark an endpoint as failed.
        Lookup(name, version string) ([]string, error)     // Return the endpoint list for the given service name/version
}

Add and Delete are used to control the content of our registry. We might want to call Add when a new host is available and Delete when one goes away.
Failure is called when Dial fails, which probably means the target is not available anymore. We can use that method to store how many time it fails and eventually call Delete to remove the faulty host.
It is a good place to put some logging and instrumentation.
Lookup is pretty straight forward, it returns the hosts list for the given service name/version.

This interface can be implemented using ZooKeeper, etcd, consul or any service you might be using. The default implementation is a naive map.

Load Balancer

The github.com/creack/goproxy package is basically our latest example hooked with the Registry interface.

In top of NewMultiplHostReverProxy, it also exposes two functions: ExtractNameVersion and LoadBalance. They are not exposed in order to be used, but in order to be overridden.

ExtractNameVersion can be replace by a custom one in order to have a different path model.
LoadBalance is the load balancer logic. It takes the target service name and version as well as the registry and yield a net.Conn. The default one is a random but can be replaced by a smarter one.

Wednesday, July 8, 2015

Scope and Shadowing in Go



Shadowing variable in Go

Variable shadowing can be confusing in Go, let’s try to clear it up.

Case of errors

Without even maybe knowing it, you have been playing with shadowing with your errors.
Consider the following code:

package main

import (
    "io/ioutil"
    "log"
)

func main() {
    f, err := ioutil.TempFile("", "")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()

    if _, err := f.Write([]byte("hello world\n")); err != nil {
        log.Fatal(err)
    }
}

Notice that we first create two variable: f and err from the TempFile function.
We then call Write discarding the number of bytes written. We make the function call it within the if statement.
Let’s compile, it work fine.

Now, the same code with the Write call outside the if:

package main

import (
    "io/ioutil"
    "log"
)

func main() {
    f, err := ioutil.TempFile("", "")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()

    _, err := f.Write([]byte("hello world\n"))
    if err != nil {
        log.Fatal(err)
    }
}

Now, compilation fails: main.go:15: no new variables on left side of :=

So what happened?

Note that we call Write with :=, which means that we create a new variable err. In the second example, it is pretty obvious, err already exists so we can’t redeclare it. But then why did it work the first time?
Because in Go, variables are local to their scope. In the first example, we actually shadowed err within the if scope.

Simple Demo

package main

func main() {
    var err error
    _ = err
    var err error
    _ = err
}

This will obviously fail, however, if we scope the second err, it will work!

package main

func main() {
    var err error
    _ = err
    {
        var err error
        _ = err
    }
}

Package Shadowing

Consider the following code:

package main

import "fmt"

func Debugf(fmt string, args ...interface{}) {
    fmt.Printf(fmt, args...)
}

At first, it looks decent. We call Printf from the fmt package and pass the fmt variable to it.

WRONG

the fmt string from the function declaration actually shadows the package and is now “just” a variable. The compiler will complain:
We need to use a different variable name conserve access to the fmt package.

Global scope

Something to take into consideration is that a function is already a “sub scope”, it is a scope within the global scope. This means that any variable you declare within a function can shadow something from the global scope.

Just as we saw before that a variable can shadow a package, the concept is the same for global variables and functions.

Type enforcement

Just like we can shadow a package with a variable or a function, we also can shadow a variable by a new variable of any type. Shadowed variables does not need to be from the same type. This example compiles just fine:

package main

func main() {
    var a string
    _ = a
    {
        var a int
        _ = a
    }
}

Closures

The scope is very important when using embeded functions. Any variable used in a function and not declared are references to the upper scope ones.
Well known example using goroutines:

package main

import (
    "fmt"
    "time"
)

func main() {
    for _, elem := range []byte{'a', 'b', 'c'} {
        go func() {
            fmt.Printf("%c\n", elem)
        }()
    }
    time.Sleep(1e9) // Sleeping to give time to the goroutines to be executed.
}

The result is:

c
c
c

Which is not really what we wanted.
This is because the range changes elem which is referenced in the goroutine, so on short lists, it will always display the last element.

To avoid this, there are two solutions:

  • Passing variable to the function
package main

import (
    "fmt"
    "time"
)

func main() {
    for _, elem := range []byte{'a', 'b', 'c'} {
        go func(char byte) {
            fmt.Printf("%c\n", char)
        }(elem)
    }
    time.Sleep(1e9)
}
  • Create a copy of the variable in the local scope
package main

import (
    "fmt"
    "time"
)

func main() {
    for _, elem := range []byte{'a', 'b', 'c'} {
        char := elem
        go func() {
            fmt.Printf("%c\n", char)
        }()
    }
    time.Sleep(1e9)
}

In both case we get our expected result:

a
b
c

When we pass the variable to the function, we actually send a copy of the variable to the function which receives it as char. Because every goroutines gets its own copy, there is no problem.
When we make a copy of the variable, we create a new variable and assigns the value of elem to it.
We do this at each iteration, which means that for each steps, we create a new variable which the goroutine get a reference to. Each goroutine has a reference to a different variable and it work fine as well.

Now, as we know that we can shadow variable, why bother change the name? We can simply use the same name knowing that it will shadow the upper scope:

package main

import (
    "fmt"
    "time"
)

func main() {
    for _, elem := range []byte{'a', 'b', 'c'} {
        go func(elem byte) {
            fmt.Printf("%c\n", elem)
        }(elem)
    }
    time.Sleep(1e9)
}
package main

import (
    "fmt"
    "time"
)

func main() {
    for _, elem := range []byte{'a', 'b', 'c'} {
        elem := elem
        go func() {
            fmt.Printf("%c\n", elem)
        }()
    }
    time.Sleep(1e9)
}

When we pass the variable to the function, the same thing happens, we pass a copy of the variable to the function which gets it with the name elem with the correct value. From this scope, because the variable is shadowed, we can’t impact the elem from the upper scope and any change made will be applied only within this scope.
When we make a copy of the variable, same as before: we create a new variable and assigns the value of elem to it. In this case, that new variable happens to have the same name as the other one but the idea stays the same: new variable + assign value. As we create a new variable within the scope with the same name we effectively shadow that variable while keeping it’s value.

Case of :=

When using := with multiple return functions (or type assertion, channel receive and map access), we can endup with 3 variables out of 2 statements:

package main

func main() {
    var iface interface{}

    str, ok := iface.(string)
    if ok {
        println(str)
    }
    buf, ok := iface.([]byte)
    if ok {
        println(string(buf))
    }
}

In this situation, ok does not get shadowed, it simply gets overridden. Which is why ok can’t change type.
Doing so in a scope, however, would shadow the variable and allow for a different type:

package main

func main() {
    var m = map[string]interface{}{}

    elem, ok := m["test"]
    if ok {
        str, ok := elem.(string)
        if ok {
            println(str)
        }
    }
}

Conclusion

Shadowing can be very useful but needs to be something to keep in mind to avoid unexpected behavior.
It is of course on a case basis, it often helps readability and safety, but can also reduce it.

In the example of the goroutines, because it is a trivial example, it is more readable to shadow, but in a more complex situation, it might be best to use different names to make sure what you are modifying.
In an other hand, however, especially for errors, it is a very powerful tool.
Going back to my first example:

package main

import (
    "io/ioutil"
    "log"
)

func main() {
    f, err := ioutil.TempFile("", "")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()

    if _, err := f.Write([]byte("hello world\n")); err != nil {
        err = nil
    }
    // err is still the one form TempFile
}

In this situation, shadowing err within the gives a warranty that previous errors will not be impacted whereas if with the same code we used = instead of := in the if, it would not have shadowed the variable but override the value of the error.

Saturday, June 20, 2015

Privileged Listen in Go

Introduction

Go does not play well with Forks and User permission. The reason is because it is a threaded runtime and there is no mechanism for clean fork or clean setuid.

To work around the fork issue, Go exposes syscall.ForkExec() which perform the fork (locked) but always perform an Exec() in the forked process, resulting in the calling one to disappear (overridden).

The Issue.

https://github.com/golang/go/issues/1435

$> GOMAXPROCS=4 ./test 65534 65534

and note output:

goroutine 1: uid=0 euid=0 gid=0 egid=0
goroutine 2: uid=0 euid=0 gid=0 egid=0
goroutine 3: uid=0 euid=0 gid=0 egid=0
goroutine 4: uid=0 euid=0 gid=0 egid=0
goroutine 5: uid=0 euid=0 gid=0 egid=0
goroutine 6: uid=0 euid=0 gid=0 egid=0
goroutine 7: uid=0 euid=0 gid=0 egid=0
goroutine 8: uid=0 euid=0 gid=0 egid=0
goroutine 9: uid=0 euid=0 gid=0 egid=0
goroutine 0: uid=65534 euid=65534 gid=65534 egid=65534
goroutine 1: uid=0 euid=0 gid=0 egid=0
goroutine 2: uid=0 euid=0 gid=0 egid=0
goroutine 3: uid=0 euid=0 gid=0 egid=0
goroutine 4: uid=0 euid=0 gid=0 egid=0
goroutine 5: uid=0 euid=0 gid=0 egid=0
goroutine 6: uid=0 euid=0 gid=0 egid=0
goroutine 7: uid=0 euid=0 gid=0 egid=0
goroutine 8: uid=0 euid=0 gid=0 egid=0
goroutine 9: uid=0 euid=0 gid=0 egid=0
goroutine 0: uid=65534 euid=65534 gid=65534 egid=65534

It would be annoying if our http Handler was ran as root!

The solution

To solve this issue, I wrote a small util: https://github.com/creack/golisten.

The idea is simple: perform the listen as root and then override the whole process with yourself as non-root.

package main

import (
    "fmt"
    "log"
    "net/http"
    "os/user"

    "github.com/creack/golisten"
)

func handler(w http.ResponseWriter, req *http.Request) {
    u, err := user.Current()
    if err != nil {
        log.Printf("Error getting user: %s", err)
        return
    }
    fmt.Fprintf(w, "%s\n", u.Uid)
}

func main() {
    http.HandleFunc("/", handler)
    log.Fatal(golisten.ListenAndServe("guillaume", ":80", nil))
}

Note

golisten is intended to be used with the http lib, but the concept can be used for any privilege de-escalation.
It is safe because we override the whole process, so all our active thread are from the child process.

Caveat

As we re-exec ourself, we need to be careful with what is done prior the call the golisten.
Maybe a bit more cumbersome, but it is best to use golisten.Listen at the beginning and pass around the listener to http.Serve() later on.

package main

import (
        "fmt"
        "log"
        "net/http"
        "os/user"

        "github.com/creack/golisten"
)

func handler(w http.ResponseWriter, req *http.Request) {
        u, err := user.Current()
        if err != nil {
                log.Printf("Error getting user: %s", err)
                return
        }
        fmt.Fprintf(w, "%s\n", u.Uid)
}

func main() {
        ln, err := golisten.Listen("guillaume", "tcp", ":80")
        if err != nil {
                log.Fatal(err)
        }
        http.HandleFunc("/", handler)
        println("ready")
        log.Fatal(http.Serve(ln, nil))
}

Wednesday, June 10, 2015

Selenium/PhantomJS troubleshooting in Python



Python Selenium PhantomJS issue

The problem

After couple of minutes running, the machines crashes. Can’t ssh, can’t do anything.
On an other machine, it works fine though…

So what is going on?

Troubleshoot

After digging a bit, I figured that the machine working was running the pip package selenium==2.45.0 and the faulty one selenium==2.46.0.

The python, node, phantomjs and the rest of the pip freeze are the same.

Isolating the issue

First step: trim down what is running. Doing this gave me more time on the machine before it crashes and I noticed that there is a lot of phantomjs process running, which doesn’t seem right.

Knowing that, I tried a simple script using a single instance of selenium to see what happen.

The script (from https://realpython.com/blog/python/headless-selenium-testing-with-python-and-phantomjs/):

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get("https://duckduckgo.com/")
driver.find_element_by_id('search_form_input_homepage').send_keys("realpython")
driver.find_element_by_id("search_button_homepage").click()
print driver.current_url
driver.quit()

Nothing too complex, we instantiate the webdriver PhantomJS (which will cause the phantomjs process to start), get the duckduckgo page and display the URL.

Doing this without driver.quit() indeed start the phantomjs process, but the interesting part is that it does so has a child of node.
When calling driver.quit(), the node process does exit properly, but the phantomjs one does not.
Doing the same test on selenium==2.45.0 result in driver.quit() correctly killing node and phantomjs

Solution

Revert to 2.45 :):
pip install selenium==2.45.0.

More

At first, I though selenium tried to isolate the phantomjs process in its own process session and/or group, but after looking more closely, it appears that it is not the case.

On 2.46.0, running the test script before the quit:

$> ps  xao comm,pid,ppid,pgid,sid | grep phantom
phantomjs        84808  84806  84803  57599
$> ps  xao comm,pid,ppid,pgid,sid | grep node
node             84806  84803  84803  57599

So we do have node which is the direct parent of phantomjs and they are both on the same session and group.

After running driver.quit():

$> ps  xao comm,pid,ppid,pgid,sid | grep phantom
phantomjs        84808  1  84803  57599
$> ps  xao comm,pid,ppid,pgid,sid | grep node

Sending a SIGTERM to phantomjs does exit the process, so it means that selenium does not kill node with SIGTERM or any “hard” signal.

It would be interesting to dig into how selenium tells node to quit.