Bunker Labs

In Defense of Systems that Fail

December 6, 2023

by Chloé VULQUIN

In recent times, as an industry, we've been building systems that are progressively less likely to fail. From the erlang's internal retries, to formally verified languages, to rust's borrow checker. These all have a place. However, it's a different thing to say that those projects and languages that are not in the same category do not have a place.

Making systems that are resistant (making a fail-free system is physically impossible, everything can and will go wrong) to failure is not free. The costs are many and come from various places. This type of software is harder and takes longer to write. The languages intended to make this easier require particular styles and restrictions to make it possible. They require additional resources to run, and are much more complex to tune, as making a perfect system is impossible, and therefore specific tuning is left to configuration time. Such systems demand more of the writer, the user, and the environment.

Furthermore, attempts to make systems fail less can actually make them fail more. The loss of a quorum is a common example in early HA (High Availability) SQL setups. There's nothing wrong anywhere, all that happened was a minor ping latency hiccup. A packet arrived late, or retried a few times, and by the time it made it over, the nodes decided that quorum is lost. What you end up with is a soft failure, where everything is running but needs manual intervention to run as intended. Were it a smaller, simpler, system, this issue would not be noticed in any way. The attempt to make something more failure-resistant has made it more sensitive to other types of environmental issues. This is another cost to these systems.

As with any tradeoff, then, it's important to make a cost-benefit analysis to determine if this particular tradeoff is worthwhile in the circumstances at hand. Let's start with the benefits.

Why would you want a system that doesn't fail, or at least fails less often? Well, for one, it's annoying when your system is down! Perhaps that's enough to justify the above costs if your system fails all the time. Then again, if your system fails all the time, you might have other problems. No, it would take something bigger, a real cost to failure.

These fail-resistant systems are obviously critical in applications like medicine, aerospace, and more, where failure is not an acceptable option. You definitely don't want a program ensuring your safety and ability to operate a rocket to fail due to unexpected user input (being chromium and javascript), or perhaps fail to deploy an automated parachute. The cost of failure in these cases is a human cost. As such, it's perfectly acceptable to sink a lot of effort into avoiding it.

Moving to somewhat less dramatic pastures, sometimes the cost is not human, but monetary. Software businesses and businesses relying on software for mission-critical operations can attribute a real and direct cost to every second of downtime. The Amazons and the Facebooks and the Fords of the world can perform an elementary analysis of dollar per second vs dollar per increased safety feature. It is then no wonder that it is often these large companies that are searching for ever more developers for these languages, those experienced in building failure-resistant systems. A rust developer likely won't even ask for more money than a C++ one, even if they probably should 🦀.

Finally, sometimes, the cost is legal. Sure, no one's going to die if the service dies randomly once every year. Sure, you're not losing tangible money out of it, or even intangible. But gosh darn, you signed that paper that promised a given SLA, so now you have to deliver it. Your hands are tied, and it hardly matters how you increase availability, but you've got to.

You'll notice that conspicuously missing from all of these scenarios is the small scale with low stakes. If you're hosting an RSS reader for your friends, whether it goes down once a year or three times a year doesn't matter – you're probably rebooting the single machine it's on more often than that. Oh, sure, it could go down even less if you set up HA, but why bother? No one will die, you won't lose any money, hell chances are no one will even notice, including you!

The cost-benefit analysis simply doesn't pan out. Most people already don't have HA set up for their personal or homelab services, the cost to the environment and to their maintenance of the software is higher than they're willing to accept for the little benefit gained. At those scales, performance, too, hardly matters. Or, more accurately, throughput is irrelevant, while latency is crucial. Have I mentioned making systems failure-resistant incurs a latency penalty?

So why then are the people running their own services often running “production-ready” “professional” “HA” systems? There are a few answers. Firstly, the cost is not visible to the user. They can just not enable the HA features (well, sometimes, anyway). And since they're not contributing to the software, nor are they the author, they do not see the increased maintenance cost. Additionally, oftentimes they won't be aware of any alternatives, if they even exist at all. The personal cost of writing something new is much higher than the cost of dealing with whatever is already out there, while smaller already-written systems will often be personalized or too small to be easily discovered. This does not, however, mean that it's not worth doing.

In the meanwhile, what pushes people to write software that is failure-resistant? Here too, the answers are many, but are not difficult either. The most obvious one is that failure conditions are bad. Nobody likes them! Humans are notoriously bad at estimating costs, time required and similar. It's a common joke that engineers will simply answer “it will be done once it is done”, and have no input to provide beyond this. An author is disproportionately likely to underestimate the cost that they are making themselves pay in advance, and therefore highly likely to not perform a cost-benefit analysis for their use case.

Sometimes, though, there is an analysis going on, though it is of a different nature. Sometimes, people write software not because they want to use it, but because they hope to get a job, or build a portfolio, or for someone else's needs. The biggest demand for software comes from corporations, and for the reasons mentioned above, corporations like HA failure-resistant systems, and are the entities most likely to hire you. As such, if you're writing the software for any of the above reasons, you're also much more likely to make them failure-resistant.

Because of all of this, most pieces of infrastructural software, where the stakes are low for most, medium for some, but never of a human cost, tend to be failure-resistant, and absolutely a pain to run. Authoritative DNS servers, email servers, collaboration software, and more. All of these suffer from this effect. So consequently, many people will never host their own authoritative DNS, or their own email, and so on. This in turn creates space for corporations to provide these as services for others. Since the name of the game is convenience, those aren't always performant, or even configured in a failure-resistant way, putting us back in square one. The price of this societal level issue ends up being paid by those that do not have the time to maintain software that is as a category needlessly obtuse, either monetarily, or by not having access to such functionality at all.

So in conclusion, and as advice – don't blindly make things failure-resistant. Perform cost-benefit analysis, avoid underestimating the costs to yourself, and if it is so justified, optimize, but not prematurely. It's ok for your software to fail sometimes; it's ok to focus on the happy code path; sometimes it might just make the world better to fail.

One of the reasons bunker labs posts are the way there are is precisely for this reason. In practice, failure-resistance is bolted on after the fact. When it isn't, you had better be getting paid well for it (again, rust devs, ask for higher salaries, I am not kidding). The final advantage to building systems that fail is that you still built a system; learned more about the subject and improved in your craft. That's what I hope to help achieve with what I write, at least on here. You can potentially use what I write in the real world, but maybe write your own instead. :)

Simply Parse in C

July 18, 2023

by Chloe Kudryavtsev

People are terrified of parsers and parsing. To the point of using magical libraries with custom syntaxes to learn just to get started. In the hopes of completely shattering this preconception, I will write a parser for the “ini” file format in about 150 lines of pure and readable ISO C99. Furthermore, this parser will be something that's nice to use and has error correcting features, such that it is actually useful and usable outside this example setting.

Design

There's no standard for the ini format, and all the existing implementations disagree, so let's take some liberties, XKCD 927 style. No newlines in keys, values, or section names. Empty values are not allowed. Comments only on their own lines (minus whitespace). Whitespace-insensitive (whitespace at the start of line, end of line, around the “=”, is all ignored). No need for a terminating newline either. Oh that's more than most C ini parsers do? Isn't that convenient.

To get the jargon out of the way, the parser will be recursive-descent LL(1).

The API will have a single entry point. You call parse_ini against a FILE*, with optional userdata (just any context the API user wants to have available in their callback), and a required callback. The callback will be called for every key-value pair, and it will receive the section, key, and value, alongside the userdata. All of the strings in question will be temporary private data, and must be copied over (for simplicity). If a given terminal does not fit inside the private data, the given value is truncated, but the parsing continues without error.

Implementing

Let's start by writing the top-level function. We macro-define the max length. This is done firstly so that it can be changed (by surrounding it with ifdefs later), and secondly because we're allocating temporary memory. Typical parsers already operate in quadratic time, adding countless allocations and reallocations for a use-case where all values are typically small isn't reasonable.

#define INI_SEC_MAXLEN 64
#define INI_KEY_MAXLEN INI_SEC_MAXLEN
#define INI_VAL_MAXLEN INI_KEY_MAXLEN * 16

Now we can write the top-level parse_ini function. The effective PEG for it is ini <- expr*. We start from the top-level because we have a specific idea for the UX, so we need to write the rest of the functions in a way as to ensure we can deliver on it.

// if the callback returns non-zero, parsing will stop
typedef int (*callback)(const char*, const char*, const char*, void*);
// we return the number of bytes handled, sort of, you'll see
int parse_ini(FILE *src, void *userdata, callback cb) {
    char section[INI_SEC_MAXLEN] = {0};
    char key[INI_KEY_MAXLEN] = {0};
    char value[INI_VAL_MAXLEN] = {0};
    int status, out = 0;
    // we stop going whenever we fail to consume any data, explicitly error,
    // the stream errors, or the stream ends
    while ((status = parse_expr(src, userdata, section, key, value, cb) >= 0) {
        out += status;
        if (feof(src) || ferror(src)) break;
    }
    return ferror(src) ? -out : out;
}

Nothing too complicated here, you just iterate the stream until something makes us stop. We return negative values in the case of an irrecoverable stream error that isn't EOF. We get into the real meat in parse_expr.

Note that all the future functions and definitions are static, but I'm omitting that keyword (as well as the occasional inline for brevity – you don't need to see it in this case).

In parse_expr, we want to parse an expression. In an ini file, you have comments, section declarations, and key-value pairs. We also want to skip whitespace. So let's do exactly that. The effective PEG is expr <- ws* (section / comment / kv).

int parse_expr(FILE *src, void *userdata, char *section, char *key, char *value, callback cb) {
    int len = parse_skipws(src);
    if (len) return len; // to avoid confusing byte counts, we're in a loop anyway

    int c;
    // figure out the next expression
    switch ((c = fgetc(src))) {
        case EOF:
            return 0; // let the outer loop figure out if this was an error or not
        case '[': // a section
            return parse_section(src, section);
        case '#':
        case ';':
            return parse_skipuntil(src, "\n");
        default: // key-value pair
            ungetc(c, src); // we need to conserve this one
            return parse_kv(src, userdata, section, key, value, cb);
    }
}

Note: we don't need to ungetc in sections or comments, since the only real use of the character is to identify the type of expression. Before we get into the implementations of parse_section and parse_kv, let's write the utility functions we'll need, such as parse_skipuntil.

Let's start with parse_skipwhile and parse_skipuntil, which will skip characters while a condition holds, or until a condition occurs. We'll define a condition as the next character being in a secondary argument (a string), a check we can perform using strchr. Neither of these have direct PEG alternatives, since this approach is closer to a lexer technique, which we can utilize precisely because we're writing a hand parser.

int parse_skipuntil(FILE *src, const char *s) {
    int out = 0;
    for (int c; (c = fgetc(src)) != EOF; out++) {
        if (strchr(s, c)) return out;
    }
    return ferror(src) ? -out : out;
}
int parse_skipwhile(FILE *src, const char *s) {
    int out = 0;
    for (int c; (c = fgetc(src)) != EOF; out++) {
        if (!strchr(s, c)) {
            ungetc(c, s);
            return out;
        }
    }
    return ferror(src) ? -out : out;
}

Note that they're almost the same. The primary notable difference is the negation of the strchr call, and that skipwhile does an ungetc (skipuntil consumes the terminator).

We can now define parse_skipws. It's actually quite simple.

// the characters we consider to be whitespace
const char wss[] = " \t\r\n";
#define parse_skipws(src) parse_skipwhile(src, wss)

Before we move on to the juicy stuff, we'll implement one more thing: parse_until. Semantically, it's the same as parse_skipuntil, but it will actually write what it reads into a string pointer, up to its maximum length. Then it just becomes parse_skipuntil. You could do the same thing with parse_skipwhile too.

int parse_until(FILE *src, char *ptr, ssize_t maxlen, const char *s) {
    int out = 0, c;
    while (out < maxlen) {
        c = fgetc(src);
        if (*ptr == EOF) { // hit error while scanning
            *ptr = 0;
            return ferror(src) ? -out : out;
        } else if (strchr(s, c)) {
            *ptr = 0;
            return out;
        }
        (*ptr++) = c; out++;
    }
    // we only make it here if we hit maxlen
    (*--ptr) = 0;
    int skipped = parse_skipwhile(src, s);
    if (skipped > 0) {
        return out + skipped; // errors are negative, eof is ok
    }
    return ferror(src) ? (skipped - out) : (out - skipped);
}

We can now write parse_section and such!

#define parse_section(src, section) parse_until(src, section, INI_SEC_MAXLEN, "]\n")

Yeah, that's all there is to it. Note that we also allow terminating with a newline. This way, if someone forgot a “]”, we can still parse it. Though it does break comments on the same line.

Let's also write parse_key and parse_value, since we'll need them in parse_kv in a second.

int parse_key(FILE *src, char *key) {
    int out = parse_until(src, key, INI_KEY_MAXLEN, "=\n");
    return stripright(key, wss);
}
int parse_value(FILE *src, char *value) {
    int out = parse_until(src, value, INI_VAL_MAXLEN, "\n");
    return stripright(value, wss);
}

First of all, we do the same thing with the key that we did with the section: if a key is not terminated by an “=”, presume the newline is the “=”. This way, in the absolute worst-case scenario, two key-value pairs get corrupted, and the rest parses fine, though it may be a good idea to check the value of “out” too.

Secondly, you'll note the “stripright”. We already discard leading whitespace in parse_expr, but to strip trailing whitespace after the key (so before the “=”) and the value (so before the “\n”), we'll need to perform string brain surgery.

It's not too bad though, look:

// returns the number of output bytes
int stripright(char *c, const char *s) {
    ssize_t len = strlen(s);
    if (!len) return len; // already empty
    while ((--len) >= 0 && strchr(s, c[len])) {}
    // either strchr failed or len is now -1
    if (len < 0) {
        *c = 0;
        return 0;
    }
    c[++len] = 0;
    return len;
}

We can now finish the parser out with the most complex function in it: parse_kv with an incredible 12 lines of code, including a callback. The closest PEG analogue is kv <- key ws* value (where's the “=”? parse_kv takes care of it, and the whitespace before it for us).

int parse_kv(FILE *src, void *userdata, const char *section, char *key, char *value, callback cb) {
    int len = 0, tmp;

    tmp = parse_key(src, key); // consumes the =
    // if the key doesn't have a value or errored, we can't continue
    if (tmp <= 0 || feof(src)) return 0;
    len += tmp;

    tmp = parse_skipws(src); // the whitespace after the "="
    // if the value would have been empty, it ends up using the next line as the value
    // it may not error or eof, but it may be empty
    if (tmp < 0 || feof(src)) return 0;
    len += tmp;

    tmp = parse_value(src, value);
    // any errors are fine, since we've finished parsing now
    len += tmp > 0 ? tmp : -tmp;

    // let callback request terminating the parse by returning non-zero
    if (cb(section, key, value, userdata)) len *= -1;
    return len;
}

And that's it, you have a functioning parser.

Discussion

To explain what's going on here on a less “here's the code, enjoy” level, we need to explain some of the jargon.

A grammar is the definition of the language being parsed. One may intuitively presume that the grammar for a given language or format is merely itself, i.e. that the BNF form is canonical. However, this is not the case. A grammar is actually dependent upon the parser in use. To discuss the platonic ideal of the language being recognized, we use the term “language”.

Any given language can have an arbitrary number of grammars describing it. These grammars are considered functionally equivalent. Since the language rarely if ever specifies expected resolutions to edge-conditions (if it did, it would be a “deterministic” language), grammars can be functionally equivalent while exhibiting different behaviors.

For example, let's consider the Lua language. The Lua language includes the following rules:

prefixexp <- var | functioncall | '(' exp ')'
functioncall <- prefixexp args | prefixexp ':' Name args
var <- Name | prefixexp '[' exp ']' | prefixexp '.' Name

This grammar is left-recursive. This means that a category of parsing algorithms cannot parse it! There are a number of reformulations that you can apply to achieve a different grammar that does not have the left-recursion problem, which will then recognize the Lua language successfully, but may have different edge-case behaviors (that are nowhere in the spec).

In short, there are a few reasons that parsing is a mess, and none of those reasons are actually resolvable by parser generators. For example, you will need to modify your grammar to get it to run on, say, tree-sitter, or antlr4, or Janet's PEGs.

A lot of the problem and solution space actually exists in the grammar definition, language design, and implementation flexibility. In this project, I set out to demonstrate all three.

For instance, you may notice that there is essentially zero error-reporting in the entire library. This is because I built the grammar in such a way that most inputs that are likely to actually occur are valid! Those that aren't will typically occur at the end of the file, and thus not affect the overall utility. Similarly, LL(1) is typically lexed, but I built the grammar in such a way as to be able to do LL(1) scannerless.

To summarize, to be good at parsing, you do not need to be an academic wizard, or to know all parsing theory. You need to be aware of the typical failure modes and make your own job easier by building in the flexibility you will then immediately use. The only exception is when you're implementing a strict specification that does specify what the edge-conditions do. Thankfully, those virtually do not exist, since specification writers often are aware of this. This is why the BNF forms of formal languages are so seemingly useless – it's there for your benefit, like undefined behavior in C, except it's actually beneficial.

Conclusion

Write your parsers, it can be fun and easy if you don't make it hard for yourself. The “true” (not edited for the article) sources of this tiny ini parsing library are available here in single-header form. It also contains an implementation of parse_while, just for you, in case you happen to be building a parser that can take advantage of the same things I did (i.e. that is an LL(1)-compatible grammar that doesn't need a lexing phase; though you'll likely want to pass around a context struct instead of three bare pointers). It can also do heap allocation, so you know.

Hopefully, this has taught you something, and you'll reach for parser generators less. I do find it amusing how I ended up publishing this one before irex (a from scratch “irregular” expression educational library). It may or may not be the next article, presuming I actually finish it.

P.S. This blog will eventually move to the bunkerlabs.net domain... somewhere. There's currently an OIDC identity provider in the works that should be trivial to self-host that will be used to authenticate various services. Long story short, hopefully sometime in the next year, this blog will be ported over to there. The front page will likely mention the new location once that's a thing (there is no front page currently).

P.P.S. The point of having written this in C is to show how small and useful an LL(1) recursive descent parser can be, even in C. It should not be taken to mean that you should use C for all your parsing needs (unless you need the single codebase to be available in every language, in which case you have no choice). These techniques are applicable to virtually every language, and can result in even smaller parsers.

It is also of note that LL(1) is generally sufficient for most things you may want to do (ini was picked since it is not parsable with all the features present here with regular expressions or string splitting (as it is not a regular language), and to ensure that the implementation could be sufficiently small; if I code golfed it, it could be under 100 LOC, but I didn't want to code golf it.). Most programming languages, for instance, are designed in a way as to be parsable with an LL(k) grammar, which are provably transformable into LL(1) grammars.

A Message Queue in Shell

April 17, 2022

I saw people being confused about message queues, so I figured I'd implement one in shell, just for fun. This would also let us explore the common pitfalls in implementing them.