Ideas

· ngp's blog

Unstructured, one-off ideas

Table of Contents

Ideas #

Things that are stewing...

Format #

New ideas are appended either in the Idea Log in a simple bulleted list, or get categorized in long form under an arbitrary heading.

Log #

Quick ideas #

NONE FOR NOW

Decentralized DNS with Validation #

We have a number of fully automated ways of managing DNS records and providing ways of validating and encrypting content. The internet at large is built on top of HTTP(S) which utilizes a complex combination of chains of trust for building x.509 certificates and validated against DNS names.

ACME is a standard protocol for generating challenges and providing TLS certificates via a series of challenges. It relies on DNS, which itself relies on a number of trusted authorities to manage the hierarchy of registrars to manage the global hierarchical database of records. It's quite impressive that it works as well as it does.

However, the DNS root zone is managed by ICANN, an American nonprofit organization whose role is to "ensure the Internet's stable and secure operation."

Meanwhile, TLS chains of trust are managed in a more decentralized manner in practice, as root certificates are typically managed by operating system vendors and open source distribution maintainers.

There is no reason these must be separate protocols, other than that they evolved independently and at different times.

Regardless, the DNS is ultimately a critical source of trust in the internet.

TLSGlue #

A CLI application, similar to socat or nc that functions as "glue" between a client and server utilizing self-signed TLS certificates for both the client and server to provide authentication. Automatic rotation and certificate management make it "set and forget" infrastructure for limited-scope sockets.

Types #

I recently had a (mostly one-sided) conversation on Twitter about some differences and trade-offs in statically typed vs dynamically typed programming languages. The pendulum of favored approach swings back and forth, probably in some abstract way tied to widespread economic factors, but I'd say we're pretty firmly on the "statically typed" side of the swing for the time being. With languages such as Rust, OCaml, Scala, Gleam, TypeScript, Typed Python, etc., it's clear that there is quite the demand for static type systems.

Hierarchical Time Series Log Compression #

Use hierarchy of identities with text templates to create "wide traces" or spans and provide log aggregation. Data gets stored as semi-structured, efficient, binary data that can be queried, analyzed, and optionally used to generate queryable text logs.

QuestDB is probably a good choice for the storage backend for this, as it will be mostly timeseries data with a small spattering of FTS. However my goals for such a project are to allow users of the system to chose their own backing store. The core concepts are intended to be extremely easy to reason about, extend, and provide alternative implementations.

Labeled namespaces are not a new concept. The Domain Name System (DNS) is a quintessential example of a distributed hierarchical system in use today. While DNS is quite complicated, the core ideas are mostly simple.

So why not use it?

Utilize SRV records that points to one or more hosts responsible for providing the directory.

The Directory #

The directory is a simple mapping numeric key namespaces to metadata about the key. This might include a type, migrations, and other declarations about sub-keys. There may be multiple directories, and each directory (in general) will provide metadata for a single root namespace.

It's just a key-value store that holds metadata for the entire system. It must be modeled in append-only time-series fashion to support multiple simultaneous versions of the schema. It stores things such as the text representation for a given key within a namespace, the (optional) type associated with a key, and templates for turning groups of related keys into textual log statements. Support for multiple simultaneous versions is necessary to support simultaneous deployments of different versions of the same software.

How do we associate a given version of software with a version of the schema? TBD. I haven't figured that part out yet, but it should be possible in most languages. Possibly utilizing associated credential trees. I intend to make mTLS the only support form of authentication/authorization, as it is simple to manage in both large and small deployments, cryptographically secure, and relies on battle-tested technology.

In general, applications may only create records, not query them. Querying of records is allowed but discouraged. Special applications whose role it is to consume records must take care to be in sync with the schema of a given keyset.