Postmortem of outage on 20th December

On 20 December, Cachix experienced a six-hour downtime, the second significant outage since the service started operating on 1 June 2018. Here are the details of what exactly happened and what has been done to prevent similar events from happening. Timeline (UTC) 02:55:07 - Backend starts to emit errors for all HTTP requests 02:56:00 - Pagerduty tries to notify me of outage via email, phone and mobile app 09:01:00 - I wake up and see the notifications 09:02:02 - Backend is restarted and recovers Root cause analysis All ~24k HTTP requests that reached the backend during the outage failed with the following exception: [Read More]

Write access control for binary caches

As Cachix is growing, I have noticed a few issues along the way: Signing keys are still the best way to upload content and not delegate trust to Cachix, but users have also found that they can be difficult to manage, particularly if the secret key needs to be rotated. At this point, the best option is to clear out the cache completely, and re-sign everything with a newly generated key. [Read More]

Changes to Garbage Collection

Based on your feedback, I have made the following two changes: When downloading <store-hash>.narinfo the timestamp of last access is updated, previously this would happen only with nar archives. This change allows tools like nix-build-uncached to prevent unneeded downloads and playing nicely with Cachix garbage collection algorithm! Previously, the algorithm ordered paths first by last accessed timestamp and then by creation timestamp. That worked well until you had all entries with last accessed and all newly created store paths will get deleted first. [Read More]

Upstream caches: avoiding pushing paths in cache.nixos.org

One of the most requested features, the so-called upstream caches was released today. It is enabled by default for all caches, and the owner of the binary cache can disable it via Settings. When you push store paths to Cachix, querying cache.nixos.org adds overhead of multiples of 100ms, but you save storage and possibly minutes for avoiding the pushing of already available paths. Queries to cache.nixos.org are also cached, so that subsequent push operations do not have the overhead. [Read More]

Documentation and More Documentation

Documentation is an important ingredient of a successful software project. Last few weeks I’ve worked on improving status quo on two fronts: https://nix.dev is an opinionated guide for developers getting things done using the Nix ecosystem. A few highlights: Getting started repository template with a tutorial for using declarative and reproducible developer environments Setting up GitHub Actions with Nix Nix language anti-patterns to avoid and recommended alternatives [Read More]

Proposal for improving Nix error messages

I’m lucky to be in touch with a lot of people that use Nix day to day. One of the most occouring annoyances that pops up more frequently with those starting with Nix are confusing error messages. Since Nix community has previously succesfully stepped up and funded removal of Perl to reduce barriers for source code contributions, I think we ought to do the same for removing barriers when using Nix. [Read More]

CDN and double storage size

Cachix - Nix binary cache hosting, has grown quite a bit in recent months in terms of day to day usage and that was mostly noticable on bandwidth. Over 3000 GB were served in December 2019. CDN by CloudFlare Increased usage prompted a few backend machine instance upgrades to handle concurrent upload/downloads, but it became clear it’s time to abandon single machine infrastructure. As of today, all binary caches are served by CloudFlare CDN. [Read More]