troglodyne.net : blog

/index:

Landlock: new sandbox on the block 🔗 1744041836

The core reason why selinux and other mandatory access control schemes have failed is because they do not integrate well into developer workflows. As such the only parties which implement it are large organizations with infinite resources to hurl at such a problem. Everyone else turns them off because it's far, far too much work; even for distro package maintainers.

seccomp-eBPF changed all of this. Now that you could filter syscalls in the kernel, sandboxing by nonroot processes was straightforwardly possible. Individual application authors & package maintainers can ship their own rules without stepping on anyone else's toes, and easily rule out interfering rules from other programs. When released, this resulted in a number of similar solutions like firejail, bubblewrap and others.

It seems there's a new effort in this sphere, called Landlock. The core question here is how is this any better? Why should I use this? From a capabilities point of view, it won't be more capable than the eBPF based solutions. What differentiates it as far as I can tell is:

Deny-by-Default
Focus on adding restrictions to Kernel Objects rather than filtering syscalls

The latter obviously will perform better than eBPF. They go on to state the obvious in their paper that this will mitigate DDOS attempts. There is similar puffery that this mitigates side-channel timing attacks (it just introduces different ones).

It remains a systematic frustration with security programs that articles written about them by their own authors bury the lede or attempt to baffle with BS. That is unavoidable, unfortunately, due to being a corporate (in this case Microsoft) project.

Unfortunately, there remain a number of syscalls that are still plenty scary that don't touch kernel objects. As such seccomp/eBPF can't be abandoned in favor of this. Increased complexity for marginal gains in all but the most demanding environments. Gonna be a hard sell for most developers.

Other hurdles to sandboxing

The core remaining hurdle to actually using sandboxing on any system is dynamically linked dependencies. A properly sandboxed and chrooted environment which has access to all such deps is usually so open as to be little better than doing nothing. The only way to cut that gordian knot is to ape Apple and mount / (or wherever your libdirs are) ro. Distros like SuSE's MicroOS have embraced this with enthusiasm, so I would suspect sandboxing may finally become ubiquitous. Whether distros go beyond eBPF and embrace Landlock remains to be seen. seccomp'd distro packaged apps remain rare outside of flatpak/snap, which are themselves about as beloved as skin diseases with end-users, and tremendously wasteful due to being cross-platform.

Many also rightly feel trepidation that ro system partitions are a foot in the door for "secure boot" (read: you no longer control your hardware). SystemD recently implementing support for just that, and increasing numbers of ARM based servers means linux could become cellphones faster than we might think. For good, and ill.

Problems with CPAN 🔗 1724338205

🏷️ perl 🏷️ cpan

Those of you who don't lurk the various perl5 groups on social media or P5P may be unaware that there are a number of problems with CPAN, largely with regard to how namespaces are doled out. Essentially the first distribution to claim it gets to squat there forever, whether you like it or not. If the maintainer does not wish for new patches to ever be added, such as is the case with DBIX::Class, longstanding custom prohibits this.

Can the state of affairs be changed? Is this compatible with the various open source licenses and terms of use of PAUSE? The core of it comes down to this passage in the PAUSE rules:

You may only upload files for which you have a right to distribute. This generally means either: (a) You created them, so own the copyright; or (b) Someone else created them, and shared them under a license that gives you the right to distribute them.

Nearly everything on CPAN has a license for which forking is entirely compatible. Similarly, nearly all of them permit patching. As such a variety of solutions have been proposed.

An opt-in 'patched' version of modules available on CPAN, to account for gone/unresponsive maintainers. Implement using cpan distroprefs.
Make it clear that ownership of a namespace remains in control of the PAUSE admins, rather than the code authors. This would cut the gordian knot of things like DBIX::Class.
More radical changes, such as "aping our betters" over at NPM and adding a number of their nifty features (security information, private packages, npm fund, etc)

I personally favor the latter over the long term, and doing what we can now. The perl build toolchain is overdue for a revamp, and abusing things like github releases would massively cut down on the storage requirements for PAUSE. While the Github 'Modules' registry would be ideal, it does not support our model, so releases it would have to be.

How to get from here to there?

I suppose the idea would be to implement NPM's featureset and call it PNPM (Perl-flavored NPM). You could have it scrape the CPAN, see which modules have primary repos on github, and if they have (non-testing) releases with a higher version number, to prefer that version of the package. That way it would be backwards compatible and give you a path to eventually move entirely off of PAUSE, and into a new model.

That said, it sounds like a lot of work. NPM itself is a business, which is why they have the model of taxing private packages for the benefit of the community at large.

One possible way forward (which would be less work for us) would be to ask if the npm crew wants to expand their business to packaging more than just node code; I imagine most of their infrastructure could be made generic and get us the featureset we want. I'd be shocked if such a thing isn't already on their roadmap, and github's.

They likely wouldn't be onboard if they didn't see a viable route to profit from private perl package distribution. Most established perl business already have their distribution channels long-established, and wouldn't see a compelling reason to horizontally dis-integrate this part of their business unless it would be significantly cheaper.

Leveraging github would likely be key to that, as they have the needed economy of scale, even beyond things like S3/R2 people are already using in their distribution channels. NPM likely has enough juice to get new package formats added to github packages, I suspect we don't.

On the other hand there might be room in the market to exploit that gap between what github supports as packages and what you can do with good ol' fashioned releases. E.G. make an actually universal package management tool that knows how to talk to each package manager and therefore inject means of distributing (taxed) private packages for mutual benefit with a % kicked back to the relevant language foundation. Might be worth researching and pitching to VCs.

Back to reality

In the meantime, it's fairly obvious the PAUSE admins could fix the main problems with a little time and will. That's probably the best we'll get.

Perl: Dead and loving it 🔗 1724260931

🏷️ perl

Internet people love to spray their feelings about everything under the sun at every passerby. Perl, being a programming language, is no exception. At the end of the day, all the sound and fury signifies nothing. While I've largely laid out my perspective on this subject here, I suspect it's not quite the engagement bait people crave. Here's some red meat.

The reality is that multi-bilion dollar businesses have been built on infinitely worse stacks than what modern perl brings to the table. What's your excuse, loser? Quit whining and build.

Sturgeon's Law applies to everything, programs, languages and their authors included. 90% of the time you will be driving your stack like you stole it until the wheels fall off, swatting flies with elephant guns, yak shaving and putting vault doors on crack houses. What matters is that you focus the 10% of time that you are "on" where it counts for your business.

You only have a limited amount of time on this earth, and much less where you are in the zone. It will almost never be a good use of that time learning the umpteenth new programming language beyond the bare minimum to get what you want done. So don't do it if you can avoid it.

There are many other areas in life where we engage in rational ignorance; your trade will be no exception. Learning things before you use them is a waste of time, because you will forget most of it. I've forgotten more math than most people ever learn.

Having written in more than 20 programming languages now, the feeling I have about all of them is the same.

They all have footguns, and they're usually the useful part of the language.
Some aspect of the build toolchain is so bad you nearly have an aneurysm.
I retreat into writing SQL to escape the pain as much as possible. It's the actually superior/universal programming language, sorry to burst your bubble
FFI and tools for building microservices are good enough you can use basically any other library in any other language if you need to.
HTML/JS are the only user interface you should ever use aside from a TTY, everything else is worse, and you'll need it eventually anyways.

I reject the entire premise that lack of interest in a programming language ought to matter to anyone who isn't fishing for engagement on social media. Most people have no interest whatsoever in the so-last-millenium Newton-Rhapson method, and yet it's a core part of this LLM craze. Useful technology has a way of not dying. What is useful about perl will survive, and so will your programming career if you stick along for the ride.

Remember the craftsman's motto: Maintain > Repair > Replace. Your time would be better spent not whining on forums, and instead writing more documentation and unit tests. If you spend your free time on that stuff, I would advise you to do what the kids say, and "Touch Grass". Otherwise how are you gonna tell the kids to get off your damned lawn?

Why all companies eventually decide to switch to the new hotness

You can show management the repeated case studies that:

switching programming languages is always such a time-consuming disaster they lose significant market share (forgetting features, falling behind competitors)
Statistically significant differences in project failure rate at firms is independent of the programming languages used

it bounces off them for two reasons:

Corn-Pone opinions. Naturally managers are predisposed to bigger projects & responsibilities, as that's the path onward and upwards.
"Senior" developers who know little about the existing tech stack, and rationally prefer to work on what they are familiar with.

The discerning among you are probably thinking "Aha! What if you have enough TOP MEN that know what's up?" This is in fact the core of the actual problem at the firm: turnover. They wouldn't even consider switching if those folks were still there.

They could vertically integrate a pipeline to train new employees to extend their lease on life, but that's quite unfashionable these days. In general that consists of:

University Partnerships
Robust paid modding scenes
The Tech -> QA -> Dev pipeline

The first is a pure cost center, the second also births competitors, and the third relies on hiring overqualified people as techs/QAs. Not exactly a fun sell to shareholders who want profit now, a C-level that has levered the firm to the point there is no wiggle room, and managers who would rather "milk the plant" and get a promotion than be a hero and get run over.

This should come as no shock. The immediate costs are why most firms eschew vertical integration. However, an ounce of prevention is worth a pound of cure. Some things are too important to leave to chance, and unfortunately this is one of them.

Ultimately, the organization, like all others before it, at some point succumbs to either age or the usual corporate pathologies which result in bouts of extreme turnover. This is the curse of all mature programming languages and organizations. Man and his works are mortal; we all pay the wages of our sins.

Conclusion

This "Ain't your grandpappy's perl", and it can't be. It's only as good as we who use perl are. Strap in, you are playing calvinball. Regardless of which language you choose, whether you like it or not, you are stuck in this game. It's entirely your choice whether it is fun and productive, or it is a grave.

Net::OpenSSH::More 🔗 1723163198

🏷️ perl 🏷️ ssh 🏷️ cpan-module

We have released to the CPAN a package that implements some of the parts of Net::OpenSSH that were left as "an exercise to the reader." This is based on Andy and my experiences over at cPanel's QA department among other things. It differs in important ways from what was used in the QA department there (they also have moved on to a less bespoke testing framework nowadays):

It is a "true" subclass, and any methods that are in particular specific to Linux can be implemented similarly as a subclass of Net::OpenSSH::More.
It *automatically* reconnects upon dropped connections, etc. and in general is just a lot nicer to use if you wanted an "easy mode" SSH accessor lib for executing commands.
Execution can be made *significantly* faster via a stable implementation of using expect to manage a persistent SSH connection.

Many thanks of course go out to current and former colleagues (and cPanel/WebPros specifically), as despite their perl QA SSH libraries not being an actual product cPanel ever used anywhere publicly, methods like these eased remotely manipulating hosts we wished to run black box testing upon so much that even authors with minimal training in perl could successfully use to write tests that executed against a remote host. This is of value for organizations that have a need to run tests in an isolated environment, or for anyone wishing to build their own CI suite or SSH based orchestrators in perl without having to discover a lot of the hard "lessons learned" that almost anyone who has had to use Net::OpenSSH extensively will eventually learn.

Of course, the most profuse of thanks go out to Salvador, who provided the excellent module this extends.

Eventually we plan to extend this package to do even more (hehe), but for now figured this was good enough to release, as it has what's probably the most useful bits already.

Using libvirt with terraform on ubuntu 🔗 1723038220

🏷️ terraform 🏷️ ubuntu 🏷️ libvirt

In short, do what is suggested here.

For the long version, this is a problem because terraform absolutely insists on total hamfisted control of its resources, including libvirt pools. This means that it must create a new one which is necessarily outside of the realm of it's apparmor rules. As such you have to turn that stuff off in the libvirt config file.

Important stuff now that I'm using it to deploy resources.

Other useful things to remember

hold escape after reboot to get a boot menu to go single-user when using virtual console via virt-manager, etc.
virsh net dhcp leases default - get the 'local' IPs of the VMs so spawned
Cloud-init logs live in /var/log/cloud-init*.log
Overall result lives in /var/lib/cloud/data/result.json, you can read this automatically with your tooling.
The scripts you run (what you generally care about) live in /var/lib/cloud/instances/$PROVIDER/scripts/runcmd

Longer term I should build a configuration script for the HV to properly setup SELinux contexts, but hey.

How I learned to love postfix for in perl 🔗 1722982036

🏷️ perl

Suppose you do a common thing like a mapping into a hash but decide to filter the input first:

shit.pl
my %hash = map {
    "here" => $_
} grep {
    -d $_
} qw{a b c d .};

This claims there is a syntax eror on line 6 where the grep starts. This is a clear violation of the principle of least-astonishment as both the map and grep work by themselves when not chained. We can fix this by assigning $_ like so:

fixed.pl
my %hash = map {
    my $subj = $_;
    "here" => $subj
} grep {
    my $subj = $_;
    -d $subj
} qw{a b c d .};

Now we get what we expect, which is no syntax error. This offends the inveterate golfer in me, but it is in many critic rules for a reason. In particular when nested lexical scope inside of the map/grep body is a problem, which is not the case here.

But never fear, there is a superior construct to map in all cases...postfix for!

oxyclean.pl
my %hash = "here" => $_ for grep { -d $_ } qw{a b c .};

No syntax errors and it's a oneliner. It's also faster due to not assigning a lexical scope.

On General Aviation 🔗 1722897713

🏷️ aviation

I wanted to be a pilot as a young man. While I did learn to fly, I ended up a mathematician, programmer and tester. Even then I am incredibly frustrated by corporate pathologies which prevent progress and meaningful improvement at the firms and industries I interact with. But it's nothing compared to the dead hand which smothers "General Aviation", which is how you learn to fly if you aren't one of the pampered princes of the USAF. This is not to say the USAF isn't dysfunctional (it is), but that GA is how I learned to fly, and frankly how most people would in a properly functioning situation.

I usually don't talk about it because it rarely comes up. Look up in the sky and 99 times out of 100 you'll see nothing unless you live next to an international airport. Sometimes people complain about "crowded" airspace and I want some of what they're smoking. You could easily fit 1000x more active aircraft in the sky safely.

Imagine my surprise when I see Y Combinator is taking their turn in the barrel. I wonder what has them so hopeful? If I were to hazard a guess, it comes from the qualification at the end of their post where they mention "a plethora of other problems that make flying cumbersome". Here are my thoughts on the ones they mentioned.

weight and balance worksheets - A sensor package which could detect excessive loading or too aft a CG is possible.
complicated route planning - When I last flew 20 years ago, filing a flight plan was a phone call and you needed to understand the jargon to make it happen. I suspect it's no different today. A computerized improvement is entirely possible.
talking to ATC - Comms tech in aviation remains "get them on the radio" even in cases where route adjustments could be pushed as data via a sideband. Ideally getting people on the main freq is the exception rather than the rule.
lengthy preflight checks - Could largely be automated by redundant sensors. Ultimately like cars it could be an "idiot light", which I'm sure instantaneously raises blood pressure in most pilots.
a fractured system of FBOs - Who do I call to file my flight plan? It's (not) surprising this hasn't yet been solved; there's a cottage industry of middlemen here. My kingdom for a TXT record.
difficult access to instruction - In town, good luck learning to fly. Half hour of procedural compliance for a 10 minute flight, all on the clock. Not to mention dealing with gridlock traffic there and back. There are nowhere near the number of airports needed for flight to be remotely accessible to the common man.

They go on to state "the list goes on. We are working on all of these too". Good luck, they'll need it. The FAA is legendarily hidebound and triply so when it comes to GA. Everyone before them who tried was gleefully beheaded by the federal crab bucket.

All this stuff is pretty obvious to anyone who flies and understands tech, but this regulatory environment ruthlessly selects against people who understand tech. Why would you want to waste your life working on airframes and powerplants with no meaningful updates in more than a half-century? Or beat your head against the brick wall of the approvals process to introduce engine tech that was old hat in cars 50 years ago?

It's not shocking the FAA and GA in general is this way. Anyone who can improve the situation quickly figures out this is a club they ain't in, and never will be. Everyone dumb/stubborn enough to remain simply confirms the biases the regulators have about folks in "indian country". Once a brain drain starts, it takes concerted effort to stop. The feds do not care at all about that problem and likely never will.

This is for two reasons. First, the CAB (predecessor of FAA) strangled the aviation industry on purpose in service of TWA in particular, and that legacy continues to poison the well. The aviation industry has exactly the kind of "revolving door" criticized in many other regulatory and federal contractor situations. This is why they don't devote a single thought to things like "which FBO should I call". Like with any other regulator the only answer to all questions is "read their minds" (have one of 'em on the payroll).

Second, there is no "General aviation" lobby thanks to this century-long suppression, so nobody in politics cares about fixing this. Like with the sorry state of rocketry pre-Spacex, this will require a truly extreme amount of work, no small amount of luck, and downright chicanery to cut the gordian knot. I love that the founders of this firm are Spacex alums, perhaps they have what it takes.

How to fix Wedged BIND due to master-master replication 🔗 1722479687

🏷️ dns 🏷️ bind


rm /var/named/_default.nzd

In short, you have to nuke the zone database with the remote zone that says "HEY IM DA MASTA" when you have the local zone going "NUH UH, ME MASTER". This means you'll have to manually rebuild all the remote zones, tough shit. There is no other solution, as there's no safe way to actually putz with the binary version of _default.nzf. Seasoned BIND hands would tell you "well, don't do that in the first place". I would say, yes I agree. Don't use BIND in the first place.

Fear and loathing at YAPC 🔗 1720542356

Despite being the worst attended YAPC in recent memory, 2024's show in Vegas had some of the best talks in a long while. In no particular order, the ones I remember after a week are:

Damian's talk - Implements most of what you want out of a type system in perl, one of the points in my testing talk
Demetrios's talk - Savings from this alone will save me more than the conference cost me
Gavin Hayes' WASM talk - has big implications in general, and I will try this in playwright-perl soon
Gavin's APPerl talk - I can see a use for this with clients immediately
Exodist's Yearly roundup of what's new in test2 - The PSGI app he's built into it implements a lot of my testing talk's wish list
Cromedome's Build a better readme - Good practical marketing advice

I would have loved to have seen the velociperl fellow show up, but I can't say I'm shocked given how attempts circumvent P5P paralysis in such a manner have ended up for the initiators thus far.

This year we had another Science track in addition to the perl and raku tracks which I submitted my testing talk to. In no particular order, the ones I enjoyed were:

Adam Russell's paper - Using LLMs to make building semantic maps no longer pulling teeth? sign me up!
Andrew O'Neil's paper - Like with 3D printing, these handheld spectrographs are going to change the world.

The track generated a fair bit of controversy due to a combination of Will and Brett being habitual irritants of Perl's In-Group, miscommunication and associated promotional efforts. While I regard their efforts as being in good faith, I doubt the TPRF board sees it that way, given they issued something of a condemnation during the final day's lightning talks. Every year somebody ends up being the hate object; victims need to be sacrificed to hutzilopotchli to keep the sun rising on the next conference.

That being said, the next conference is very much in doubt. Due mostly to corporate sponsorship of employee attendance largely drying up, the foundation took a bath on this one. I'm sure that the waves of mutual excommunications and factionalism in the perl "community" at large hasn't helped, but most of those who put on such airs wouldn't deign to have attended in the first place. My only productive thought would be to see what it is the Japanese perl conference is doing, and ape our betters. Lots of attendance, and they're even doing a second one this year. Must be doing something right.

My Talks

I got positive feedback on both of my talks. I suspect the one with the most impact will be the playwright one, as it has immediate practical impact for most in attendance. That said, I had the most productive discussions coming out of the testing talk. In particular the bit at the start where I went over the case for testing in general exposed a lot of new concepts to people. One of the retirees in the audience who raised the point that the future was "Dilbert instead of Deming" was right on the money. Most managers have never even heard of Deming or Juran, much less implemented their ideas.

Nevertheless, I suspect it was too "political" for some to call out fraud where I see it. I would point out that my particular example used (Boeing) is being prosecuted for fraud as of this writing. Nevertheless, everyone expects they'll get a slap on the wrist. While "the ideal amount of fraud in a system is nonzero" as patio11 puts it, the systematic distribution of it and near complete lack of punishment is (as mentioned in the talk) quite corrosive to public order. It has similar effects in the firm.

My lack of tolerance for short-sighted defrauding of customers and shareholders has got me fired on 3 occasions in my life, and I've fired clients over it. I no longer fear any retaliation for this, and as such was able to go into depth on why to choose quality instead. Besides, a reputation for uncompromising honesty has it's own benefits. Sometimes people want to be seen as cleaning up their act, after all.

I enjoyed very much working with LaTeX again to write the paper. I think I'll end up writing a book on testing at some point.

I should be able to get a couple of good talks ready for next year, supposing it happens. I might make it to the LPW, and definitely plan on attending the Japanese conference next year.

Why configuration models matter: WebServers 🔗 1715714895

Back when I worked at cPanel, I implemented a feature to have customer virtualhosts automatically redirect to SSL if they had a valid cert and were configured to re-up it via LetsEncrypt (or other providers). However this came with a significant caveat -- it could not work on servers where the operator overrode our default vhost template. There is no way you can sanely inject rules into an environment where I don't even know if the template is valid. At least not in the amount of time we had to implement the project.

Why did we have this system of "templates" which were then rendered and injected into Apache's configuration file? This is because it's configuration model is ass-backwards and has no mechanism for overriding configs for specific vhosts. Its fundamental primitive is a "location" or "directory" which have a value which is either a filesystem or URI path component.

Ideally this would instead be a particular vhost name, such as "", "127.0.0.1, "foobar.test" or even multiple of them. But because it isn't we saw no benefit to using the common means of separating configs for separate things (like vhosts), the "config.d" directory. Instead we parse and generate the main config file anytime a relevant change happens. In short we had to build a configuration manager, which means that now manual edits to fix anything will always get stomped. The only way around that is to have user-editable templates that are used by the manager (which we implemented by a $template_file.local override).

Nginx recognized this, and their server primitive directive is organized around vhosts. However they did not go all the way and make it to where you could have multiple server blocks referring to the same vhost with the last one encountered, say in the config.d/ directory, taking precedence. It is not stated in the documentation, but later directives referring to the same host do the same thing as apache. As such configuration managers are still needed when dealing with nginx in a shared hosting context.

This is most unfortunate as it does not allow the classic solution to many such problems in programming to be utilized: progressive rendering pipelines. Ideally you would have a configuration model like so:


vhost * {
    # Global config goes here
    ...
}

include "/etc/httpd/conf.d/*.conf"

# Therein we have two files, "00-clients-common.conf"
vhost "foobar.test" "baz.test" {
    # Configuration common to various domains go here, overrides previously seen keys for the vhost(s)
    ...
}

# And also "foobar.test.conf"
vhost "foobar.test" {
    # configuration specific to this vhost goes here, overrides previously seen keys for the vhost
    ....
}

The failure by the web server authors to adopt such a configuration model has made configuration managers necessary. Had they adopted the correct configuration model they would not be, and cPanel's "redirect this vhost to ssl" checkbox would work even with client overrides. This is yet another reason much of the web has relegated the web server to the role of "shut up and be a reverse proxy for my app".

At one point another developer at cPanel decided he hated that we "could not have nice things" in this regard and figured out a way we could have our cake and eat it too via mod_macro. However it never was prioritized and died on the vine. Anyone who works in corporate long enough has a thousand stories like this. Like tears in rain.

nginx also doesn't have an equivalent to mod_macro. One of the few places apache is in fact better. But not good enough to justify switching from "shut up and reverse proxy".

Why you should use the Rename-In-Place pattern in your code rather than fcntl() locking 🔗 1714508024

🏷️ perl

Today I submitted a minor patch for File::Slurper::Temp. Unfortunately the POD there doesn't tell you why you would want to use this module. Here's why.

It implements the 'rename-in-place' pattern for editing files. This is useful when you have multiple processes reading from a file which may be written to at any time. That roughly aligns with "any non-trivial perl application". I'm sure this module is not the only one on CPAN that implements this, but it does work out of the box with File::Slurper, which is my current favorite file reader/writer.

Why not just lock a file?

If you do not lock a file under these conditions, eventually a reader will consume a partially written file. For serialized data, this is the same as corruption.

Using traditional POSIX file locking with fcntl() using an RW lock comes with a number of drawbacks:

It does not work on NFS - at all
Readers will have to handle EINTR correctly (e.g. Retry)
In the event the lock/write code is killed midstream you need something to bash the file open again

Writing to a temporary file, and then renaming it to the target file solves these problems.

This is because rename() just changes the inode for the file. Existing readers continue reading the stale old inode happily, never encountering corrupt data. This of course means there is a window of time where stale data is used (e.g. the implicit TOCTOU implied in any action dependent on fread()). Update your cache invalidation logic accordingly, or be OK with "eventual consistency".

Be aware of one drawback here: The temporary file (by default) is in the same directory as the target as a means of avoiding EXDEV. This is the error you get from attempting to rename() across devices, as fcopy() is more appropriate there. If you are say, globbing across a directory with no filter, hilarity may ensue. You should change this to some other directory which is periodically cleaned on the same disk, or given enough time & script kills it will fill.

KYC: A bad idea for the hosting industry 🔗 1714071973

🏷️ regulation

I try not to ever get political if I can help it here, as that's always the wrong kind of attention for a business to attract. However I'm going to have to today, as the eye of sauron is directly affixed on my industry today. If that's not for you, I encourage you to skip this article.

As of this writing, there is a proposed rule change working its way through the bowels of the Department of Commerce. Hot on the heels of the so-called "TikTok ban" (which would more rightly be called forced divestiture e.g. "nationalization through the back door"), this rule change would require all web hosting, colo and virtual service providers to subject their customers to a KYC process of some sort.

The trouble always lies in that "of some sort". In practice the only way to comply with regulations is to have a Contact Man [1]" with juice at the agency that thinks like they think. Why is this? Because regulations are always what the regulator and administrative law judge think they are. Neither ignorance or full knowledge of the law is an effective defense; only telepathy is.

This means you have to have a fully loaded expense tacked onto your business. Such bureaucrats rarely come cheap, oftentimes commanding six figure salaries and requiring support staff to boot. Compliance can't ever be fully automated, as you will always be a step behind whatever hobgoblin has taken a hold of the bureau today.

Obviously this precludes the viability of the "mom and pop hosting shop", and even most of our mittlestand. This is atop the reduction in overall demand due to people who don't value a website as much as their privacy, or the hassle of the KYC process itself. This will cause widespread economic damage to an industry already reeling from the amortization changes to R&D expenses. This is not the end of the costs however.

KYC means you have to keep yet more sensitive customer information atop things like PII and CC numbers. This means even more stuff you have to engage in complicated schemes to secure, and yet another thing you have to insure and indemnify against breach.

However the risks don't stop with cyber-criminals looking to steal identities. The whole point of KYC is to have a list that the state can subpoena whenever they are feeling their oats. Such information is just more rope they can put around you and your customers' necks when that time comes. Anytime you interact with the state, you lose -- it's just a matter of how much. This increases that "how much" greatly.

Do you think they won't go on a fishing expedition based on this information? Do you really trust a prosecutor not to threaten leaking your book to a competitor as a way of coercing a plea, or the local PD holding it over you for protection money? Don't be a fool. You'll need to keep these records in another jurisdiction to minimize these risks.

On top of this, no actual problem (e.g. cybercrime) will be addressed via these means (indeed these problems will be made manifestly worse). Just like in the banking world, the people who need to engage in shenanigans will remain fully capable of doing so. No perfect rule or correct interpretation thereof exists or can exist. The savvy operators will find the "hole in the sheet" and launder money, run foreign intel ops and much worse on US servers just as much as they do now.

A few small-time operators will get nicked when the agency needs to look good and get more budget. The benefit to society of removing those criminals will be overwhelmed by the negatives imposed to business and the taxpayer at large.

Many other arguments could easily be made against this, such as the dubious legality of administrative "law" in the first place. Similarly, this dragooning of firms into being ersatz cops seems a rather obvious 13th amendment violation to me. However just like with regulators, the law is whatever judges think it is. Your or my opinion and the law as written is of no consequence whatsoever. As such you should expect further consolidation and the grip of the dead hand to squeeze our industry ever tighter from now on.

Notes

[1] Günter Reimann - "The Vampire Economy", Ch. 4

ARC and the SRS: Stop the email insanity 🔗 1713224239

🏷️ email

There's a problem with most of the mail providers recently requiring SPF+DKIM+DMARC. Lots of MTAs (exchange, mailman etc) are notorious for rewriting emails for a variety of reasons. This naturally breaks DKIM, as they don't have the needed private key to sign messages which they are forwarding. And given the nature of the email oligopoly means you absolutely have to be under the protection of one of the big mass mailers with juice at MICROS~1 or Google, this necessitated a means to "re-sign" emails.

This is where SRS came in as the first solution. Easy, just strip the DKIM sig and rewrite the sender right? Wrong. Now you are liable for all the spam forwarded by your users. Back to the drawing board!

So, now we have ARC. We're gonna build a ~~wall~~ chain of trust, and we're gonna make google pay for it! But wait, all DKIM signatures are self-signed. Which means that peer-to-peer trust has to be established. Good luck with that as one of the mittlestand out there. Google can't think that small.

I can't help but think we've solved this problem before. Maybe in like, web browsers. You might think that adopting the CA infrastructure in Email just won't work. You'd be wrong. At the end of the day, I trust LetsEncrypt 100000% more than Google or MICROS~1.

So how do we fix email?

The core problem solved by SPF/DKIM/DMARC/SRS/ARC is simple. Spoofing. The sender and recipient want an absolute guarantee the message is not adulterated, and that both sides are who they say they are. The web solved this long ago with the combination of SSL and DNS. We can do the same, and address the pernicious reality of metadata leaks in the protocol.

Email servers will generally accept anything with a To, From, Subject and Body. So, let's give it to them.


To: $RECIPIENT_SHA_1_SUM@recipient-domain.test
From: $USERNAME_SHA_1_SUM@sender-domain.test
Subject: Decrypt and then queue this mail plz

(encrypted blob containing actual email here)

Yo dawg, I heard you liked email and security so I put an encrypted email inside your email so you can queue while you queue

Unfortunately, for this to work we have to fix email clients to send these mails which ha ha, will never happen; S/MIME and PGP being case in point. From there we would have to have servers understand them, which is not actually difficult. Servers that don't understand these mails will bounce them, like they do to misconfigured mails (such as those with bad SPF/DKIM/DMARC/SRS/ARC anyways). There are also well established means by which email servers discover whether X feature is supported (EHLO, etc), and gracefully degrades to doing it the old dumbass way. When things are supported it works like this:

We fetch the relevant cert for the sender domain, which is provided to us by the sending server.
We barf if it's self-signed
We decrypt the body, and directly queue it IFF the From: and Envelope From: are both from the relevant domain, and the username's sha1 sum matches that of the original from.
Continue in a similar vein if the recipient matches and exists.

From there you can drop all the rest of it; SPF, DKIM, DMARC what have you. Not needed. SpamAssasin and Milters continue working as normal. If it weren't for ~~forwarding~~ the fact you have to trust your mailserver with your life because all residential IPs are perma-banned, you could even encrypt the sender/reciever domains for marginally more deniability about which vhosts are communicating. That said, some scheme for passing on this info securely to forwards could be devised.

What can't be fixed is the reciever server having to decrypt the payload. The last hop can always adulterate the message due to email not actually being a peer-to-peer protocol because spam. This is what PGP & S/MIME are supposed to address, but failed to do due to not encrypting headers. Of course this could be resolved by the mailserver reaching out to the actual domain for a user that reaches out peer-to-peer and asking for a shared secret. Your mailserver could then be entirely flushed down the commode in favor of LDAP.

So why hasn't it happened, smart guy?

In short, the situation being what it is would be why everyone long ago threw up their hands and said "I may as well just implement a whole new protocol". At some point someone has to do the hard work of pushing a solution like this over the finish line, as people will not stop using email for the same reason we still use obsolete telephones. What is needed is for mailops to reject all servers without MTA-STS and sending unencrypted, adulterated emails. Full Stop.

Unfortunately the oligopoly will not, because their business model is to enable spammers; just like the USPS, that's the majority of their revenue. If I could legally weld shut my mailbox, I would. But I can't because .gov hasn't figured out a way to communicate which isn't letter or fax. It's the same situation with email for anyone running a business. The only comms worth having there are email or zoom; our days are darkness.

The security conscious & younger generations have all engaged in the digital equivalent of "white flight" and embraced alternative messaging platforms. They will make the same mistakes and have to flee once again when their new shiny also becomes a glorified adeverising delivery platform. They all eventually will flee to a new walled garden. Cue "it's a circle of liiiiife" SIIIMMMBAAAA

Is there a better solution?

Yes. It even worked for a number of years; it was called XMPP with pidgin-otr. Email clients even supported it! Unfortunately there wasn't a good bridge for talking with bad ol' email. Now everyone has forgotten XMPP even exists and are gulag'd in proprietary messengers that have troubling links to the spook aristocracy.

The bridge that has to be built would be an LDA that delivers email to an actual secure, federated messaging platform rather than a mailbox in the event it looks like it oughtta. In short, it's a sieve plugin or even a glorified procmailrc. From there you have to have a client that distinguishes between the people you can actually talk to securely rather than email yahoos. Which should also helpfully append a message at the top of the email body instructing people how to stop using email onto replies. As to the actual secure messaging platform, I'd use matrix.

There's probably a product somewhere in here to have a slick mail client which knows how to talk email, matrix, rss and activitypub. After all, we still need mailing lists, and activitypub is a better protocol for doing that. Hell, may as well make it your CMS too. More homework for me.

What roles can LLMs actually replace in a software firm? 🔗 1710443386

🏷️ machine learning

I've told a number of people that large language models are essentially Clever Hans as a service. These transformers are taking a prompting by the user, and then producing a chain of the most likely tokens to satisfy said prompt. Which is to say, they will (unless altered by "safety" filters like the commercial offerings) simply tell you what you want to hear. Early versions of these models like tay.ai made this abundantly clear, as it was (and remains) trivial to find prompts that the most natural response to would be Sieg Heil! This has made them the perfect parasocial companion, and a number of firms are making a killing running hog butchering scams with these language models, as predicted by my prior post on the subject.

So should you be worried about your job? Sure, supposing you are a glad-handing empty suit, or a vapid hand-holder. Pretty much all of the talentless power junkies and people who want to play house could be replaced tomorrow with these mindless algorithms and I doubt anyone would notice. I suspect some of the more clever of them are already automating their jobs via these tools so they can moonlight and double-dip.

Those of us who live in O-Ring world where being right and making mistakes very rarely matters, the LLMs fall flat on their face. 3/4 of Github copilot suggestions are immediately discarded by senior programmers, and more prompting is required.

This is like working with a newbie programmer who's thick as a brick; your time would be better spent fixing it yourself and moving on. It does however shine when used by said newbies; that's the silver lining to this cloud. LLMs are great at telling the young bucks what the more experienced programmers want to see. The primary productivity gain here will be using such on-ramps to distract the able less with LMGTFY-tier requests by the noobs. This is a genuine improvement on the current state of affairs, as most of the unskilled labor coming into the field has no idea what magic words to even start querying an indexer for. They'll probably spend more time torturing information out of LLMs than they would an experienced dev, but clever hans is paid with carrots.

Parallels between the Hosting and Real Estate business 🔗 1709745254

🏷️ hosting

While it seems trivially obvious that the hosting industry is engaged in building "houses for data", be they commercial (full service webmasters), residential (dedi) or multi-family (shared), the parallels run deeper than that. The same structural defects holding back real estate largely constrain hosting as well.

For instance, most hosting shops at the small scale (dedi/shared) provide an environment which is astonishingly shoddy. Very little if any improvements to the design and implementation of most hosting arrangements aside from a few exceptions have happened in over a decade. This is a similar situation to housing, and the reason is quite simple. There is a massive brain drain of pretty much anyone with intelligence and talent away from small scale construction and into megaprojects. As Dan Luu mentioned about why so few of the good practices at FAANG filter out into the general programming world, "At google, we couldn't think that small".

In the construction industry this resulted in people building houses as though we were still living in the pre air conditioning days, with vented attics and the like until quite recently. If it ain't broke, don't fix it...except it was quite broken in a number of important qualitative ways, especially for homes in the southern US. Putting ducts outside of the conditioned space results in a great deal of wasted energy, and the vented nature necessarily provides ingress for vermin of various kinds. Now that efficiency is increasingly a concern, practices like "monopoly framing" are finally addressing this in new construction. It turns out that we could have simply devoted a bit of thinking to the problem and provided much higher quality buildings for not much more cost, but thinking was in short supply.

Similarly in the hosting industry, most hosting arrangements are designed with a shocking disregard for efficiency or security against infiltration by vermin. Firewalls are rarely on by default, even in cases where all the running services by default have rulesets defined and known by the firewall application. Most programs never reach out to various resources that could be unshared, and many outward facing services don't run chrooted. SeLinux is never enabled, and it's common practice to not enable automatic updates. No comprehensive monitoring and alerting is done; alterations to Package manager controlled files, new user additions and configuration changes all happen sight unseen. Cgroups are not applied to limit the damage of any given rogue (or authorized!) process, and millions of zombies are happily mining crypto thanks to this.

Basically all of these subjects are well-trod ground for anyone with experience in DevOps or security. Too bad all of them have jobs in corporate, and the plebs out here get to enjoy living in the hosting equivalent of an outhouse. Cyclical downturns eventually solve this in most industries as the golden handcuffs dissolve and the able still have to feed themselves. The trouble is that at the same time talent comes available, funding for new projects dries up due to extreme reliance on leverage. It's tough to make your cap rate with 5% interest. As such, ignorance persists far longer than it has to.

I can't say I mind, more money for me. Plenty of the smart construction guys are making hay spreading the good word about superior practices on the video sites, I suspect a lot of hay can be made for an intrepid hosting entrepeneur as well.

On using Net::Server instances listening on AF_UNIX sockets in shared environments 🔗 1709145489

🏷️ www 🏷️ perl

Net::Server is the backend for most popular PSGI servers on CPAN, such as starman. In shared hosting environments, it's a common pattern to have the www files owned by the relevant user, with the group being www-data, or whatever the HTTPd uses to access things. In the context of a reverse-proxy to a PSGI server, you can be a bit more strict by having only the AF_UNIX socket given the www group. However, this requires the execute bit to be set for the group (so you can't just set a umask), and Net::Server makes no attempt to chmod the socket it creates (but will helpfully fail to chown it when running as a nonroot user if you specify a different GID, as you can't chown or setgid as nonroot users).

This obviously has security implications in a shared environment:

You have to start your PSGI server as root or a sudoer, and then instruct it to drop privs to the relevant user
You then have to fix the socket after the fact by wrapping the invocation to daemonize.
As such, you can't run things as user-mode systemd units; automating this for clients necessarily can't be self-service without some kind of script to "poke a hole in the sheet".

Back at cPanel we called such helpers "adminbins". Yet more "complexity demon" that could (and arguably should) be fixed by patching the upstream. These schleps rarely get fixed in practice, as people don't write articles about it like this. They just fix it and move on; that's the internet way -- route around damage and become a rat's nest of complexity rather than fix it. A patch will have to be submitted to add an option to set the group execute bit on the socket so created, likely here. Consumers of Net::Server would then need to plumb up to this; getting this all coordinated is quite the schlep in itself, which is why we usually don't get such nice things.

There is a clever way to have our cake and eat it too regarding not needing an adminbin. Make the user which owns the directory have the www-data group as their primary group, and make sure to set the group perms on their www files to be 0. Then you won't have to setgid or chown anything at all, and can happily run a usermode service all day.

Web components: taken to the bikeshed 🔗 1708990756

🏷️ javascript 🏷️ www

Web Components are a matter of particular amusement given many are coming back to the realization that the best way to build modular websites is via server-side templating for a variety of good reasons. This bad idea then becomes worse via shadow DOM reliance and randomizing the CSS classnames & IDs to prevent collisions when having multiple instances of the same component on a page. This pretty much screws testers who need reliable selectors; Shadow DOM means that XPath is right out, and randomized classnames & IDs means CSS selectors are likely shot too. Playwright solved this via the nuclear option of ripping the selector out of the browser's internals. Even then they can only be tested in isolation rather than in situ.

Verdict: avoid. Server side template includes are the better tool to reach for.

So you want to use client certificates instead of HTTP simple auth 🔗 1706313410

🏷️ ssl 🏷️ dns

In an earlier essay, I went over the sticky reality that is the CA infrastructure. I'd like to discuss a related subject, which is why nobody uses client certificates to restrict access to and authenticate users of websites, despite them being "supported" by browsers for many years. For those of you unfamiliar with the concept, it goes like this:

I issue a certificate for $USER, just like you would if you were a CA and $USER were a vhost.
$USER installs this certificate in their browser, (not-so-optionally) inputting a password to unlock it.
$USER opens a web page configured with the Issuer's CABundle, which asks them if they'd like to identify themselves with the cert they installed
$USER clicks yes and goes on their merry way.

There are major hiccups at basically every single step of this process. Naturally, I've got some ideas as to how one might resolve them, and will speculate as to why nobody's even remotely considered them.

Generating Client Certificates

First, if you want to generate certs like a CA does, you have two choices -- self signed, or become an "Intermediate" CA with delegated authority from a bigger CA. The big trouble with this is that getting delegation will never happen for anyone without serious juice in the industry, as it can potentially incur liability. This is why it is observed the only parties that generally use client certificates at all are those which in fact are Intermediate CAs, such as google, facebook and the like. On the other hand, if you go with self-signing, the user that imports the certificate has to import the full chain, which now means the issuer can issue certs for any site anywhere. Yes, the security model for CAs is that laughingly, astonishingly bad; this is why CAA records exist to limit the damage caused by this.

What is needed here is to dump Mozilla::CA, /etc/ssl/certs and all that towering pile of excresence in favor of a reverse CAA record. If we placed the fullchain.pem for each CA in a DNS record for a domain, we could say that this PEM is valid to sign things under this domain. For the big boys, they'd get the root zones to publish records with their PEM, and could go on signing anything and everything. However for the individual owners of domains this finally frees them to become intermediate CAs for their own domains only, and thereby not expose the delegator to potential liability. LetsEncrypt could be entirely dismantled in favor of every server becoming self-service. Thanks to these being DNS lookups, we can also do away with every computer on earth caching a thousand or so CABundles and having to keep them up to date into perpetuity.

With that implemented, each server would look at say /etc/server.key, or perhaps a hardware key, and it's software could then happily go about issuing certs to their hearts desire. One of the firms with juice at the IETF are the only ones who will move this forward, and they don't care because this is not a problem they have to solve. That leaves Pitching this as a new source of rents for the TLD authorities; I'm sure they'd love to get the big CAs to pay yasak. This could be the in to get domain owners to start paying CAs again -- nominal fee, you can sign for your domain. It's a price worth paying, unlike EV certs.

Installing client certificates

Every single browser implements this differently. Some use the built in OS key store, but the point is it's inevitably going to be putzing around in menus. A far better UX would be for the browsers to ask "hey, do you have a certificate, the page is asking for one", much like they prompt for usernames and passwords under http simple auth. This would probably be the simplest problem of these to solve, as google themselves use client certs extensively. It is a matter of academic curiosity why they have failed as of yet to scratch their own back, but a degree of schlep blindness ought to be expected at a tech firm.

Furthermore, while blank passwords are supported by openssl, some keystores will not accept this. Either the keystores need to accept this, or openssl needs to stop this. I consider the latter to be a non-starter, as there is too much reliance on this behavior everywhere.

But which cert should I use?

Browser support for having multiple certs corresponding to multiple possible logins is lacking. Separate profiles ought to do the trick, but keystores tend to be global. This problem would quickly sort itself out given the prior issues get solved as part of an adoption campaign.

The IPv6 debate is unnecessary 🔗 1705532991

🏷️ ipv6

Since Amazon is about to start charging an arm and a leg for IPv4 addresses, many have begun talking about the imminent migration to ipv6, which won't happen because ISPs still haven't budged an inch as regards actually implementing this. What's more likely is that everyone will raise prices and enjoy monopoly profits rather than upgrading decrepit routing equipment.

What's worst about this situation is that the entire problem is unnecessary in the era of ubiquitous gigabit internet. Suppose you need directions to 612 Wharf Avenue. You simply consult a map, and note down "right on dulles, left on crenshaw..." until you get to the destination. This only has to be done once, and reversed on the return trip. This is essentially how BGP works under the hood.

So the question arises: Why do we have glorified phone numbers passed in every IP packet? Performance and cost. Encoding down into a bag of 4 or 16 bytes is less work than reading 253 bytes (max for a domain name). But let's be real, it's not much less work, especially if we adopt jumbo frames by default. This is fully in the realm of "hurl more hardware & bandwidth at the problem".

The benefits to doing so are huge. You whack an entire layer of abstraction; DNS translation alone adds more latency than the overhead of passing these names in every packet. Much like how the Telcos whacked the POTS network and now emulate it over SIP, you could emulate the v4 system where needed and move on. Self-Hosted DNS (delegation) would still be possible; just like now you ultimately have to have A records for your nameserver(s) with your registrar or ISP. They would adapt the means they already use for IPs to map their own internal network topology. This scheme would have the added benefit of being able to do away with PTR records entirely.

The prospects for this happening anytime soon are quite grim, as I've never even heard anyone discuss how obviously unnecessary the IP -> DNS abstraction layer is. More's the pity; get yourself a /24 while you can.

25 most recent posts older than 1705532991

Prev Next Size: Jump to: