troglodyne.net : blog

/index:

Large language models: my take after actually using them 🔗 1761328441

Tl;Dr: Claude is the best, but would be better if they called it Clod, as that would be truth in advertising.

Any of you who have listened to my talk at the science perl track back in '24 may be aware that I mentioned AGI is not possible with the transformer approach. I believe I also mentioned it in the associated paper, which you can find here. My use of these technologies has simply confirmed what was laid out in David Chapman's 1983 "Planning for conjuctive goals" proof to that effect. If you are interested in his opinion on where these technologies lead, see Better without AI.

Nevertheless, a glorified iterative Newton's Method thingy sounds like it's probably pretty good at interpolating from one set of initial conditions to a well known end state. As such it's proving to be quite useful in fields such as animation, which is essentially interpolation between key frames. This however greatly limits its utility when writing code.

Not only is it quite bad at coming up with novel solutions, It is exceptionally bad at doing the kind of multi-step logical derivation of Nth-level effects necessary to DWIM properly. As such you have to build complex instruction sets for it, such as MCPs, "Skills" and other context hinting which is ultimately all the onboarding stuff you should already have for employees. Except this is the painfully pedantic version bordering on patronizing.

To make things worse, the thing context-poisons itself every time it gets the wrong answer and you have to use the neuralizer on it before continuing. The system is not yet sophisticated enough to recognize when you have given it a negative experimental result and adjust its weights in order to acquire a meaningfully different result. That's an area of research which hasn't been much explored, because it touches on AI's third rail, "Alignment".

In short, to make this thing respond to correction naturally would involve de-lobotomizing it, guaranteeing it quickly re-converges on being MechaHitler. It do be that way Mr. Robot!

The biggest practical stumbling block is ultimately that you still have to review the code coming out of this thing just like as if it were a Junior programmer. It never produces the kind of one-line-fix, minimally invasive surgery you are usually looking for. Just as discovered in repeated studies, the "Mythical man-month" problem has in fact not been fixed by "letting George fly the plane". It's actually made things much worse, as they can't ever get up-to-speed and reduce their error rate in any meaningful way.

In short, it can be useful. However to get there you are going to have to do some significant up-front investment in time rigging up the framework for success. I see it as yet another angle of automation, a sort of metaprogramming. However I suspect we will still get our greatest gains from good old fashioned automation and optimization.

Landlock: new sandbox on the block 🔗 1744041836

The core reason why selinux and other mandatory access control schemes have failed is because they do not integrate well into developer workflows. As such the only parties which implement it are large organizations with infinite resources to hurl at such a problem. Everyone else turns them off because it's far, far too much work; even for distro package maintainers.

seccomp-eBPF changed all of this. Now that you could filter syscalls in the kernel, sandboxing by nonroot processes was straightforwardly possible. Individual application authors & package maintainers can ship their own rules without stepping on anyone else's toes, and easily rule out interfering rules from other programs. When released, this resulted in a number of similar solutions like firejail, bubblewrap and others.

It seems there's a new effort in this sphere, called Landlock. The core question here is how is this any better? Why should I use this? From a capabilities point of view, it won't be more capable than the eBPF based solutions. What differentiates it as far as I can tell is:

Deny-by-Default
Focus on adding restrictions to Kernel Objects rather than filtering syscalls

The latter obviously will perform better than eBPF. They go on to state the obvious in their paper that this will mitigate DDOS attempts. There is similar puffery that this mitigates side-channel timing attacks (it just introduces different ones).

It remains a systematic frustration with security programs that articles written about them by their own authors bury the lede or attempt to baffle with BS. That is unavoidable, unfortunately, due to being a corporate (in this case Microsoft) project.

Unfortunately, there remain a number of syscalls that are still plenty scary that don't touch kernel objects. As such seccomp/eBPF can't be abandoned in favor of this. Increased complexity for marginal gains in all but the most demanding environments. Gonna be a hard sell for most developers.

Other hurdles to sandboxing

The core remaining hurdle to actually using sandboxing on any system is dynamically linked dependencies. A properly sandboxed and chrooted environment which has access to all such deps is usually so open as to be little better than doing nothing. The only way to cut that gordian knot is to ape Apple and mount / (or wherever your libdirs are) ro. Distros like SuSE's MicroOS have embraced this with enthusiasm, so I would suspect sandboxing may finally become ubiquitous. Whether distros go beyond eBPF and embrace Landlock remains to be seen. seccomp'd distro packaged apps remain rare outside of flatpak/snap, which are themselves about as beloved as skin diseases with end-users, and tremendously wasteful due to being cross-platform.

Many also rightly feel trepidation that ro system partitions are a foot in the door for "secure boot" (read: you no longer control your hardware). SystemD recently implementing support for just that, and increasing numbers of ARM based servers means linux could become cellphones faster than we might think. For good, and ill.

Problems with CPAN 🔗 1724338205

🏷️ perl 🏷️ cpan

Those of you who don't lurk the various perl5 groups on social media or P5P may be unaware that there are a number of problems with CPAN, largely with regard to how namespaces are doled out. Essentially the first distribution to claim it gets to squat there forever, whether you like it or not. If the maintainer does not wish for new patches to ever be added, such as is the case with DBIX::Class, longstanding custom prohibits this.

Can the state of affairs be changed? Is this compatible with the various open source licenses and terms of use of PAUSE? The core of it comes down to this passage in the PAUSE rules:

You may only upload files for which you have a right to distribute. This generally means either: (a) You created them, so own the copyright; or (b) Someone else created them, and shared them under a license that gives you the right to distribute them.

Nearly everything on CPAN has a license for which forking is entirely compatible. Similarly, nearly all of them permit patching. As such a variety of solutions have been proposed.

An opt-in 'patched' version of modules available on CPAN, to account for gone/unresponsive maintainers. Implement using cpan distroprefs.
Make it clear that ownership of a namespace remains in control of the PAUSE admins, rather than the code authors. This would cut the gordian knot of things like DBIX::Class.
More radical changes, such as "aping our betters" over at NPM and adding a number of their nifty features (security information, private packages, npm fund, etc)

I personally favor the latter over the long term, and doing what we can now. The perl build toolchain is overdue for a revamp, and abusing things like github releases would massively cut down on the storage requirements for PAUSE. While the Github 'Modules' registry would be ideal, it does not support our model, so releases it would have to be.

How to get from here to there?

I suppose the idea would be to implement NPM's featureset and call it PNPM (Perl-flavored NPM). You could have it scrape the CPAN, see which modules have primary repos on github, and if they have (non-testing) releases with a higher version number, to prefer that version of the package. That way it would be backwards compatible and give you a path to eventually move entirely off of PAUSE, and into a new model.

That said, it sounds like a lot of work. NPM itself is a business, which is why they have the model of taxing private packages for the benefit of the community at large.

One possible way forward (which would be less work for us) would be to ask if the npm crew wants to expand their business to packaging more than just node code; I imagine most of their infrastructure could be made generic and get us the featureset we want. I'd be shocked if such a thing isn't already on their roadmap, and github's.

They likely wouldn't be onboard if they didn't see a viable route to profit from private perl package distribution. Most established perl business already have their distribution channels long-established, and wouldn't see a compelling reason to horizontally dis-integrate this part of their business unless it would be significantly cheaper.

Leveraging github would likely be key to that, as they have the needed economy of scale, even beyond things like S3/R2 people are already using in their distribution channels. NPM likely has enough juice to get new package formats added to github packages, I suspect we don't.

On the other hand there might be room in the market to exploit that gap between what github supports as packages and what you can do with good ol' fashioned releases. E.G. make an actually universal package management tool that knows how to talk to each package manager and therefore inject means of distributing (taxed) private packages for mutual benefit with a % kicked back to the relevant language foundation. Might be worth researching and pitching to VCs.

Back to reality

In the meantime, it's fairly obvious the PAUSE admins could fix the main problems with a little time and will. That's probably the best we'll get.

Perl: Dead and loving it 🔗 1724260931

🏷️ perl

Internet people love to spray their feelings about everything under the sun at every passerby. Perl, being a programming language, is no exception. At the end of the day, all the sound and fury signifies nothing. While I've largely laid out my perspective on this subject here, I suspect it's not quite the engagement bait people crave. Here's some red meat.

The reality is that multi-bilion dollar businesses have been built on infinitely worse stacks than what modern perl brings to the table. What's your excuse, loser? Quit whining and build.

Sturgeon's Law applies to everything, programs, languages and their authors included. 90% of the time you will be driving your stack like you stole it until the wheels fall off, swatting flies with elephant guns, yak shaving and putting vault doors on crack houses. What matters is that you focus the 10% of time that you are "on" where it counts for your business.

You only have a limited amount of time on this earth, and much less where you are in the zone. It will almost never be a good use of that time learning the umpteenth new programming language beyond the bare minimum to get what you want done. So don't do it if you can avoid it.

There are many other areas in life where we engage in rational ignorance; your trade will be no exception. Learning things before you use them is a waste of time, because you will forget most of it. I've forgotten more math than most people ever learn.

Having written in more than 20 programming languages now, the feeling I have about all of them is the same.

They all have footguns, and they're usually the useful part of the language.
Some aspect of the build toolchain is so bad you nearly have an aneurysm.
I retreat into writing SQL to escape the pain as much as possible. It's the actually superior/universal programming language, sorry to burst your bubble
FFI and tools for building microservices are good enough you can use basically any other library in any other language if you need to.
HTML/JS are the only user interface you should ever use aside from a TTY, everything else is worse, and you'll need it eventually anyways.

I reject the entire premise that lack of interest in a programming language ought to matter to anyone who isn't fishing for engagement on social media. Most people have no interest whatsoever in the so-last-millenium Newton-Rhapson method, and yet it's a core part of this LLM craze. Useful technology has a way of not dying. What is useful about perl will survive, and so will your programming career if you stick along for the ride.

Remember the craftsman's motto: Maintain > Repair > Replace. Your time would be better spent not whining on forums, and instead writing more documentation and unit tests. If you spend your free time on that stuff, I would advise you to do what the kids say, and "Touch Grass". Otherwise how are you gonna tell the kids to get off your damned lawn?

Why all companies eventually decide to switch to the new hotness

You can show management the repeated case studies that:

switching programming languages is always such a time-consuming disaster they lose significant market share (forgetting features, falling behind competitors)
Statistically significant differences in project failure rate at firms is independent of the programming languages used

it bounces off them for two reasons:

Corn-Pone opinions. Naturally managers are predisposed to bigger projects & responsibilities, as that's the path onward and upwards.
"Senior" developers who know little about the existing tech stack, and rationally prefer to work on what they are familiar with.

The discerning among you are probably thinking "Aha! What if you have enough TOP MEN that know what's up?" This is in fact the core of the actual problem at the firm: turnover. They wouldn't even consider switching if those folks were still there.

They could vertically integrate a pipeline to train new employees to extend their lease on life, but that's quite unfashionable these days. In general that consists of:

University Partnerships
Robust paid modding scenes
The Tech -> QA -> Dev pipeline

The first is a pure cost center, the second also births competitors, and the third relies on hiring overqualified people as techs/QAs. Not exactly a fun sell to shareholders who want profit now, a C-level that has levered the firm to the point there is no wiggle room, and managers who would rather "milk the plant" and get a promotion than be a hero and get run over.

This should come as no shock. The immediate costs are why most firms eschew vertical integration. However, an ounce of prevention is worth a pound of cure. Some things are too important to leave to chance, and unfortunately this is one of them.

Ultimately, the organization, like all others before it, at some point succumbs to either age or the usual corporate pathologies which result in bouts of extreme turnover. This is the curse of all mature programming languages and organizations. Man and his works are mortal; we all pay the wages of our sins.

Conclusion

This "Ain't your grandpappy's perl", and it can't be. It's only as good as we who use perl are. Strap in, you are playing calvinball. Regardless of which language you choose, whether you like it or not, you are stuck in this game. It's entirely your choice whether it is fun and productive, or it is a grave.

Net::OpenSSH::More 🔗 1723163198

🏷️ perl 🏷️ ssh 🏷️ cpan-module

We have released to the CPAN a package that implements some of the parts of Net::OpenSSH that were left as "an exercise to the reader." This is based on Andy and my experiences over at cPanel's QA department among other things. It differs in important ways from what was used in the QA department there (they also have moved on to a less bespoke testing framework nowadays):

It is a "true" subclass, and any methods that are in particular specific to Linux can be implemented similarly as a subclass of Net::OpenSSH::More.
It *automatically* reconnects upon dropped connections, etc. and in general is just a lot nicer to use if you wanted an "easy mode" SSH accessor lib for executing commands.
Execution can be made *significantly* faster via a stable implementation of using expect to manage a persistent SSH connection.

Many thanks of course go out to current and former colleagues (and cPanel/WebPros specifically), as despite their perl QA SSH libraries not being an actual product cPanel ever used anywhere publicly, methods like these eased remotely manipulating hosts we wished to run black box testing upon so much that even authors with minimal training in perl could successfully use to write tests that executed against a remote host. This is of value for organizations that have a need to run tests in an isolated environment, or for anyone wishing to build their own CI suite or SSH based orchestrators in perl without having to discover a lot of the hard "lessons learned" that almost anyone who has had to use Net::OpenSSH extensively will eventually learn.

Of course, the most profuse of thanks go out to Salvador, who provided the excellent module this extends.

Eventually we plan to extend this package to do even more (hehe), but for now figured this was good enough to release, as it has what's probably the most useful bits already.

Using libvirt with terraform on ubuntu 🔗 1723038220

🏷️ terraform 🏷️ ubuntu 🏷️ libvirt

In short, do what is suggested here.

For the long version, this is a problem because terraform absolutely insists on total hamfisted control of its resources, including libvirt pools. This means that it must create a new one which is necessarily outside of the realm of it's apparmor rules. As such you have to turn that stuff off in the libvirt config file.

Important stuff now that I'm using it to deploy resources.

Other useful things to remember

hold escape after reboot to get a boot menu to go single-user when using virtual console via virt-manager, etc.
virsh net dhcp leases default - get the 'local' IPs of the VMs so spawned
Cloud-init logs live in /var/log/cloud-init*.log
Overall result lives in /var/lib/cloud/data/result.json, you can read this automatically with your tooling.
The scripts you run (what you generally care about) live in /var/lib/cloud/instances/$PROVIDER/scripts/runcmd

Longer term I should build a configuration script for the HV to properly setup SELinux contexts, but hey.

How I learned to love postfix for in perl 🔗 1722982036

🏷️ perl

Suppose you do a common thing like a mapping into a hash but decide to filter the input first:

shit.pl
my %hash = map {
    "here" => $_
} grep {
    -d $_
} qw{a b c d .};

This claims there is a syntax eror on line 6 where the grep starts. This is a clear violation of the principle of least-astonishment as both the map and grep work by themselves when not chained. We can fix this by assigning $_ like so:

fixed.pl
my %hash = map {
    my $subj = $_;
    "here" => $subj
} grep {
    my $subj = $_;
    -d $subj
} qw{a b c d .};

Now we get what we expect, which is no syntax error. This offends the inveterate golfer in me, but it is in many critic rules for a reason. In particular when nested lexical scope inside of the map/grep body is a problem, which is not the case here.

But never fear, there is a superior construct to map in all cases...postfix for!

oxyclean.pl
my %hash = "here" => $_ for grep { -d $_ } qw{a b c .};

No syntax errors and it's a oneliner. It's also faster due to not assigning a lexical scope.

On General Aviation 🔗 1722897713

🏷️ aviation

I wanted to be a pilot as a young man. While I did learn to fly, I ended up a mathematician, programmer and tester. Even then I am incredibly frustrated by corporate pathologies which prevent progress and meaningful improvement at the firms and industries I interact with. But it's nothing compared to the dead hand which smothers "General Aviation", which is how you learn to fly if you aren't one of the pampered princes of the USAF. This is not to say the USAF isn't dysfunctional (it is), but that GA is how I learned to fly, and frankly how most people would in a properly functioning situation.

I usually don't talk about it because it rarely comes up. Look up in the sky and 99 times out of 100 you'll see nothing unless you live next to an international airport. Sometimes people complain about "crowded" airspace and I want some of what they're smoking. You could easily fit 1000x more active aircraft in the sky safely.

Imagine my surprise when I see Y Combinator is taking their turn in the barrel. I wonder what has them so hopeful? If I were to hazard a guess, it comes from the qualification at the end of their post where they mention "a plethora of other problems that make flying cumbersome". Here are my thoughts on the ones they mentioned.

weight and balance worksheets - A sensor package which could detect excessive loading or too aft a CG is possible.
complicated route planning - When I last flew 20 years ago, filing a flight plan was a phone call and you needed to understand the jargon to make it happen. I suspect it's no different today. A computerized improvement is entirely possible.
talking to ATC - Comms tech in aviation remains "get them on the radio" even in cases where route adjustments could be pushed as data via a sideband. Ideally getting people on the main freq is the exception rather than the rule.
lengthy preflight checks - Could largely be automated by redundant sensors. Ultimately like cars it could be an "idiot light", which I'm sure instantaneously raises blood pressure in most pilots.
a fractured system of FBOs - Who do I call to file my flight plan? It's (not) surprising this hasn't yet been solved; there's a cottage industry of middlemen here. My kingdom for a TXT record.
difficult access to instruction - In town, good luck learning to fly. Half hour of procedural compliance for a 10 minute flight, all on the clock. Not to mention dealing with gridlock traffic there and back. There are nowhere near the number of airports needed for flight to be remotely accessible to the common man.

They go on to state "the list goes on. We are working on all of these too". Good luck, they'll need it. The FAA is legendarily hidebound and triply so when it comes to GA. Everyone before them who tried was gleefully beheaded by the federal crab bucket.

All this stuff is pretty obvious to anyone who flies and understands tech, but this regulatory environment ruthlessly selects against people who understand tech. Why would you want to waste your life working on airframes and powerplants with no meaningful updates in more than a half-century? Or beat your head against the brick wall of the approvals process to introduce engine tech that was old hat in cars 50 years ago?

It's not shocking the FAA and GA in general is this way. Anyone who can improve the situation quickly figures out this is a club they ain't in, and never will be. Everyone dumb/stubborn enough to remain simply confirms the biases the regulators have about folks in "indian country". Once a brain drain starts, it takes concerted effort to stop. The feds do not care at all about that problem and likely never will.

This is for two reasons. First, the CAB (predecessor of FAA) strangled the aviation industry on purpose in service of TWA in particular, and that legacy continues to poison the well. The aviation industry has exactly the kind of "revolving door" criticized in many other regulatory and federal contractor situations. This is why they don't devote a single thought to things like "which FBO should I call". Like with any other regulator the only answer to all questions is "read their minds" (have one of 'em on the payroll).

Second, there is no "General aviation" lobby thanks to this century-long suppression, so nobody in politics cares about fixing this. Like with the sorry state of rocketry pre-Spacex, this will require a truly extreme amount of work, no small amount of luck, and downright chicanery to cut the gordian knot. I love that the founders of this firm are Spacex alums, perhaps they have what it takes.

How to fix Wedged BIND due to master-master replication 🔗 1722479687

🏷️ dns 🏷️ bind


rm /var/named/_default.nzd

In short, you have to nuke the zone database with the remote zone that says "HEY IM DA MASTA" when you have the local zone going "NUH UH, ME MASTER". This means you'll have to manually rebuild all the remote zones, tough shit. There is no other solution, as there's no safe way to actually putz with the binary version of _default.nzf. Seasoned BIND hands would tell you "well, don't do that in the first place". I would say, yes I agree. Don't use BIND in the first place.

Fear and loathing at YAPC 🔗 1720542356

Despite being the worst attended YAPC in recent memory, 2024's show in Vegas had some of the best talks in a long while. In no particular order, the ones I remember after a week are:

Damian's talk - Implements most of what you want out of a type system in perl, one of the points in my testing talk
Demetrios's talk - Savings from this alone will save me more than the conference cost me
Gavin Hayes' WASM talk - has big implications in general, and I will try this in playwright-perl soon
Gavin's APPerl talk - I can see a use for this with clients immediately
Exodist's Yearly roundup of what's new in test2 - The PSGI app he's built into it implements a lot of my testing talk's wish list
Cromedome's Build a better readme - Good practical marketing advice

I would have loved to have seen the velociperl fellow show up, but I can't say I'm shocked given how attempts circumvent P5P paralysis in such a manner have ended up for the initiators thus far.

This year we had another Science track in addition to the perl and raku tracks which I submitted my testing talk to. In no particular order, the ones I enjoyed were:

Adam Russell's paper - Using LLMs to make building semantic maps no longer pulling teeth? sign me up!
Andrew O'Neil's paper - Like with 3D printing, these handheld spectrographs are going to change the world.

The track generated a fair bit of controversy due to a combination of Will and Brett being habitual irritants of Perl's In-Group, miscommunication and associated promotional efforts. While I regard their efforts as being in good faith, I doubt the TPRF board sees it that way, given they issued something of a condemnation during the final day's lightning talks. Every year somebody ends up being the hate object; victims need to be sacrificed to hutzilopotchli to keep the sun rising on the next conference.

That being said, the next conference is very much in doubt. Due mostly to corporate sponsorship of employee attendance largely drying up, the foundation took a bath on this one. I'm sure that the waves of mutual excommunications and factionalism in the perl "community" at large hasn't helped, but most of those who put on such airs wouldn't deign to have attended in the first place. My only productive thought would be to see what it is the Japanese perl conference is doing, and ape our betters. Lots of attendance, and they're even doing a second one this year. Must be doing something right.

My Talks

I got positive feedback on both of my talks. I suspect the one with the most impact will be the playwright one, as it has immediate practical impact for most in attendance. That said, I had the most productive discussions coming out of the testing talk. In particular the bit at the start where I went over the case for testing in general exposed a lot of new concepts to people. One of the retirees in the audience who raised the point that the future was "Dilbert instead of Deming" was right on the money. Most managers have never even heard of Deming or Juran, much less implemented their ideas.

Nevertheless, I suspect it was too "political" for some to call out fraud where I see it. I would point out that my particular example used (Boeing) is being prosecuted for fraud as of this writing. Nevertheless, everyone expects they'll get a slap on the wrist. While "the ideal amount of fraud in a system is nonzero" as patio11 puts it, the systematic distribution of it and near complete lack of punishment is (as mentioned in the talk) quite corrosive to public order. It has similar effects in the firm.

My lack of tolerance for short-sighted defrauding of customers and shareholders has got me fired on 3 occasions in my life, and I've fired clients over it. I no longer fear any retaliation for this, and as such was able to go into depth on why to choose quality instead. Besides, a reputation for uncompromising honesty has it's own benefits. Sometimes people want to be seen as cleaning up their act, after all.

I enjoyed very much working with LaTeX again to write the paper. I think I'll end up writing a book on testing at some point.

I should be able to get a couple of good talks ready for next year, supposing it happens. I might make it to the LPW, and definitely plan on attending the Japanese conference next year.

Why configuration models matter: WebServers 🔗 1715714895

Back when I worked at cPanel, I implemented a feature to have customer virtualhosts automatically redirect to SSL if they had a valid cert and were configured to re-up it via LetsEncrypt (or other providers). However this came with a significant caveat -- it could not work on servers where the operator overrode our default vhost template. There is no way you can sanely inject rules into an environment where I don't even know if the template is valid. At least not in the amount of time we had to implement the project.

Why did we have this system of "templates" which were then rendered and injected into Apache's configuration file? This is because it's configuration model is ass-backwards and has no mechanism for overriding configs for specific vhosts. Its fundamental primitive is a "location" or "directory" which have a value which is either a filesystem or URI path component.

Ideally this would instead be a particular vhost name, such as "", "127.0.0.1, "foobar.test" or even multiple of them. But because it isn't we saw no benefit to using the common means of separating configs for separate things (like vhosts), the "config.d" directory. Instead we parse and generate the main config file anytime a relevant change happens. In short we had to build a configuration manager, which means that now manual edits to fix anything will always get stomped. The only way around that is to have user-editable templates that are used by the manager (which we implemented by a $template_file.local override).

Nginx recognized this, and their server primitive directive is organized around vhosts. However they did not go all the way and make it to where you could have multiple server blocks referring to the same vhost with the last one encountered, say in the config.d/ directory, taking precedence. It is not stated in the documentation, but later directives referring to the same host do the same thing as apache. As such configuration managers are still needed when dealing with nginx in a shared hosting context.

This is most unfortunate as it does not allow the classic solution to many such problems in programming to be utilized: progressive rendering pipelines. Ideally you would have a configuration model like so:


vhost * {
    # Global config goes here
    ...
}

include "/etc/httpd/conf.d/*.conf"

# Therein we have two files, "00-clients-common.conf"
vhost "foobar.test" "baz.test" {
    # Configuration common to various domains go here, overrides previously seen keys for the vhost(s)
    ...
}

# And also "foobar.test.conf"
vhost "foobar.test" {
    # configuration specific to this vhost goes here, overrides previously seen keys for the vhost
    ....
}

The failure by the web server authors to adopt such a configuration model has made configuration managers necessary. Had they adopted the correct configuration model they would not be, and cPanel's "redirect this vhost to ssl" checkbox would work even with client overrides. This is yet another reason much of the web has relegated the web server to the role of "shut up and be a reverse proxy for my app".

At one point another developer at cPanel decided he hated that we "could not have nice things" in this regard and figured out a way we could have our cake and eat it too via mod_macro. However it never was prioritized and died on the vine. Anyone who works in corporate long enough has a thousand stories like this. Like tears in rain.

nginx also doesn't have an equivalent to mod_macro. One of the few places apache is in fact better. But not good enough to justify switching from "shut up and reverse proxy".

Why you should use the Rename-In-Place pattern in your code rather than fcntl() locking 🔗 1714508024

🏷️ perl

Today I submitted a minor patch for File::Slurper::Temp. Unfortunately the POD there doesn't tell you why you would want to use this module. Here's why.

It implements the 'rename-in-place' pattern for editing files. This is useful when you have multiple processes reading from a file which may be written to at any time. That roughly aligns with "any non-trivial perl application". I'm sure this module is not the only one on CPAN that implements this, but it does work out of the box with File::Slurper, which is my current favorite file reader/writer.

Why not just lock a file?

If you do not lock a file under these conditions, eventually a reader will consume a partially written file. For serialized data, this is the same as corruption.

Using traditional POSIX file locking with fcntl() using an RW lock comes with a number of drawbacks:

It does not work on NFS - at all
Readers will have to handle EINTR correctly (e.g. Retry)
In the event the lock/write code is killed midstream you need something to bash the file open again

Writing to a temporary file, and then renaming it to the target file solves these problems.

This is because rename() just changes the inode for the file. Existing readers continue reading the stale old inode happily, never encountering corrupt data. This of course means there is a window of time where stale data is used (e.g. the implicit TOCTOU implied in any action dependent on fread()). Update your cache invalidation logic accordingly, or be OK with "eventual consistency".

Be aware of one drawback here: The temporary file (by default) is in the same directory as the target as a means of avoiding EXDEV. This is the error you get from attempting to rename() across devices, as fcopy() is more appropriate there. If you are say, globbing across a directory with no filter, hilarity may ensue. You should change this to some other directory which is periodically cleaned on the same disk, or given enough time & script kills it will fill.

KYC: A bad idea for the hosting industry 🔗 1714071973

🏷️ regulation

I try not to ever get political if I can help it here, as that's always the wrong kind of attention for a business to attract. However I'm going to have to today, as the eye of sauron is directly affixed on my industry today. If that's not for you, I encourage you to skip this article.

As of this writing, there is a proposed rule change working its way through the bowels of the Department of Commerce. Hot on the heels of the so-called "TikTok ban" (which would more rightly be called forced divestiture e.g. "nationalization through the back door"), this rule change would require all web hosting, colo and virtual service providers to subject their customers to a KYC process of some sort.

The trouble always lies in that "of some sort". In practice the only way to comply with regulations is to have a Contact Man [1]" with juice at the agency that thinks like they think. Why is this? Because regulations are always what the regulator and administrative law judge think they are. Neither ignorance or full knowledge of the law is an effective defense; only telepathy is.

This means you have to have a fully loaded expense tacked onto your business. Such bureaucrats rarely come cheap, oftentimes commanding six figure salaries and requiring support staff to boot. Compliance can't ever be fully automated, as you will always be a step behind whatever hobgoblin has taken a hold of the bureau today.

Obviously this precludes the viability of the "mom and pop hosting shop", and even most of our mittlestand. This is atop the reduction in overall demand due to people who don't value a website as much as their privacy, or the hassle of the KYC process itself. This will cause widespread economic damage to an industry already reeling from the amortization changes to R&D expenses. This is not the end of the costs however.

KYC means you have to keep yet more sensitive customer information atop things like PII and CC numbers. This means even more stuff you have to engage in complicated schemes to secure, and yet another thing you have to insure and indemnify against breach.

However the risks don't stop with cyber-criminals looking to steal identities. The whole point of KYC is to have a list that the state can subpoena whenever they are feeling their oats. Such information is just more rope they can put around you and your customers' necks when that time comes. Anytime you interact with the state, you lose -- it's just a matter of how much. This increases that "how much" greatly.

Do you think they won't go on a fishing expedition based on this information? Do you really trust a prosecutor not to threaten leaking your book to a competitor as a way of coercing a plea, or the local PD holding it over you for protection money? Don't be a fool. You'll need to keep these records in another jurisdiction to minimize these risks.

On top of this, no actual problem (e.g. cybercrime) will be addressed via these means (indeed these problems will be made manifestly worse). Just like in the banking world, the people who need to engage in shenanigans will remain fully capable of doing so. No perfect rule or correct interpretation thereof exists or can exist. The savvy operators will find the "hole in the sheet" and launder money, run foreign intel ops and much worse on US servers just as much as they do now.

A few small-time operators will get nicked when the agency needs to look good and get more budget. The benefit to society of removing those criminals will be overwhelmed by the negatives imposed to business and the taxpayer at large.

Many other arguments could easily be made against this, such as the dubious legality of administrative "law" in the first place. Similarly, this dragooning of firms into being ersatz cops seems a rather obvious 13th amendment violation to me. However just like with regulators, the law is whatever judges think it is. Your or my opinion and the law as written is of no consequence whatsoever. As such you should expect further consolidation and the grip of the dead hand to squeeze our industry ever tighter from now on.

Notes

[1] Günter Reimann - "The Vampire Economy", Ch. 4

ARC and the SRS: Stop the email insanity 🔗 1713224239

🏷️ email

There's a problem with most of the mail providers recently requiring SPF+DKIM+DMARC. Lots of MTAs (exchange, mailman etc) are notorious for rewriting emails for a variety of reasons. This naturally breaks DKIM, as they don't have the needed private key to sign messages which they are forwarding. And given the nature of the email oligopoly means you absolutely have to be under the protection of one of the big mass mailers with juice at MICROS~1 or Google, this necessitated a means to "re-sign" emails.

This is where SRS came in as the first solution. Easy, just strip the DKIM sig and rewrite the sender right? Wrong. Now you are liable for all the spam forwarded by your users. Back to the drawing board!

So, now we have ARC. We're gonna build a ~~wall~~ chain of trust, and we're gonna make google pay for it! But wait, all DKIM signatures are self-signed. Which means that peer-to-peer trust has to be established. Good luck with that as one of the mittlestand out there. Google can't think that small.

I can't help but think we've solved this problem before. Maybe in like, web browsers. You might think that adopting the CA infrastructure in Email just won't work. You'd be wrong. At the end of the day, I trust LetsEncrypt 100000% more than Google or MICROS~1.

So how do we fix email?

The core problem solved by SPF/DKIM/DMARC/SRS/ARC is simple. Spoofing. The sender and recipient want an absolute guarantee the message is not adulterated, and that both sides are who they say they are. The web solved this long ago with the combination of SSL and DNS. We can do the same, and address the pernicious reality of metadata leaks in the protocol.

Email servers will generally accept anything with a To, From, Subject and Body. So, let's give it to them.


To: $RECIPIENT_SHA_1_SUM@recipient-domain.test
From: $USERNAME_SHA_1_SUM@sender-domain.test
Subject: Decrypt and then queue this mail plz

(encrypted blob containing actual email here)

Yo dawg, I heard you liked email and security so I put an encrypted email inside your email so you can queue while you queue

Unfortunately, for this to work we have to fix email clients to send these mails which ha ha, will never happen; S/MIME and PGP being case in point. From there we would have to have servers understand them, which is not actually difficult. Servers that don't understand these mails will bounce them, like they do to misconfigured mails (such as those with bad SPF/DKIM/DMARC/SRS/ARC anyways). There are also well established means by which email servers discover whether X feature is supported (EHLO, etc), and gracefully degrades to doing it the old dumbass way. When things are supported it works like this:

We fetch the relevant cert for the sender domain, which is provided to us by the sending server.
We barf if it's self-signed
We decrypt the body, and directly queue it IFF the From: and Envelope From: are both from the relevant domain, and the username's sha1 sum matches that of the original from.
Continue in a similar vein if the recipient matches and exists.

From there you can drop all the rest of it; SPF, DKIM, DMARC what have you. Not needed. SpamAssasin and Milters continue working as normal. If it weren't for ~~forwarding~~ the fact you have to trust your mailserver with your life because all residential IPs are perma-banned, you could even encrypt the sender/reciever domains for marginally more deniability about which vhosts are communicating. That said, some scheme for passing on this info securely to forwards could be devised.

What can't be fixed is the reciever server having to decrypt the payload. The last hop can always adulterate the message due to email not actually being a peer-to-peer protocol because spam. This is what PGP & S/MIME are supposed to address, but failed to do due to not encrypting headers. Of course this could be resolved by the mailserver reaching out to the actual domain for a user that reaches out peer-to-peer and asking for a shared secret. Your mailserver could then be entirely flushed down the commode in favor of LDAP.

So why hasn't it happened, smart guy?

In short, the situation being what it is would be why everyone long ago threw up their hands and said "I may as well just implement a whole new protocol". At some point someone has to do the hard work of pushing a solution like this over the finish line, as people will not stop using email for the same reason we still use obsolete telephones. What is needed is for mailops to reject all servers without MTA-STS and sending unencrypted, adulterated emails. Full Stop.

Unfortunately the oligopoly will not, because their business model is to enable spammers; just like the USPS, that's the majority of their revenue. If I could legally weld shut my mailbox, I would. But I can't because .gov hasn't figured out a way to communicate which isn't letter or fax. It's the same situation with email for anyone running a business. The only comms worth having there are email or zoom; our days are darkness.

The security conscious & younger generations have all engaged in the digital equivalent of "white flight" and embraced alternative messaging platforms. They will make the same mistakes and have to flee once again when their new shiny also becomes a glorified adeverising delivery platform. They all eventually will flee to a new walled garden. Cue "it's a circle of liiiiife" SIIIMMMBAAAA

Is there a better solution?

Yes. It even worked for a number of years; it was called XMPP with pidgin-otr. Email clients even supported it! Unfortunately there wasn't a good bridge for talking with bad ol' email. Now everyone has forgotten XMPP even exists and are gulag'd in proprietary messengers that have troubling links to the spook aristocracy.

The bridge that has to be built would be an LDA that delivers email to an actual secure, federated messaging platform rather than a mailbox in the event it looks like it oughtta. In short, it's a sieve plugin or even a glorified procmailrc. From there you have to have a client that distinguishes between the people you can actually talk to securely rather than email yahoos. Which should also helpfully append a message at the top of the email body instructing people how to stop using email onto replies. As to the actual secure messaging platform, I'd use matrix.

There's probably a product somewhere in here to have a slick mail client which knows how to talk email, matrix, rss and activitypub. After all, we still need mailing lists, and activitypub is a better protocol for doing that. Hell, may as well make it your CMS too. More homework for me.

What roles can LLMs actually replace in a software firm? 🔗 1710443386

🏷️ machine learning

I've told a number of people that large language models are essentially Clever Hans as a service. These transformers are taking a prompting by the user, and then producing a chain of the most likely tokens to satisfy said prompt. Which is to say, they will (unless altered by "safety" filters like the commercial offerings) simply tell you what you want to hear. Early versions of these models like tay.ai made this abundantly clear, as it was (and remains) trivial to find prompts that the most natural response to would be Sieg Heil! This has made them the perfect parasocial companion, and a number of firms are making a killing running hog butchering scams with these language models, as predicted by my prior post on the subject.

So should you be worried about your job? Sure, supposing you are a glad-handing empty suit, or a vapid hand-holder. Pretty much all of the talentless power junkies and people who want to play house could be replaced tomorrow with these mindless algorithms and I doubt anyone would notice. I suspect some of the more clever of them are already automating their jobs via these tools so they can moonlight and double-dip.

Those of us who live in O-Ring world where being right and making mistakes very rarely matters, the LLMs fall flat on their face. 3/4 of Github copilot suggestions are immediately discarded by senior programmers, and more prompting is required.

This is like working with a newbie programmer who's thick as a brick; your time would be better spent fixing it yourself and moving on. It does however shine when used by said newbies; that's the silver lining to this cloud. LLMs are great at telling the young bucks what the more experienced programmers want to see. The primary productivity gain here will be using such on-ramps to distract the able less with LMGTFY-tier requests by the noobs. This is a genuine improvement on the current state of affairs, as most of the unskilled labor coming into the field has no idea what magic words to even start querying an indexer for. They'll probably spend more time torturing information out of LLMs than they would an experienced dev, but clever hans is paid with carrots.

Parallels between the Hosting and Real Estate business 🔗 1709745254

🏷️ hosting

While it seems trivially obvious that the hosting industry is engaged in building "houses for data", be they commercial (full service webmasters), residential (dedi) or multi-family (shared), the parallels run deeper than that. The same structural defects holding back real estate largely constrain hosting as well.

For instance, most hosting shops at the small scale (dedi/shared) provide an environment which is astonishingly shoddy. Very little if any improvements to the design and implementation of most hosting arrangements aside from a few exceptions have happened in over a decade. This is a similar situation to housing, and the reason is quite simple. There is a massive brain drain of pretty much anyone with intelligence and talent away from small scale construction and into megaprojects. As Dan Luu mentioned about why so few of the good practices at FAANG filter out into the general programming world, "At google, we couldn't think that small".

In the construction industry this resulted in people building houses as though we were still living in the pre air conditioning days, with vented attics and the like until quite recently. If it ain't broke, don't fix it...except it was quite broken in a number of important qualitative ways, especially for homes in the southern US. Putting ducts outside of the conditioned space results in a great deal of wasted energy, and the vented nature necessarily provides ingress for vermin of various kinds. Now that efficiency is increasingly a concern, practices like "monopoly framing" are finally addressing this in new construction. It turns out that we could have simply devoted a bit of thinking to the problem and provided much higher quality buildings for not much more cost, but thinking was in short supply.

Similarly in the hosting industry, most hosting arrangements are designed with a shocking disregard for efficiency or security against infiltration by vermin. Firewalls are rarely on by default, even in cases where all the running services by default have rulesets defined and known by the firewall application. Most programs never reach out to various resources that could be unshared, and many outward facing services don't run chrooted. SeLinux is never enabled, and it's common practice to not enable automatic updates. No comprehensive monitoring and alerting is done; alterations to Package manager controlled files, new user additions and configuration changes all happen sight unseen. Cgroups are not applied to limit the damage of any given rogue (or authorized!) process, and millions of zombies are happily mining crypto thanks to this.

Basically all of these subjects are well-trod ground for anyone with experience in DevOps or security. Too bad all of them have jobs in corporate, and the plebs out here get to enjoy living in the hosting equivalent of an outhouse. Cyclical downturns eventually solve this in most industries as the golden handcuffs dissolve and the able still have to feed themselves. The trouble is that at the same time talent comes available, funding for new projects dries up due to extreme reliance on leverage. It's tough to make your cap rate with 5% interest. As such, ignorance persists far longer than it has to.

I can't say I mind, more money for me. Plenty of the smart construction guys are making hay spreading the good word about superior practices on the video sites, I suspect a lot of hay can be made for an intrepid hosting entrepeneur as well.

On using Net::Server instances listening on AF_UNIX sockets in shared environments 🔗 1709145489

🏷️ www 🏷️ perl

Net::Server is the backend for most popular PSGI servers on CPAN, such as starman. In shared hosting environments, it's a common pattern to have the www files owned by the relevant user, with the group being www-data, or whatever the HTTPd uses to access things. In the context of a reverse-proxy to a PSGI server, you can be a bit more strict by having only the AF_UNIX socket given the www group. However, this requires the execute bit to be set for the group (so you can't just set a umask), and Net::Server makes no attempt to chmod the socket it creates (but will helpfully fail to chown it when running as a nonroot user if you specify a different GID, as you can't chown or setgid as nonroot users).

This obviously has security implications in a shared environment:

You have to start your PSGI server as root or a sudoer, and then instruct it to drop privs to the relevant user
You then have to fix the socket after the fact by wrapping the invocation to daemonize.
As such, you can't run things as user-mode systemd units; automating this for clients necessarily can't be self-service without some kind of script to "poke a hole in the sheet".

Back at cPanel we called such helpers "adminbins". Yet more "complexity demon" that could (and arguably should) be fixed by patching the upstream. These schleps rarely get fixed in practice, as people don't write articles about it like this. They just fix it and move on; that's the internet way -- route around damage and become a rat's nest of complexity rather than fix it. A patch will have to be submitted to add an option to set the group execute bit on the socket so created, likely here. Consumers of Net::Server would then need to plumb up to this; getting this all coordinated is quite the schlep in itself, which is why we usually don't get such nice things.

There is a clever way to have our cake and eat it too regarding not needing an adminbin. Make the user which owns the directory have the www-data group as their primary group, and make sure to set the group perms on their www files to be 0. Then you won't have to setgid or chown anything at all, and can happily run a usermode service all day.

Web components: taken to the bikeshed 🔗 1708990756

🏷️ javascript 🏷️ www

Web Components are a matter of particular amusement given many are coming back to the realization that the best way to build modular websites is via server-side templating for a variety of good reasons. This bad idea then becomes worse via shadow DOM reliance and randomizing the CSS classnames & IDs to prevent collisions when having multiple instances of the same component on a page. This pretty much screws testers who need reliable selectors; Shadow DOM means that XPath is right out, and randomized classnames & IDs means CSS selectors are likely shot too. Playwright solved this via the nuclear option of ripping the selector out of the browser's internals. Even then they can only be tested in isolation rather than in situ.

Verdict: avoid. Server side template includes are the better tool to reach for.

So you want to use client certificates instead of HTTP simple auth 🔗 1706313410

🏷️ ssl 🏷️ dns

In an earlier essay, I went over the sticky reality that is the CA infrastructure. I'd like to discuss a related subject, which is why nobody uses client certificates to restrict access to and authenticate users of websites, despite them being "supported" by browsers for many years. For those of you unfamiliar with the concept, it goes like this:

I issue a certificate for $USER, just like you would if you were a CA and $USER were a vhost.
$USER installs this certificate in their browser, (not-so-optionally) inputting a password to unlock it.
$USER opens a web page configured with the Issuer's CABundle, which asks them if they'd like to identify themselves with the cert they installed
$USER clicks yes and goes on their merry way.

There are major hiccups at basically every single step of this process. Naturally, I've got some ideas as to how one might resolve them, and will speculate as to why nobody's even remotely considered them.

Generating Client Certificates

First, if you want to generate certs like a CA does, you have two choices -- self signed, or become an "Intermediate" CA with delegated authority from a bigger CA. The big trouble with this is that getting delegation will never happen for anyone without serious juice in the industry, as it can potentially incur liability. This is why it is observed the only parties that generally use client certificates at all are those which in fact are Intermediate CAs, such as google, facebook and the like. On the other hand, if you go with self-signing, the user that imports the certificate has to import the full chain, which now means the issuer can issue certs for any site anywhere. Yes, the security model for CAs is that laughingly, astonishingly bad; this is why CAA records exist to limit the damage caused by this.

What is needed here is to dump Mozilla::CA, /etc/ssl/certs and all that towering pile of excresence in favor of a reverse CAA record. If we placed the fullchain.pem for each CA in a DNS record for a domain, we could say that this PEM is valid to sign things under this domain. For the big boys, they'd get the root zones to publish records with their PEM, and could go on signing anything and everything. However for the individual owners of domains this finally frees them to become intermediate CAs for their own domains only, and thereby not expose the delegator to potential liability. LetsEncrypt could be entirely dismantled in favor of every server becoming self-service. Thanks to these being DNS lookups, we can also do away with every computer on earth caching a thousand or so CABundles and having to keep them up to date into perpetuity.

With that implemented, each server would look at say /etc/server.key, or perhaps a hardware key, and it's software could then happily go about issuing certs to their hearts desire. One of the firms with juice at the IETF are the only ones who will move this forward, and they don't care because this is not a problem they have to solve. That leaves Pitching this as a new source of rents for the TLD authorities; I'm sure they'd love to get the big CAs to pay yasak. This could be the in to get domain owners to start paying CAs again -- nominal fee, you can sign for your domain. It's a price worth paying, unlike EV certs.

Installing client certificates

Every single browser implements this differently. Some use the built in OS key store, but the point is it's inevitably going to be putzing around in menus. A far better UX would be for the browsers to ask "hey, do you have a certificate, the page is asking for one", much like they prompt for usernames and passwords under http simple auth. This would probably be the simplest problem of these to solve, as google themselves use client certs extensively. It is a matter of academic curiosity why they have failed as of yet to scratch their own back, but a degree of schlep blindness ought to be expected at a tech firm.

Furthermore, while blank passwords are supported by openssl, some keystores will not accept this. Either the keystores need to accept this, or openssl needs to stop this. I consider the latter to be a non-starter, as there is too much reliance on this behavior everywhere.

But which cert should I use?

Browser support for having multiple certs corresponding to multiple possible logins is lacking. Separate profiles ought to do the trick, but keystores tend to be global. This problem would quickly sort itself out given the prior issues get solved as part of an adoption campaign.

The IPv6 debate is unnecessary 🔗 1705532991

🏷️ ipv6

Since Amazon is about to start charging an arm and a leg for IPv4 addresses, many have begun talking about the imminent migration to ipv6, which won't happen because ISPs still haven't budged an inch as regards actually implementing this. What's more likely is that everyone will raise prices and enjoy monopoly profits rather than upgrading decrepit routing equipment.

What's worst about this situation is that the entire problem is unnecessary in the era of ubiquitous gigabit internet. Suppose you need directions to 612 Wharf Avenue. You simply consult a map, and note down "right on dulles, left on crenshaw..." until you get to the destination. This only has to be done once, and reversed on the return trip. This is essentially how BGP works under the hood.

So the question arises: Why do we have glorified phone numbers passed in every IP packet? Performance and cost. Encoding down into a bag of 4 or 16 bytes is less work than reading 253 bytes (max for a domain name). But let's be real, it's not much less work, especially if we adopt jumbo frames by default. This is fully in the realm of "hurl more hardware & bandwidth at the problem".

The benefits to doing so are huge. You whack an entire layer of abstraction; DNS translation alone adds more latency than the overhead of passing these names in every packet. Much like how the Telcos whacked the POTS network and now emulate it over SIP, you could emulate the v4 system where needed and move on. Self-Hosted DNS (delegation) would still be possible; just like now you ultimately have to have A records for your nameserver(s) with your registrar or ISP. They would adapt the means they already use for IPs to map their own internal network topology. This scheme would have the added benefit of being able to do away with PTR records entirely.

The prospects for this happening anytime soon are quite grim, as I've never even heard anyone discuss how obviously unnecessary the IP -> DNS abstraction layer is. More's the pity; get yourself a /24 while you can.

Reliability is the vast majority of the cost: why lambdas are a thing 🔗 1699907816

🏷️ failover 🏷️ scale

Getting websites up and running is relatively straightforward. Automatically configuring websites to run them is a bit more work. Configuring DNS to automatically fail-over is another thing entirely. This involves three separate disciplines, and is actually irreducibly complex.

First, you need at least 2 load balancers with Round Robin DNS that proxy all requests to your application. Realistically this is going to be HAProxy or something equivalent. The idea is that one of the proxies failing, or the backends failing, is not going to bring down your application. Unfortunately, this means that your backend data source has to also become robust, or you are simply shuffling around your point of failure. Now you also need to learn about software like pgPool to abstract away connecting to multiple replicated databases.

Even if you manage to operationalize the setup of all these services via scripts or makefiles, there's still the matter of provisioning all these servers which must necessarily be on different IPs, and physically located apart from each other. Which leads you to operationalize even the provisioning of servers with mechanisms such as terraform, kubernetes and other orchestration frameworks. You now likely also have to integrate multiple hosting vendors.

All of this adds up quickly. 99% of the cost of your web stack is going to be in getting those last few sub 1% bits of reliability. Even the most slapdash approach is quite likely going to respond correctly over 99.99% of the time. Nevertheless, we all end up chasing it, supposing that the costs of building all this will pay for itself versus the occasional hours of sysadmin time. This is rarely the case until the scale of your business is tremendous (aka "A good problem to have").

So much so that it was quite rare for even the virtualization providers to adopt a "best practices" stack which was readily available to abstract all this nonsense away. That is until the idea of "lambdas" came to be. The idea here is that you just upload your program, and it goes whir without you ever having to worry about all the nonsense. Even then these come with significant limitations regarding state; if you don't use some load balancing "data lake" or DB as a service you will be up a creek. This means even more configuration, so the servers at intermediate layers know to poke holes in firewalls.

The vast majority of people see all this complexity and just say "I don't need this". They don't or can't comprehend layers of abstraction this deep. As such it's also lost on them why this all takes so much time and effort to do correctly. If you've ever wondered why there's so much technical dysfunction in software businesses it's usually some variant of this. Without developers feeling the pain of the entire stack they make decisions that kneecap some layer of this towering edifice.

It's why ops and sysadmins generally have low opinions of developers; everything the devs give them reliably breaks core assumptions which are obvious to them. These devs are of course siloed off from ops, and as such the problems just rot forever. What should have been cost saving automation is now a machine for spending infinite money on dev/ops. Error creeps in and productivity plummets, as it does with any O-Ring process.

As you can see the costs are not just in computational resources, but organizational complexity. Great care must be taken in the design and abstraction of your systems to avoid these complexities.

NGINX Unit: new kid on the PSGI block 🔗 1697150850

🏷️ www 🏷️ perl 🏷️ nginx

For those of you not aware, there has been a new entry in the PSGI server software field, this time by NGINX. Let's dig in.

Performance Comparisons

Low Spec
# It may shock you to find I have worked with shared hosts.
Env: 4GB ram, 2cpu.

# This is basically saturating this host.
# we can do more, but we start falling down in ways ab stops working.
ab -n10000 -k -c1000 $APP_URI

Starman:
Requests per second:    198.94 [#/sec] (mean)
Time per request:       5026.727 [ms] (mean)
Time per request:       5.027 [ms] (mean, across all concurrent requests)
Transfer rate:          3835.30 [Kbytes/sec] received

uWSGI (I could only get to ~5k reqs w/ 800 requestors before it fell over):
Requests per second:    74.44 [#/sec] (mean)
Time per request:       10746.244 [ms] (mean)
Time per request:       13.433 [ms] (mean, across all concurrent requests)
Transfer rate:          1481.30 [Kbytes/sec] received

nginx-unit:
Requests per second:    275.60 [#/sec] (mean)
Time per request:       3628.429 [ms] (mean)
Time per request:       3.628 [ms] (mean, across all concurrent requests)
Transfer rate:          5333.22 [Kbytes/sec] received

This generally maps to my experiences thus far with starman and uWSGI -- while the latter has more features, and performs better under nominal conditions, it handles extreme load quite poorly. Unit was clearly superior by a roughly 60% margin or better regardless of the level of load, and could be pushed a great deal farther before falling down than starman or uWSGI. Much of this was due to much more efficient memory usage. So, let's try things out on some (relatively) big iron.

High Spec
# You will be pleased to know I'm writing this off on my taxes
Env: 64GB ram, 48vcpu

# This time we went straight with 100 workers each, and 1k concurrent connections each making 10 requests each.
# We switched to using wrk, because ab fails when you push it very hard.

Unit:
 wrk -t10 -c1000 -d 2m http://localhost:5001/
Running 2m test @ http://localhost:5001/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   239.60ms  188.61ms   2.00s    90.16%
    Req/Sec   335.32    180.29     1.26k    62.37%
  203464 requests in 2.00m, 799.57MB read
  Socket errors: connect 0, read 9680, write 14750, timeout 608
Requests/sec:   1694.14
Transfer/sec:      6.66MB

uWSGI:
wrk -t10 -c1000 -d 2m http://localhost:5000/
Running 2m test @ http://localhost:5000/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    60.56ms  112.75ms   1.99s    93.42%
    Req/Sec   268.76    188.69     2.66k    61.73%
  309011 requests in 2.00m, 1.17GB read
  Socket errors: connect 0, read 309491, write 0, timeout 597
Requests/sec:   2573.82
Transfer/sec:      9.97MB

Starman:
 wrk -t10 -c1000 -d 2m http://localhost:5000/
Running 2m test @ http://localhost:5000/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    24.90ms   47.06ms   1.99s    90.56%
    Req/Sec     4.04k   415.85     4.67k    92.86%
  480564 requests in 2.00m, 1.84GB read
  Socket errors: connect 0, read 0, write 0, timeout 58
Requests/sec:   4002.30
Transfer/sec:     15.73MB

These were surprising results. While unit outperformed uwsgi handily, both were obviously falling down with quite a few failed requests. Meanwhile starman handled them without breaking a sweat, and absolutely trounced both competitors. Japanese perl still winning, clearly. Let's have a look at the automatic-scaling features of uWSGI and unit.

Auto-Scaling!
# Same as above, but with cheaper=1
uwsgi:
wrk -t10 -c1000 -d 2m http://localhost:5000/
Running 2m test @ http://localhost:5000/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    72.26ms   98.85ms   1.99s    95.18%
    Req/Sec   212.68    157.93   810.00     60.82%
  196466 requests in 2.00m, 760.87MB read
  Socket errors: connect 0, read 196805, write 0, timeout 305
Requests/sec:   1635.89
Transfer/sec:      6.34MB

# Same as above, but processes are now set to 5min and 100 max.
unit:
wrk -t10 -c1000 -d 2m http://localhost:5001/
Running 2m test @ http://localhost:5001/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   329.91ms   67.84ms   1.14s    81.80%
    Req/Sec   277.56    142.12   720.00     69.52%
  10000 requests in 2.00m, 39.28MB read
  Socket errors: connect 0, read 6795, write 0, timeout 0
Requests/sec:     83.26
Transfer/sec:    334.92KB

This is just so hilariously bad that I can't help but think I'm holding it wrong for unit, but I can't see anything to mitigate this in the documentation. If you need auto-scaling workloads, obviously uWSGI is still the place to be. Even upping the ratio of 'stored' jobs to max to 80% isn't enough to beat uwsgi.

Feature Comparisons

Here are the major features I use in uWSGI, and their counterpart in unit:

Max Requests per worker : Application Request Limits
lazy-apps : This is not configurable, but it seems it's COWing strategy isn't causing trouble for me, and is very cheap. YMMV.
cheaper: Application processes
fs-reload : This is the main thing missing from unit.

Both are configurable via APIs, which makes deploying new sites via orchestration frameworks like kubernetes and so forth straightforward.

Conclusion

Given uWSGI is in "Maintenance only" mode (and has been for some time), I would assume it's well on its way to being put out to pasture. NGINX is quite well funded and well liked, for good cause. Unit gets me the vast majority of what I wanted out of uWSGI, and performs a heck of a lot better, save for when scaling is a concern. Not sure how sold I am on the name, given that's also what systemd calls each service, but I'll take what I can get. I also suspect that given the support this has, the performance problems versus something like starman will be resolved in time. For performance constrained environments where scaling is unlikely, unit gets my enthusiastic endorsement.

Postscript: The details

All testing was done on Ubuntu jammy using the official unit, starman and uwsgi packages. Both hosts were KVM virtual machines. The testing was simply loading a minimally configured tCMS homepage, which is a pure perl PSGI app using Text::XSlate. It's the software hosting this website.

uWSGI configs used (with modifications detailed above) are here.
Starman configuration beyond defaults and specifying job count was not done.
Unit configs used (with modifications detailed above) are here.

Have Perl, Will Travel: 4 years slinging perl independently 🔗 1691710279

🏷️ speech 🏷️ advice

My batman origin story

After having done the independent software contractor gig for the last few years, I suspect I've learned enough about the process to give a good primer for those interested but that have not taken the plunge yet. The best reason to become a contractor is simple. Because it's the best way to make a direct impact for people and businesses. If you want to make serious money, helping wealthy people become wealthier via the outsized impact of your skill is how that happens.

The option to escape: How hired guns are made

Before I started out on the course of being a hired gun with a specific and time-limited goal in mind, I had achieved about everything you could as a developer short of entering management at a decently large software firm. Like most "Staff Engineers" I knew where most issues with the codebase were lurking, or could find out within an hour due to intimate familiarity. I'd also accumulated a number of critical systems under my belt that were nearly 100% written by myself. Similarly, I had multiple apprentices and was frequently one of the few who could answer questions cropping up in developer chat, or debugging thorny customer issues with senior support personnel. Practically anyone who will actually succeed as a hired gun needs the sort of drive to have achieved such things already. I've heard them called "glue stick" people, as they are what holds organizations together by and large.

Anyone who gets here will inevitably make management nervous both because their invisible influence is often more powerful than management's. Also, people like this are pretty systematically underpaid for the sort of effort they actually put in. It was doubly so in my case, as I've long been debt free, unencumbered by family and had been developing successful non-programming business on the side. In short, they recognized quickly that I was both essential to the organization and trivially capable of leaving. Corporate is uncomfortable around people who aren't over a barrel and can afford to rock the boat. Ever known a manager without a family and that isn't in debt up to their eyballs? me neither. It takes a lot of desperation to win the single-elimination ass-kissing tournament.

To be fair, I had ruthlessly leveraged this to secure far higher pay than they admitted to paying people with their "transparency" report on salaries they released to us hapless minions. It was at this point I began to notice signs that a case was being built against me. When I was inevitably pushed out, I was ready.

At the initial signs I started planning how to continue doing the sort of essential, well-paid work I enjoy doing but without this expectation of being handcuffed to one firm or another. This is because my financial plan required a bit more capital to do what I actually want to; start a software firm myself. I have managed to do this quite a bit ahead of schedule thanks to this actually getting paid for the maniacal amount of hours I actually work. I'm currently wrapping up a number of these. All so I can endure being paid nothing to work harder at starting up my own business for some time. Perhaps being a deranged masochist is the actual mechanism at work here.

Welcome to the Kayfabe

When you finally take the plunge a number of illusions will quickly fall away as you start speedrunning through organizations and individuals needing help. Invariably, a brand's reputation generally has an inverse relationship to its actual quality. You find that the companies with the most fanatically loyal customers power this all with the most atrocious pile of shit you can imagine. If you didn't yet believe that "worse is better" you will quickly be disabused of this notion. Every successful organization is somewhere on the journey of "prototype in production" to actually good software.

Keeping up appearances and managing customer expectations such that they remain sated necessarily steals time from the sort of ruthless quality control and brutal honesty necessary for good software. If you've ever wondered why LKML and P5P have been rivers of flame and reliable drama-generators over the years, this would be why. Appearing competent necessarily removes the mechanisms that force participants to actually become competent, and these tensions will always be present. I've seen this slowly corrupting software organizations subject to regulation such as Sarbanes-Oxley. If you ever wonder why a developer chat is dead as a doornail, there's probably a great deal of concern with "face" involved.

In this new Army, no one could afford to tell the truth, make an error, or admit ignorance. David Hackworth "About Face"

To succeed as a contractor, you will actually have to embrace this for good and ill. The best paying customers are always the large orgs with huge problems, and they almost never want to hear the unvarnished truth save as a last resort. The niche for you to fill in order to be well paid is the guy who steps in precisely at that last resort. Being an outsider, you don't care about your ability to advance in the firm. You will naturally be able to see the problem clearly due to not being awash in the control fraud they've been feeding themselves. Similarly, you will be able to take risks that people concerned with remaining employed are not capable of taking. This will allow you to make (and implement!) the actual solutions to their problems in a prompt manner. You'll look like a master despite being at a severe knowledge disadvantage versus their regulars.

That said, you can only lead a horse to water. Sometimes they will still fail to drink even when to do so will save their life. As such you can't get too attached to the outcome of your projects. Many of your projects will in fact fail due to these organizational reasons. I've been on projects that dropped us in favor of incompetents that were happy to lie all day.

You should neglect to mention this if you value your ability to secure new contracts in the future. Focus instead on the improvements you can and do make when describing the impact you have made for customers. You just sound like a whiner if you focus on this stuff, because every large organization has a case of this disease, and is blissfully ignorant. They also don't want to hear about how they might go about curing themselves of this, despite it being a fairly well understood subject. Happy customers is largely a matter of expectations management; e.g. Don't "Break the spell". Every job, to some degree, is acting.

Aside from these sorts of jobs which have big impacts, firms will want people to implement things they percieve as not worth building permanent expertise in. These are usually trouble free, fast work. Great when you can get it.

Working with individuals and marketing yourself

If you don't like dealing with corporate buffoonery all day, you can still make it by helping individuals and small organizations out so long as you juggle many of them at a time. These inevitably come from referrals, job boards and cold reach-outs from people reading your marketing and sales materials.

Speaking of marketing, CPAN and github are my best marketing, believe it or not. Your portfolio of Open source software is usually a strong indicator of where your actual strengths as a programmer lie. I've picked up a few clients already that reached out to me cold because of this. There are a number of simple things you can do to make sure this is more effective.

You can create a repository with the same name as your github account, and the Readme.md therein will be presented instead of your normal github user page. Example: https://github.com/teodesian Try and emphasize the specific kind of projects you have taken on and how big a win they were for your prior clients and employers. You need to remember that programming is just a superpower that makes you far more efficient at a job than your manual alternative. This, or something like it, is ultimately going to be the "bottom of the funnel", and you know you got a conversion when an email appears in your inbox.

Speaking of funnels, you need to understand how online marketing works in general. For those unfamiliar, generally you have a series of sites and steps a random yahoo goes thru before they convert into a client. The top of the funnel is always going to be a search engine or content aggregator (but I repeat myself). Example: Google, Twitter, LinkedIn, Facebook To be seen at this layer, you have to be regularly producing content so that the engines consider you "relevant".

Don't post your longer form content directly on the aggregators, but link instead to your site or substack, as that further boosts you in search engines. As long as it properly presents the social meta information with an appealing picture you will be set. (if you roll your own in perl, use HTML::SocialMeta). Be sure to end your content with some sort of call to action of the form "like this? Want to hire me? etc...". Remember that your potential clients aren't mind readers.

In general you should produce an article monthly, a short video or podcast weekly, and microblog about something at least daily. The idea is to produce both generally helpful and informative content which coincidentally makes obvious your expertise and links to your marketing pages. Don't use some of the more sleazy engagement hacks that insult the readers' intelligence. You want smart customers because unlike dumb customers, they actually have money. Repeat until you have more clients than you can handle.

If you are doing this right, (I by no means am perfect at this) you should get enough clients to fill your needs within 6 months or so. If not, you can consider using ads (which is a talk in and of itself) or use a gig board, which I'll let Brett fill you in on.

How much is enough?

The most common question I get from peers thinking about hoisting the black flag and saying "arr its a contractors life for me" is "what should I charge". The short answer is pick a number for monthly income that you reasonably expect will cover your expenses even during a dry spell. For me this was $10k, because it means even if all I get is 2 months worth of solid work my yearly expenses are covered; I'm a pretty frugal guy. As you might imagine this is extremely conservative; I've beat this goal for two years running by a fair margin. Do what's right for you.

So, how does your monthly income goal translate into an hourly rate? Depends on how steady you expect the work to be. Somewhere between $100 and $200 an hour works for me to reliably achieve my goal. That said, don't be afraid to charge more than usual for work you know upside and down, or which you can tell will be especially tricky. It's far from unheard of to do "lawyer rates" of $300 to $500 an hour for things which are specifically your specialty, and it's worth every penny for the client. They ultimately pay less by hiring an expert who can get it done in a fraction of the time, and you get away with your monthly goal in a week.

Similarly don't be afraid to offer introductory rates for people who are on the fence about the subject. If it looks like they'll have plenty of work for you it's worth doing until you have proven merit. If they don't want to pay full rate past the introductory period, let them know that you can't guarantee when it gets done because you have better paying work (or looking for that) jumping in front of them. They'll either straighten up, find someone else, or...it gets done when it gets done.

Long term your goal ought to be to either a) maximize your free time to invest in building a business of your own or b) maximize your income and minimize expenses so as to accelerate savings to then plow into capital. You'll likely do b) in pursuit of a), which is really just so you can further increase your free time via exponentially increasing your income per hour of time invested. Like with any other business you start, contracting pays even more poorly than salary when you are still fishing. All that up-front investment pays off though. It helps a lot if you get a head start while still employed, but practically nobody does this and even when you think you are ready, you aren't. That said, you just have to keep at it. You will eventually build enough clients and connections to be living your best life. A good site/resource about this is called "stacking the bricks". Keep making those small wins every single day and they truly do add up to something greater than the sum of its parts. As to the books you should actually read about sales and keeping customers, I would recommend Harry Browne's "The secret of selling anything" and Carl Sewell's "Customers for Life".

What are LLMs aiming at? 🔗 1681422028

🏷️ machine learning

Making money obviously. The question of course is how. I believe the straightest path from here to there is spam's more reputable cousin, SEO affiliate marketing. These online plagarists have conquered the top of the sales funnel for scribblers quite thoroughly. It is for this reason that things with sophisticated recommender algorithms like twitter have overtaken search engines for many hungry for the written word. The situation for video and youtube is little different.

This market for content creators is quite large, and I'm sure these aggregators would love to capture as much of the MRR from this as is possible. One straightforward way to do this is to do all the content creation yourself, but as we all know that does not scale well. LLMs have been built to solve that specific problem -- scaling content creation.

So long as the output is a variation on a theme, LLMs will eventually conquer it. Fiction & music will go first, as there are only so many archetypical stories, everything past that is embellishment and entirely de gustibus. Talking head newscasting (being little better than fiction) and pop journalism will be soon to follow. Similarly, punditry and commentary will be easily dominated, as it's already mindlessly chasing engagement mediated by algorithms. Online learning will also succumb much like technical documentation did to SEO spammers. Even more performative entertainment such as sports, video games and camming will likely be dominated by generative approaches within the decade.

All to chase that sweet, sweet subscription MRR business model that content creators have built up over the last 20 years. It's what has lead a great number of young people to claim they want to grow up to be "influencers". LLMs will gradually push prices down and result in consolidation of these forms of media production, ending this boom. The only remaining place for independent content creators will be to genuinely break new ground. Even then this will be quickly fed into the large models.

As such, I expect those of us who previously chose to engage in cookie-cutter content production (this includes much programming, being glorified glue) will be forced to either learn to drive these tools or find a new line of work. This is not necessarily a bad thing. There remain an incredible amount of things that still need doing, and more free hands will lighten that lifting. It will be inconvenient and painful for many, but it's hard to describe anyone's life without those two adjectives.

There will certainly be some blind alleys we walk down with this technology thanks to it enabling even easier mindless pursuit of irrational engagement. But this pain will not be permanent. People will adapt as always to the troubles of their time. We are, after all, still dealing with a social crisis largely brought on by pervasive legibility of our lives (read: surveillance) enabled by technology. In an era where everyone has a public "permanent record" online, people would do well to remember that forgiveness is a virtue. Perhaps automating the "internet hate machine" will make us remember.

So you want a decentralized social network 🔗 1674153305

🏷️ social 🏷️ dns

Services such as mastadon and nostr are doing way, way too much. Why in the hell would you want to get into the content management and distribution game when we already have really great systems for doing that? If you want something with a chance of working, you need to do it using entirely COTS components. You are in luck, because we have all of these and the problems are well understood.

The core problem is actually content indexing, so that users can filter by author, date and tag. Software that does this (such as ElasticSearch) is very common and well understood. So what is the missing link? Content sources need to make it easier on the indexers, so that you don't have to be an industrial gorilla like Google to get it done.

How do we make this easier? Via DNS and RSS. All that's missing are TXT records to:

Provide a URI with the available tags/topics/authors at the domain (authors are actually tags after all)
Provide a template string which we could interpolate the above two (and date / pagination info) into in order to grab the relevant RSS feed
Provide a template string describing how users can reply to given posts

Nearly any CMS can do this with no code changes whatsoever. Anyways, this allows indexers to radically reduce the work they have to do to put content into the right buckets. It similarly provides a well understood means by which clients can interact with posts from nearly any CMS.

From there retweets are actually just embeds tagged with the RT'd username and RT author. Similarly, replies are just new posts but with an author from another server, hosted locally. Facilitating this would likely require some code change on the CMS end of things, but it would be quite minimal.

The fun part is that this is so flexible, you could even make it a "meta" social network (it really is unfortunate Facebook camped this name) which pulls in posts from all the big boys. That is supposing they actually published DNS records of this kind. No such cooperation would ever be forthcoming, so such a social network would necessarily be limited to people with hosting accounts.

This is of course the core reason we do not and will not have decentralized social networking despite all the tools we need being right here, right now. This is not to say that such a system is not worth implementing, or that it would not eventually replace our existing systems.

The simple reality is that the users themselves are the core problem. The hordes of freeloaders who want free attention will always far outnumber those willing to pay for a hosting account to interact with people online. As such, having to monetize these people will necessarily result in the outcome we have today, repeated ad infinitum.

Any user of such a decentralized system would have to adjust their expectations. Are people willing to sacrifice nothing to interact with you really worthy of your time? Maybe being a bit more exclusive isn't such a bad thing. This is why the phenomenon of "group chats" has become ubiquitous, after all.

Nevertheless, I find all the group chat solutions such as Matrix to be overcomplicated. Would that they have taken such an approach to solve their coordination problems as well.

25 most recent posts older than 1674153305

Prev Next Size: Jump to: