🌐
Videos Blog About Series πŸ—ΊοΈ
❓
πŸ”‘
/users/George S. Baugh:

George S. Baugh πŸ”— 1607536598  



So you want to use client certificates instead of HTTP simple auth πŸ”— 1706313410  

🏷️ blog 🏷️ ssl 🏷️ dns

In an earlier essay, I went over the sticky reality that is the CA infrastructure. I'd like to discuss a related subject, which is why nobody uses client certificates to restrict access to and authenticate users of websites, despite them being "supported" by browsers for many years. For those of you unfamiliar with the concept, it goes like this:

  • I issue a certificate for $USER, just like you would if you were a CA and $USER were a vhost.
  • $USER installs this certificate in their browser, (not-so-optionally) inputting a password to unlock it.
  • $USER opens a web page configured with the Issuer's CABundle, which asks them if they'd like to identify themselves with the cert they installed
  • $USER clicks yes and goes on their merry way.

There are major hiccups at basically every single step of this process. Naturally, I've got some ideas as to how one might resolve them, and will speculate as to why nobody's even remotely considered them.

Generating Client Certificates

First, if you want to generate certs like a CA does, you have two choices -- self signed, or become an "Intermediate" CA with delegated authority from a bigger CA. The big trouble with this is that getting delegation will never happen for anyone without serious juice in the industry, as it can potentially incur liability. This is why it is observed the only parties that generally use client certificates at all are those which in fact are Intermediate CAs, such as google, facebook and the like. On the other hand, if you go with self-signing, the user that imports the certificate has to import the full chain, which now means the issuer can issue certs for any site anywhere. Yes, the security model for CAs is that laughingly, astonishingly bad; this is why CAA records exist to limit the damage caused by this.

What is needed here is to dump Mozilla::CA, /etc/ssl/certs and all that towering pile of excresence in favor of a reverse CAA record. If we placed the fullchain.pem for each CA in a DNS record for a domain, we could say that this PEM is valid to sign things under this domain. For the big boys, they'd get the root zones to publish records with their PEM, and could go on signing anything and everything. However for the individual owners of domains this finally frees them to become intermediate CAs for their own domains only, and thereby not expose the delegator to potential liability. LetsEncrypt could be entirely dismantled in favor of every server becoming self-service. Thanks to these being DNS lookups, we can also do away with every computer on earth caching a thousand or so CABundles and having to keep them up to date into perpetuity.

With that implemented, each server would look at say /etc/server.key, or perhaps a hardware key, and it's software could then happily go about issuing certs to their hearts desire. One of the firms with juice at the IETF are the only ones who will move this forward, and they don't care because this is not a problem they have to solve. That leaves Pitching this as a new source of rents for the TLD authorities; I'm sure they'd love to get the big CAs to pay yasak. This could be the in to get domain owners to start paying CAs again -- nominal fee, you can sign for your domain. It's a price worth paying, unlike EV certs.

Installing client certificates

Every single browser implements this differently. Some use the built in OS key store, but the point is it's inevitably going to be putzing around in menus. A far better UX would be for the browsers to ask "hey, do you have a certificate, the page is asking for one", much like they prompt for usernames and passwords under http simple auth. This would probably be the simplest problem of these to solve, as google themselves use client certs extensively. It is a matter of academic curiosity why they have failed as of yet to scratch their own back, but a degree of schlep blindness ought to be expected at a tech firm.

Furthermore, while blank passwords are supported by openssl, some keystores will not accept this. Either the keystores need to accept this, or openssl needs to stop this. I consider the latter to be a non-starter, as there is too much reliance on this behavior everywhere.

But which cert should I use?

Browser support for having multiple certs corresponding to multiple possible logins is lacking. Separate profiles ought to do the trick, but keystores tend to be global. This problem would quickly sort itself out given the prior issues get solved as part of an adoption campaign.


The IPv6 debate is unnecessary πŸ”— 1705532991  

🏷️ blog

Since Amazon is about to start charging an arm and a leg for IPv4 addresses, many have begun talking about the imminent migration to ipv6, which won't happen because ISPs still haven't budged an inch as regards actually implementing this. What's more likely is that everyone will raise prices and enjoy monopoly profits rather than upgrading decrepit routing equipment.

What's worst about this situation is that the entire problem is unnecessary in the era of ubiquitous gigabit internet. Suppose you need directions to 612 Wharf Avenue. You simply consult a map, and note down "right on dulles, left on crenshaw..." until you get to the destination. This only has to be done once, and reversed on the return trip. This is essentially how BGP works under the hood.

So the question arises: Why do we have glorified phone numbers passed in every IP packet? Performance and cost. Encoding down into a bag of 4 or 16 bytes is less work than reading 253 bytes (max for a domain name). But let's be real, it's not much less work, especially if we adopt jumbo frames by default. This is fully in the realm of "hurl more hardware & bandwidth at the problem".

The benefits to doing so are huge. You whack an entire layer of abstraction; DNS translation alone adds more latency than the overhead of passing these names in every packet. Much like how the Telcos whacked the POTS network and now emulate it over SIP, you could emulate the v4 system where needed and move on. Self-Hosted DNS (delegation) would still be possible; just like now you ultimately have to have A records for your nameserver(s) with your registrar or ISP. They would adapt the means they already use for IPs to map their own internal network topology. This scheme would have the added benefit of being able to do away with PTR records entirely.

The prospects for this happening anytime soon are quite grim, as I've never even heard anyone discuss how obviously unnecessary the IP -> DNS abstraction layer is. More's the pity; get yourself a /24 while you can.


Reliability is the vast majority of the cost: why lambdas are a thing πŸ”— 1699907816  

🏷️ blog

Getting websites up and running is relatively straightforward. Automatically configuring websites to run them is a bit more work. Configuring DNS to automatically fail-over is another thing entirely. This involves three separate disciplines, and is actually irreducibly complex.

First, you need at least 2 load balancers with Round Robin DNS that proxy all requests to your application. Realistically this is going to be HAProxy or something equivalent. The idea is that one of the proxies failing, or the backends failing, is not going to bring down your application. Unfortunately, this means that your backend data source has to also become robust, or you are simply shuffling around your point of failure. Now you also need to learn about software like pgPool to abstract away connecting to multiple replicated databases.

Even if you manage to operationalize the setup of all these services via scripts or makefiles, there's still the matter of provisioning all these servers which must necessarily be on different IPs, and physically located apart from each other. Which leads you to operationalize even the provisioning of servers with mechanisms such as terraform, kubernetes and other orchestration frameworks. You now likely also have to integrate multiple hosting vendors.

All of this adds up quickly. 99% of the cost of your web stack is going to be in getting those last few sub 1% bits of reliability. Even the most slapdash approach is quite likely going to respond correctly over 99.99% of the time. Nevertheless, we all end up chasing it, supposing that the costs of building all this will pay for itself versus the occasional hours of sysadmin time. This is rarely the case until the scale of your business is tremendous (aka "A good problem to have").

So much so that it was quite rare for even the virtualization providers to adopt a "best practices" stack which was readily available to abstract all this nonsense away. That is until the idea of "lambdas" came to be. The idea here is that you just upload your program, and it goes whir without you ever having to worry about all the nonsense. Even then these come with significant limitations regarding state; if you don't use some load balancing "data lake" or DB as a service you will be up a creek. This means even more configuration, so the servers at intermediate layers know to poke holes in firewalls.

The vast majority of people see all this complexity and just say "I don't need this". They don't or can't comprehend layers of abstraction this deep. As such it's also lost on them why this all takes so much time and effort to do correctly. If you've ever wondered why there's so much technical dysfunction in software businesses it's usually some variant of this. Without developers feeling the pain of the entire stack they make decisions that kneecap some layer of this towering edifice.

It's why ops and sysadmins generally have low opinions of developers; everything the devs give them reliably breaks core assumptions which are obvious to them. These devs are of course siloed off from ops, and as such the problems just rot forever. What should have been cost saving automation is now a machine for spending infinite money on dev/ops. Error creeps in and productivity plummets, as it does with any O-Ring process.

As you can see the costs are not just in computational resources, but organizational complexity. Great care must be taken in the design and abstraction of your systems to avoid these complexities.


Have Perl, Will Travel: 4 years slinging perl independently πŸ”— 1691710279  

🏷️ blog

My batman origin story

After having done the independent software contractor gig for the last few years, I suspect I've learned enough about the process to give a good primer for those interested but that have not taken the plunge yet. The best reason to become a contractor is simple. Because it's the best way to make a direct impact for people and businesses. If you want to make serious money, helping wealthy people become wealthier via the outsized impact of your skill is how that happens.

The option to escape: How hired guns are made

Before I started out on the course of being a hired gun with a specific and time-limited goal in mind, I had achieved about everything you could as a developer short of entering management at a decently large software firm. Like most "Staff Engineers" I knew where most issues with the codebase were lurking, or could find out within an hour due to intimate familiarity. I'd also accumulated a number of critical systems under my belt that were nearly 100% written by myself. Similarly, I had multiple apprentices and was frequently one of the few who could answer questions cropping up in developer chat, or debugging thorny customer issues with senior support personnel. Practically anyone who will actually succeed as a hired gun needs the sort of drive to have achieved such things already. I've heard them called "glue stick" people, as they are what holds organizations together by and large.

Anyone who gets here will inevitably make management nervous both because their invisible influence is often more powerful than management's. Also, people like this are pretty systematically underpaid for the sort of effort they actually put in. It was doubly so in my case, as I've long been debt free, unencumbered by family and had been developing successful non-programming business on the side. In short, they recognized quickly that I was both essential to the organization and trivially capable of leaving. Corporate is uncomfortable around people who aren't over a barrel and can afford to rock the boat. Ever known a manager without a family and that isn't in debt up to their eyballs? me neither. It takes a lot of desperation to win the single-elimination ass-kissing tournament.

To be fair, I had ruthlessly leveraged this to secure far higher pay than they admitted to paying people with their "transparency" report on salaries they released to us hapless minions. It was at this point I began to notice signs that a case was being built against me. When I was inevitably pushed out, I was ready.

At the initial signs I started planning how to continue doing the sort of essential, well-paid work I enjoy doing but without this expectation of being handcuffed to one firm or another. This is because my financial plan required a bit more capital to do what I actually want to; start a software firm myself. I have managed to do this quite a bit ahead of schedule thanks to this actually getting paid for the maniacal amount of hours I actually work. I'm currently wrapping up a number of these. All so I can endure being paid nothing to work harder at starting up my own business for some time. Perhaps being a deranged masochist is the actual mechanism at work here.

Welcome to the Kayfabe

When you finally take the plunge a number of illusions will quickly fall away as you start speedrunning through organizations and individuals needing help. Invariably, a brand's reputation generally has an inverse relationship to its actual quality. You find that the companies with the most fanatically loyal customers power this all with the most atrocious pile of shit you can imagine. If you didn't yet believe that "worse is better" you will quickly be disabused of this notion. Every successful organization is somewhere on the journey of "prototype in production" to actually good software.

Keeping up appearances and managing customer expectations such that they remain sated necessarily steals time from the sort of ruthless quality control and brutal honesty necessary for good software. If you've ever wondered why LKML and P5P have been rivers of flame and reliable drama-generators over the years, this would be why. Appearing competent necessarily removes the mechanisms that force participants to actually become competent, and these tensions will always be present. I've seen this slowly corrupting software organizations subject to regulation such as Sarbanes-Oxley. If you ever wonder why a developer chat is dead as a doornail, there's probably a great deal of concern with "face" involved.

In this new Army, no one could afford to tell the truth, make an error, or admit ignorance. David Hackworth "About Face"

To succeed as a contractor, you will actually have to embrace this for good and ill. The best paying customers are always the large orgs with huge problems, and they almost never want to hear the unvarnished truth save as a last resort. The niche for you to fill in order to be well paid is the guy who steps in precisely at that last resort. Being an outsider, you don't care about your ability to advance in the firm. You will naturally be able to see the problem clearly due to not being awash in the control fraud they've been feeding themselves. Similarly, you will be able to take risks that people concerned with remaining employed are not capable of taking. This will allow you to make (and implement!) the actual solutions to their problems in a prompt manner. You'll look like a master despite being at a severe knowledge disadvantage versus their regulars.

That said, you can only lead a horse to water. Sometimes they will still fail to drink even when to do so will save their life. As such you can't get too attached to the outcome of your projects. Many of your projects will in fact fail due to these organizational reasons. I've been on projects that dropped us in favor of incompetents that were happy to lie all day.

You should neglect to mention this if you value your ability to secure new contracts in the future. Focus instead on the improvements you can and do make when describing the impact you have made for customers. You just sound like a whiner if you focus on this stuff, because every large organization has a case of this disease, and is blissfully ignorant. They also don't want to hear about how they might go about curing themselves of this, despite it being a fairly well understood subject. Happy customers is largely a matter of expectations management; e.g. Don't "Break the spell". Every job, to some degree, is acting.

Aside from these sorts of jobs which have big impacts, firms will want people to implement things they percieve as not worth building permanent expertise in. These are usually trouble free, fast work. Great when you can get it.

Working with individuals and marketing yourself

If you don't like dealing with corporate buffoonery all day, you can still make it by helping individuals and small organizations out so long as you juggle many of them at a time. These inevitably come from referrals, job boards and cold reach-outs from people reading your marketing and sales materials.

Speaking of marketing, CPAN and github are my best marketing, believe it or not. Your portfolio of Open source software is usually a strong indicator of where your actual strengths as a programmer lie. I've picked up a few clients already that reached out to me cold because of this. There are a number of simple things you can do to make sure this is more effective.

You can create a repository with the same name as your github account, and the Readme.md therein will be presented instead of your normal github user page. Example: https://github.com/teodesian Try and emphasize the specific kind of projects you have taken on and how big a win they were for your prior clients and employers. You need to remember that programming is just a superpower that makes you far more efficient at a job than your manual alternative. This, or something like it, is ultimately going to be the "bottom of the funnel", and you know you got a conversion when an email appears in your inbox.

Speaking of funnels, you need to understand how online marketing works in general. For those unfamiliar, generally you have a series of sites and steps a random yahoo goes thru before they convert into a client. The top of the funnel is always going to be a search engine or content aggregator (but I repeat myself). Example: Google, Twitter, LinkedIn, Facebook To be seen at this layer, you have to be regularly producing content so that the engines consider you "relevant".

Don't post your longer form content directly on the aggregators, but link instead to your site or substack, as that further boosts you in search engines. As long as it properly presents the social meta information with an appealing picture you will be set. (if you roll your own in perl, use HTML::SocialMeta). Be sure to end your content with some sort of call to action of the form "like this? Want to hire me? etc...". Remember that your potential clients aren't mind readers.

In general you should produce an article monthly, a short video or podcast weekly, and microblog about something at least daily. The idea is to produce both generally helpful and informative content which coincidentally makes obvious your expertise and links to your marketing pages. Don't use some of the more sleazy engagement hacks that insult the readers' intelligence. You want smart customers because unlike dumb customers, they actually have money. Repeat until you have more clients than you can handle.

If you are doing this right, (I by no means am perfect at this) you should get enough clients to fill your needs within 6 months or so. If not, you can consider using ads (which is a talk in and of itself) or use a gig board, which I'll let Brett fill you in on.

How much is enough?

The most common question I get from peers thinking about hoisting the black flag and saying "arr its a contractors life for me" is "what should I charge". The short answer is pick a number for monthly income that you reasonably expect will cover your expenses even during a dry spell. For me this was $10k, because it means even if all I get is 2 months worth of solid work my yearly expenses are covered; I'm a pretty frugal guy. As you might imagine this is extremely conservative; I've beat this goal for two years running by a fair margin. Do what's right for you.

So, how does your monthly income goal translate into an hourly rate? Depends on how steady you expect the work to be. Somewhere between $100 and $200 an hour works for me to reliably achieve my goal. That said, don't be afraid to charge more than usual for work you know upside and down, or which you can tell will be especially tricky. It's far from unheard of to do "lawyer rates" of $300 to $500 an hour for things which are specifically your specialty, and it's worth every penny for the client. They ultimately pay less by hiring an expert who can get it done in a fraction of the time, and you get away with your monthly goal in a week.

Similarly don't be afraid to offer introductory rates for people who are on the fence about the subject. If it looks like they'll have plenty of work for you it's worth doing until you have proven merit. If they don't want to pay full rate past the introductory period, let them know that you can't guarantee when it gets done because you have better paying work (or looking for that) jumping in front of them. They'll either straighten up, find someone else, or...it gets done when it gets done.

Long term your goal ought to be to either a) maximize your free time to invest in building a business of your own or b) maximize your income and minimize expenses so as to accelerate savings to then plow into capital. You'll likely do b) in pursuit of a), which is really just so you can further increase your free time via exponentially increasing your income per hour of time invested. Like with any other business you start, contracting pays even more poorly than salary when you are still fishing. All that up-front investment pays off though. It helps a lot if you get a head start while still employed, but practically nobody does this and even when you think you are ready, you aren't. That said, you just have to keep at it. You will eventually build enough clients and connections to be living your best life. A good site/resource about this is called "stacking the bricks". Keep making those small wins every single day and they truly do add up to something greater than the sum of its parts. As to the books you should actually read about sales and keeping customers, I would recommend Harry Browne's "The secret of selling anything" and Carl Sewell's "Customers for Life".


What are LLMs aiming at? πŸ”— 1681422028  

🏷️ blog

Making money obviously. The question of course is how. I believe the straightest path from here to there is spam's more reputable cousin, SEO affiliate marketing. These online plagarists have conquered the top of the sales funnel for scribblers quite thoroughly. It is for this reason that things with sophisticated recommender algorithms like twitter have overtaken search engines for many hungry for the written word. The situation for video and youtube is little different.

This market for content creators is quite large, and I'm sure these aggregators would love to capture as much of the MRR from this as is possible. One straightforward way to do this is to do all the content creation yourself, but as we all know that does not scale well. LLMs have been built to solve that specific problem -- scaling content creation.

So long as the output is a variation on a theme, LLMs will eventually conquer it. Fiction & music will go first, as there are only so many archetypical stories, everything past that is embellishment and entirely de gustibus. Talking head newscasting (being little better than fiction) and pop journalism will be soon to follow. Similarly, punditry and commentary will be easily dominated, as it's already mindlessly chasing engagement mediated by algorithms. Online learning will also succumb much like technical documentation did to SEO spammers. Even more performative entertainment such as sports, video games and camming will likely be dominated by generative approaches within the decade.

All to chase that sweet, sweet subscription MRR business model that content creators have built up over the last 20 years. It's what has lead a great number of young people to claim they want to grow up to be "influencers". LLMs will gradually push prices down and result in consolidation of these forms of media production, ending this boom. The only remaining place for independent content creators will be to genuinely break new ground. Even then this will be quickly fed into the large models.

As such, I expect those of us who previously chose to engage in cookie-cutter content production (this includes much programming, being glorified glue) will be forced to either learn to drive these tools or find a new line of work. This is not necessarily a bad thing. There remain an incredible amount of things that still need doing, and more free hands will lighten that lifting. It will be inconvenient and painful for many, but it's hard to describe anyone's life without those two adjectives.

There will certainly be some blind alleys we walk down with this technology thanks to it enabling even easier mindless pursuit of irrational engagement. But this pain will not be permanent. People will adapt as always to the troubles of their time. We are, after all, still dealing with a social crisis largely brought on by pervasive legibility of our lives (read: surveillance) enabled by technology. In an era where everyone has a public "permanent record" online, people would do well to remember that forgiveness is a virtue. Perhaps automating the "internet hate machine" will make us remember.


So you want a decentralized social network πŸ”— 1674153305  

🏷️ blog 🏷️ social 🏷️ dns

Services such as mastadon and nostr are doing way, way too much. Why in the hell would you want to get into the content management and distribution game when we already have really great systems for doing that? If you want something with a chance of working, you need to do it using entirely COTS components. You are in luck, because we have all of these and the problems are well understood.

The core problem is actually content indexing, so that users can filter by author, date and tag. Software that does this (such as ElasticSearch) is very common and well understood. So what is the missing link? Content sources need to make it easier on the indexers, so that you don't have to be an industrial gorilla like Google to get it done.

How do we make this easier? Via DNS and RSS. All that's missing are TXT records to:

  1. Provide a URI with the available tags/topics/authors at the domain (authors are actually tags after all)
  2. Provide a template string which we could interpolate the above two (and date / pagination info) into in order to grab the relevant RSS feed
  3. Provide a template string describing how users can reply to given posts
Nearly any CMS can do this with no code changes whatsoever. Anyways, this allows indexers to radically reduce the work they have to do to put content into the right buckets. It similarly provides a well understood means by which clients can interact with posts from nearly any CMS.

From there retweets are actually just embeds tagged with the RT'd username and RT author. Similarly, replies are just new posts but with an author from another server, hosted locally. Facilitating this would likely require some code change on the CMS end of things, but it would be quite minimal.

The fun part is that this is so flexible, you could even make it a "meta" social network (it really is unfortunate Facebook camped this name) which pulls in posts from all the big boys. That is supposing they actually published DNS records of this kind. No such cooperation would ever be forthcoming, so such a social network would necessarily be limited to people with hosting accounts.

This is of course the core reason we do not and will not have decentralized social networking despite all the tools we need being right here, right now. This is not to say that such a system is not worth implementing, or that it would not eventually replace our existing systems.

The simple reality is that the users themselves are the core problem. The hordes of freeloaders who want free attention will always far outnumber those willing to pay for a hosting account to interact with people online. As such, having to monetize these people will necessarily result in the outcome we have today, repeated ad infinitum.

Any user of such a decentralized system would have to adjust their expectations. Are people willing to sacrifice nothing to interact with you really worthy of your time? Maybe being a bit more exclusive isn't such a bad thing. This is why the phenomenon of "group chats" has become ubiquitous, after all.

Nevertheless, I find all the group chat solutions such as Matrix to be overcomplicated. Would that they have taken such an approach to solve their coordination problems as well.


Web5 and Decentralized Identity: The next big thing? πŸ”— 1662592980  

🏷️ blog

The idea of the decentralized ID folks is to have some means to identify an online entity is who they say they are, mostly to comply with the orwellianly named Bank Secrecy Act and it's many international equivalents imposing some level of "know your customer" (read: snitching to the tax man). That's of course not the only use, but I sure as hell ain't sharing personal info if I don't have anything to gain by doing so. As such, the success of such projects are inherently hitched to whether they can dethrone the payment processors -- after all credit cards are the most real form of ID there is.

Why do these blockchain guys think they're going to succeed when email and DNS have all the tools to do precisely this right now off the shelf, but nobody does? Encrypted email is solved by putting in and adopting an RFC to slap public keys in DNS records, and then have cPanel, plesk and the email majors hop on board. You could then layer anything you really want within that encrypted protocol, and life's good right? Of course not. Good luck with reliable delivery, as encryption breaks milters totally. This is one of the key advantages to web5, as it imposes transaction costs to control spam.

Even then, it could probably work from a technical point of view. Suppose you had a "pay via email" product, where you enter in the KYC foo like for a bank account, and now your email and PGP keys are the key and door to that account. Thanks to clients not reliably sending read reciepts, some TXs will be in limbo thanks to random hellbans by servers. How long do you wait to follow up with a heartbeat? You'd probably want tell users that if you don't respond to the confirmation emails within some time, they are dropped. Which inevitably means the transaction is on hold until you are double sure, making this little better than putting in a CC on a web form.

This problem exists even in meatspace with real passports. Nations have invented all manner of excuses to prevent the free movements of peoples and goods for reasons both good and ill. Do not think that people will fail to dream up reasons to deny this decentralized identity of yours from delivering messages to their intended recipients. Much like bitcoin, all it takes is people refusing to recognize your "decentralized self sovereign identity card" at a whim. The cost to them of denying you is nothing, and the cost of accepting high. This reduces the usefulness of the whole tech stack to 0.

At the end of the day, if you don't have a credible plan to straight-up beat Visa and Mastercard in every KPI, your DeFi project will be stillborn:

  • You have to be able to handle transactions in all known jurisdictions and currencies. Prepare to drown in red tape.
  • You have to be accepted at every financial institution otherwise merchants will say no. Prepare to drown in red tape.
  • You have to have lower merchant fees, despite suffering the above red tape and Visa/MC putting their thumb on the scale to kill competitors like you
  • You have to have lower interest rates / client fees as well. Gotta get them airline miles and cash back too!
  • You have to be better at detecting/absorbing fraud, yet be as convenient as an unencrypted magnetic stripe
Seeing that laundry list, you may conclude that anyone actually trying "have to be insane" as the last bullet point. I have yet to see any such plan by any person in the web3 or "web5" space.

The most credible plan I can possibly think of (other than wait for a systemic crisis to unseat these majors) would be to make such a product and intend to sell it to Visa/MC as a solution to their own internal problems. It's either that or try and grow in jurisdictions the majors don't care about, which comes with its own unique set of problems. In that situation, email might actually be the answer.


Why is email deliverability so hard? πŸ”— 1662383538  

🏷️ blog

In short, it's because spammers use Judo throws against the mail providers who, in their exhaustion overreact. Generally during a flood of spam what providers will do is whack an entire /24 of IPs, taking the "Kill 'em all and let god sort them out" strategy. It is for similar reasons many servers flat-out block IPs originating from other countries.

Anyhow, this has lead to plenty of headache for services which market themselves on deliverability. Mailchimp, sendgrid and Amazon SES all have to keep a hold of far more IPs than they need at any given time to keep themselves squeaky clean. They also have to rate-limit and aggressively ban anyone sending what looks like spam either by AI analysis or bayesian filtering. Spammers on the other hand aren't dummies and they have vast resources at their command. It's straightforward to brute-force reverse engineer which markov chains actually get thru, as mail servers normally tell the sender why they failed to deliver a message.

At a certain scale this becomes a real problem. After a spammer has crafted a message they know will sail past the filters, they then can hook up to the "reputable" mailers as relays in parallel and shoot out a huge wave of un-interceptable spam before anyone can do anything about it. Everyone in the value chain gets mad, and massive overreactions happen thanks to this harming the bottom line.

The most important of these is to return 250 OK to messages which are known to be bad, and then silently deleting them. This leaves the spammer none the wiser. It's essentially a "hellban" where they will simply waste resources drilling a dry hole. Crafting un-interceptable messages is essentially impossible under this regime, and the spammers have to go back to their old tricks of smearing wide swaths of previously clean IPs with their garbage.

On the other hand, legitimate servers in these IP blocks get run over. Even servers which have been on the same IP for years get overwhelmed reputationally as they would have to send out far more volume of email to build enough reputation to overcome the insane volume of spam coming out of sibling IPs. Worse yet, there is no sign of anything wrong and no recourse whatsoever. You simply find out your email has NOT been delivered a few weeks later after its cost you and your business untold quantities of money.

As such the only real way to protect yourself is buy a huge IP block, and basically don't use any but one of them for mail. It's a "you must be this tall to ride" mechanism, like much of the rest of the internet has become. Either that or you have to sidle up to a firm which has a big IP block (and ruthlessly protects it) via forwarding with things like postsrsd & postforward.

In short the only cure for spam that has worked is steadily increasing the marginal cost of sending a message. Many feel anxious about this, as they realize anything they have to say is essentially going to be held captive to large firms. The weak are crushed into their usual condition of "nothing to say, and no way to say it".

As always, prosperity is a necessary condition of your freedom of action. This being a somewhat uncomfortable truth is probably why the reality of the Email situation is seldom discussed.


Web Programming seems to finally be standardizing. How did we get here? πŸ”— 1661204858  

🏷️ blog
entirely by accident

An old (ish) trick to speed up webpages is using sendfile() to DMA files to a socket. Nowadays you use SSL_sendfile and Kernel TLS, (optionally offloaded to a specialized TLS processor) but you get the idea. Bypass the CPU and just vomit data out the NIC.

Couple that with the natural speed benefits to the "god algorithm" (already knowing the answer, e.g. caching) and the strength of static rendering frameworks became clear to everyone. That said, static renderers didn't really catch on until recently and even now dynamic renderers are the overwhelming majority of pages out there. This is because building a progressive render pipeline that is actually fast and correctly invalidates caches at each step is not an immediately obvious design.

Templating engines tend to encourage this approach, as they all have some kind of #include directive. The step from there to static renders requires integration with the data model, so that re-renders can detect changes in the underlying data. Just like how strict typing helps optimize compiled programs, well-structured data aids template renderers in reasoning about when to re-render. In every imperative program this is how the actual source is linked and built. It has been fun watching JS and typescript frameworks re-learn these old lessons the hard way as they get frustrated with long build times.

The trouble then comes down to how do you serve this data up to the browser? You can't simply hand it two HTML documents, one static and the other dynamic without using things like frames (inline or via frameset). The best you can do is use JavaScript to insert data into a page. Even then, if you insert new DOM, this will be slow. It is much faster to *only* flesh out missing data in a fully formed interface, and juggle visibility based on whether the data is loaded or not.

This is obviously far from the original promise of HTML's declarative nature. Nevertheless, it means the only performant strategy is to divorce the interface from the data and fill on the client side. If there were some standard means (say via header in a HEAD request, or link tags) to instruct browsers to fetch JSON DATA sections to fill the innerText of various selectors with we could perhaps do away with nearly all XHRs and spinners on cold-loads entirely. If you could do it on cold-loads, you could also do it within documents and on-the-fly, leaving the only role for JS to be managing state transitions. Alas, that ship has probably sailed for good.

HTML has become a widget toolkit rather than means to create documents as it was originally envisioned. This happened because it was not openly trying to be a cross-platform widget toolkit and thus this aspect was not actively suppressed by the OS vendors. I don't think it's a coincidence that Javascript is now the fastest growing programming language, despite frequently being hated more than PHP over the past 20 years. Worse is better works to some degree because those engaged in anti-competitive practices don't see things that are worse than their crap as a real threat. HTML/CSS/JS was a far worse widget toolkit than any of its competitors until relatively recently.

This is not to say that the browser wars and repeated embrace, extend, extinguish attempts by Microsoft and other vendors didn't come very close to killing HTML/CSS/JS. They very much wanted their own document standards to succeed. As a result you still see a bunch of Word and PDF documents passed around online. Things stagnated for a good long time as a result of this. But when it turned out the OS vendors were too dysfunctional in the wake of the dotcom crash to actually build something better, forward motion slowly resumed.

Despite the OS vendors rightly seeing the threat to their business open web standards represented, they proved too useful to the new titans of tech. Being the things powering social (read:advertising) networks ability to reach into everyone's pockets ultimately tied the hands of OS vendors who had for decades prevented anything truly cross-platform from working well. The stars have finally aligned and the OS wars are mostly over. Hooray.

This is largely what is behind some of the questionable activities of the WHATWG. The ham-fisted imposition of DRM and slavish pursuit of what the ad networks want has sent the web down some blind alleys of late. Nevertheless it's clearly not in their interest to deliberately kneecap the web and pages being capable of performing well.

Anyways, since all this API data is going to require a hit to the CPU to stream it, it must by necessity be returned in very small chunks if it can't be delivered and stored persistently on the client side for future reference. Hopefully the entire API to serve this stuff can fit inside cache. This requires a uniform design to your backing data that can be queried simply. Dare we say with a standard query language.

What I am observing is that the only role left for programming languages other than JavaScript in the userspace is as batch processors and API servers that are glorified proxies to SQL servers. Even then Node is a strong contender for use in those jobs too. Thanks to recent developments such as tauri, we might actually get truly cross platform interfaces and even window managers out of the deal.


Technical solutions to People Problems πŸ”— 1658410198  

🏷️ blog

Oftentimes you will encounter a number of standards enforcement mechanisms to prevent the junior programmers who don't know any better (and the senior ones who should know better) from doing dumb stuff. When these are enforced at build time, it is usually quite fine, as it is not very costly. However, some of them are quite costly, as they are essentially runtime or interpreter modifications.

I grant that in a few exceptions, there is no other solution than to do so. Most of the time there is always a behavior modification which is sufficient, especially with proper incentivization. For example, do you go out and buy those fancy mitre saws that know how to stop from cutting off your finger, or do you just take care around circular saws? Of course you simply take care.

That said, at a certain scale stupidity will always creep in, and the overriding impulse is to insulate yourself and the firm from their impact. Overcomplicated safety devices and insurance schemes result, when the proper remedy is to fire careless people. Just like people will write weeks of code to avoid hours of meetings, they will also install huge and complicated gimcracks rather than confront careless people.

This impulse to avoid conflict is the root of many evils in the firm. Like in relationships, who cares if you make the other person mad? Sometimes making people feel something is the only way to get your message across. At the end of the day, they'll either stick around or leave; demonstrated preference reveals the truth obscured by clouds of emotion. And there are always more people.


Idiot-Proofing Software: Me worry? πŸ”— 1651531040  

🏷️ blog

I read a Warren Buffet quote the other day that sort of underlines the philosophy I try to take with my programs given the option:

"We try to find businesses that an idiot can run, because eventually an idiot will run it."
This applies inevitably to your programs too. I'm not saying that you should treat your customers like idiots. Idiots don't have much money and treating customers like they are upsets the smart ones that actually do have money. You must understand that they can cost you a lot of money without much effort on their part. This is the thrust of a seminal article: The fundamental laws of human stupidity.

This is why many good programs focus on having sane defaults, because that catches 80% of the stupid mistakes people make. That said, the 20% of people who are part of the "I know just enough to be dangerous" cohort (see illustration) cause 80% of the damage. Aside from the discipline that comes with age (George, why do you charge so much?), there are a few things you can do to whittle down 80% of that dangerous 20%. This usually involves erecting a Chesterton's Fence of some kind, like a --force or --dryrun option. Beyond that lies the realm of disaster recovery, as some people will just drop the table because a query failed.

This also applies to the architecture of software stacks and the business in general (as mentioned by Buffet). I see a lot of approaches advocated to the independent software vendor because "google uses it" and similar nonsense. They've got a huge blind spot they admit freely as "I can't count that low". What has resulted from this desire to "ape our betters" is an epidemic of swatting flies with elephant guns, and vault doors on crack houses. This time could have been spent building win-wins with smart customers or limiting the attack surface exploited by the dumb or malicious.

So long as you take a fairly arms-length approach with regard to the components critical to your stack, swapping one out for another more capable one is the kind of problem you like to have. This means you are scaling to the point you can afford to solve it.


uWSGI and the principle of least astonishment πŸ”— 1650324165  

🏷️ blog

I've been wanting to migrate tCMS to uWSGI for some time now because it has several nice features beyond what any one thing on CPAN offers:

  • Built in virtual hosting equivalent
  • Ability to re-load the code whenever it detects changes, simplifying deployments
  • Automatic worker scaling
  • HUP to reload when needed
  • Auto-reloads of workers after X requests (some scripts are like Patriot missile batteries, and need a reboot every 24hrs)
There are of course modules on CPAN to do any one of these. That said, getting familiar with this seems useful, given it supports many programming languages with WSGI-ish interfaces. It also has a "stats server" which gives you a lot of aggregate introspection for free, an *api* you can use to add vhosts on the fly, and more.

To get this working you need to make sure its perl plugin is installed (search your package manager of choice) or follow the compilation instructions. Once I got a good configuration file (the distinction between the socket and http-socket field is ths most obvious gotcha), I got a page loaded immediately.

Then I ran into a big problem. The way I store static renders is essentially as a raw dump of what I'd print out were it a CGI script. I open a filehandle, read until the double newline, parse the headers and pass them and the filehandle on to starman. On starman and other psgi servers on CPAN, this follows the "principle of least astonishment" and reads the filehandle as I handed it to them. uWSGI on the other hand grabs the filename from the handle and then just serves it up if the 'path' property is set (e.g. it's an IO::File instance). This obviously resulted in a double-header print.

As such, you should instead use the 'streaming' response interface for psgi (return a subroutine instead of the 3-arg arrayref). See the patch I made to do precisely that here.

Update (5/15/2022):
It turns out there's yet another point where uWSGI performs differently, and that's with how psgi.input is handled. It returns a uwsgi::input object, which behaves sort of like a filehandle, with one important exception. You can't do 3-arg read() on it. Instead, you must use the 2-arg read() method on the filehandle. This also applies to seek() and close() on input/output filehandles you play with in uwsgi.


The crisis of meaningness in the firm πŸ”— 1650129412  

🏷️ blog

A great article came across my desk this morning: Can you know too much about your organization? The TL:DR version is that a bunch of managers were tasked with organizational transformation via reconsidering their processes from first principles. What they found almost invariably shattered their mental model of the firm and their role within it. This caused a crisis within them, resulting in many of them abandoning their position of authority altogether.

This is because they derived a great deal of meaning from their profession. Like rational science had in the early 20th century, they've peeled the onion and discovered the world they thought they lived in was an illusion. Those given the greatest authority in the firm turn out to be the most powerless to effect positive change in the production process. The actual means by which decisions get made in the firm are a rats' nest of bypasses often held up by the force of will in singular individuals.

Many of these individuals (such as staff engineers) also have a crisis of meaningness when and if they realize their vast skills are essentially wasted being a glorified "glue stick" holding together a system which is perverse, and for no real purpose.

This happened to me. Coming out of QA means I was very concerned with catching things as early as possible, and thereby reducing the cost involved. This evolved into a particular interest in shaving off the "sharp corners" of software production processes, as it was time wasted on these largely preventing better early scrutiny. Paul Graham has a great article on the subject called Schlep Blindness, but the concept is well-encapsulated within Kanban.

The poster child for this in modern development organizations is using CODEOWNERS files as a means to prevent howlers from slipping by in large monorepos. Many like monorepos because it theoretically means that less time is wasted by developers hunting down code and making many PRs. Having to impose a CODEOWNERS regime in a monorepo implies that the automated testing corpus is far from adequate for catching bad changes. It instantly negates 100% of the possible advantage one can achieve through usage of a monorepo. In both situations, every second spent chasing people down to approve changesets and splitting changesets into multiple pull requests is time far better spent writing tests. This purported solution only gives the feeling that things are under control while slowly and corrosively making the problem worse.

I took a look at the PR history for one of these monorepos, and sorted it into buckets. It turns out the vast majority of changes required approval by at least 3 groups, and had at least one merge conflict result in seeking approval multiple times from the same people. Even the best case estimate of how much time was wasted here (notwithstanding how many people simply drag feet and become discouraged) was quite troubling. At least 1 man-lifetimes a year were spent on this, and this was a firm with less than a thousand developers. This amounts to human sacrifice to no productive end, and there are many more examples of this and worse lurking in the modern software development organization. Lord knows I've spent unhealthy amounts of my life dealing with bikeshedding masquerading as "standards" over the years.

It is easy to then lose heart when you consider the consequences of actually fixing these problems. Chesterton's Fence comes to mind. The problem that made this feel necessary likely hasn't (and won't) go away anytime soon, and the Lindy Effect is likely in play. This is why the managers in TFA reported huge levels of alienation and many even changed careers once they understood they were dealing with a gordian knot they could not just cut.

Similarly, most individual contributors simply "check out" mentally when they realize there's not only nobody else willing to strike the root, but all attempts to do so will be savagely punished. Like with the Rationalist's crisis of Meaningness, thinking on another level of abstraction is required to actually cut the knot.

Most seemingly intractable problems in production lines are because the approach used does not scale. Like in computer science, you must re-frame the problem. Rather than solve an NP-Hard problem, solve a subset of the problem which can be handled in linear time.

The solution to the particular problem I've used as the example here (unwieldy and untested big repos) involves understanding how they came to be so in the first place. The reality of business is that the incentive to cut corners to meet deadlines will always be present. The larger the organization becomes, the more its decision-making will resemble total acephaly and incoherence. Steps must be taken to reduce the impact of this.

To date the most effective mechanism for this has been Autocephaly. Regardless of how many corners are cut, or doctrinal corruption is tolerated in one bishopric, it cannot fully infect the body. In the modern firm this was first implemented as divisions; Peter Drucker's Concept of the Corporation covered this in 1946! The modern software firm's analog to this is called Service Oriented Archetechure.

Meta-Rational approaches are always like this. They are strong because they recognize the common and intractable failure modes present and contain them rather than attempt to stamp them out. Much of this is why both free markets and political decentralization have proven so durable. For all their faults, they effectively limit the impact of any given group going catastrophically sideways.

Nevertheless, there are always growing pains. The reality of power dynamics means that things subdivided will usually not subsequently subdivide once more until far past the point it is once again necessary. Sometimes subdivision "in name only" such as Scrum Teams occur. This introduces its' own set of pathological behavior for which entire firms base their livelihood upon servicing.

Rather than become alienated and hopeless upon discovering the reality of corporate existence, a re-orientation to not fight this flow re-establishes meaning. The participants in the firm can once again proceed forward taking pride in their corner of the great work. Even in firms which failed to scale and reverted to de-facto acephaly you can do good work when you realize what does and does not work there. Given I've had a lot of experience with the latter, I'll write a follow-up soon on how to be effective in acephalous organizations.


How computer science captured the hearts and minds of generations of scientists πŸ”— 1645974650  

🏷️ blog 🏷️ programming

The scientific method is well understood by schoolchildren in theory, but thanks to the realities of schooling systems they are rarely if ever exposed to its actual practice. This is because the business of science can be quite expensive. Every experiment takes time and nontrivial amounts of capital, much of which may be irreversibly lost in each experiment. As such, academia is far behind modern development organizations. In most cases they are not even aware to the extent that we have made great strides towards actually doing experimentation.

Some of this is due to everyone capable of making a difference toward that problem being able to achieve more gainful employment in the private sector. Most of it is due to the other hard sciences not catching up to our way of experimentation either. This is why SpaceX has been able to succeed where NASA has failed -- by applying our way to a hard science. There's also a lack of understanding at a policy level as to why it is the scientifically inclined are overwhelmingly preferring computers to concrete sciences. The Chinese government has made waves of late claiming they wish to address this, but I see no signs as of yet that they are aware how this trend occurred in the first place.

Even if it were not the case that programming is a far quicker path to life-changing income for most than the other sciences, I suspect most would still prefer it. Why this income potential exists in the first place is actually the reason for such preference. It is far, far quicker and cheaper to iterate (and thus learn from) your experiments. Our tools for peer review are also far superior to the legacy systems that still dominate in the other sciences.

Our process also systematically embraces the building of experiments (control-groups, etc) to the point we've got entire automated orchestration systems. The Dev, Staging/Testing and Production environments model works quite well when applied to the other sciences. Your development environment is little more than a crude simulator that allows you to do controlled, ceteris-paribus experiments quickly. As changes percolate upward and mix they hit the much more mutis mutandis environment of staging/testing. When you get to production your likelihood of failure is much reduced versus the alternative. When failures do happen, we "eat the dog food" and do our best to fix the problems in our simulated environments.

Where applied in the other sciences, our approach has resurrected forward momentum. Firms which do not adopt them in the coming years will be outcompeted by those that do. Similarly, countries which do not re-orient their educational systems away from rote memorization and towards guided experimental rediscovery from first principles using tools very much like ours will also fall behind.


Why am I still using certificate authorities in $CURRENT_YEAR? πŸ”— 1645057792  

🏷️ blog

Much hay has been made of late about how everyone's favorite CAs, including LetsEncrypt are worse than useless for their stated purpose of identity verification. The entire idea that this "chain of trust" prevents man-in-the middle attacks is completely nonsense, as the issuers are all capable of easily being fooled or coerced by state power on a routine basis.

I remember the good old days of self-signed certs. All the anti-self-signed hysteria was about the fact nobody read the certs, just like today. We could in fact have it much better nowadays via DNSSEC, DANE, CAA Records and CT Headers. The closest thing anyone has to identity verification is WHOIS (and anyone who opts for WHOIS privacy is a fool opening themself up to arbitrary seizure). The credit card companies are infinitely better at KYC than all the Certificate Authorities thrown together, so don't fight the system.

There's still one thing missing to completely remove the possibility of MITMs from any source other than smacking your registrar and host with a rubber hose. Post your self-signed CABundle as a TXT record. If you did so, you could implement the ultimate countermeasure to MITM attacks. Issuing a unique cert per session. Talk about perfect forward secrecy! I sure as heck would prefer to pay for a crypto accelerator card than send a dime to Certificate Authorities, being as they're little better than scams. This would also make a lot of things go whir at your friendly neighborhood gestapo agency. I wish I were shilling for $NVDA here, but alas I hold no position as of this writing.

Why nobody's thought of this incredibly simple solution is for the same reason as all my other "Why am I..." articles. It's easy to be dense when your livelihood depends on using your cranium to store old rags. Thankfully LetsEncrypt has almost totally put the CAs out of business at this point. It shouldn't be much of a step to put them out of business too.

The bigger question is how to get the browsers to reverse their scaremongering about self-signing. It will likely take dedicated lobbying to get them to support mechanisms for feeling good about self-signed CAs. LetsEncrypt is unfortunately "good enough" and has taken away the enthusiasm for further reform. I consider it unlikely that server operators and domain owners will fight for control being in their hands (where it ought to have been all along) until a major and prolonged LetsEncrypt outage.


Performance Engineering for the Layman πŸ”— 1643415182  

🏷️ blog

As my nephews are coming of age, I'm considering taking an apprentice. This has resulted in me thinking more of how I might explain programming best practices to the layman. Today I'd like to focus on performance.

Suppose you had to till, plant and water an arbitrary number of acres. Would you propose ploughing a foot, planting a seed and watering ad nauseum? I suspect not. This is because context switching costs a great deal. Indeed, the context switches involved between planting, seeding and watering will end up being the costliest action when scaling this (highly inefficient) process to many acres.

This is why batching of work is the solution everyone reaches for instinctively. It is from this fact that economic specialization developed. I can only hold so much in my own two hands and can't be in two places at once. It follows that I can produce far more washed dishes or orders being a cook or dish-washer all day than I can switching between the tasks repeatedly.

That said, doing so only makes sense at a particular scale of activity. If your operational scale can't afford specialized people or equipment you will be forced to "wear all the hats" yourself. Naturally this means that operating at a larger scale will be more efficient, as it can avoid those context switching costs.

Unfortunately, the practices adopted at small scale prove difficult to overcome. When these are embodied in programs, they are like concreting in a plumbing mistake (and thus quite costly to remedy). I have found this to be incredibly common in the systems I have worked with. The only way to avoid such problems is to insist your developers not test against trivial data-sets, but worst-case data sets.

Optimizing your search pattern

When ploughing you can choose a pattern of furroughing that ends up right where you started to minimize the cost of the eventual context switch to seeding or watering. Almost every young man has mowed a lawn and has come to this understanding naturally. Why is it then that I repeatedly see simple performance mistakes which a manual laborer would consider obvious?

For example, consider a file you are parsing to be a field, and lines to be the furroughs. If we need to make multiple passes, it will behoove us to avoid a seek to the beginning, much like we try to arrive close to the point of origin in real life. We would instead iterate in reverse over the lines. Many performance issues are essentially a failure to understand this problem. Which is to say, a cache miss. Where we need to be is not within immediate sequential reach of our working set. Now a costly context switch must be made.

All important software currently in use is precisely because it understood this, and it's competitors did not. The reason preforking webservers and then PSGI/WSGI + reverse proxies took over the world is because of this -- program startup is an important context switch. Indeed, the rise of Event-Driven programming is entirely due to this reality. It encourages the programmer to keep as much as possible in the working set, where we can get acceptable performance. Unfortunately, this is also behind the extreme bloat in working sets of programs, as proper cache loading and eviction is a hard problem.

If we wish to avoid bloat and context switches, both our data and the implements we wish to apply to it must be sequentially available to each other. Computers are in fact built to exploit this; "Deep pipelining" is essentially this concept. Unfortunately, a common abstraction which has made programming understandable to many hinders this.

Journey to flatland

Object-Orientation encourages programmers to hang a bag on the side of their data as a means of managing the complexity involved with "what should transform this" and "what state do we need to keep track of doing so". The trouble with this is that it encourages one-dimensional thinking. My plow object is calling the aerateSoil() method of the land object, which is instantiated per square foot, which calls back to the seedFurroughedSoil() method... You might laugh at this example (given the problem is so obvious with it), but nearly every "DataTable" component has this problem to some degree. Much of the slowness of the modern web is indeed tied up in this simple failure to realize they are context switching far too often.

This is not to say that object orientation is bad, but that one-dimensional thinking (as is common with those of lesser mental faculties) is bad for performance. Sometimes one-dimensional thinking is great -- every project is filled with one-dimensional problems which do not require creative thinkers to solve. We will need dishes washed until the end of time. That said, letting the dish washers design the business is probably not the smartest of moves. I wouldn't have trusted myself to design and run a restaurant back when I washed dishes for a living.

You have to consider multiple dimensions. In 2D, your data will need to be consumed in large batches. In practice, this means memoization and tight loops rather than function composition or method chaining. Problems scale beyond this -- into the third and fourth dimension, and the techniques used there are even more interesting. Almost every problem in 3 dimensions can be seen as a matrix translation, and in 4 dimensions as a series of relative shape rotations (rather than as quaternion matrix translation).

The outside view

Thankfully, this discussion of viewing things from multiple dimensions hits upon the practical approach to fixing performance problems. Running many iterations of a program with a large dataset under a profiling framework (hopefully producing flame-graphs) is the change of perspective most developers need. Considering the call stack forces you into the 2-dimensional mindset you need to be in (data over time).

This should make sense intuitively, as the example of the ploughman. He calls furrough(), seed() and water() upon the dataset consisting of many hectares of soil. Which is taking the majority of time should be made immediately obvious simply by observing how long it takes per foot of soil acted upon per call, and context switch costs.


tCMS current state and plan going forward πŸ”— 1642782163  

🏷️ blog 🏷️ tcms 🏷️ perl

The consistent theme I've been driving at with tCMS development is to transform as much of the program out of code into data. The last thing I've done in this vein was to create parent-child relationships between posts (series), and to allow posts to embed other posts within themselves. The next thing I'm interested in doing is to move the entire page structure into data as well. Recently working with javascript component-based frameworks has given me the core inspiration behind what I ought to do.

Any given page can be seen as little more than a concatenation of components in a particular order. Components themselves can be seen in the same way, simplifying rendering them to be a matter of recursive descent to build an iterator you feed to the renderer. How do I implement this with the current system?

Every post needs to support an array of components. This will necessitate a re-thinking of how the post interface itself works. I should probably have some "preview" mechanism to show an idea how the post should work after you frankenstein it together.

This will enable me to do the most significant performance improvement I can do (static renders) incredibly easily. As a page render will be little more than a SELECT CONCAT statement over a table of pre-rendered component instances for the data. To make updates cheap, we need but check the relevant post timestamps to see if anything in the recursive descent needs a re-render.

As of this writing, a render of the most complicated page of any tCMS install is taking 21ms. This should bring that time down to 2-3ms. It will also enable me to implement the feature which will turn tCMS into a best-of-breed content publishing framework. Which is to automatically syndicate each page we render to multiple CDNs and transparently redirect to them in a load-balancing fashion.

From there I see little more that needs to be done other than improving the posting interface and adding userland features. I still want all of that, but believe technical excellence comes first.


On Building Software Teams πŸ”— 1642525672  

🏷️ blog

Good production processes are always characterized by a lack of friction in intermediate stages. In software that mostly means that those involved "know each other's minds", as the friction is almost always coming as pushback during review or test. For most this doesn't come without "the feels" hitching a ride too. This can make getting there a bumpy ride, as most are incapable of articulating their boundaries without them first being crossed.

As you might imagine, any time feelings get involved, costs go through the roof. Very little productive will happen until all those chemicals flush from the system. Avoiding this involves setting expectations up-front. Which is hard, as most people are incapable of doing so for a variety of reasons.

First, most are incapable of coherently articulating their boundaries and preferences due to simple lack of spare time. This is almost always the case with those who are in "survival" (read: instinct) reaction mode, such as is the case during business emergencies. Many a new hire has failed to thrive due to being onboarded during a "permanent emergency". This is how firms dig holes they can't get out of, as they can't scale under this mindset. Such emergencies are usually caused by excessive micromanagement in the first place. If you can't "Trust the process" the firm isn't really set up to succeed.

Many others default to sub-communication of emotional state when communicating rather than directly stating their desires. They tend to only resort to direct comms when they've become so frustrated with their interlocutor that they put their thoughts together in a coherent form. Deciphering sub-communications is essentially mind-reading (especially in text communication), so I don't feel particularly bad about failing to do so, or the emotional outbursts at my failure to "just get it". Some people just need drama in their lives. It's a pity that the time wasted in this pursuit wastes so much time and money.

The most pernicious difficulty you will encounter in this endeavor is the "nice guy". These are people who simply never disclose their boundaries for fear they will be perceived in a negative light. Software is packed to the gills with these types, quietly grinding their axes for years until it explodes like a land-mine under your production process. Thankfully, they can't help but tell on themselves. Passive-aggressive commentary is almost always a sure sign some kind of covert contract is lurking in their psyche. This results in expensive re-work when their expectations are not met, or what they want clashes with what's needed.

Countermeasures

Like any other production line, you can sand off a lot of the sharp edges causing friction. This is true even when dealing with problems between the chair and keyboard. People instinctually get that no amount of whining can sway automated linters, tidiers and CI pipelines. As such you should automate as much of this process as is feasible. Much of helping people succeed is reliably pointing them in the right direction.

RPA tools and chat bots have proven indispensable here as well. People knowing that routine parts of the workflow will be handled in exactly the same manner across a division can stop resentment over arbitrariness cold. Like with automation on the IC side, some will chafe under this. It is important to remind them that like children, we are best managed via rules applied consistently. Breaking discipline even once means production stoppages.

People must also face real consequences for failing to responsibly shepherd the production process. There will always be issues found in code review, for example. Failing to resolve these (either by the submitter failing to take action, or the review committee simply sitting on changes) should be unacceptable. Similarly, failures to communicate requirements (which could obviously have been), or to ask for clarification when requirements are vague should be rooted out.

Which comes down to the fact that "no, this time is not different". Your production process, like every single other one, can benefit from a check-list. If it can't be automated, make sure you at least can't forget to think about it. Making as much about the job as possible fully explicit reduces sources of error (and hence friction).


Audit::Log released to CPAN πŸ”— 1642470899  

🏷️ video 🏷️ blog 🏷️ troglovlog 🏷️ programming 🏷️ perl
For those of you interested in parsing audit logs with perl.

Looks like I need to make some more business expenses if I want to be able to stream 4k video!

When management feels out of control: the truest test of leadership skill πŸ”— 1640314002  

🏷️ blog

A common occurrence in firms is that the production line will innovate in a way which breaks the underlying assumptions baked into the heads of those in authority. Oftentimes in software projects serving said production lines, this is manifested by a User Interface that evolves in emergent ways beyond that which was envisioned by the data model. When this inevitably leads to undefined behavior, something breaks. Sometimes, it's at an inconvenient time and the impossibly hungry judges effect kicks in. (As an aside regarding that article, "hangry people" is the most valid cause for any statistical phenomenon I've ever heard).

As such, they're on the hunt for scalps. Which means if your name is on the commit, doing the right thing and explaining the actual root cause is almost always the wrong thing. Especially when the cause is, such as in this case, due to a breakdown in communication between management and the managed. The most likely result of this is simply that coups will be counted upon you for not doing what is really wanted: a signal of submission.

Even offering a patch which will solve the immediate problem won't help. If it has come to this point they will have an emotional need to seize direct control, consequences be damned. Woe unto you if you offer the only correct solution with your patch, as that means they will choose the wrong thing simply out of spite.

Having seen this happen repeatedly in my years in corporate, it's never gone any other way. Indeed, this is yet another scenario explicitly discussed in Moral Mazes, which was written when I was knee high. Which comes to the important question: why after all these years do I persist in my impertinence? Why continue to offer sound root cause analysis, even when it is embarrassing for all involved?

Because it's worth the risk to get people mad at you. Most of the time this ends in summary termination. Sometimes, it results in sober second thought, which would not have happened without the emotional spike caused by "rubbing it in". It's best that this happens sooner rather than later when working with someone, as people who don't course correct here are ultimately incapable of greatness. I don't have long-term interest in working with people lacking the necessary maturity to do whatever it takes to smash the problems in their way. The biggest organizational impediment that exists is our own pride.


A very Troglodyne Christmas πŸ”— 1640147922  

🏷️ blog

Aside from being busy with work for clients, I haven't managed to do much writing this December due to finally digesting a few marketing insights I've been chewing on but not swallowing for the better part of a decade. Here at Troglodyne we may be thick headed, but at least we're not smrrt.

Anyways, all types of content need an emotional appeal to get anywhere. Not everyone's like me, and just wants to skip to the end and hear the answer. The people want to hear about what motivated you, as a bearded road apple, to finally shoot the damned computer out of a cannon!

Though I'm only exaggerating a little, it gets me back into a mood of prose I haven't dipped into much since I was much younger and feeling my oats. I don't think I wrote a serious essay even when I had to do so in order to make the grade. Instead, I'd viciously and cruelly make the jokes too esoteric to be detected (or at least proven guilty of cheek) by the faculty. Having grown up consuming a steady diet of MAD magazine, it's a miracle that I've managed to become such an accomplished bore.

I suppose it's a testament to how thoroughly corporate is capable of domesticating a programming community known for eccentricity. This should shock nobody, as it's the smartest dogs that are the easiest to train. That said, there are still plenty of us off the leash having a grand old time.

All my writing and on-video time has made me quite a good deal better at requiring less editing required to render the steaming heaps of drivel that you see before you. Unfortunately, I sound almost as bad as the corporate borg twerps I've been pillorying over the last year or so it's taken to de-brainwash myself away from that set of cults. It's finally time to come into my own voice, which is to say steal someone else's.

In that vein, I've generally seen a few patterns among successful tech content creators. For those interested in Quality and Testing, you generally need to embrace your mean streak. It's got that synthesis of wrath at the excrecable piles of junk we work on all day and the thrill of the hunt when you finally figure out a way to wreck that POS that feels...magnificent! This also bleeds over into programming, as the experience is always one of smashing your head into a brick wall until pure, beautiful victory is achieved just in time for the acceptance criteria to change. Really some of the best fun you can have with your pants on, I definitely recommend you try it. None of the content creators of this stripe are ever sarcastic.

Then we have my favorite kind of technical creator -- we're talking Scotty Kilmer types. Just talk about whatever the hell you feel like today, because it's always gonna be the same old bullshit...but make sure your marketing is as spectacular as humanly possible. Whether it has anything to do whatsoever with the actual content is irrelevant. Don't care, got laid. It's the hacker ethos to a T... and by T, I mean Tren! Hell, it's probably a fun game to see how misleading you can have your sizzle be and still get hits. Excuse me while I eat this whole fried catfish.

For those of you who skipped to the conclusion (like me), let me waste a bit more of your time as a special bonus gift. We've got some exciting things coming for you all in 2022! Whether or not they're actually exciting I sure am excited about them. So excited I've gotta be vague about it. Definitely not because I haven't thought of anything yet.

That reminds me, I still need to go get presents. Merry Christmas everyone!


Why am I still using a search engine in $CURRENT_YEAR? πŸ”— 1638625845  

🏷️ blog

There has been much controversy in recent times over censorship of search engines and social media. According to those engaging in this, it's done with good intentions. Whether this is true or not is missing the point. Why are we relying on a centralized search engine at all that can censor, when we've had decentralized search for a half-century?

DNS can be seen as little more than an indexing service. There is no fundamental technical reason why the exact same approach can't be taken for resources at particular domains. Every site could publish their sitemaps and tags quite easily, and many do right now. They simply upload them to search engines rather than having them be well-known records considered by peers.

A DNS model would in fact simplify search indexing a good deal, as you can drop the crawling code entirely and simply wait until at least one person accesses a resource to index it. This would put the burden of crawling/advertising their available pages on site operators themselves, pushing costs down the stack, as is appropriate in a decentralized system.

Much of the reason DNS is tolerated as a decentralized system rather than centralized is that it takes so little resources relative to the rest of the web stack. People love the idea of federation, but hate paying for it. The primary question is whether incentives align for the current parties running DNS to also index and cache content hierarchies.

The answer is obviously no, or they would be doing this right now. This is once again due to the primary browser vendor (google) having no interest in supporting such a thing, as it would actively undercut their business model. If a browser did support such a system, many content creators and web hosters would absolutely love to adopt a system with clear rules under their control rather than the morass of inconsistency that is the centralized engine's rulesets. Similarly, the ISPs and Web Hosts would happily hop on board to the idea of offering yet another service they can charge for.

Therefore the question is can the existing business model of advertising that subtly corrupts search results translate to a decentralized system? Of course it can. The trouble is that it'd be the ISPs and web hosts in the position to extract this profit. This is in fact the ray of hope in this situation, as both Google and it's competitors in the (virtualized) hosting biz could get a large piece of this pie.

So, if you wanted to know what a future with this would look like it'd be that Microsoft or Amazon forks Chrome. This has already happened in Microsoft Edge. From here it's but a matter of modifying and open-sourcing their existing indexer, and making their fork support it's use. Introducing a system of decentralized search would both hurt their competitor, and be another reason to use Azure versus GCP and Amazon. They'd likely adapt Bing to leverage this as well, to extend the benefit to all browsers.

That said, Amazon or any large host could execute on this. Much of the tech that Cloudflare uses to cache content could likely be re-purposed towards these ends as well. There's a lot of money to be made in disrupting the status quo. Whether this translates into concrete action is anyone's guess.


caldav is a solution looking for a problem πŸ”— 1637210035  

🏷️ blog

Many hours have been wasted on calendaring servers, and they still don't solve the problems people who use calendars want solved. This is because the problem is approached from the wrong direction. People think from the client to the server, as it's clients originating ics files which are then schlepped around via email. Servers allowed people to do things like free-busy for attendees and conference rooms, but required email clients to support things like itip. I'll give you one guess how that went.

This model instantaneously breaks down when you go cross-organizational. The widespread incompatibility between mailservers and no standardized way to federate directory discoverability makes this impossible. As such, the meta collapses back to schlepping around ics files. It should shock nobody that embracing this fact and giving up on free/busy and presence has been the solution that dominates. Microsoft has implemented on this approach better than anyone decades ago.

Actually solving the presence problem requires that you get federation right. Guess who's doing that? The chat apps. Both Slack and Teams have this figured out. Doing this as a plugin to matrix or snikket would actually be quite straightforward. As such my recommendation is that shared hosting software stop distributing calendaring software. They should instead distribute chat servers and good chatbots that can do things like meeting reminders.

You could even federate across incompatible chat apps and protocols via bots that know how to talk to each other. I know it would work, because it worked before computers. That's how secretaries coordinated all of it -- picking up a phone. Implementing a UI for people to use would be as simple as wrapping your secretary bot that knows how to book people and rooms.


Doing Hiring Right πŸ”— 1636831995  

🏷️ blog

Most people make Interviewing programming candidates way too much work. I've setup (and taken) more HackerRank style puzzles than a person ought to have in a lifetime. One of the problems with this tool is that it's basically handing dynamite to children. Programmers are in general an over-specific lot that delight in secrets. This results in unnecessary grief and makes you miss out on otherwise good candidates.

The purpose of Job Descriptions, Programming Tests and Phone Screens are all the same. The trouble is that most people neither understand or acknowledge this on either side of the table. They're all just spam filters. The simplest path to success for candidates is to simply spam cold approaches and leverage social proof for everything it's worth.

Rather than allow negative feelings to build up and covert contracts to form, you should simply be up-front about the reality. An "Early Frame Announcement" of why you are doing the process the way you do, and what you expect out of it helps a lot. Managing Expectations will never stop being important in your relationships with those you work with, so you need to do this from the very beginning.

Sometimes you can be too explicit with your expectations. As with anything else measured, people bend over backwards to conform to them. This can be good, when what you measure actually matters. Unfortunately, very few things which get measured in this way actually do.

Employers allow themselves to get bullshitted into hiring people very good at telling them exactly what they want to hear. They then get blindsided when they inevitably find out the candidate, like everyone else, is in fact far from perfect. That said, people do manage to "fake it till they make it" more often than not so this isn't something to get too worried about. As long as they keep doing things you want, who cares if they were less than fully truthful during the interview process? You as the interviewer can't and won't be fully disclosing the facts of the situation on the ground either. What you actually want is a system that exposes the faults and strengths of candidates as quickly as possible.

First, you need to understand that nobody believes the posted requirements in public openings (nor should they). Accept that you will just get spammed. Then you need to tightly focus on your "Must Haves" rather than a laundry list of wants. If there are more than 4 technical skills you need for a given position, the solution you want someone to implement is probably ill-designed. You can't prove an optimal solution exists for anything with more than 3 variables and can't guarantee a closed form solution exists for systems with 5 variables, after all.

If you still have too many candidates to pick from (you will), you should then treat your "want" keyword list much like you would a digital ad campaign. Try and find people with at least a pair of them and that usually winnows you down to a list worth talking to. Don't get too excited here -- you won't be shocked to find maybe 10% of even these can get past a phone screen.

The phone screen's only purpose is to determine whether or not you are being catfished, and telling people this up-front tends to result in better outcomes. Most of the technical questions here should be softballs which can be googled as they talk to you. All you want to see is that they know and care enough at this point to actually get you the right answer. This is all to lull them into a false sense of security until you slip them the hilariously obvious false question. If they don't call you out on making an obviously false statement and try to bullshit you, just hang up.

Lots of people do tests and programming puzzles in lieu of the phone screen now. This is actually a bad idea. Online tests should only be employed in lieu of phone screens when you have too many candidates. Even then, they should be done similar to what the phone screen would have been.

I personally prefer to save tests as prelim for the in person interview. I like making the in-person basically a code review of the test, as this gets you into mind-meld with the candidate quite quickly. This also more closely mimics how they will actually be working, which is what you really want to know. Making this clear to candidates up-front tends to get the best results (which is what you actually want from candidates).

Nevertheless the online code test should consist of one straightforward question, another less so. The ultimate goal is that they should take more time than allotted for most people. This can be established by administering the test to a sample of your existing staff. Be up-front that this is the point, lest they get upset and corrupt the results with cheating and other such chicanery. You should end up seeing an 80% solution from the candidates at the very least.

From here the question remains what to do with the candidates you are on the fence about. Sometimes people just get a case of nerves, or aren't quite experienced enough yet but still can do the work. It's here that you need to understand that all deals can happen at the right price. Making it clear that you're willing to take a risk on a candidate during a probationary period with introductory rates can work out quite well. It's even better when you have a candidate offer this themselves, go-getters like that almost always turn out well.

At this point you should have a candidate and a price. Now you need to take them to lunch and fish for their victim puke. The last thing you need is a whipped dog ready to snap. This is unfortunately common thanks to widespread pathological behavior running rampant in corporate and public schools.

From there the problem is making sure they fit on a team. This means you have to figure out how to make the others invest in the candidate's success and vice versa. Too often things quickly turn adversarial, resulting in bad outcomes that were totally avoidable. That is a pretty in-depth topic which is worthy of it's own post.


Why am I still passing around phone numbers in CURRENT_YEAR? πŸ”— 1636669564  

🏷️ blog

SIP and trunking to the POTS have been around for more than a decade now. The million dollar question nobody seems to be able to answer is why all our mobiles still use a number instead of addressing via DNS (e.g. user@domain). The carriers wouldn't be cut out of the loop by this, as they can remain a dumb, albeit wireless, pipe. Indeed, some carriers actually implement their systems as glorified SIP trunks on the backend.

Address book software is also plenty capable of tracking email-style addresses. Android has natively supported SIP calling such addresses for more than a decade. This means that for everyone I know using android it's as simple as setting up openPBX on my server. I could have full control and encrypted video calling tomorrow. The trouble is that everyone's favorite status symbol, the iPhone doesn't support this. Which means I couldn't communicate with half my family and many of my clients, as they're not about to install squat to please me.

It appears the reason for this lack of support is the usual modus operandi from Apple. That is to say, they have their own proprietary standard they'd prefer everyone use instead (but that is shoddy by comparison to standard software). Further complicating matters is that new and popular video conferencing firms like Zoom have also introduced yet more shoddy and incompatible software.

While Zoom can bridge to SIP clients, it costs extra, and they already trunk to the POTS at no cost, further entrenching the phone number. Skype has had a similar model for many years. FaceTime users can provide links to allow non-apple clients to call them, but not the other way around. That said, the fact that there is now an HTTP means of doing FaceTime means reverse-engineering the protocol and building a SIP bridge is but a matter of time. When PBXes are capable of appearing to be apple devices with FaceTime things will finally be "good enough" to ditch the number.

Much of the reason for the success of these non-open packages is because the cost structure is largely hidden from users. The FaceTime ecosystem is "free" past the initial phone purchase, and only the host of zoom calls generally pays for the service. By comparison, users of open software and standards bear recurring costs (and they're already paying a phone bill). Like with the telcos themselves, very few people are willing to pay for a SIP account if it's not bundled with hosting, mail, DAV and everything else.

Competing the telcos down from being vertically integrated multi-service providers to mere ISPs is the real mountain to climb here. The first major shared host to execute on this will be able to tap billions in additional MRR. When and if that day comes, I'd likely ditch the cellphone entirely in favor of superior clients on real computers.


25 most recent posts older than 1636669564
Size:
Jump to:
POTZREBIE
© 2020-2023 Troglodyne LLC