🌐
Videos Blog About Series πŸ—ΊοΈ
❓
πŸ”‘

Web5 and Decentralized Identity: The next big thing? πŸ”—
1662592980  

🏷️ crypto

The idea of the decentralized ID folks is to have some means to identify an online entity is who they say they are, mostly to comply with the orwellianly named Bank Secrecy Act and it's many international equivalents imposing some level of "know your customer" (read: snitching to the tax man). That's of course not the only use, but I sure as hell ain't sharing personal info if I don't have anything to gain by doing so. As such, the success of such projects are inherently hitched to whether they can dethrone the payment processors -- after all credit cards are the most real form of ID there is.

Why do these blockchain guys think they're going to succeed when email and DNS have all the tools to do precisely this right now off the shelf, but nobody does? Encrypted email is solved by putting in and adopting an RFC to slap public keys in DNS records, and then have cPanel, plesk and the email majors hop on board. You could then layer anything you really want within that encrypted protocol, and life's good right? Of course not. Good luck with reliable delivery, as encryption breaks milters totally. This is one of the key advantages to web5, as it imposes transaction costs to control spam.

Even then, it could probably work from a technical point of view. Suppose you had a "pay via email" product, where you enter in the KYC foo like for a bank account, and now your email and PGP keys are the key and door to that account. Thanks to clients not reliably sending read reciepts, some TXs will be in limbo thanks to random hellbans by servers. How long do you wait to follow up with a heartbeat? You'd probably want tell users that if you don't respond to the confirmation emails within some time, they are dropped. Which inevitably means the transaction is on hold until you are double sure, making this little better than putting in a CC on a web form.

This problem exists even in meatspace with real passports. Nations have invented all manner of excuses to prevent the free movements of peoples and goods for reasons both good and ill. Do not think that people will fail to dream up reasons to deny this decentralized identity of yours from delivering messages to their intended recipients. Much like bitcoin, all it takes is people refusing to recognize your "decentralized self sovereign identity card" at a whim. The cost to them of denying you is nothing, and the cost of accepting high. This reduces the usefulness of the whole tech stack to 0.

At the end of the day, if you don't have a credible plan to straight-up beat Visa and Mastercard in every KPI, your DeFi project will be stillborn:

  • You have to be able to handle transactions in all known jurisdictions and currencies. Prepare to drown in red tape.
  • You have to be accepted at every financial institution otherwise merchants will say no. Prepare to drown in red tape.
  • You have to have lower merchant fees, despite suffering the above red tape and Visa/MC putting their thumb on the scale to kill competitors like you
  • You have to have lower interest rates / client fees as well. Gotta get them airline miles and cash back too!
  • You have to be better at detecting/absorbing fraud, yet be as convenient as an unencrypted magnetic stripe
Seeing that laundry list, you may conclude that anyone actually trying "have to be insane" as the last bullet point. I have yet to see any such plan by any person in the web3 or "web5" space.

The most credible plan I can possibly think of (other than wait for a systemic crisis to unseat these majors) would be to make such a product and intend to sell it to Visa/MC as a solution to their own internal problems. It's either that or try and grow in jurisdictions the majors don't care about, which comes with its own unique set of problems. In that situation, email might actually be the answer.


Why is email deliverability so hard? πŸ”—
1662383538  

🏷️ email

In short, it's because spammers use Judo throws against the mail providers who, in their exhaustion overreact. Generally during a flood of spam what providers will do is whack an entire /24 of IPs, taking the "Kill 'em all and let god sort them out" strategy. It is for similar reasons many servers flat-out block IPs originating from other countries.

Anyhow, this has lead to plenty of headache for services which market themselves on deliverability. Mailchimp, sendgrid and Amazon SES all have to keep a hold of far more IPs than they need at any given time to keep themselves squeaky clean. They also have to rate-limit and aggressively ban anyone sending what looks like spam either by AI analysis or bayesian filtering. Spammers on the other hand aren't dummies and they have vast resources at their command. It's straightforward to brute-force reverse engineer which markov chains actually get thru, as mail servers normally tell the sender why they failed to deliver a message.

At a certain scale this becomes a real problem. After a spammer has crafted a message they know will sail past the filters, they then can hook up to the "reputable" mailers as relays in parallel and shoot out a huge wave of un-interceptable spam before anyone can do anything about it. Everyone in the value chain gets mad, and massive overreactions happen thanks to this harming the bottom line.

The most important of these is to return 250 OK to messages which are known to be bad, and then silently deleting them. This leaves the spammer none the wiser. It's essentially a "hellban" where they will simply waste resources drilling a dry hole. Crafting un-interceptable messages is essentially impossible under this regime, and the spammers have to go back to their old tricks of smearing wide swaths of previously clean IPs with their garbage.

On the other hand, legitimate servers in these IP blocks get run over. Even servers which have been on the same IP for years get overwhelmed reputationally as they would have to send out far more volume of email to build enough reputation to overcome the insane volume of spam coming out of sibling IPs. Worse yet, there is no sign of anything wrong and no recourse whatsoever. You simply find out your email has NOT been delivered a few weeks later after its cost you and your business untold quantities of money.

As such the only real way to protect yourself is buy a huge IP block, and basically don't use any but one of them for mail. It's a "you must be this tall to ride" mechanism, like much of the rest of the internet has become. Either that or you have to sidle up to a firm which has a big IP block (and ruthlessly protects it) via forwarding with things like postsrsd & postforward.

In short the only cure for spam that has worked is steadily increasing the marginal cost of sending a message. Many feel anxious about this, as they realize anything they have to say is essentially going to be held captive to large firms. The weak are crushed into their usual condition of "nothing to say, and no way to say it".

As always, prosperity is a necessary condition of your freedom of action. This being a somewhat uncomfortable truth is probably why the reality of the Email situation is seldom discussed.


Web Programming seems to finally be standardizing. How did we get here? πŸ”—
1661204858  

🏷️ www
entirely by accident

An old (ish) trick to speed up webpages is using sendfile() to DMA files to a socket. Nowadays you use SSL_sendfile and Kernel TLS, (optionally offloaded to a specialized TLS processor) but you get the idea. Bypass the CPU and just vomit data out the NIC.

Couple that with the natural speed benefits to the "god algorithm" (already knowing the answer, e.g. caching) and the strength of static rendering frameworks became clear to everyone. That said, static renderers didn't really catch on until recently and even now dynamic renderers are the overwhelming majority of pages out there. This is because building a progressive render pipeline that is actually fast and correctly invalidates caches at each step is not an immediately obvious design.

Templating engines tend to encourage this approach, as they all have some kind of #include directive. The step from there to static renders requires integration with the data model, so that re-renders can detect changes in the underlying data. Just like how strict typing helps optimize compiled programs, well-structured data aids template renderers in reasoning about when to re-render. In every imperative program this is how the actual source is linked and built. It has been fun watching JS and typescript frameworks re-learn these old lessons the hard way as they get frustrated with long build times.

The trouble then comes down to how do you serve this data up to the browser? You can't simply hand it two HTML documents, one static and the other dynamic without using things like frames (inline or via frameset). The best you can do is use JavaScript to insert data into a page. Even then, if you insert new DOM, this will be slow. It is much faster to *only* flesh out missing data in a fully formed interface, and juggle visibility based on whether the data is loaded or not.

This is obviously far from the original promise of HTML's declarative nature. Nevertheless, it means the only performant strategy is to divorce the interface from the data and fill on the client side. If there were some standard means (say via header in a HEAD request, or link tags) to instruct browsers to fetch JSON DATA sections to fill the innerText of various selectors with we could perhaps do away with nearly all XHRs and spinners on cold-loads entirely. If you could do it on cold-loads, you could also do it within documents and on-the-fly, leaving the only role for JS to be managing state transitions. Alas, that ship has probably sailed for good.

HTML has become a widget toolkit rather than means to create documents as it was originally envisioned. This happened because it was not openly trying to be a cross-platform widget toolkit and thus this aspect was not actively suppressed by the OS vendors. I don't think it's a coincidence that Javascript is now the fastest growing programming language, despite frequently being hated more than PHP over the past 20 years. Worse is better works to some degree because those engaged in anti-competitive practices don't see things that are worse than their crap as a real threat. HTML/CSS/JS was a far worse widget toolkit than any of its competitors until relatively recently.

This is not to say that the browser wars and repeated embrace, extend, extinguish attempts by Microsoft and other vendors didn't come very close to killing HTML/CSS/JS. They very much wanted their own document standards to succeed. As a result you still see a bunch of Word and PDF documents passed around online. Things stagnated for a good long time as a result of this. But when it turned out the OS vendors were too dysfunctional in the wake of the dotcom crash to actually build something better, forward motion slowly resumed.

Despite the OS vendors rightly seeing the threat to their business open web standards represented, they proved too useful to the new titans of tech. Being the things powering social (read:advertising) networks ability to reach into everyone's pockets ultimately tied the hands of OS vendors who had for decades prevented anything truly cross-platform from working well. The stars have finally aligned and the OS wars are mostly over. Hooray.

This is largely what is behind some of the questionable activities of the WHATWG. The ham-fisted imposition of DRM and slavish pursuit of what the ad networks want has sent the web down some blind alleys of late. Nevertheless it's clearly not in their interest to deliberately kneecap the web and pages being capable of performing well.

Anyways, since all this API data is going to require a hit to the CPU to stream it, it must by necessity be returned in very small chunks if it can't be delivered and stored persistently on the client side for future reference. Hopefully the entire API to serve this stuff can fit inside cache. This requires a uniform design to your backing data that can be queried simply. Dare we say with a standard query language.

What I am observing is that the only role left for programming languages other than JavaScript in the userspace is as batch processors and API servers that are glorified proxies to SQL servers. Even then Node is a strong contender for use in those jobs too. Thanks to recent developments such as tauri, we might actually get truly cross platform interfaces and even window managers out of the deal.


Technical solutions to People Problems πŸ”—
1658410198  

🏷️ corporate

Oftentimes you will encounter a number of standards enforcement mechanisms to prevent the junior programmers who don't know any better (and the senior ones who should know better) from doing dumb stuff. When these are enforced at build time, it is usually quite fine, as it is not very costly. However, some of them are quite costly, as they are essentially runtime or interpreter modifications.

I grant that in a few exceptions, there is no other solution than to do so. Most of the time there is always a behavior modification which is sufficient, especially with proper incentivization. For example, do you go out and buy those fancy mitre saws that know how to stop from cutting off your finger, or do you just take care around circular saws? Of course you simply take care.

That said, at a certain scale stupidity will always creep in, and the overriding impulse is to insulate yourself and the firm from their impact. Overcomplicated safety devices and insurance schemes result, when the proper remedy is to fire careless people. Just like people will write weeks of code to avoid hours of meetings, they will also install huge and complicated gimcracks rather than confront careless people.

This impulse to avoid conflict is the root of many evils in the firm. Like in relationships, who cares if you make the other person mad? Sometimes making people feel something is the only way to get your message across. At the end of the day, they'll either stick around or leave; demonstrated preference reveals the truth obscured by clouds of emotion. And there are always more people.


Idiot-Proofing Software: Me worry? πŸ”—
1651531040  

🏷️ programming

I read a Warren Buffet quote the other day that sort of underlines the philosophy I try to take with my programs given the option:

"We try to find businesses that an idiot can run, because eventually an idiot will run it."
This applies inevitably to your programs too. I'm not saying that you should treat your customers like idiots. Idiots don't have much money and treating customers like they are upsets the smart ones that actually do have money. You must understand that they can cost you a lot of money without much effort on their part. This is the thrust of a seminal article: The fundamental laws of human stupidity.

This is why many good programs focus on having sane defaults, because that catches 80% of the stupid mistakes people make. That said, the 20% of people who are part of the "I know just enough to be dangerous" cohort (see illustration) cause 80% of the damage. Aside from the discipline that comes with age (George, why do you charge so much?), there are a few things you can do to whittle down 80% of that dangerous 20%. This usually involves erecting a Chesterton's Fence of some kind, like a --force or --dryrun option. Beyond that lies the realm of disaster recovery, as some people will just drop the table because a query failed.

This also applies to the architecture of software stacks and the business in general (as mentioned by Buffet). I see a lot of approaches advocated to the independent software vendor because "google uses it" and similar nonsense. They've got a huge blind spot they admit freely as "I can't count that low". What has resulted from this desire to "ape our betters" is an epidemic of swatting flies with elephant guns, and vault doors on crack houses. This time could have been spent building win-wins with smart customers or limiting the attack surface exploited by the dumb or malicious.

So long as you take a fairly arms-length approach with regard to the components critical to your stack, swapping one out for another more capable one is the kind of problem you like to have. This means you are scaling to the point you can afford to solve it.


uWSGI and the principle of least astonishment πŸ”—
1650324165  

🏷️ www 🏷️ perl 🏷️ uwsgi

I've been wanting to migrate tCMS to uWSGI for some time now because it has several nice features beyond what any one thing on CPAN offers:

  • Built in virtual hosting equivalent
  • Ability to re-load the code whenever it detects changes, simplifying deployments
  • Automatic worker scaling
  • HUP to reload when needed
  • Auto-reloads of workers after X requests (some scripts are like Patriot missile batteries, and need a reboot every 24hrs)
There are of course modules on CPAN to do any one of these. That said, getting familiar with this seems useful, given it supports many programming languages with WSGI-ish interfaces. It also has a "stats server" which gives you a lot of aggregate introspection for free, an *api* you can use to add vhosts on the fly, and more.

To get this working you need to make sure its perl plugin is installed (search your package manager of choice) or follow the compilation instructions. Once I got a good configuration file (the distinction between the socket and http-socket field is ths most obvious gotcha), I got a page loaded immediately.

Then I ran into a big problem. The way I store static renders is essentially as a raw dump of what I'd print out were it a CGI script. I open a filehandle, read until the double newline, parse the headers and pass them and the filehandle on to starman. On starman and other psgi servers on CPAN, this follows the "principle of least astonishment" and reads the filehandle as I handed it to them. uWSGI on the other hand grabs the filename from the handle and then just serves it up if the 'path' property is set (e.g. it's an IO::File instance). This obviously resulted in a double-header print.

As such, you should instead use the 'streaming' response interface for psgi (return a subroutine instead of the 3-arg arrayref). See the patch I made to do precisely that here.

Update (5/15/2022):
It turns out there's yet another point where uWSGI performs differently, and that's with how psgi.input is handled. It returns a uwsgi::input object, which behaves sort of like a filehandle, with one important exception. You can't do 3-arg read() on it. Instead, you must use the 2-arg read() method on the filehandle. This also applies to seek() and close() on input/output filehandles you play with in uwsgi.


The crisis of meaningness in the firm πŸ”—
1650129412  

🏷️ corporate

A great article came across my desk this morning: Can you know too much about your organization? The TL:DR version is that a bunch of managers were tasked with organizational transformation via reconsidering their processes from first principles. What they found almost invariably shattered their mental model of the firm and their role within it. This caused a crisis within them, resulting in many of them abandoning their position of authority altogether.

This is because they derived a great deal of meaning from their profession. Like rational science had in the early 20th century, they've peeled the onion and discovered the world they thought they lived in was an illusion. Those given the greatest authority in the firm turn out to be the most powerless to effect positive change in the production process. The actual means by which decisions get made in the firm are a rats' nest of bypasses often held up by the force of will in singular individuals.

Many of these individuals (such as staff engineers) also have a crisis of meaningness when and if they realize their vast skills are essentially wasted being a glorified "glue stick" holding together a system which is perverse, and for no real purpose.

This happened to me. Coming out of QA means I was very concerned with catching things as early as possible, and thereby reducing the cost involved. This evolved into a particular interest in shaving off the "sharp corners" of software production processes, as it was time wasted on these largely preventing better early scrutiny. Paul Graham has a great article on the subject called Schlep Blindness, but the concept is well-encapsulated within Kanban.

The poster child for this in modern development organizations is using CODEOWNERS files as a means to prevent howlers from slipping by in large monorepos. Many like monorepos because it theoretically means that less time is wasted by developers hunting down code and making many PRs. Having to impose a CODEOWNERS regime in a monorepo implies that the automated testing corpus is far from adequate for catching bad changes. It instantly negates 100% of the possible advantage one can achieve through usage of a monorepo. In both situations, every second spent chasing people down to approve changesets and splitting changesets into multiple pull requests is time far better spent writing tests. This purported solution only gives the feeling that things are under control while slowly and corrosively making the problem worse.

I took a look at the PR history for one of these monorepos, and sorted it into buckets. It turns out the vast majority of changes required approval by at least 3 groups, and had at least one merge conflict result in seeking approval multiple times from the same people. Even the best case estimate of how much time was wasted here (notwithstanding how many people simply drag feet and become discouraged) was quite troubling. At least 1 man-lifetimes a year were spent on this, and this was a firm with less than a thousand developers. This amounts to human sacrifice to no productive end, and there are many more examples of this and worse lurking in the modern software development organization. Lord knows I've spent unhealthy amounts of my life dealing with bikeshedding masquerading as "standards" over the years.

It is easy to then lose heart when you consider the consequences of actually fixing these problems. Chesterton's Fence comes to mind. The problem that made this feel necessary likely hasn't (and won't) go away anytime soon, and the Lindy Effect is likely in play. This is why the managers in TFA reported huge levels of alienation and many even changed careers once they understood they were dealing with a gordian knot they could not just cut.

Similarly, most individual contributors simply "check out" mentally when they realize there's not only nobody else willing to strike the root, but all attempts to do so will be savagely punished. Like with the Rationalist's crisis of Meaningness, thinking on another level of abstraction is required to actually cut the knot.

Most seemingly intractable problems in production lines are because the approach used does not scale. Like in computer science, you must re-frame the problem. Rather than solve an NP-Hard problem, solve a subset of the problem which can be handled in linear time.

The solution to the particular problem I've used as the example here (unwieldy and untested big repos) involves understanding how they came to be so in the first place. The reality of business is that the incentive to cut corners to meet deadlines will always be present. The larger the organization becomes, the more its decision-making will resemble total acephaly and incoherence. Steps must be taken to reduce the impact of this.

To date the most effective mechanism for this has been Autocephaly. Regardless of how many corners are cut, or doctrinal corruption is tolerated in one bishopric, it cannot fully infect the body. In the modern firm this was first implemented as divisions; Peter Drucker's Concept of the Corporation covered this in 1946! The modern software firm's analog to this is called Service Oriented Archetechure.

Meta-Rational approaches are always like this. They are strong because they recognize the common and intractable failure modes present and contain them rather than attempt to stamp them out. Much of this is why both free markets and political decentralization have proven so durable. For all their faults, they effectively limit the impact of any given group going catastrophically sideways.

Nevertheless, there are always growing pains. The reality of power dynamics means that things subdivided will usually not subsequently subdivide once more until far past the point it is once again necessary. Sometimes subdivision "in name only" such as Scrum Teams occur. This introduces its' own set of pathological behavior for which entire firms base their livelihood upon servicing.

Rather than become alienated and hopeless upon discovering the reality of corporate existence, a re-orientation to not fight this flow re-establishes meaning. The participants in the firm can once again proceed forward taking pride in their corner of the great work. Even in firms which failed to scale and reverted to de-facto acephaly you can do good work when you realize what does and does not work there. Given I've had a lot of experience with the latter, I'll write a follow-up soon on how to be effective in acephalous organizations.


How computer science captured the hearts and minds of generations of scientists πŸ”—
1645974650  

🏷️ programming

The scientific method is well understood by schoolchildren in theory, but thanks to the realities of schooling systems they are rarely if ever exposed to its actual practice. This is because the business of science can be quite expensive. Every experiment takes time and nontrivial amounts of capital, much of which may be irreversibly lost in each experiment. As such, academia is far behind modern development organizations. In most cases they are not even aware to the extent that we have made great strides towards actually doing experimentation.

Some of this is due to everyone capable of making a difference toward that problem being able to achieve more gainful employment in the private sector. Most of it is due to the other hard sciences not catching up to our way of experimentation either. This is why SpaceX has been able to succeed where NASA has failed -- by applying our way to a hard science. There's also a lack of understanding at a policy level as to why it is the scientifically inclined are overwhelmingly preferring computers to concrete sciences. The Chinese government has made waves of late claiming they wish to address this, but I see no signs as of yet that they are aware how this trend occurred in the first place.

Even if it were not the case that programming is a far quicker path to life-changing income for most than the other sciences, I suspect most would still prefer it. Why this income potential exists in the first place is actually the reason for such preference. It is far, far quicker and cheaper to iterate (and thus learn from) your experiments. Our tools for peer review are also far superior to the legacy systems that still dominate in the other sciences.

Our process also systematically embraces the building of experiments (control-groups, etc) to the point we've got entire automated orchestration systems. The Dev, Staging/Testing and Production environments model works quite well when applied to the other sciences. Your development environment is little more than a crude simulator that allows you to do controlled, ceteris-paribus experiments quickly. As changes percolate upward and mix they hit the much more mutis mutandis environment of staging/testing. When you get to production your likelihood of failure is much reduced versus the alternative. When failures do happen, we "eat the dog food" and do our best to fix the problems in our simulated environments.

Where applied in the other sciences, our approach has resurrected forward momentum. Firms which do not adopt them in the coming years will be outcompeted by those that do. Similarly, countries which do not re-orient their educational systems away from rote memorization and towards guided experimental rediscovery from first principles using tools very much like ours will also fall behind.


Why am I still using certificate authorities in $CURRENT_YEAR? πŸ”—
1645057792  

🏷️ ssl 🏷️ www 🏷️ dns

Much hay has been made of late about how everyone's favorite CAs, including LetsEncrypt are worse than useless for their stated purpose of identity verification. The entire idea that this "chain of trust" prevents man-in-the middle attacks is completely nonsense, as the issuers are all capable of easily being fooled or coerced by state power on a routine basis.

I remember the good old days of self-signed certs. All the anti-self-signed hysteria was about the fact nobody read the certs, just like today. We could in fact have it much better nowadays via DNSSEC, DANE, CAA Records and CT Headers. The closest thing anyone has to identity verification is WHOIS (and anyone who opts for WHOIS privacy is a fool opening themself up to arbitrary seizure). The credit card companies are infinitely better at KYC than all the Certificate Authorities thrown together, so don't fight the system.

There's still one thing missing to completely remove the possibility of MITMs from any source other than smacking your registrar and host with a rubber hose. Post your self-signed CABundle as a TXT record. If you did so, you could implement the ultimate countermeasure to MITM attacks. Issuing a unique cert per session. Talk about perfect forward secrecy! I sure as heck would prefer to pay for a crypto accelerator card than send a dime to Certificate Authorities, being as they're little better than scams. This would also make a lot of things go whir at your friendly neighborhood gestapo agency. I wish I were shilling for $NVDA here, but alas I hold no position as of this writing.

Why nobody's thought of this incredibly simple solution is for the same reason as all my other "Why am I..." articles. It's easy to be dense when your livelihood depends on using your cranium to store old rags. Thankfully LetsEncrypt has almost totally put the CAs out of business at this point. It shouldn't be much of a step to put them out of business too.

The bigger question is how to get the browsers to reverse their scaremongering about self-signing. It will likely take dedicated lobbying to get them to support mechanisms for feeling good about self-signed CAs. LetsEncrypt is unfortunately "good enough" and has taken away the enthusiasm for further reform. I consider it unlikely that server operators and domain owners will fight for control being in their hands (where it ought to have been all along) until a major and prolonged LetsEncrypt outage.


Performance Engineering for the Layman πŸ”—
1643415182  

🏷️ programming

As my nephews are coming of age, I'm considering taking an apprentice. This has resulted in me thinking more of how I might explain programming best practices to the layman. Today I'd like to focus on performance.

Suppose you had to till, plant and water an arbitrary number of acres. Would you propose ploughing a foot, planting a seed and watering ad nauseum? I suspect not. This is because context switching costs a great deal. Indeed, the context switches involved between planting, seeding and watering will end up being the costliest action when scaling this (highly inefficient) process to many acres.

This is why batching of work is the solution everyone reaches for instinctively. It is from this fact that economic specialization developed. I can only hold so much in my own two hands and can't be in two places at once. It follows that I can produce far more washed dishes or orders being a cook or dish-washer all day than I can switching between the tasks repeatedly.

That said, doing so only makes sense at a particular scale of activity. If your operational scale can't afford specialized people or equipment you will be forced to "wear all the hats" yourself. Naturally this means that operating at a larger scale will be more efficient, as it can avoid those context switching costs.

Unfortunately, the practices adopted at small scale prove difficult to overcome. When these are embodied in programs, they are like concreting in a plumbing mistake (and thus quite costly to remedy). I have found this to be incredibly common in the systems I have worked with. The only way to avoid such problems is to insist your developers not test against trivial data-sets, but worst-case data sets.

Optimizing your search pattern

When ploughing you can choose a pattern of furroughing that ends up right where you started to minimize the cost of the eventual context switch to seeding or watering. Almost every young man has mowed a lawn and has come to this understanding naturally. Why is it then that I repeatedly see simple performance mistakes which a manual laborer would consider obvious?

For example, consider a file you are parsing to be a field, and lines to be the furroughs. If we need to make multiple passes, it will behoove us to avoid a seek to the beginning, much like we try to arrive close to the point of origin in real life. We would instead iterate in reverse over the lines. Many performance issues are essentially a failure to understand this problem. Which is to say, a cache miss. Where we need to be is not within immediate sequential reach of our working set. Now a costly context switch must be made.

All important software currently in use is precisely because it understood this, and it's competitors did not. The reason preforking webservers and then PSGI/WSGI + reverse proxies took over the world is because of this -- program startup is an important context switch. Indeed, the rise of Event-Driven programming is entirely due to this reality. It encourages the programmer to keep as much as possible in the working set, where we can get acceptable performance. Unfortunately, this is also behind the extreme bloat in working sets of programs, as proper cache loading and eviction is a hard problem.

If we wish to avoid bloat and context switches, both our data and the implements we wish to apply to it must be sequentially available to each other. Computers are in fact built to exploit this; "Deep pipelining" is essentially this concept. Unfortunately, a common abstraction which has made programming understandable to many hinders this.

Journey to flatland

Object-Orientation encourages programmers to hang a bag on the side of their data as a means of managing the complexity involved with "what should transform this" and "what state do we need to keep track of doing so". The trouble with this is that it encourages one-dimensional thinking. My plow object is calling the aerateSoil() method of the land object, which is instantiated per square foot, which calls back to the seedFurroughedSoil() method... You might laugh at this example (given the problem is so obvious with it), but nearly every "DataTable" component has this problem to some degree. Much of the slowness of the modern web is indeed tied up in this simple failure to realize they are context switching far too often.

This is not to say that object orientation is bad, but that one-dimensional thinking (as is common with those of lesser mental faculties) is bad for performance. Sometimes one-dimensional thinking is great -- every project is filled with one-dimensional problems which do not require creative thinkers to solve. We will need dishes washed until the end of time. That said, letting the dish washers design the business is probably not the smartest of moves. I wouldn't have trusted myself to design and run a restaurant back when I washed dishes for a living.

You have to consider multiple dimensions. In 2D, your data will need to be consumed in large batches. In practice, this means memoization and tight loops rather than function composition or method chaining. Problems scale beyond this -- into the third and fourth dimension, and the techniques used there are even more interesting. Almost every problem in 3 dimensions can be seen as a matrix translation, and in 4 dimensions as a series of relative shape rotations (rather than as quaternion matrix translation).

The outside view

Thankfully, this discussion of viewing things from multiple dimensions hits upon the practical approach to fixing performance problems. Running many iterations of a program with a large dataset under a profiling framework (hopefully producing flame-graphs) is the change of perspective most developers need. Considering the call stack forces you into the 2-dimensional mindset you need to be in (data over time).

This should make sense intuitively, as the example of the ploughman. He calls furrough(), seed() and water() upon the dataset consisting of many hectares of soil. Which is taking the majority of time should be made immediately obvious simply by observing how long it takes per foot of soil acted upon per call, and context switch costs.


tCMS current state and plan going forward πŸ”— 1642782163  

🏷️ tcms 🏷️ perl

The consistent theme I've been driving at with tCMS development is to transform as much of the program out of code into data. The last thing I've done in this vein was to create parent-child relationships between posts (series), and to allow posts to embed other posts within themselves. The next thing I'm interested in doing is to move the entire page structure into data as well. Recently working with javascript component-based frameworks has given me the core inspiration behind what I ought to do.

Any given page can be seen as little more than a concatenation of components in a particular order. Components themselves can be seen in the same way, simplifying rendering them to be a matter of recursive descent to build an iterator you feed to the renderer. How do I implement this with the current system?

Every post needs to support an array of components. This will necessitate a re-thinking of how the post interface itself works. I should probably have some "preview" mechanism to show an idea how the post should work after you frankenstein it together.

This will enable me to do the most significant performance improvement I can do (static renders) incredibly easily. As a page render will be little more than a SELECT CONCAT statement over a table of pre-rendered component instances for the data. To make updates cheap, we need but check the relevant post timestamps to see if anything in the recursive descent needs a re-render.

As of this writing, a render of the most complicated page of any tCMS install is taking 21ms. This should bring that time down to 2-3ms. It will also enable me to implement the feature which will turn tCMS into a best-of-breed content publishing framework. Which is to automatically syndicate each page we render to multiple CDNs and transparently redirect to them in a load-balancing fashion.

From there I see little more that needs to be done other than improving the posting interface and adding userland features. I still want all of that, but believe technical excellence comes first.


On Building Software Teams πŸ”—
1642525672  

🏷️ corporate

Good production processes are always characterized by a lack of friction in intermediate stages. In software that mostly means that those involved "know each other's minds", as the friction is almost always coming as pushback during review or test. For most this doesn't come without "the feels" hitching a ride too. This can make getting there a bumpy ride, as most are incapable of articulating their boundaries without them first being crossed.

As you might imagine, any time feelings get involved, costs go through the roof. Very little productive will happen until all those chemicals flush from the system. Avoiding this involves setting expectations up-front. Which is hard, as most people are incapable of doing so for a variety of reasons.

First, most are incapable of coherently articulating their boundaries and preferences due to simple lack of spare time. This is almost always the case with those who are in "survival" (read: instinct) reaction mode, such as is the case during business emergencies. Many a new hire has failed to thrive due to being onboarded during a "permanent emergency". This is how firms dig holes they can't get out of, as they can't scale under this mindset. Such emergencies are usually caused by excessive micromanagement in the first place. If you can't "Trust the process" the firm isn't really set up to succeed.

Many others default to sub-communication of emotional state when communicating rather than directly stating their desires. They tend to only resort to direct comms when they've become so frustrated with their interlocutor that they put their thoughts together in a coherent form. Deciphering sub-communications is essentially mind-reading (especially in text communication), so I don't feel particularly bad about failing to do so, or the emotional outbursts at my failure to "just get it". Some people just need drama in their lives. It's a pity that the time wasted in this pursuit wastes so much time and money.

The most pernicious difficulty you will encounter in this endeavor is the "nice guy". These are people who simply never disclose their boundaries for fear they will be perceived in a negative light. Software is packed to the gills with these types, quietly grinding their axes for years until it explodes like a land-mine under your production process. Thankfully, they can't help but tell on themselves. Passive-aggressive commentary is almost always a sure sign some kind of covert contract is lurking in their psyche. This results in expensive re-work when their expectations are not met, or what they want clashes with what's needed.

Countermeasures

Like any other production line, you can sand off a lot of the sharp edges causing friction. This is true even when dealing with problems between the chair and keyboard. People instinctually get that no amount of whining can sway automated linters, tidiers and CI pipelines. As such you should automate as much of this process as is feasible. Much of helping people succeed is reliably pointing them in the right direction.

RPA tools and chat bots have proven indispensable here as well. People knowing that routine parts of the workflow will be handled in exactly the same manner across a division can stop resentment over arbitrariness cold. Like with automation on the IC side, some will chafe under this. It is important to remind them that like children, we are best managed via rules applied consistently. Breaking discipline even once means production stoppages.

People must also face real consequences for failing to responsibly shepherd the production process. There will always be issues found in code review, for example. Failing to resolve these (either by the submitter failing to take action, or the review committee simply sitting on changes) should be unacceptable. Similarly, failures to communicate requirements (which could obviously have been), or to ask for clarification when requirements are vague should be rooted out.

Which comes down to the fact that "no, this time is not different". Your production process, like every single other one, can benefit from a check-list. If it can't be automated, make sure you at least can't forget to think about it. Making as much about the job as possible fully explicit reduces sources of error (and hence friction).


Audit::Log released to CPAN πŸ”— 1642470899  

🏷️ video 🏷️ troglovlog 🏷️ programming 🏷️ perl
For those of you interested in parsing audit logs with perl.

Looks like I need to make some more business expenses if I want to be able to stream 4k video!

When management feels out of control: the truest test of leadership skill πŸ”—
1640314002  

🏷️ corporate

A common occurrence in firms is that the production line will innovate in a way which breaks the underlying assumptions baked into the heads of those in authority. Oftentimes in software projects serving said production lines, this is manifested by a User Interface that evolves in emergent ways beyond that which was envisioned by the data model. When this inevitably leads to undefined behavior, something breaks. Sometimes, it's at an inconvenient time and the impossibly hungry judges effect kicks in. (As an aside regarding that article, "hangry people" is the most valid cause for any statistical phenomenon I've ever heard).

As such, they're on the hunt for scalps. Which means if your name is on the commit, doing the right thing and explaining the actual root cause is almost always the wrong thing. Especially when the cause is, such as in this case, due to a breakdown in communication between management and the managed. The most likely result of this is simply that coups will be counted upon you for not doing what is really wanted: a signal of submission.

Even offering a patch which will solve the immediate problem won't help. If it has come to this point they will have an emotional need to seize direct control, consequences be damned. Woe unto you if you offer the only correct solution with your patch, as that means they will choose the wrong thing simply out of spite.

Having seen this happen repeatedly in my years in corporate, it's never gone any other way. Indeed, this is yet another scenario explicitly discussed in Moral Mazes, which was written when I was knee high. Which comes to the important question: why after all these years do I persist in my impertinence? Why continue to offer sound root cause analysis, even when it is embarrassing for all involved?

Because it's worth the risk to get people mad at you. Most of the time this ends in summary termination. Sometimes, it results in sober second thought, which would not have happened without the emotional spike caused by "rubbing it in". It's best that this happens sooner rather than later when working with someone, as people who don't course correct here are ultimately incapable of greatness. I don't have long-term interest in working with people lacking the necessary maturity to do whatever it takes to smash the problems in their way. The biggest organizational impediment that exists is our own pride.


A very Troglodyne Christmas πŸ”—
1640147922  

🏷️ humor

Aside from being busy with work for clients, I haven't managed to do much writing this December due to finally digesting a few marketing insights I've been chewing on but not swallowing for the better part of a decade. Here at Troglodyne we may be thick headed, but at least we're not smrrt.

Anyways, all types of content need an emotional appeal to get anywhere. Not everyone's like me, and just wants to skip to the end and hear the answer. The people want to hear about what motivated you, as a bearded road apple, to finally shoot the damned computer out of a cannon!

Though I'm only exaggerating a little, it gets me back into a mood of prose I haven't dipped into much since I was much younger and feeling my oats. I don't think I wrote a serious essay even when I had to do so in order to make the grade. Instead, I'd viciously and cruelly make the jokes too esoteric to be detected (or at least proven guilty of cheek) by the faculty. Having grown up consuming a steady diet of MAD magazine, it's a miracle that I've managed to become such an accomplished bore.

I suppose it's a testament to how thoroughly corporate is capable of domesticating a programming community known for eccentricity. This should shock nobody, as it's the smartest dogs that are the easiest to train. That said, there are still plenty of us off the leash having a grand old time.

All my writing and on-video time has made me quite a good deal better at requiring less editing required to render the steaming heaps of drivel that you see before you. Unfortunately, I sound almost as bad as the corporate borg twerps I've been pillorying over the last year or so it's taken to de-brainwash myself away from that set of cults. It's finally time to come into my own voice, which is to say steal someone else's.

In that vein, I've generally seen a few patterns among successful tech content creators. For those interested in Quality and Testing, you generally need to embrace your mean streak. It's got that synthesis of wrath at the excrecable piles of junk we work on all day and the thrill of the hunt when you finally figure out a way to wreck that POS that feels...magnificent! This also bleeds over into programming, as the experience is always one of smashing your head into a brick wall until pure, beautiful victory is achieved just in time for the acceptance criteria to change. Really some of the best fun you can have with your pants on, I definitely recommend you try it. None of the content creators of this stripe are ever sarcastic.

Then we have my favorite kind of technical creator -- we're talking Scotty Kilmer types. Just talk about whatever the hell you feel like today, because it's always gonna be the same old bullshit...but make sure your marketing is as spectacular as humanly possible. Whether it has anything to do whatsoever with the actual content is irrelevant. Don't care, got laid. It's the hacker ethos to a T... and by T, I mean Tren! Hell, it's probably a fun game to see how misleading you can have your sizzle be and still get hits. Excuse me while I eat this whole fried catfish.

For those of you who skipped to the conclusion (like me), let me waste a bit more of your time as a special bonus gift. We've got some exciting things coming for you all in 2022! Whether or not they're actually exciting I sure am excited about them. So excited I've gotta be vague about it. Definitely not because I haven't thought of anything yet.

That reminds me, I still need to go get presents. Merry Christmas everyone!


Why am I still using a search engine in $CURRENT_YEAR? πŸ”—
1638625845  

🏷️ www 🏷️ dns

There has been much controversy in recent times over censorship of search engines and social media. According to those engaging in this, it's done with good intentions. Whether this is true or not is missing the point. Why are we relying on a centralized search engine at all that can censor, when we've had decentralized search for a half-century?

DNS can be seen as little more than an indexing service. There is no fundamental technical reason why the exact same approach can't be taken for resources at particular domains. Every site could publish their sitemaps and tags quite easily, and many do right now. They simply upload them to search engines rather than having them be well-known records considered by peers.

A DNS model would in fact simplify search indexing a good deal, as you can drop the crawling code entirely and simply wait until at least one person accesses a resource to index it. This would put the burden of crawling/advertising their available pages on site operators themselves, pushing costs down the stack, as is appropriate in a decentralized system.

Much of the reason DNS is tolerated as a decentralized system rather than centralized is that it takes so little resources relative to the rest of the web stack. People love the idea of federation, but hate paying for it. The primary question is whether incentives align for the current parties running DNS to also index and cache content hierarchies.

The answer is obviously no, or they would be doing this right now. This is once again due to the primary browser vendor (google) having no interest in supporting such a thing, as it would actively undercut their business model. If a browser did support such a system, many content creators and web hosters would absolutely love to adopt a system with clear rules under their control rather than the morass of inconsistency that is the centralized engine's rulesets. Similarly, the ISPs and Web Hosts would happily hop on board to the idea of offering yet another service they can charge for.

Therefore the question is can the existing business model of advertising that subtly corrupts search results translate to a decentralized system? Of course it can. The trouble is that it'd be the ISPs and web hosts in the position to extract this profit. This is in fact the ray of hope in this situation, as both Google and it's competitors in the (virtualized) hosting biz could get a large piece of this pie.

So, if you wanted to know what a future with this would look like it'd be that Microsoft or Amazon forks Chrome. This has already happened in Microsoft Edge. From here it's but a matter of modifying and open-sourcing their existing indexer, and making their fork support it's use. Introducing a system of decentralized search would both hurt their competitor, and be another reason to use Azure versus GCP and Amazon. They'd likely adapt Bing to leverage this as well, to extend the benefit to all browsers.

That said, Amazon or any large host could execute on this. Much of the tech that Cloudflare uses to cache content could likely be re-purposed towards these ends as well. There's a lot of money to be made in disrupting the status quo. Whether this translates into concrete action is anyone's guess.


caldav is a solution looking for a problem πŸ”—
1637210035  

🏷️ calendaring 🏷️ DAV

Many hours have been wasted on calendaring servers, and they still don't solve the problems people who use calendars want solved. This is because the problem is approached from the wrong direction. People think from the client to the server, as it's clients originating ics files which are then schlepped around via email. Servers allowed people to do things like free-busy for attendees and conference rooms, but required email clients to support things like itip. I'll give you one guess how that went.

This model instantaneously breaks down when you go cross-organizational. The widespread incompatibility between mailservers and no standardized way to federate directory discoverability makes this impossible. As such, the meta collapses back to schlepping around ics files. It should shock nobody that embracing this fact and giving up on free/busy and presence has been the solution that dominates. Microsoft has implemented on this approach better than anyone decades ago.

Actually solving the presence problem requires that you get federation right. Guess who's doing that? The chat apps. Both Slack and Teams have this figured out. Doing this as a plugin to matrix or snikket would actually be quite straightforward. As such my recommendation is that shared hosting software stop distributing calendaring software. They should instead distribute chat servers and good chatbots that can do things like meeting reminders.

You could even federate across incompatible chat apps and protocols via bots that know how to talk to each other. I know it would work, because it worked before computers. That's how secretaries coordinated all of it -- picking up a phone. Implementing a UI for people to use would be as simple as wrapping your secretary bot that knows how to book people and rooms.


Doing Hiring Right πŸ”—
1636831995  

🏷️ corporate

Most people make Interviewing programming candidates way too much work. I've setup (and taken) more HackerRank style puzzles than a person ought to have in a lifetime. One of the problems with this tool is that it's basically handing dynamite to children. Programmers are in general an over-specific lot that delight in secrets. This results in unnecessary grief and makes you miss out on otherwise good candidates.

The purpose of Job Descriptions, Programming Tests and Phone Screens are all the same. The trouble is that most people neither understand or acknowledge this on either side of the table. They're all just spam filters. The simplest path to success for candidates is to simply spam cold approaches and leverage social proof for everything it's worth.

Rather than allow negative feelings to build up and covert contracts to form, you should simply be up-front about the reality. An "Early Frame Announcement" of why you are doing the process the way you do, and what you expect out of it helps a lot. Managing Expectations will never stop being important in your relationships with those you work with, so you need to do this from the very beginning.

Sometimes you can be too explicit with your expectations. As with anything else measured, people bend over backwards to conform to them. This can be good, when what you measure actually matters. Unfortunately, very few things which get measured in this way actually do.

Employers allow themselves to get bullshitted into hiring people very good at telling them exactly what they want to hear. They then get blindsided when they inevitably find out the candidate, like everyone else, is in fact far from perfect. That said, people do manage to "fake it till they make it" more often than not so this isn't something to get too worried about. As long as they keep doing things you want, who cares if they were less than fully truthful during the interview process? You as the interviewer can't and won't be fully disclosing the facts of the situation on the ground either. What you actually want is a system that exposes the faults and strengths of candidates as quickly as possible.

First, you need to understand that nobody believes the posted requirements in public openings (nor should they). Accept that you will just get spammed. Then you need to tightly focus on your "Must Haves" rather than a laundry list of wants. If there are more than 4 technical skills you need for a given position, the solution you want someone to implement is probably ill-designed. You can't prove an optimal solution exists for anything with more than 3 variables and can't guarantee a closed form solution exists for systems with 5 variables, after all.

If you still have too many candidates to pick from (you will), you should then treat your "want" keyword list much like you would a digital ad campaign. Try and find people with at least a pair of them and that usually winnows you down to a list worth talking to. Don't get too excited here -- you won't be shocked to find maybe 10% of even these can get past a phone screen.

The phone screen's only purpose is to determine whether or not you are being catfished, and telling people this up-front tends to result in better outcomes. Most of the technical questions here should be softballs which can be googled as they talk to you. All you want to see is that they know and care enough at this point to actually get you the right answer. This is all to lull them into a false sense of security until you slip them the hilariously obvious false question. If they don't call you out on making an obviously false statement and try to bullshit you, just hang up.

Lots of people do tests and programming puzzles in lieu of the phone screen now. This is actually a bad idea. Online tests should only be employed in lieu of phone screens when you have too many candidates. Even then, they should be done similar to what the phone screen would have been.

I personally prefer to save tests as prelim for the in person interview. I like making the in-person basically a code review of the test, as this gets you into mind-meld with the candidate quite quickly. This also more closely mimics how they will actually be working, which is what you really want to know. Making this clear to candidates up-front tends to get the best results (which is what you actually want from candidates).

Nevertheless the online code test should consist of one straightforward question, another less so. The ultimate goal is that they should take more time than allotted for most people. This can be established by administering the test to a sample of your existing staff. Be up-front that this is the point, lest they get upset and corrupt the results with cheating and other such chicanery. You should end up seeing an 80% solution from the candidates at the very least.

From here the question remains what to do with the candidates you are on the fence about. Sometimes people just get a case of nerves, or aren't quite experienced enough yet but still can do the work. It's here that you need to understand that all deals can happen at the right price. Making it clear that you're willing to take a risk on a candidate during a probationary period with introductory rates can work out quite well. It's even better when you have a candidate offer this themselves, go-getters like that almost always turn out well.

At this point you should have a candidate and a price. Now you need to take them to lunch and fish for their victim puke. The last thing you need is a whipped dog ready to snap. This is unfortunately common thanks to widespread pathological behavior running rampant in corporate and public schools.

From there the problem is making sure they fit on a team. This means you have to figure out how to make the others invest in the candidate's success and vice versa. Too often things quickly turn adversarial, resulting in bad outcomes that were totally avoidable. That is a pretty in-depth topic which is worthy of it's own post.


Why am I still passing around phone numbers in CURRENT_YEAR? πŸ”—
1636669564  

🏷️ SIP 🏷️ POTS

SIP and trunking to the POTS have been around for more than a decade now. The million dollar question nobody seems to be able to answer is why all our mobiles still use a number instead of addressing via DNS (e.g. user@domain). The carriers wouldn't be cut out of the loop by this, as they can remain a dumb, albeit wireless, pipe. Indeed, some carriers actually implement their systems as glorified SIP trunks on the backend.

Address book software is also plenty capable of tracking email-style addresses. Android has natively supported SIP calling such addresses for more than a decade. This means that for everyone I know using android it's as simple as setting up openPBX on my server. I could have full control and encrypted video calling tomorrow. The trouble is that everyone's favorite status symbol, the iPhone doesn't support this. Which means I couldn't communicate with half my family and many of my clients, as they're not about to install squat to please me.

It appears the reason for this lack of support is the usual modus operandi from Apple. That is to say, they have their own proprietary standard they'd prefer everyone use instead (but that is shoddy by comparison to standard software). Further complicating matters is that new and popular video conferencing firms like Zoom have also introduced yet more shoddy and incompatible software.

While Zoom can bridge to SIP clients, it costs extra, and they already trunk to the POTS at no cost, further entrenching the phone number. Skype has had a similar model for many years. FaceTime users can provide links to allow non-apple clients to call them, but not the other way around. That said, the fact that there is now an HTTP means of doing FaceTime means reverse-engineering the protocol and building a SIP bridge is but a matter of time. When PBXes are capable of appearing to be apple devices with FaceTime things will finally be "good enough" to ditch the number.

Much of the reason for the success of these non-open packages is because the cost structure is largely hidden from users. The FaceTime ecosystem is "free" past the initial phone purchase, and only the host of zoom calls generally pays for the service. By comparison, users of open software and standards bear recurring costs (and they're already paying a phone bill). Like with the telcos themselves, very few people are willing to pay for a SIP account if it's not bundled with hosting, mail, DAV and everything else.

Competing the telcos down from being vertically integrated multi-service providers to mere ISPs is the real mountain to climb here. The first major shared host to execute on this will be able to tap billions in additional MRR. When and if that day comes, I'd likely ditch the cellphone entirely in favor of superior clients on real computers.


Why am I still sending unencrypted email in CURRENT_YEAR? πŸ”—
1636484555  

🏷️ email

Regardless of whether you use OpenPGP or S/MIME certificates, the core problem of distribution of public keys was never really solved to anyone's satisfaction. S/MIME essentially never even addressed the problem beyond assuming you'd link them somewhere on a website and that people would go out of their way to communicate securely. I'll give you one guess how that turned out.

OpenPGP by contrast built a key server called SKS. The trouble is that it was flaming garbage and abandonware to boot. Thankfully hagrid fixed that problem. The trouble is that this model relies on users to upload their keys to the server, rather than things "just happening" automatically as in the case of things like LetsEncrypt on shared hosting. Again, I'll give you one guess how that turned out.

So the latest solution is a thing called WKD. It's a practical solution to essentially adopt the model used by LetsEncrypt to do DCV. Shared Hosts now have no real reason not to auto-generate OpenPGP keys for every email, as the impact of compromises are quite limited. A short renewal timeframe should be applied for the same reasons they are with LetsEncrypt's certificates.

The primary drawback is the same one as with CAs, which is to say they have the private key used to generate things. In short, it's a problem of trust. That said, we seem to put up with this issue in the web at large, and encryption by default would be better than the status quo of sending everything over the wire unencrypted. It would be straightforward for hosting management software to support users uploading their own keys to satisfy those with cause for concern, unlike with SSL certs.

The only remaining hurdle is that clients by and large do not consult WKD whatsoever. Some things like enigmail do support it, but anything short of this being the default setting on the most popular option won't matter. Like with authentication code the primary issue remains that the biggest vendors (exchange and gmail) would have to lead, follow or get out of the way. Frustratingly the default stance there remains to simply obstruct. This is baffling, as there are only upsides to them embracing this. Holding private keys, the management of firms can still be snoops if they feel like it (despite this actually being a bad idea). Their competitors would not and as such no longer have the option of MITMing their email to conduct corporate espionage.

At this point it simply appears to be a matter of inertia. Which makes sense, as email is not exactly the big moneymaker these days. Hosted chat, DevOps and ERP software is where all the energy is now.

Nevertheless, this is actually a place that shared hosting can take a leadership role to improve the world like they did with LetsEncrypt. Integrating automatic key generation and sharing into webmail via a plugin is possible today. That coupled with a marketing blitz might just be enough to finally fix this problem.

Postscript

There's a bigger problem here than just key distribution. Namely, how to filter spam in an encrypted world. This would require a much more browser-like world where it's the server doing the encrypting and de-crypting, so that it can read and filter before delivery. While not ideal, it's still better than the status quo where MITMing your mail is trivial. You have to trust your server operator not to compromise your mail, but let's be real here. They can straight up modify items in your mbox right now without your knowledge, so this is not a serious concern beyond "just host your own mailserver".

You can actually do this right now by rejecting anything that is relayed without SSL. But this is less than ideal because:

  1. Servers can lie and not set the header for this
  2. Many legit emails will start disappearing because "oops I went thru an unencrypted relay", as there are tons of such legitimate arrangements still extant.

To be fair, you can still PGP encrypt behind that if you wanna be super paranoid. As such I think we need both WKD and strong server-side encryption of mailservers.


Why do I still have to write auth code in CURRENT_YEAR? πŸ”—
1636477820  

🏷️ authn 🏷️ authz

Authentication and Authorization of users is one of those things I don't like storing data for. In short it's bad for the same reason storing credit card numbers (or any PII) is. Not only is it not my app's core job, it's a disaster waiting to happen in a breach. User provisioning is also one of those things I hate having to do when everyone on the web already has an email identity.

So the question remains: why is it that your only real options for OIDC just happen to be the big boys? One of the major goals for my CMS is to disintermediate from the big aggregators, so this is a problem I'd love to have solved. Why can't my shared hosting account be my OIDC provider? Furthermore, why do I need 9 different logins for services on the same server? It turns out there are some technical and organizational hurdles that have to be overcome.

Fixing this for the web

The primary difficulty is that there is no standard for how you ought to advertise the available JSON Web Tokens to pages. It's standard practice to use Double Submit cookies and synchronizers in DOM components to prevent XSS/CSRF, rather than stuffing JWTs into LocalStorage. This is why every OIDC process does the two-step over to another domain, as the cookies are SameSite and can't be used anywhere else. The practical consequence of the status quo is that every single oauth implementation is a hardcode rather than iterating over your active and available logins.

I suspect the big boys are content with this status quo and would resist any solution that didn't allow them to stay at the top of the list of available providers. Existing objections have to do with the user privacy (should I advertise my active CrazyFetish.yikes login?) provided by the status quo. To keep such control in the hands of the user, any such feature should be opt-in per service. To make that happen, browser integrations would have to happen.

Implementing such a feature should be straightforward. JWT issuers would use a standard name for their token so that the browser could know one exists for said domain. The browser would then have an array as part of the window object in javascript advertising which domains have active JWTs the user consented to share.

Therein lies the rub. Incentives don't align here, as the big boys are also the browser vendors. Such a regime that preserves privacy would necessarily not preserve their ability to get free advertising on every login screen worldwide.

This could be made less objectionable by throwing some bones to both the tech firms and developers. Both would like to display a branding image & copy for these auth providers alongside the login prompt. Thankfully, this could easily be stuffed into LocalStorage with a standard name without risk.

It would also not be the first time that browsers have had a pre-baked whitelist of services for self-serving reasons. To be entirely frank, I'd prefer that they do such a thing in this case rather than force every app developer on earth to write extra code for no good reason. I'd rather be happy than right and especially in this case. I have a feeling you would too.

Fixing this for non-http services

The first problem is that they went about discoverability wrong. They use the .well_known mechanism in vhosts. The more standard way would be something like a TXT or SRV record on an appropriate subdomain. LetsEncrypt found out the hard way that vhosts are a lot easier to compromise than A records. It also doesn't hurt that DNS is basically the LCD of web features. Almost everything has to know how to talk DNS, not everything talks HTTP.

From there it's essentially a matter of integrating SASL into applications, as it already supports Oauth tokens. Given most of the backbone services out there aren't hostages of large tech firms like browsers, the server end shouldn't be hard to gain momentum. The trouble will be with OS vendors, as they'll have the same problem as browser vendors. This pattern repetition suggests that the real problem is that we actually treat identity and authentication as a tack-on component rather than as a first class service.

This suggests that gaining the requisite momentum to solve this problem can't really be achieved externally. The OS and Browser vendors are going to have to want to solve this problem. Until then, developers will have to write auth code they don't have to, and put up with services written by people who don't understand security concerns. Which is the biggest selling point to me -- The OS and browser vendors could do humanity a favor and extinguish an entire category of bugs forever with this.


The spam and scraping problem is about to get even worse πŸ”—
1636133435  


A recent post has made it clear where the future of industrial scale spamming and scraping is increasingly headed. In short, exploitation of the nature of cellular phone networks is about to unleash the kraken in ways that existing countermeasures will all prove useless against.

Mobile phone networks are all essentially large proxies for fleets of NATed devices. As such, getting a new NAT IP is as simple as turning on and off the radio, and if you are moving around, likely a new proxy IP as well. That said you don't even need to move around, as blocking any IP from mobile phone providers is basically a nonstarter. Nobody wants to block the large number of legit clients coming from their IPs, after all.

One might think that eventually the telcos will wise up and start banning. The trouble is that the mobile nature of these systems and wide assortment of carriers to spread requests over make it entirely viable to be both mobile and decentralized enough to have no effective means of enforcement. To a large extent, I suspect we are already in the early stages of this problem given there are firms making hardware facilitating this.

Is the only way to resolve this problem would be to forgo NAT altogether and embrace IPv6? Not really, as IPs are so plentiful there as to make bans useless once more. Even AI and Bayesian filtering has proven to be insufficient at improving fingerprinting enough to matter. The only practical option left is QoS measures to ensure you at least don't get DoSed by this stuff. That, and actually improving the performance of your mailserver and web properties such that they can bear the load.


What's behind institutional resistance to remote work? πŸ”—
1636128408  

🏷️ corporate

Ever since lockdown policies began 2 years ago, most of the white-collar workforce started working from home full-time. About six months ago management began to get anxious to get the workers back into the office. Looking into the data, this frankly appears irrational. The reduction in productivity is so small as to be easily outweighed by office costs.

Similarly, firms and employees remain irrationally attached to W4 employment. In this environment the home office environment very much lends itself to a favorable tax & regulatory situation via 1099 resulting in higher take-home pay for the worker and less administrative expense for the employer. Why is this organized irrationality the case?

I think the most persuasive case against the return to the office is laid out here by a blue-collar worker. This is also the case observed by white-collar WFH offices by and large. All the mandatory policy meetings my friends used to have to put up with mysteriously stopped over the last two years. Similarly, tedious manager interactions have slowed to a trickle, begging the question as to why any of these people are being paid.

Anybody paying attention to the ratio of administrative staff to workers has noticed troubling trends over the last 50 years. These numbers should have gone down in this age of automation rather than increased as it has. Nobody can make a serious case that we need more administrators per worker than we did in say, the 1970s. Many of the compliance & regulatory fears that have motivated this rise evaporate without an office. This cannot help but produce some degree of existential fear in middle management. As such their advocacy for a return to a useless and costly office makes sense.

Similarly, the reality of employees maintaining their own office space will have to be recognized at some point. For many years, courts have converted 1099 workers into W4 employees due to the application of the duck rule to working conditions. Such legal issues have ruined many an in-sourcing firm. The application of various mandates, regulations and tax policies on employees bearing the costs of maintaining their own office will eventually bring this to a head. Most remote workers both look and act like independent contractors, and would benefit from this being official.


The impact of open source licensing on your business πŸ”—
1636067872  

🏷️ oss 🏷️ licensing

Recently, OVID had some remarks about using GPL3 code in your projects. The most relevant bit is this:

Do you have any code that cannot be open sourced but uses code with a "permissive" license that in turn uses code with a GPL license?

Congratulations! You now have a court case on your hands if anyone finds out.
The backstory here is that one of his friends has some issues he can't fix without hiring a lawyer.

Normally, this is not too problematic (even when the upstream is hostile) when you use packaged software and libraries, as distributing patches is fairly straightforward. However sometimes there are extenuating circumstances (such as clouded title) as happens thanks to contentious forks. The other circumstance is the viral nature of some licenses such as GPLv3 and Affero. There are even more extremely ideological licenses out there, but few which are of any practical consequence.

Both the normal and Affero GPL have the practical consequence of you needing to either license proprietary data or sell services rather than sell software. That is, unless you adopt a gratuity model which has proven less than viable in the overwhelming majority of circumstances. Even the idea of selling services is difficult to secure against competition, as the recent war between elasticsearch and Amazon proves. It's quite a bitter pill to swallow to be undercut by a competitor using the fruits of your own ongoing labors.

It's not an easy choice to make. Choosing to forgo software with viral licenses means more time-to-market, which is not always available. Similarly, your business model may be to help people with their data, not sell access to yours. Ultimately the only thing you can really rely upon in the long term are your own individual wits and physical capital, licenses and laws notwithstanding. This is why most tech firms (if they survive long enough) end up becoming glorified consulting firms like IBM.


The war on scraping is lost for the same reason as the war on piracy πŸ”—
1635962678  

🏷️ piracy 🏷️ scraping

A great deal of effort is expended upon anti-scraping measures in webpages. There are a number of reasons for this:

  • prohibitive bandwidth costs involved in allowing bulk downloads
  • wanting tight control over how users view the data in order to influence their conclusions
  • competitors stealing content and reproducing their service in a jurisdiction in which they have no legal recourse.
The first concern is best addressed by rate-limiting mechanisms or metering fees. The second concern has become quite a heated topic for social media of late. This is not a problem for most services, as it's not good business to second guess paying customers. On the other hand, if the business is advertising (as it is with social media) influence is precisely the point.

For most businesses the concern will be the last one. I've learned over my career programming that data oriented design not only results in faster code, but less code. It's entirely possible to build a successful business with entirely open source code but proprietary data using this model. That said, it makes one uniquely vulnerable to such data theft.

Can we fix it?

Enter anti-scraping technology. For a good overview of the current landscape, see here. You may have noticed the core problem is "fingerprinting", which is essentially the same one to solve with software licensing. This is because it's the same exact problem as software piracy, programs are just data that transform other data.

Those of you familiar with implementing software licensing schemes like I have are well aware that basically everything but phone-homes coupled with fingerprinting are not worth pursuing. Even then, there is no real way to prevent people from nopping your checks out. Generally you see mechanisms to ensure that a crack for one version does not work on the next. This has resulted in a status quo of customers submitting to this stick with the carrot of forward-going code updates.

Which is to say a stalemate in the immediate term, but total surrender in the long term. The only reliable way to prevent this is to never allow clients to interpret your code. Even then side-channel attacks are possible to reverse-engineer it.

This model breaks down for targeted and simple programs, as after some point there's nothing to update. I suspect this is much of the reason that observation of Zawinski's law is so prevalent in the software industry. There is however no such concern with data, as you can always add more. The video game industry in particular has embraced this with zeal. Expansion content not only drives much of sales, it also works quite well to keep their content artisans fully employed when they might otherwise have downtime.

You may have noticed that the ultimate remedy available to software is not exactly feasible for data. Data cannot be fully obscured from the client in nearly all use cases. Anti-scraping measures (as you can see from the overview) have also failed almost comprehensively. This has had far-reaching effects on a number of industries.

Tech Blogging has been totally smothered by plagiarists who know how to do SEO. The only real reason to do one nowadays is as a big "hire me" billboard. My father was an inventor with a number of patents, and he discovered (the hard way) that they were also useless besides as an inducement for employment. Almost every social media platform which started out with good APIs have now comprehensively crippled or dropped them altogether and an industry of scraping based tools have popped up to satisfy this need. Plaid became a multibillion dollar company by doing scraping of bank websites using bank clients own logins.

Like with software licensing it begs the question of why any of this effort is expended at all, given it's ultimately Canute screaming at the tides. This comes down to legal reasons. The courts generally say that you "had it coming" if you left a gold bar in the middle of the street and it gets stolen. So it is with software and data. If you don't at least make a token effort at anti-circumvention you have no recourse. Of course, this is not consistently applied to all firms and jurisdictions but such is the law. If we wanted consistent outcomes, we'd replace black robes and powdered wigs with programs. Even then this has no bearing internationally, as most firms ability to have recourse there is nil.

Time to get creative

The good news is that it turns out that any effort beyond token prevention in fact hinders your ability to stop piracy. Pirates are inherently lazy, and you can exploit this to get a handle on the problem. For example, I once worked with an IP based licensing scheme that also gathered OS fingerprints passively, but did not do enforcement based on the latter. This allowed some people to feel they were quite clever running a number of instances behind NAT. Periodically they get rounded up (random reinforcement works best for operant conditioning) and told they're gonna get a lifetime ban unless they buy the right number of licenses and sign an NDA about the incident. This was but one example of many where over the years where laying traps for pirates paid off quite handsomely.

Just like with my previous article about the victory of spam, the proper mindset is not to fight but "make the trend your friend". The motivations for piracy and spamming are both deeply ingrained in human nature. The most powerful people and organizations in the world have fought that war against our baser natures for millennia and are still no closer to victory then when they set out. This time will not be different.


25 most recent posts older than 1635962678
Size:
Jump to:
POTZREBIE
© 2020-2023 Troglodyne LLC