troglodyne.net : blog

Why do I still have to write auth code in CURRENT_YEAR? 🔗

1636477820

Authentication and Authorization of users is one of those things I don't like storing data for. In short it's bad for the same reason storing credit card numbers (or any PII) is. Not only is it not my app's core job, it's a disaster waiting to happen in a breach. User provisioning is also one of those things I hate having to do when everyone on the web already has an email identity.

So the question remains: why is it that your only real options for OIDC just happen to be the big boys? One of the major goals for my CMS is to disintermediate from the big aggregators, so this is a problem I'd love to have solved. Why can't my shared hosting account be my OIDC provider? Furthermore, why do I need 9 different logins for services on the same server? It turns out there are some technical and organizational hurdles that have to be overcome.

Fixing this for the web

The primary difficulty is that there is no standard for how you ought to advertise the available JSON Web Tokens to pages. It's standard practice to use Double Submit cookies and synchronizers in DOM components to prevent XSS/CSRF, rather than stuffing JWTs into LocalStorage. This is why every OIDC process does the two-step over to another domain, as the cookies are SameSite and can't be used anywhere else. The practical consequence of the status quo is that every single oauth implementation is a hardcode rather than iterating over your active and available logins.

I suspect the big boys are content with this status quo and would resist any solution that didn't allow them to stay at the top of the list of available providers. Existing objections have to do with the user privacy (should I advertise my active CrazyFetish.yikes login?) provided by the status quo. To keep such control in the hands of the user, any such feature should be opt-in per service. To make that happen, browser integrations would have to happen.

Implementing such a feature should be straightforward. JWT issuers would use a standard name for their token so that the browser could know one exists for said domain. The browser would then have an array as part of the window object in javascript advertising which domains have active JWTs the user consented to share.

Therein lies the rub. Incentives don't align here, as the big boys are also the browser vendors. Such a regime that preserves privacy would necessarily not preserve their ability to get free advertising on every login screen worldwide.

This could be made less objectionable by throwing some bones to both the tech firms and developers. Both would like to display a branding image & copy for these auth providers alongside the login prompt. Thankfully, this could easily be stuffed into LocalStorage with a standard name without risk.

It would also not be the first time that browsers have had a pre-baked whitelist of services for self-serving reasons. To be entirely frank, I'd prefer that they do such a thing in this case rather than force every app developer on earth to write extra code for no good reason. I'd rather be happy than right and especially in this case. I have a feeling you would too.

Fixing this for non-http services

The first problem is that they went about discoverability wrong. They use the .well_known mechanism in vhosts. The more standard way would be something like a TXT or SRV record on an appropriate subdomain. LetsEncrypt found out the hard way that vhosts are a lot easier to compromise than A records. It also doesn't hurt that DNS is basically the LCD of web features. Almost everything has to know how to talk DNS, not everything talks HTTP.

From there it's essentially a matter of integrating SASL into applications, as it already supports Oauth tokens. Given most of the backbone services out there aren't hostages of large tech firms like browsers, the server end shouldn't be hard to gain momentum. The trouble will be with OS vendors, as they'll have the same problem as browser vendors. This pattern repetition suggests that the real problem is that we actually treat identity and authentication as a tack-on component rather than as a first class service.

This suggests that gaining the requisite momentum to solve this problem can't really be achieved externally. The OS and Browser vendors are going to have to want to solve this problem. Until then, developers will have to write auth code they don't have to, and put up with services written by people who don't understand security concerns. Which is the biggest selling point to me -- The OS and browser vendors could do humanity a favor and extinguish an entire category of bugs forever with this.

The spam and scraping problem is about to get even worse 🔗

1636133435

A recent post has made it clear where the future of industrial scale spamming and scraping is increasingly headed. In short, exploitation of the nature of cellular phone networks is about to unleash the kraken in ways that existing countermeasures will all prove useless against.

Mobile phone networks are all essentially large proxies for fleets of NATed devices. As such, getting a new NAT IP is as simple as turning on and off the radio, and if you are moving around, likely a new proxy IP as well. That said you don't even need to move around, as blocking any IP from mobile phone providers is basically a nonstarter. Nobody wants to block the large number of legit clients coming from their IPs, after all.

One might think that eventually the telcos will wise up and start banning. The trouble is that the mobile nature of these systems and wide assortment of carriers to spread requests over make it entirely viable to be both mobile and decentralized enough to have no effective means of enforcement. To a large extent, I suspect we are already in the early stages of this problem given there are firms making hardware facilitating this.

Is the only way to resolve this problem would be to forgo NAT altogether and embrace IPv6? Not really, as IPs are so plentiful there as to make bans useless once more. Even AI and Bayesian filtering has proven to be insufficient at improving fingerprinting enough to matter. The only practical option left is QoS measures to ensure you at least don't get DoSed by this stuff. That, and actually improving the performance of your mailserver and web properties such that they can bear the load.

What's behind institutional resistance to remote work? 🔗

1636128408

🏷️ corporate

Ever since lockdown policies began 2 years ago, most of the white-collar workforce started working from home full-time. About six months ago management began to get anxious to get the workers back into the office. Looking into the data, this frankly appears irrational. The reduction in productivity is so small as to be easily outweighed by office costs.

Similarly, firms and employees remain irrationally attached to W4 employment. In this environment the home office environment very much lends itself to a favorable tax & regulatory situation via 1099 resulting in higher take-home pay for the worker and less administrative expense for the employer. Why is this organized irrationality the case?

I think the most persuasive case against the return to the office is laid out here by a blue-collar worker. This is also the case observed by white-collar WFH offices by and large. All the mandatory policy meetings my friends used to have to put up with mysteriously stopped over the last two years. Similarly, tedious manager interactions have slowed to a trickle, begging the question as to why any of these people are being paid.

Anybody paying attention to the ratio of administrative staff to workers has noticed troubling trends over the last 50 years. These numbers should have gone down in this age of automation rather than increased as it has. Nobody can make a serious case that we need more administrators per worker than we did in say, the 1970s. Many of the compliance & regulatory fears that have motivated this rise evaporate without an office. This cannot help but produce some degree of existential fear in middle management. As such their advocacy for a return to a useless and costly office makes sense.

Similarly, the reality of employees maintaining their own office space will have to be recognized at some point. For many years, courts have converted 1099 workers into W4 employees due to the application of the duck rule to working conditions. Such legal issues have ruined many an in-sourcing firm. The application of various mandates, regulations and tax policies on employees bearing the costs of maintaining their own office will eventually bring this to a head. Most remote workers both look and act like independent contractors, and would benefit from this being official.

The impact of open source licensing on your business 🔗

1636067872

🏷️ oss 🏷️ licensing

Recently, OVID had some remarks about using GPL3 code in your projects. The most relevant bit is this:

Do you have any code that cannot be open sourced but uses code with a "permissive" license that in turn uses code with a GPL license?

Congratulations! You now have a court case on your hands if anyone finds out.

The backstory here is that one of his friends has some issues he can't fix without hiring a lawyer.

Normally, this is not too problematic (even when the upstream is hostile) when you use packaged software and libraries, as distributing patches is fairly straightforward. However sometimes there are extenuating circumstances (such as clouded title) as happens thanks to contentious forks. The other circumstance is the viral nature of some licenses such as GPLv3 and Affero. There are even more extremely ideological licenses out there, but few which are of any practical consequence.

Both the normal and Affero GPL have the practical consequence of you needing to either license proprietary data or sell services rather than sell software. That is, unless you adopt a gratuity model which has proven less than viable in the overwhelming majority of circumstances. Even the idea of selling services is difficult to secure against competition, as the recent war between elasticsearch and Amazon proves. It's quite a bitter pill to swallow to be undercut by a competitor using the fruits of your own ongoing labors.

It's not an easy choice to make. Choosing to forgo software with viral licenses means more time-to-market, which is not always available. Similarly, your business model may be to help people with their data, not sell access to yours. Ultimately the only thing you can really rely upon in the long term are your own individual wits and physical capital, licenses and laws notwithstanding. This is why most tech firms (if they survive long enough) end up becoming glorified consulting firms like IBM.

The war on scraping is lost for the same reason as the war on piracy 🔗

1635962678

🏷️ piracy 🏷️ scraping

A great deal of effort is expended upon anti-scraping measures in webpages. There are a number of reasons for this:

prohibitive bandwidth costs involved in allowing bulk downloads
wanting tight control over how users view the data in order to influence their conclusions
competitors stealing content and reproducing their service in a jurisdiction in which they have no legal recourse.

The first concern is best addressed by rate-limiting mechanisms or metering fees. The second concern has become quite a heated topic for social media of late. This is not a problem for most services, as it's not good business to second guess paying customers. On the other hand, if the business is advertising (as it is with social media) influence is precisely the point.

For most businesses the concern will be the last one. I've learned over my career programming that data oriented design not only results in faster code, but less code. It's entirely possible to build a successful business with entirely open source code but proprietary data using this model. That said, it makes one uniquely vulnerable to such data theft.

Can we fix it?

Enter anti-scraping technology. For a good overview of the current landscape, see here. You may have noticed the core problem is "fingerprinting", which is essentially the same one to solve with software licensing. This is because it's the same exact problem as software piracy, programs are just data that transform other data.

Those of you familiar with implementing software licensing schemes like I have are well aware that basically everything but phone-homes coupled with fingerprinting are not worth pursuing. Even then, there is no real way to prevent people from nopping your checks out. Generally you see mechanisms to ensure that a crack for one version does not work on the next. This has resulted in a status quo of customers submitting to this stick with the carrot of forward-going code updates.

Which is to say a stalemate in the immediate term, but total surrender in the long term. The only reliable way to prevent this is to never allow clients to interpret your code. Even then side-channel attacks are possible to reverse-engineer it.

This model breaks down for targeted and simple programs, as after some point there's nothing to update. I suspect this is much of the reason that observation of Zawinski's law is so prevalent in the software industry. There is however no such concern with data, as you can always add more. The video game industry in particular has embraced this with zeal. Expansion content not only drives much of sales, it also works quite well to keep their content artisans fully employed when they might otherwise have downtime.

You may have noticed that the ultimate remedy available to software is not exactly feasible for data. Data cannot be fully obscured from the client in nearly all use cases. Anti-scraping measures (as you can see from the overview) have also failed almost comprehensively. This has had far-reaching effects on a number of industries.

Tech Blogging has been totally smothered by plagiarists who know how to do SEO. The only real reason to do one nowadays is as a big "hire me" billboard. My father was an inventor with a number of patents, and he discovered (the hard way) that they were also useless besides as an inducement for employment. Almost every social media platform which started out with good APIs have now comprehensively crippled or dropped them altogether and an industry of scraping based tools have popped up to satisfy this need. Plaid became a multibillion dollar company by doing scraping of bank websites using bank clients own logins.

Like with software licensing it begs the question of why any of this effort is expended at all, given it's ultimately Canute screaming at the tides. This comes down to legal reasons. The courts generally say that you "had it coming" if you left a gold bar in the middle of the street and it gets stolen. So it is with software and data. If you don't at least make a token effort at anti-circumvention you have no recourse. Of course, this is not consistently applied to all firms and jurisdictions but such is the law. If we wanted consistent outcomes, we'd replace black robes and powdered wigs with programs. Even then this has no bearing internationally, as most firms ability to have recourse there is nil.

Time to get creative

The good news is that it turns out that any effort beyond token prevention in fact hinders your ability to stop piracy. Pirates are inherently lazy, and you can exploit this to get a handle on the problem. For example, I once worked with an IP based licensing scheme that also gathered OS fingerprints passively, but did not do enforcement based on the latter. This allowed some people to feel they were quite clever running a number of instances behind NAT. Periodically they get rounded up (random reinforcement works best for operant conditioning) and told they're gonna get a lifetime ban unless they buy the right number of licenses and sign an NDA about the incident. This was but one example of many where over the years where laying traps for pirates paid off quite handsomely.

Just like with my previous article about the victory of spam, the proper mindset is not to fight but "make the trend your friend". The motivations for piracy and spamming are both deeply ingrained in human nature. The most powerful people and organizations in the world have fought that war against our baser natures for millennia and are still no closer to victory then when they set out. This time will not be different.

Welcome to spamworld, where nobody reads 🔗

1635869893

🏷️ SEO 🏷️ spam

Recently, a resume went viral for getting good responses despite being filled with obvious BS such as rickrolls thanks to being SEO'd out the wazoo. A less obvious variant of this trick for more serious people has been to include a paragraph of text with white font color (so that it is not visible unless selected, or ever when printed) filled with these SEO keywords. While these tricks can open some doors, they still aren't enough because people still don't read what gets past the filters. Some of the reason for this is plain laziness, but the truth is that what gets past the filters are still too much to read.

This can get especially frustrating for programmers looking for contracts that have a large corpus of public work (such as a blog, or OSS contribs). Prospective employers invariably ask you to take yet another test whether or not you have clear and demonstrated ability to solve their business problems. At the end of the day, exploiting social proof is still what's needed to get hired. Whether you leverage a personal connection, build fame or use "jedi mind tricks" to quickly build emotional investment in an interview it always has to be done. The skills you actually need to do the job basically don't matter at all; they're just one more filter.

Why can't we have nice things?

The core issue which is unremarked upon here is that the war on spam is over, and the spammers have decisively won. The only set of spam filters which actually can catch 100% of spam also catches 100% of non-whitelisted ham. The most recent weapon in this war is greytrapping where you blacklist anyone sending to addresses not at the server, as it's evidence of scanning.

I realized that this approach could also be applied various other places to improve web hosting in general, as scanning happens all the time. My /var/log/messages is usually filled with queries for domains that are not, and never have been on the box. You could similarly ban HTTP requests against IPs specifying incorrect HOST headers.

There are a lot of areas where the other techniques applied to email would actually help. Greylisting phone calls in particular would essentially extinguish the epidemic of scam calls immediately. Especially if you combined it with a mandatory up-front first time leave-a-message running a bot-or-not analysis. That said, it appears there is 0 motivation to change in the telephony space. After all, most major smartphones have supported sending and recieving encrypted SIP calls identified by email addresses for years, yet we still trade the equivalent of IP addresses and pay for this!

This still doesn't fix the problem though. Given the only foolproof solution is whitelisting, it surprises me that no major mail package or hosting control panel automatically adds anyone you directly mail to the whitelist. Most don't even auto-whitelist your addressbook!

But wait! There's MORE

There is an even more insidious problem introduced by the net. While there is an endless tide of spam there is also more ham than anyone could ever possibly eat. This is the current state of scientific publishing despite the replication crisis. What happens when the possible routes of investigation are more than you could ever possibly investigate?

While it is possible that multiple routes lead to your destination, it's likely only one of them is optimal. As programmers know well, this is close to an undecidable problem short of exhaustive search. This flood of "not wrong, but not useful" content which increasingly hinders my search for solutions (again, thank you SEO blogs) has grown increasingly concerning.

I've begun to wonder if this will be the mechanism by which the spread of knowledge regresses to the pre-internet mean. I certainly don't relish the days of having to drive to and then search library stacks to get answers. I don't think it'll be as bad as it used to be, but this has major ramifications for AI researchers. If we can barely get through this tide of junk, I suppose it comes as no surprise that "expert systems" turn out to be closer to "mediocrities copying and pasting from stackoverflow".

This is good news for content creators at least. It means that posts like this one where I lead off with some "in the news" thing can easily be evergreened in the future. This is because everyone's social media feed is eternal september of the guy who just started paying attention. As PT Barnum said, there's a sucker born every minute!

Power Distortions in the Firm 🔗

1635781264

🏷️ management 🏷️ corporate

The most corrosive element in any relationship is power, especially when the wielder does not understand the way it subtly warps their interactions with others. Middle management in firms are quite unaware of this, as in the rest of their lives they are powerless peasants like the rest of us. Doing the sort of context switch to make this work does not come naturally, and the means by which we select managers does not select for the self-reflective. Occasionally they develop the necessary faculties, but this necessarily means their advance in rank will cease and much of the good they do will be plowed under by their peers.

This is why much of modern automation in firms is giving dynamite to children. Once managers saw how much things like issue trackers helped teams internally they could not resist using it as yet another lever to micro-control the process. The strength of Auftragstategik is in practice paid no more than lip service.

Having fallen victim to the siren song of automated measurement, they forget that now they have the same problem as search engines. Unscrupulous employees are now be able to SEO their way into the top ranks of performance with very little effort. Much of this is why the urge in firms to pick low-hanging fruit to get up numbers is so widespread. It is also yet another shackle on themselves, management begins to use the same hammer amongst themselves. This further distracts them from their true purpose of resolving systemic barriers to progress.

Management by Reid Technique

I can't think of a better way to induce anxiety and destroy productivity in the workforce than regularly scheduled police interrogations. Which is essentially the primary way in which employees and management interact now, commonly known as the "one on one". Well-meaning managers put out pieces like this on how they can be positive interactions.

The summary is that management generally wants to hear "all is well" so they may return to inaction, as this is easy. Basically anything else is seen as emotional whining they need to pacify at best. At worst the manager goes full on cop mode and fires people over throwing a tantrum. This in particular is quite perceptive:

A Disaster is the end result of poor management. Your employee believes totally losing their shit is a productive strategy and they believe it's the only option left to making anything change.

It is true that many do not resort to communication of facts until incredibly frustrated their subcommunications have been comprehensively ignored. This is a rational response to the actual goal of the meeting, what managers want to hear is ketman so that's what people give them.

A manager which understands the distorting nature of the power they wield would not engage in such tactics. Like torture, one-on-ones can't possibly achieve anything you actually want. All you will hear is what you want to hear, or emotional outbursts which can and should be disregarded.

The only real way to learn the truth is to observe from a dis-empowered position, like Henry V going into camp incognito. It's either that or have spies. This is much of why QA is defined as "providing information to decision makers". The reports from your QA department is what should be finding the problems in the production process you need to resolve.

As to the people problems, an "open door" policy should suffice. If people won't tell you these things until they explode anyways, this at least saves time. This is not the policy by and large, as management is in love with the idea of prevention. While this is indeed the right strategy in the production process, it is dangerously wrong for personal development. Never allowing people to make interpersonal mistakes is to deprive them of essential learning opportunities. Can one truly be said to have repented under the lash? Or be said to be good without having experienced evil and rejected it?

The only way to avoid these distortions is systemic reform of the organization. Scaling organizations without diluting ownership (as in a partnership) inevitably results in the single-elimination ass-kissing tournament. As such we cannot expect anything but self-service (much less reform) from management at large. The attendant mendacity is a cost of doing business in large firms.

Even in a firm without these problems power can still prove corrosive. That said the incentives are at least not aligned against doing the correct thing.

Detect OOMs via cron 🔗

1635525018

🏷️ scripts

I recently had a problem with a hypervisor where the dom0 was underprovisioned for RAM, and it OOMKiller'd the hypervisor processes for VMs during scheduled rsync backups. After nice-ing the processes appropriately to prevent this I decided to implement a simple cron to email me whenever oomkiller fires. This obviously isn't foolproof, as something could be killed preventing outgoing mails. It is however unlikely, so this will likely be a good enough solution going forward.

oomdetect.sh
#!/bin/bash

touch /root/ooms.log
FSZ=$(stat --printf "%s" /root/ooms.log)
grep -i oom-killer /var/log/messages >> /root/ooms.log
echo $(uniq < /root/ooms.log) > /root/ooms.log
NEWSZ=$(stat --printf "%s" /root/ooms.log)

if [ $FSZ != $NEWSZ ]
then
	echo "New OOM detected, investigate /var/log/messages:"
	tail -n1 /root/ooms.log
fi

Pretty simple altogether, just make sure to run it once before you install it to root's crontab. Don't forget that you can send crons to multiple recipients by making the relevant updates to /etc/aliases for that cron's user and running newaliases.

Why are US firms having such a hard time hiring? 🔗

1635261979

🏷️ corporate 🏷️ recruiting

The mainstream narrative throughout the last couple years has been that additional aid to the unemployed was encouraging them to be layabouts. Now that this aid has ceased, people are looking around and realizing the problem lies somewhere else. It turns out that the reality is a combination of both being on the cusp of a demographic cliff and workers fed up with common behaviors of employers in a glutted market.

One such behavior has direct relevance to hiring. Thanks to said oversupply of labor for years, a similar situation to that experienced by women on dating apps has developed. Which is to say a spam crisis. This has resulted in incredibly aggressive filtering measures. Most commonly these are automated filtering, inflated JDs (job descriptions) and deceptive offers. For a while this was working, but lately the successful match rate (Belveridge curve) has nosedived. I suspect I know why.

Automated Filtering and JD inflation

JD inflation started out the same way most product requirements go. Too many cooks adding too many ingredients. What started out as a light scout vehicle becomes the Bradley fighting vehicle.

Eventually programmers got a hold of these documents and realized they could apply search techniques to try and improve match rates. This provided great results for both employers and workers for the first decade of the 21st century. However, like the web it became filled with spam and SEO'd content and the signal-to-noise ratio plummeted.

The practical consequences of this are twofold:

Honest qualified applicants are either discouraged or screened out
All hiring happens in practice through personal connections and recruiters

The latter is particularly concerning as recruiters are quite expensive per hire. This means that the only cost-effective way small firms can acquire talent is via personal connections. The former is also corrosive, as many on both sides have come to accept that you have to cynically game this system to get good results.

Deceptive Offers, Liar Hires

Recently a story went viral about the abysmal rate of responses a qualified applicant got for entry level jobs, and how the offer was always lower than advertised on those that did respond. It should shock nobody that marginal firms engage in catfishing to game this system and get better hires than they could normally. So long as advantage may be acquired through dishonest practices people will try it absent any meaningful sanction. All's fair in love and war and hiring.

Similarly many prospective hires get good results using jedi mind tricks pioneered by salesmen and pickup artists to rapidly build emotional investment in their counterparts. This causes a great deal of resentment in those who prefer devoting their limited time to professional excellence when they see this results in the (relatively) unskilled getting ahead and them being left behind.

Much of what we are seeing with our worker shortage is these resentments finally boiling over. Professionals either decide to "play the game" or take their ball and go home. This results in turnover at worst and checked-out disloyal workers at best.

Firms have tried to prevent this by fostering a cult atmosphere via paternalist measures and propaganda. This is both expensive and ineffective outside of the short-term. Competitors have all adopted similar benefits and anyone paying attention is wise to the BS now. Joshua Fluke has made a career on youtube pillorying this nonsense. The only option left to employers is actually raising wages.

That said the astonishing levels of average household debt continues to weaken workers' position. They ultimately don't have enough savings to give an outright no. The American worker's only choice is who to say yes to. Woe unto those working in highly consolidated industries, where competition is less meaningful of a bargaining chip.

Demographics

What is making this even worse are demographic trends. The boomers are taking these frustrations they've been dealing with for years and being at or close to retirement age as sufficient reason to throw in the towel. Similarly many spouses are discovering they prefer being at home in a supporting role after having experienced it thanks to pandemic related layoffs. Anxiety over family formation (the lack thereof) likely factors into this decision to an extent.

Like the mainstream narrative as to why the shortage was incorrect, the advocated solution to the demographic problem is also incorrect. While increased immigration will provide a fresh supply of those ignorant of the reasons Americans are reticent in dealing with US firms, this is not a sustainable solution. The internet has massively increased the speed with which immigrants are stripped of their illusions regarding the "American Dream". Many immigrants deeply resent the sword of Damocles that green cards represent and they inevitably learn the reality of the American workplace as well.

Nevertheless, supposing an unlimited supply of skilled labor willing to immigrate (which upon reflection is actually quite a dubious assumption) the can could be kicked down the road indefinitely. However there has been no meaningful increase in immigration at present, which means employers must take concrete action now or accept understaffing. This means any change in immigration policy is unlikely to happen, as it will be "too little, too late" for the vast majority of firms.

Expansion of Entrepreneurship

One other way in which people are "taking their ball and going home" is striking out on their own, much as I have. The 49% increase in EIN applications is strong evidence many are doing so. I have been surprised for years that more didn't recognize the strength of contracting earlier. The tax advantages are quite strong and automation removes much of the need for support staff. Nevertheless now that people are taking the plunge, this is removing a significant number of people from the employment rolls permanently.

Missionary or Mercenary? 🔗

1634842858

🏷️ corporate 🏷️ entrepeneurship

As part of this transition to entrepreneurship, I've talked to a lot of recruiters and companies in attempts to get contracts. Of those successful, there is almost always some element of a bait-and-switch involved either with the actual duties required or payment offered versus the expectations discussed up-front.

I don't take any of it personally, sometimes it's just a negotiation tactic you have to withstand. It is nevertheless a black mark on the relationship going forward, as it's assured they'll inevitably find some way to short you subsequently. If you are good at sticking to your guns, the place they inevitably slip is being late with payment.

What baffles me is how many employees I've known throughout the years who actually accepted an offer after such an obvious bait-and-switch without holding them to initial expectations. When I talked to them about it, their rationalization of the situation was inevitably a covert contract along the lines of "they'll do it when I prove myself". This of course was an axe ground forever in secret when that never came to pass. Such poison is everywhere when you know how to look for it in most firms.

Modern employment seems built around fostering these sorts of covert contracts. This is a side-effect of managers wanting missionaries not mercenaries. The trouble is that this vision of the firm is inevitably undermined by the incompetence of their self-serving management. It can't be any other way because the phenomenon of the "single-elimination ass-kissing tournament" and it's effects described in "Moral Mazes" saturate the market in the USA totally. As such employees regularly care deeply about firms that are at best indifferent to their well being. In that case being a "missionary" feels less inspiring and more like wearing matching nikes and drinking kool-aid. Missionaries tend to get disillusioned when they realize those which they follow are not gods, just men.

Which brings us to today, where we have many firms lamenting a labor shortage. This should shock nobody paying attention to demographics. Over the last 60 years firms have enjoyed an unprecedentedly glutted labor pool thanks to both the baby boom and feminine empowerment. The reality going forward is instead a diminishing labor pool. Yet firms still regularly adjust their final offers down and withhold Hiring "bonuses" until long after the initial work. They then have the gall to wonder why they're seeing a lack of enthusiasm.

Firms will not be able to get away with the sort of chicanery which has been commonplace in the last 60 years. Both wages and the meaningfulness of jobs will have to actually adjust higher. This is bad news for the management of most firms, as they've grown fat glad-handing and hiring armies of suck-ups which make them look good without being productive.

This is ultimately a hopeful sign for the future versus our reality of the last 20 years where we have had the best engineers ever lead by the worst management. We're already seeing huge levels of attrition from firms which have clueless management. It is but a matter of time before people look around and see that we have had the answers all along, but are blind to them on a systemic level.

It would be a breath of fresh air to be able to deal with management on a professional level rather than have to engage in a guerrilla insurgency of "Fuhren unter Der Hand" to achieve the actual goals of companies. I have grown weary of having to cynically exploit middle management's impulse to look good at all costs as the way to advance in a firm rather than actually accomplishing something. That said, the army has known about the superiority of auftragstrategik, the OODA loop and ideas similar to the germ theory of management for even longer than the business community has, yet remains the poster child for the kind of management described in "moral mazes". The capacity for self-deception in managers clearly is at least as great as those who work for them.

This is perhaps the greatest reason I've left it all behind me to do this hired-gun thing. At least this way I'm not holding my breath that this entrenched set of problems gets fixed.

PTR Records: waving a dead chicken 🔗

1634239190

🏷️ ipv6 🏷️ dns 🏷️ mail 🏷️ spam

Helping out people with their shared hosting problems inevitably runs into mail issues. One of the more common ones has to do with being put on an RBL, usually a code 550 bounce. This usually happens for one of three reasons:

The mailserver is configured to do what seems right and HELO per the appropriate domain rather than the domain of the server
The mailserver responds on an IP other than that of the domain returned by the HELO (either lack of a PTR record, or responding on the mail domain's IP)
The client is actually spamming

It's rarely the latter, as this is fairly straightforward to prevent with outgoing mail limits, outgoing scanning and so forth.

You'd think that this cross-referencing of a domain's A record to it's PTR would prevent a lot of spam, as this would effectively prevent spoofing. Unfortunately the reality of shared hosting, IPv4 scarcity and limited ISP tooling has crippled this. This has resulted in a reality where there are only 2 correct configurations:

No sharing of IPs between domains which emit mail, period. Not even CNAMEs. You can then HELO per domain on it's IP.
Never HELO with anything other than the server hostname, and always respond from the server's IP

As you can see allowing the latter (which all RBLs must, as it's the Lowest common denominator here) essentially cripples the cross-referencing of the sender domain's A record with it's IP and the corresponding PTR record. Spammers can spoof any domain they wish as long as their MTA HELOs correctly until they get manually reported.

So why is it that we can't share IPs and HELO per-domain/ip? It's precisely because we are doing this PTR to IP to A record cross-referencing as an automated process over at the RBL makers like spamcop et al. That, and almost nobody is ever given a /24 block of IPs (0-254 on the least significant byte of the address). Instead the ISPs usually assign smaller blocks to multiple customers. This means that they can't delegate the NS record for *.X.Y.Z.in-addr-arpa (where XYZ is the remainder of your IP, but backwards) to their client.

So instead they provide some kind of interface for their clients to add PTR records. Unfortunately many don't know that a one-to-many PTR relationship is entirely supported by DNS, much like for A records themselves. As such these interfaces inevitably only allow one domain to be specified per IP. Which leaves you with only one option: send out mail from your server's hostname and primary IP.

Meanwhile, 0 spam is prevented because of this massive hole blown in the entire scheme. The only practical outcome is that unaware sysadmins get caught up by this and are put on RBLs erroneously. Which is a pity, as if the system actually worked correctly it would be an ironclad guarantee against spoofed emails.

IPv6 could have fixed this, as we could give everyone a /64 IP range until the end of time and never run out. Delegating NS for the PTR domains could then become standard practice. IPv6 never really got adopted by the major ISPs though. Given they haven't updated their tooling for multi-PTR (which has been supported almost since the beginning of DNS), we shouldn't hold our breath that NAT is going away anytime soon either.

2 Major paths to take advantage of CDNs 🔗

1634143208

🏷️ tcms 🏷️ www

When considering a switch from traditional web hosting to something like S3 (or ipfs) plus web workers or equivalent stateless compute resources (GKP, Lambda, etc) you need to re-think how you deploy your applications. Many of the complex backends that you are used to are simply not feasible to have on the live web with this model. That said, it's not hard to write CGIs in a stateless fashion, and many only keep their state client side or in a database, which is a model that works just fine here.

That said, most of your backend probably doesn't even need to be public. It could easily be behind a firewall and just push updates to your staging and production buckets and workers. In fact, I'm considering this model for tCMS right now! There's no good reason why I need the login functionality public, at least for post editors -- all of them could run a local instance and push out updates supposing I had an rqlite backend data storage sitting in a bucket as a read only node, with the editors behind firewalls doing the RAFT thing.

From there the question is whether to do static generation or load sqlite with sql.js and do everything client-side (or a combination of both). Static versus dynamic web pages is well trod ground at this point so do what feels best for your application. Personally I will probably go with static generation as I prefer to run as little as possible on the client machine to get the best user experience.

The big drawback to using IAAS is (historically) higher TCO and a lack of good abstractions making it easy to self-host under such models. Things like OpenStack make self-hosting possible, but prohibitively expensive and it feels like swatting flies with an elephant gun for most webmasters.

Even containerization and it's orchestration tools require a lot of extra thought and code versus the traditional LAMP approach. What's really needed to swim in both pools easily are some abstractions to make it easy for you to:

Treat local directories like they are S3 buckets (and/or ipfs tokens) so you have flexibility for storage deployment
Treat local httpds like they're a web worker API you deploy to, rather than vhosts

Implementing such tools would effectively allow any shared host to transition their existing infrastructure into such products, albeit without automatic scaling on their end. That said, if multiple servers are supported in a sort of federated model (again, likely mediated thru rqlite), that would largely ameliorate such concerns.

For S3, s3proxy is exactly what we are looking for. As to an imitation of web workers (or other "stateless" compute resources), that would be far more complex given there is no standardization across vendors.

Dealing with out of date slave zones after an IP migration 🔗

1633971893

🏷️ rndc 🏷️ dns

I had a problem the other day where I did an IP migration on a server which was shipping slave zones to another server. Unfortunately, the IP migration script provided with plesk either failed to update the zone serials, or BIND9/rndc just didn't care for whatever reason. As such, the other nameserver kept returning responses with the old IPs, despite:

Restarting bind, reloading rndc on both
incrementing every zone serial on the zones we were authoritative for using sed
Manually wasting the zonefile on the slave, then running rndc retransfer and rndc resync on the master
rndc delzone of said slave, and then try retransfer/resync again

Nothing actually brought over the new zonefile. Obviously, more rigorous means of forcing the issue were required.

First, I needed to see if the zones on the slave were not updating, or if I was nuts. Unfortunately rndc showzone tells you next to nothing other than that a zone exists and where it comes from. This means I need to convert them to text, as the slave zones by default are shipped in bind's Binary format. To make a more generic alias, I've made a means to detect whether or not it's binary and "do the right thing".

showzone.source

showzone()
{
    echo ${1?Error domain name is not defined e.g. showzone test.test }
    # Update the following as appropriate to your environment, this works on plesk
    ZONE_LOC=/var/named/chroot/var

    file $ZONE_LOC/$1 | grep text &> /dev/null
    RESULT=$?
    if [ $RESULT == 0 ]
    then
        cat $ZONE_LOC/$1
    else
        /usr/sbin/named-compilezone -f raw -F text -o - $1 $ZONE_LOC/$1
    fi
}

The missing link here was doing rndc addzone in the same way it would happen automatically, to fool the system into thinking it was a brand new slave configured:

fix_borked_slaves.sh

#!/bin/bash

ZONE_LOC=/var/named/chroot/var

# Run this On slave:
for zone in $(ls $ZONE_LOC)
do
    if [ -z `rndc showzone $zone 2>/dev/null` ]
    then
        # Grab rndc's current definition
        TO_ADD=$(rndc showzone $zone 2>/dev/null | grep slave | awk '{ $1=$2=$3=""; print $0}')
        # Dump the current list for use on master
        echo "$zone" >> for_master_update.txt
        # Delete and re-add the zones
        rndc delzone $(rndc showzone $zone 2>/dev/null | grep slave | awk '{ print $2 }')
        rndc addzone $zone $TO_ADD
    fi
done

# Preamble to script you need to run on master, based on for_master_update.txt

# ADAPTER=eth0
# LOCAL_IP=$(ip addr show $ADAPTER | grep 'inet ' | awk '{print $2}' | sed -e 's/\/.*$//g')
# REMOTE_IP="REPLACEME"

# On REMOTE_IP, do a mass search/replace to make for_master_update.txt look like this and run it

#rndc -b $LOCAL_IP -s $REMOTE_IP -p 953 -y rndc-key retransfer $zone

This thorough application of the LART did the trick. I'm still not 100% sure what was BIND and rndc's problem here; bumping zone serials should always result in a transfer.

What would be better?

Nonsense like this is probably why cPanel's "DNS Clustering" never used afxrs or master/slave DNS in the first place. Eventually people liked it's enabling of a multi-master setup where you could edit the zones from anywhere in your cluster. It was of course a pretty funky system in that it had no consensus protocol whatsoever. Edit loops and Sibyl attacks were entirely possible and straightforward for customers to mistakenly produce for years because of confusing documentation. The documentation is better now, but the design of the system haven't fundamentally changed.

I was part of the team tasked with considering redesign of that system when we had to add DNSSEC support to it. Even the supermaster mode in pdns doesn't have the same strength as the RAFT protocol. As such I think the ideal setup for that is probably pdns in sqlite mode, utilizing RQlite. This has the added benefit of being simple for people to just setup themselves.

MySQL utf8 migrations: big hairy queries 🔗

1633018799

🏷️ mysql 🏷️ utf8

I had to migrate some mysql databases to UTF8mb4 last week. You would think this all would have happened a decade ago in a sane world, but this is still surprisingly common thanks to bad defaults in packaged software. Every day there's a new latin1 db and table being made because /etc/my.cnf isn't setup right out of the box. So People google around, and find some variant of this article.

I wasn't willing to waste hours with a non-automated solution for the many tables and databases piled up over years. So, I wrote a script of course. The plan was this:

Convert the DB's default encoding & collation
Get the list of tables so we can iterate
Try and convert the table. If that fails, it's usually due to long varchars being indexed.
On failure, drop and re-add the indices on long varchars with an acceptable shorter prefix length
After that, try and re-convert the table

This ended up being a 99% solution. A couple of tables had some neat problems I'll talk about at the end. So without further ado, I'll talk about the queries themselves.

Converting the DB is straightforward.

 fix_db.sh
mysql -e "ALTER DATABASE $DB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci"

We'll assume for these examples that $DB is $1 (the first argument to the script).

Similarly, listing and converting tables was easy:

 fix_tables.sh ~ snippet 1
for table in $(mysql $DB -ss -e 'show tables')
    do
        mysql $DB -ss -e "ALTER TABLE $table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci" &> /dev/null
        STATUS=$?
        if [ $STATUS -ne 0 ]
        then
            ...

Now we have to do the hard part of dropping and re-adding the indices, because mysql doesn't support ALTER INDEX like postgres.

This means we need to capture the index state as it currently exists and only make the needed adjustments to turn varchar indexes into prefix indices. If you didn't read the article linked at the top, this is because utf8mb4 strings are 4x as large as latin1, and mysql has a (mostly) hardcoded index member size. You can get it bigger on InnoDB for most use cases, but in this case it was MyISAM and no such option was available.

Here's how I did it:

 fix_tables.sh ~ snippet 2

    for query in $(mysql -ss -e "select CONCAT('DROP INDEX ', index_name, ' ON ', table_name) AS query FROM information_schema.statistics WHERE table_name = '$table' AND index_schema = '$DB' GROUP BY index_name HAVING group_concat(column_name) REGEXP (select GROUP_CONCAT(cols.column_name SEPARATOR '|') as pattern FROM information_schema.columns AS cols JOIN information_schema.statistics AS indices ON cols.table_schema=indices.index_schema AND cols.table_name=indices.table_name AND cols.column_name=indices.column_name where cols.table_name = '$table' and cols.table_schema = '$DB' AND data_type IN ('varchar','char') AND character_maximum_length >= 250 AND sub_part IS NULL);")
    do
        echo "mysql $DB -e '$query'"
    done

    for query in $(mysql -ss -e "select CONCAT('CREATE ', CASE non_unique WHEN 1 THEN '' ELSE 'UNIQUE ' END, 'INDEX ', index_name, ' ON ', table_name, ' (', group_concat(concat(column_name, COALESCE(CONCAT('(',sub_part,')'), CASE WHEN column_name REGEXP (select GROUP_CONCAT(cols.column_name SEPARATOR '|') as pattern FROM information_schema.columns AS cols JOIN information_schema.statistics AS indices ON cols.table_schema=indices.index_schema AND cols.table_name=indices.table_name AND cols.column_name=indices.column_name where cols.table_name = '$table' and cols.table_schema = '$DB' AND data_type IN ('varchar','char') AND character_maximum_length >= 250 AND sub_part IS NULL) THEN '($PREFIX_LENGTH)' ELSE '' END)) ORDER BY seq_in_index), ') USING ', index_type) AS query FROM information_schema.statistics WHERE table_name = '$table' AND index_schema = '$DB' GROUP BY index_name HAVING group_concat(column_name) REGEXP (select GROUP_CONCAT(cols.column_name SEPARATOR '|') as pattern FROM information_schema.columns AS cols JOIN information_schema.statistics AS indices ON cols.table_schema=indices.index_schema AND cols.table_name=indices.table_name AND cols.column_name=indices.column_name where cols.table_name = '$table' and cols.table_schema = '$DB' AND data_type IN ('varchar','char') AND character_maximum_length >= 250 AND sub_part IS NULL);")
    do
        echo "mysql $DB -e '$query'"
    done

We then execute all the mysql statements. I output this to a shell script which can then be run. If we simply ran them, we would have an issue where the indexes we are looking for have already been dropped.

Let's go over the details, as it uses a solid chunk of the important concepts in mysql. Read the comments (lines beginning with --) for explanations.

 drop_indices.sql

-- The plan is to output a DROP INDEX query
SELECT CONCAT('DROP INDEX ', index_name, ' ON ', table_name) AS query
-- information_schema.statistics is where the indices actually live
FROM information_schema.statistics
WHERE table_name = '$table'
AND index_schema = '$DB'
-- The indices themselves are stored with an entry per column indexed, so we have to group to make sure the right ones stick together.
GROUP BY index_name
-- We want to exclude any indices which dont have VARCHAR or CHAR cols, as their length is irrelevant to utf8mb4
-- The simplest way is to build a subquery grabbing these, and then scan the columns in the index via regexp
HAVING group_concat(column_name) REGEXP (
    select GROUP_CONCAT(cols.column_name SEPARATOR '|') AS pattern
    -- We need to cross-reference the cols in the index with those in the table
    -- So that we can figure out which ones are varchars/chars
    FROM information_schema.columns AS cols
    JOIN information_schema.statistics AS indices ON cols.table_schema=indices.index_schema
    AND cols.table_name=indices.table_name
    AND cols.column_name=indices.column_name where cols.table_name = '$table'
    AND cols.table_schema = '$DB'
    AND data_type IN ('varchar','char') AND character_maximum_length >= 250
    -- sub_part is the prefix length.  In practice, any ones with a sub part already set were just fine, but YMMV.
    AND sub_part IS NULL
);

Things get a little more complicated with the index add. To aid the reader I've replaced the subquery used above (it is the same below) with "$subquery".

 make_indices.sql

-- As before, we wish to build a query, but this time to add the index back properly
SELECT CONCAT(
    'CREATE ',
    -- Make sure to handle the UNIQUE indexes
    CASE non_unique WHEN 1 THEN '' ELSE 'UNIQUE ' END,
    'INDEX ',
    index_name,
    ' ON ',
    table_name,
    ' (',
    -- Build the actual column definitions, complete with the (now) prefixed indices
    group_concat(
        concat(
            column_name,
            -- COALESCE is incredibly useful in aggregate functions, as the default behavior is to drop the result in the event of a null.
            -- We can use this to make sure that never happens when sub_part IS NULL.
            -- In that case, we either print a blank string or a prefix based on what I chose to be $PREFIX_LENGTH (in my case 50 chars) when varchar/char cols are detected by the subquery.
            COALESCE(
                CONCAT('(',sub_part,')'),
                CASE WHEN column_name REGEXP ($subquery) THEN '($PREFIX_LENGTH)' ELSE '' END
            )
        )
        -- Very important to preserve the sequence in the index, these are scanned before the latter ones and can have big perf impact
        ORDER BY seq_in_index
    ),
    ') USING ',
    -- Almost always BTREE, but might be something else if we are for example using geospatials
    index_type
) AS query
-- The rest is essentially the same as the DROP statement
FROM information_schema.statistics
WHERE table_name = '$table'
AND index_schema = '$DB'
GROUP BY index_name
HAVING group_concat(column_name) REGEXP ($subquery);

As you can see, this uses nearly every trick in the book. The only improvements I could think of would be to turn the subquery into a VIEW, as it was used multiple times. Turning the create statement generator without our constraints into a VIEW is generally useful and left as an exercise for the reader.

About the only major feature I didn't use were stored procedures or extensions. This all would have been a great deal simpler had ALTER INDEX been available as in postgres. In sqlite it would be a great deal more complicated save for the fact that this isn't a problem in the first place, given utf8 is the default.

The only places this didn't work out was when tables crashed on things like SELECTs of particular rows for whatever reason. The good news was that mysqldump generally worked for these tables, and you could modify the dump's CREATE TABLE statement to setup indices properly. There were also a few places where there were very long varchars that were *not* indexed, but made the table too large. Either shortening them or turning them into LONGTEXT was the proper remedy.

Games people play: Ultimate Smackdown 🔗

1629390195

🏷️ kayfabe 🏷️ branding

Mark Gardner asked me on twitter the other day why I wrote this article without specifically naming any names. I realized my reasoning here is probably worth sharing and elaborating if the question needs to be asked.

A lot of people would naturally think most of the post was about the perl community given that's sort of what this series of posts has been about up to this point. While it does indeed apply in large part, it's far from the only place I've seen these phenomena, and a general treatment will naturally be more useful. I've also personally "been the bad guy" here myself, so I can't throw the first stone.

People don't usually read blogs because they're useful. It's all about getting engagement these days which gets down to the more important reason to not "name names" unless the situation described is already well understood in the public. When you do, your brand immediately becomes pro wrestling and you're either gonna be a heel or a face.

This is also why calumny is considered sinful, as it provokes strong emotions which tend to make people do things they regret. When you decide to participate in an online kayfabe, don't be shocked when you get the wrong kind of followers. Sometimes the drooling fans are more annoying than the online lynch mob.

Beefin' is generally not what you want to see from a professional offering actual goods or services. Do you want to sell software and services, or entertainment? You generally don't get to do the same within one brand. There's a reason my political/entertainment persona online is quite separated from my professional. While I don't try to hide that it exists, the degree of separation is an important signal that you can, in fact, control yourself when it comes to what's actually important -- making an upright living through service to others.

A lot of this first year of entrepreneurship has been de-programming myself and unlearning bad corporate habits I didn't even know I had. Much of my content up to now has been little more than sharing my journey. It's been useful to some people, but nowhere near as useful as my actual skill set has been to my clients. Sure feels good to write though.

Resizing, Expanding and Using raw images 🔗

1629321079

🏷️ tutorial 🏷️ kvm

So you need to resize some VM images, and possibly inspect them later. Here's the dope:

 mount_loopback.sh
# Replace size with what you see to be appropriate
qemu-img resize $IMAGE_FILE +80GB
# Setup a device for the image file so that we can resize the relevant partition
losetup -f -P $IMAGE_FILE

When using LVM, you'll have to do some extra steps to get the relevant device and resize things:

 get_mapped_volname.sh
pvscan --cache
# Grab the name of the volume
lvs
# Use the name from that to plug it in, there should be something similarly named in /dev/mapper
vgchange -ay

Now you need to resize the partition:

 resize_normal.sh
# When you have a normal DOS/BSD partition table:

# Using ext fS:
resize2fs /dev/loop0p1

# Basically anything else
growpart /dev/loop0 1

Things are of course different on LVM:

 lvm_extend.sh
# Resize partition you want, usually this is device 1, but on XFS the data section is usually device 2
pvresize /dev/loop0p2
lvextend -l +100%FREE /dev/mapper/$NAME

XFS also requires special handling. You will have to mount the relevant device and then:

 xfs_grow.sh
# Or, the loop device if it's not LVM
xfs_growfs /dev/mapper/$NAME

Mounting and fiddling with stuff is as simple as:

 mount_loopbacks.sh
# Replace partition number with whatever is appropriate
mount -t $FSTYPE /dev/loop0p1 $MOUNTPOINT
# Same story with LVM stuff, but use the /dev/mapper entry
mount -t $FSTYPE /dev/mapper/$NAME $MOUNTPOINT

When done, you should remove the loopback device:

 remove_loopbacks.sh
losetup -d /dev/loop0

One last point worth noting is that if you do this while the VM is active writes will not show up until you unmount & unload the loopback devices, as they don't share journals. The best use of mounted VM disks on the hypervisor is for backups though, and for this purpose loading and mounting them just for the backup period works quite well. As such I generally also add the -r (read only) option to losetup when I'm mounting for backup purposes.

Troglodyne year 1 in review 🔗

1629315907

🏷️ tcms

About halfway into my year of doing this solo entrepeneur thing, I realized a lot of my work on tCMS was not done out of a desire to outdo wordpress, ghost or any of the other CMSes which are a part of this cambrian explosion of software most of us have lived through.

Instead, it was actually done for the same reason carpenters build their own house. By god, I'm gonna do it the way I want it for once! What you get is indeed quite satisfying. Though when you zoom out and think of the long term perspective, does it actually mean as much as I feel it does? After all, generations of my ancestors built their own homes and barns. Those now living in them (if they weren't razed) now have no idea what went into it or why it was built that way.

It will be the same with software and the brands and businesses built around them. Like with houses, the only ones that will remain standing will largely be a function of what particular families, towns, firms and industries managed to stick around. As such the "0 code" grifters, for all their embarrassing obviousness are essentially right when they focus almost exclusively on building their customer pipelines.

Now that I'm out in the world of general contracting, I actually see this everywhere. The biggest and most successful businesses tend to run lean and hard on their creaking and ancient facilities be they real or virtual. Even obsolete software, hardware and real estate work just fine, and usually with great margins now that they've long outlived their depreciation curve.

What I'm trying to say here is yet another reason to not get too wrapped up in your tech. So what if it's a mountain of garbage? Plenty of money to be made mining that heap! Using a dead language? Necromancy tends to have pretty high margins! Lots of people make their livings with run-down trucks and dilapidated real estate.

Which is ultimately why I'm sticking with my little CMS. Sure it's using Perl, and probably an evolutionary dead end as far as CMSes go. But it's mine, and at the end of the day you have to live like nobody else to live like nobody else. As long as it delivers where I need it to, I'm not gonna sweat about the future. Having seen several people build successful business with worse tech up close and personal, I'm confident that I can actually build a business atop this little house for my data.

I'm quite blessed to have had good advice, prudent planning, discipline and a patient business partner which has allowed me the ability to putter around until I figured out how all this works. I'm grateful for the clients I've had up to this point and their ongoing custom. I think this next year I'll be able to finally add in a software offering of my own.

I'm also quite happy with how well I've done keeping up with my open source projects. Now that it looks like I've actually got a credible hourly rate I'm beginning to wonder if setting up a charitable OSS foundation (or getting sponsorship from something existing) so I can use this pro-bono time as a writeoff will make sense in the future. I'll have to look into this, and hopefully can get a good article and video on the subject in the future.

The reality of banality: yet more lessons from a career in QA 🔗

1629266857

🏷️ oss 🏷️ moderation

A great deal of the conflict in online messaging software and social networks revolves around the idea that people should conduct themselves "better" as though that were in fact possible. A thorough reading of history will make you realize that Sturgeon's Law applies equally to interactions you have with others. When have we ever not had depraved maniacs for elites, mobs of raving heretics spreading all manner of nonsense, attention seekers and every other kind of nuisance which we are currently beset by? I don't think it's ever happened for more than brief stretches of time.

Layered atop this reality is the massively perverse incentives of the "social graph". Measuring engagement is essentially a flow meter on a sewer pipe. The fact that nobody has figured out anything better than this grotesque hack to produce relevant searches is a testament to the reality of our natures and desires.

As such, the idea that codes of conduct or "Zero tolerance" policies could even begin to address deficiencies in public discourse is beyond ludicrous. This has important implications for things such as open source projects and social networks. The more they commit to openness, the more "toxic" the discussion is guaranteed to become, as this necessarily means not filtering out 90% of the possible inputs for the simple reason that they add nothing substantive.

This is why projects with BDFLs (Benevolent Dictator For Life) actually tend to work. Maintainership almost always goes hand in glove with deleting clowns from your tracker, message boards and mailing lists. On the other hand, when you have a nebulous "open" means by which authority in a project is acquired, any fool can make a run at the crown and as such will make effort to be heard.

At that point, whatever group in charge has two choices:

Ignore the rabble
Engage with the rabble until overwhelmed by sheer numbers

As such it should shock no one at all why projects run by nebulous groups of quasi-nobility tend to turn out the way they do. It is an inevitability that the group's legitimacy is questioned if for no better reason than it has a larger attack surface (again, Sturgeon's law means most of your ruling council will be of dubious quality).

On the other hand, the BDFL is indivisible, and bad ones don't get projects off the ground at all. The normal selection mechanism of the Bazaar helps us here. The attack surface is minimal, and discourse is likely to be healthy.

The question then arises, Why does the BDFL always step down? Heavy is the head that wears the crown. Scaling projects is hard. Many people use the tricks of modularization and shoving as much code into data as possible, but won't go all the way and release control, making the breaking of the monolith academic at best.

Done properly this resembles a feudatory arrangement in the classical sense. Many overlapping claims to particular subsystems, but in general one overarching BDFL for each, reporting up the stack to the ur-BDFL of the project. This is actually a pretty good description of how the linux kernel's development works right now. I hope that like the Good Emperors they choose successors wisely rather than leaving the title(s) up for grabs.

Which brings me to the point of all this. The reason this arrangement works is because it is naturally efficient to have people become experts in the systems they work on rather than generalists who know nothing about everything. Even studies back this up. So why would anyone want to mess up a good thing over little things like them being wicked sinners and conforming with Sturgeon's Law in the other 90% of their life?

Because we think things can be different, despite millenia of evidence to the contrary. Human situations change, but their nature does not. Even the idea that "we want better communities" is usually just a cope masking a lust for vengeance over personal slights.

Even in those rare moments of greatness, we have to know mean reversion is right around the corner. But take heart! In the worst of situations you can also be sure a return to mediocrity can't be far off. So rather than lament our lot, why not embrace it? This is what I mean when saying "It's better to be happy than right".

Sure you're gonna work and talk with some absolute toads. So what? Get over yourself, and throw your inner child down a well (it's OK, Lassie will eventually rescue it). Getting emotional about it never accomplished anything.

Here's the practical advice. In a technical project, you will have to deal with a mountain of BS from validation seekers. This is the ultimate motivation of both the holy crusader and the troll and neither should be tolerated unless you enjoy watching your fora transform into a river of sewage. These messages can easily be spotted because they don't actually contribute anything concrete. Reward good behavior by responding promptly to actual contributions, and leave everything else "on read".

Eventually long-time contributors (and users) get emotionally invested. This is where most of the crusaders actually come out of, not realizing they're playing a validation seeking game. Gee, what if this project I invested all this time in is actually bad? Does that mean I'm bad? Impossible! It's all these heretics...

Remain vigilant against the urge to get emotionally invested in your tools and organizations. Sturgeon's Law holds and mean reversion will eventually happen. If you can't move on, you eventually become a fool screaming at inanimate objects and attacking phantoms of your own imagining.

Games people play on P5P: Part Deux 🔗

1629136448

🏷️ perl 🏷️ p5p 🏷️ lolcows

Since I posted about the resignation of SAWYERX, even more TPF members have thrown in the towel. This doesn't shock me at all, as the responses to my prior post made it clear the vast majority of programmers out there are incapable of seeing past their emotional blinders in a way that works for them.

The latest round of resignations comes in the wake of the decision to ban MST. The Iron Law of Prohibition still holds true on the internet in the form of the Streisand Effect, so it's not shocking that this resulted in him getting more press than ever. See above Image.

I'll say it again, knowing none will listen. There's a reason hellbans exist. The only sanction that exists in an attention economy such as mailing lists, message boards and chatrooms is to cease responding to agitators. Instead P5P continues to reward bad behavior and punish good behavior as a result of their emotional need to crusade against "toxicity". Feed trolls, get trolls.

This is why I've only ever lurked P5P, as they've been lolcows as long as I've used perl. This is the norm with programmers, as being right and happy is the same thing when dealing with computers. You usually have to choose one or the other when dealing with other people, and lots of programmers have trouble with that context switch.

So why am I responding to this now? For the same reason these resignations are happening. It's not actually about the issue, but a way to ~~get attention~~ raise awareness about important issues. Otherwise typing out drivel like this feels like a waste of time next to all those open issues sitting on my tracker. This whole working on computers thing sure is emotionally exhausting!

Getting Features shipped in the face of resistance 🔗

1628695736

🏷️ wwic 🏷️ corinna 🏷️ perl

The big tempest in a teapot for perl these days is whether OVID's new Class and Object system Corinna should be mainlined. A prototype implementation, Object::Pad has come out trying to implement the already fleshed out specification. Predictably, resistance to the idea has already come out of the woodwork as one should expect with any large change. Everyone out there who is invested in a different paradigm will find every possible reason to find weakness in the new idea.

In this situation, the degree to which the idea is fleshed out and articulated works against merging the change as it becomes little more than a huge attack surface. When the gatekeepers see enough things which they don't like, they simply reject the plan in toto, even if all their concerns can be addressed.

Playing your cards close to your chest and letting people "think it was their idea" when they give the sort of feedback you expect is the way to go. With every step, they are consulted and they emotionally invest in the concept bit by bit. This is in contrast to "Design by Committee" where there is no firm idea where to go in the first place, and the discussion goes off the rails due to nobody steering the ship. Nevertheless, people still invest; many absolute stinkers we deal with today are a result of this.

The point I'm trying to make here is that understanding how people emotionally invest in concepts is what actually matters for shipping software. The technical merits don't. This is what the famous Worse Is Better essay arrived at empirically without understanding this was simply a specific case of a general human phenomenon.

Hand me the MOP! 🔗

1628189700

🏷️ perl 🏷️ oo

Reading Mark Gardner's latest post on what's "coming soon" regarding OO and perl has made me actually think for once about objects, which I generally try to avoid. I've posted a few times before that about the only thing I want regarding a new object model is for P5P to make up it's mind already. I didn't exactly have a concrete pain point to give me cause to say "gimme" now. Ask, and ye shall receive.

I recently had an issue come down the pipe at playwright-perl. For those of you not familiar, I designed the module for ease of maintenance. The way this was accomplished is to parse a spec document, and then build the classes dynamically using Sub::Install. The significant wrinkle here is that I chose to have the playwright server provide this specification. This means that it was more practical to simply move this class/method creation to runtime rather than in a BEGIN block. Running subprocesses in BEGIN blocks is not usually something I would consider (recovered memory of ritual abuse at the hands of perlcc).

Anyways, I have a couple of options to fix the reporter's inability to subclass/wrap the Playwright child classes:

Run the subprocess in BEGIN to grab the spec from the playwright_server
install all the Moo stuff with Sub::Install as well
Throw in the towel on runtime meta and rely on compile-time code generation

I could of course abandon object orientation entirely as well. The object model is familiar to users of Selenium::Remote::Driver (which is pretty much where all my user base is coming from), so that's probably not a great idea.

On the other hand, if we had a good "default mop" like Mark discusses, this would be a non-issue given we'd already get everything we want out of bless (or the successor equivalent). It made me realize that we could have our cake and eat it in this regard by just having a third argument to bless (what MOP to use).

perl being what it is though, I am sure there are people who are in "bless is bad and should go away" gang. In which case all I can ask is that whatever comes along accommodates the sort of crazed runtime shenanigans that make me enjoy using perl. In the meantime I'm going back to compile-time metaprogramming.

Dependencies: It depends 🔗

1627440954

🏷️ linking 🏷️ dependencies

Over the years I've had discussions over the merit of various forms of dependency resolution and people tend to fall firmly into one camp or the other. I have found what is best is very much dependent on the situation; dogmatic adherence to one model or another will eventually handicap you as a programmer. Today I ran into yet another example of this in the normal course of maintaining playwright-perl.

For those of you unfamiliar, there are two major types of dependency resolution:

Dynamic Linking: Everything uses global deps unless explicitly overridden with something like LD_LIBRARY_PATH at runtime. This has security and robustness implications as it means the program author has no control over the libraries used to build their program. That said, it results in the smallest size of shipped programs and is the most efficient way to share memory among programs. Given both these resources were scarce in the past, this became the dominant paradigm over the years. As such it is still the norm when using systems programming languages (save for some OSes and utilities which prefer static linking).

Static Linking: essentially a fat-packing of all dependencies into your binary (or library). The author (actually whoever compiles it) has full control over the version of bundled dependencies. This is highly wasteful of both memory and disk space, but has the benefit of working on anything which can run its' binaries. The security concern here is baking in obsolete and vulnerable dependencies. Examples of this approach would be most vendor install mechanisms, such as npm, java jars and many others. This has become popular as of late due to it being simplest to deploy. Bootstrapping programs are almost always using this technique.

There is yet another way, and this is to "have our cake and eat it too" approach most well known in windows DLLs. You can have shared libraries which expose every version ever shipped such that applications can have the same kind of simple deploys as with static linking. The memory tradeoffs here are not terrible, as only the versions used are loaded into memory, but you pay the worst disk cost possible. Modern packaging systems get around this by simply keeping the needed versions online, to be downloaded upon demand.

Anyways, perl is a product of it's time and as such takes the Dynamic approach by default, which is to install copies of its' libraries system-wide. That said, it's kept up with the times so vendor installs are possible now with most modules which do not write their makefiles manually. Anyhow, this results in complications when you have multi-programming language projects, such as playwright-perl.

Playwright is a web automation toolkit written in node.js, and so we will have to do some level of validation to ensure our node kit is correct before the perl interface can work. This is further complicated by the fact that playwright also has other dependencies on both browsers and OS packages which must be installed.

Normally in a node project you could use webpack to fat-pack (statically link) to all your dependencies. That said, packing entire browser binaries is a bridge too far, so we either have to instruct the user properly or install everything for them. As is usual with my projects, I bit off a bit more than I can chew trying to be good to users, and made attempts to install the dependencies for them. Needless to say, I have ceased doing so. Looking back, this willingness to please has caused more of my bugs than any other. Yet again the unix philosophy wins; do one thing, do it well. This is also a big reason why dynamic linking won -- it makes a lot of things "not your problem". Resolving dependencies and installing them are two entirely separate categories of problem, and a program only has to solve one to run.

The long-term solution here is to wait until playwright can be shipped as an OS package, as many perl libraries are nowadays. It's interesting that playwright itself made an install-deps subcommand. I hope this means that is in the cards soon, as that's most of the heavy lifting for building OS packages.

Why do I still have to write auth code in CURRENT_YEAR? 🔗

1636477820

Fixing this for the web

Fixing this for non-http services

The spam and scraping problem is about to get even worse 🔗

1636133435

What's behind institutional resistance to remote work? 🔗

1636128408

The impact of open source licensing on your business 🔗

1636067872

The war on scraping is lost for the same reason as the war on piracy 🔗

1635962678

Can we fix it?

Time to get creative

Welcome to spamworld, where nobody reads 🔗

1635869893

Why can't we have nice things?

But wait! There's MORE

Power Distortions in the Firm 🔗

1635781264

Management by Reid Technique

Detect OOMs via cron 🔗

1635525018

Why are US firms having such a hard time hiring? 🔗

1635261979

Automated Filtering and JD inflation

Deceptive Offers, Liar Hires

Demographics

Expansion of Entrepreneurship

Missionary or Mercenary? 🔗

1634842858

PTR Records: waving a dead chicken 🔗

1634239190

2 Major paths to take advantage of CDNs 🔗

1634143208

Dealing with out of date slave zones after an IP migration 🔗

1633971893

What would be better?

MySQL utf8 migrations: big hairy queries 🔗

1633018799

What's new in playwright-perl v0.015 🔗

1631664613

tCMS September 2021 update 🔗

1631660780

tCMS September 2021 update 🔗

1631660718

Games people play: Ultimate Smackdown 🔗

1629390195

Resizing, Expanding and Using raw images 🔗

1629321079

Troglodyne year 1 in review 🔗

1629315907

The reality of banality: yet more lessons from a career in QA 🔗

1629266857

Games people play on P5P: Part Deux 🔗

1629136448

Getting Features shipped in the face of resistance 🔗

1628695736

Hand me the MOP! 🔗

1628189700

Dependencies: It depends 🔗

1627440954

Why do I still have to write auth code in CURRENT_YEAR? 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 1636477820

Fixing this for the web

Fixing this for non-http services

The spam and scraping problem is about to get even worse 🔗 v0 v1 1636133435

What's behind institutional resistance to remote work? 🔗 v0 v1 v2 v3 v4 1636128408

The impact of open source licensing on your business 🔗 v0 v1 v2 v3 v4 1636067872

The war on scraping is lost for the same reason as the war on piracy 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 1635962678

Can we fix it?

Time to get creative

Welcome to spamworld, where nobody reads 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 1635869893

Why can't we have nice things?

But wait! There's MORE

Power Distortions in the Firm 🔗 v0 v1 v2 v3 v4 v5 v6 v7 1635781264

Management by Reid Technique

Detect OOMs via cron 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 1635525018

Why are US firms having such a hard time hiring? 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 1635261979

Automated Filtering and JD inflation

Deceptive Offers, Liar Hires

Demographics

Expansion of Entrepreneurship

Missionary or Mercenary? 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19 v20 v21 v22 v23 v24 v25 v26 v27 v28 v29 v30 1634842858

PTR Records: waving a dead chicken 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 1634239190

2 Major paths to take advantage of CDNs 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 1634143208

Dealing with out of date slave zones after an IP migration 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 1633971893

What would be better?

MySQL utf8 migrations: big hairy queries 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19 v20 v21 v22 v23 v24 v25 v26 1633018799

What's new in playwright-perl v0.015 🔗 v0 v1 v2 1631664613

tCMS September 2021 update 🔗 v0 v1 1631660780

tCMS September 2021 update 🔗 v0 v1 1631660718

Games people play: Ultimate Smackdown 🔗 v0 v1 v2 1629390195

Resizing, Expanding and Using raw images 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19 v20 v21 v22 v23 v24 v25 v26 v27 v28 v29 v30 v31 v32 v33 v34 v35 v36 1629321079

Troglodyne year 1 in review 🔗 v0 v1 v2 v3 v4 v5 1629315907

The reality of banality: yet more lessons from a career in QA 🔗 v0 v1 v2 v3 v4 1629266857

Games people play on P5P: Part Deux 🔗 v0 v1 v2 v3 v4 1629136448

Getting Features shipped in the face of resistance 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 1628695736

Hand me the MOP! 🔗 v0 v1 v2 v3 v4 1628189700

Dependencies: It depends 🔗 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 1627440954

Why do I still have to write auth code in CURRENT_YEAR? 🔗

1636477820

The spam and scraping problem is about to get even worse 🔗

1636133435

What's behind institutional resistance to remote work? 🔗

1636128408

The impact of open source licensing on your business 🔗

1636067872

The war on scraping is lost for the same reason as the war on piracy 🔗

1635962678

Welcome to spamworld, where nobody reads 🔗

1635869893

Power Distortions in the Firm 🔗

1635781264

Detect OOMs via cron 🔗

1635525018

Why are US firms having such a hard time hiring? 🔗

1635261979

Missionary or Mercenary? 🔗

1634842858

PTR Records: waving a dead chicken 🔗

1634239190

2 Major paths to take advantage of CDNs 🔗

1634143208

Dealing with out of date slave zones after an IP migration 🔗

1633971893

MySQL utf8 migrations: big hairy queries 🔗

1633018799

What's new in playwright-perl v0.015 🔗

1631664613

tCMS September 2021 update 🔗

1631660780

tCMS September 2021 update 🔗

1631660718

Games people play: Ultimate Smackdown 🔗

1629390195

Resizing, Expanding and Using raw images 🔗

1629321079

Troglodyne year 1 in review 🔗

1629315907

The reality of banality: yet more lessons from a career in QA 🔗

1629266857

Games people play on P5P: Part Deux 🔗

1629136448

Getting Features shipped in the face of resistance 🔗

1628695736

Hand me the MOP! 🔗

1628189700

Dependencies: It depends 🔗

1627440954