🌐
Videos Blog About Series πŸ—ΊοΈ
❓
πŸ”‘

A very Troglodyne Christmas πŸ”—
1640147922  


Aside from being busy with work for clients, I haven't managed to do much writing this December due to finally digesting a few marketing insights I've been chewing on but not swallowing for the better part of a decade. Here at Troglodyne we may be thick headed, but at least we're not smrrt.

Anyways, all types of content need an emotional appeal to get anywhere. Not everyone's like me, and just wants to skip to the end and hear the answer. The people want to hear about what motivated you, as a bearded road apple, to finally shoot the damned computer out of a cannon!

Though I'm only exaggerating a little, it gets me back into a mood of prose I haven't dipped into much since I was much younger and feeling my oats. I don't think I wrote a serious essay even when I had to do so in order to make the grade. Instead, I'd viciously and cruelly make the jokes too esoteric to be detected (or at least proven guilty of cheek) by the faculty. Having grown up consuming a steady diet of MAD magazine, it's a miracle that I've managed to become such an accomplished bore.

I suppose it's a testament to how thoroughly corporate is capable of domesticating a programming community known for eccentricity. This should shock nobody, as it's the smartest dogs that are the easiest to train. That said, there are still plenty of us off the leash having a grand old time.

All my writing and on-video time has made me quite a good deal better at requiring less editing required to render the steaming heaps of drivel that you see before you. Unfortunately, I sound almost as bad as the corporate borg twerps I've been pillorying over the last year or so it's taken to de-brainwash myself away from that set of cults. It's finally time to come into my own voice, which is to say steal someone else's.

In that vein, I've generally seen a few patterns among successful tech content creators. For those interested in Quality and Testing, you generally need to embrace your mean streak. It's got that synthesis of wrath at the excrecable piles of junk we work on all day and the thrill of the hunt when you finally figure out a way to wreck that POS that feels...magnificent! This also bleeds over into programming, as the experience is always one of smashing your head into a brick wall until pure, beautiful victory is achieved just in time for the acceptance criteria to change. Really some of the best fun you can have with your pants on, I definitely recommend you try it. None of the content creators of this stripe are ever sarcastic.

Then we have my favorite kind of technical creator -- we're talking Scotty Kilmer types. Just talk about whatever the hell you feel like today, because it's always gonna be the same old bullshit...but make sure your marketing is as spectacular as humanly possible. Whether it has anything to do whatsoever with the actual content is irrelevant. Don't care, got laid. It's the hacker ethos to a T... and by T, I mean Tren! Hell, it's probably a fun game to see how misleading you can have your sizzle be and still get hits. Excuse me while I eat this whole fried catfish.

For those of you who skipped to the conclusion (like me), let me waste a bit more of your time as a special bonus gift. We've got some exciting things coming for you all in 2022! Whether or not they're actually exciting I sure am excited about them. So excited I've gotta be vague about it. Definitely not because I haven't thought of anything yet.

That reminds me, I still need to go get presents. Merry Christmas everyone!


Why am I still using a search engine in $CURRENT_YEAR? πŸ”—
1638625845  


There has been much controversy in recent times over censorship of search engines and social media. According to those engaging in this, it's done with good intentions. Whether this is true or not is missing the point. Why are we relying on a centralized search engine at all that can censor, when we've had decentralized search for a half-century?

DNS can be seen as little more than an indexing service. There is no fundamental technical reason why the exact same approach can't be taken for resources at particular domains. Every site could publish their sitemaps and tags quite easily, and many do right now. They simply upload them to search engines rather than having them be well-known records considered by peers.

A DNS model would in fact simplify search indexing a good deal, as you can drop the crawling code entirely and simply wait until at least one person accesses a resource to index it. This would put the burden of crawling/advertising their available pages on site operators themselves, pushing costs down the stack, as is appropriate in a decentralized system.

Much of the reason DNS is tolerated as a decentralized system rather than centralized is that it takes so little resources relative to the rest of the web stack. People love the idea of federation, but hate paying for it. The primary question is whether incentives align for the current parties running DNS to also index and cache content hierarchies.

The answer is obviously no, or they would be doing this right now. This is once again due to the primary browser vendor (google) having no interest in supporting such a thing, as it would actively undercut their business model. If a browser did support such a system, many content creators and web hosters would absolutely love to adopt a system with clear rules under their control rather than the morass of inconsistency that is the centralized engine's rulesets. Similarly, the ISPs and Web Hosts would happily hop on board to the idea of offering yet another service they can charge for.

Therefore the question is can the existing business model of advertising that subtly corrupts search results translate to a decentralized system? Of course it can. The trouble is that it'd be the ISPs and web hosts in the position to extract this profit. This is in fact the ray of hope in this situation, as both Google and it's competitors in the (virtualized) hosting biz could get a large piece of this pie.

So, if you wanted to know what a future with this would look like it'd be that Microsoft or Amazon forks Chrome. This has already happened in Microsoft Edge. From here it's but a matter of modifying and open-sourcing their existing indexer, and making their fork support it's use. Introducing a system of decentralized search would both hurt their competitor, and be another reason to use Azure versus GCP and Amazon. They'd likely adapt Bing to leverage this as well, to extend the benefit to all browsers.

That said, Amazon or any large host could execute on this. Much of the tech that Cloudflare uses to cache content could likely be re-purposed towards these ends as well. There's a lot of money to be made in disrupting the status quo. Whether this translates into concrete action is anyone's guess.


caldav is a solution looking for a problem πŸ”—
1637210035  


Many hours have been wasted on calendaring servers, and they still don't solve the problems people who use calendars want solved. This is because the problem is approached from the wrong direction. People think from the client to the server, as it's clients originating ics files which are then schlepped around via email. Servers allowed people to do things like free-busy for attendees and conference rooms, but required email clients to support things like itip. I'll give you one guess how that went.

This model instantaneously breaks down when you go cross-organizational. The widespread incompatibility between mailservers and no standardized way to federate directory discoverability makes this impossible. As such, the meta collapses back to schlepping around ics files. It should shock nobody that embracing this fact and giving up on free/busy and presence has been the solution that dominates. Microsoft has implemented on this approach better than anyone decades ago.

Actually solving the presence problem requires that you get federation right. Guess who's doing that? The chat apps. Both Slack and Teams have this figured out. Doing this as a plugin to matrix or snikket would actually be quite straightforward. As such my recommendation is that shared hosting software stop distributing calendaring software. They should instead distribute chat servers and good chatbots that can do things like meeting reminders.

You could even federate across incompatible chat apps and protocols via bots that know how to talk to each other. I know it would work, because it worked before computers. That's how secretaries coordinated all of it -- picking up a phone. Implementing a UI for people to use would be as simple as wrapping your secretary bot that knows how to book people and rooms.


Doing Hiring Right πŸ”—
1636831995  


Most people make Interviewing programming candidates way too much work. I've setup (and taken) more HackerRank style puzzles than a person ought to have in a lifetime. One of the problems with this tool is that it's basically handing dynamite to children. Programmers are in general an over-specific lot that delight in secrets. This results in unnecessary grief and makes you miss out on otherwise good candidates.

The purpose of Job Descriptions, Programming Tests and Phone Screens are all the same. The trouble is that most people neither understand or acknowledge this on either side of the table. They're all just spam filters. The simplest path to success for candidates is to simply spam cold approaches and leverage social proof for everything it's worth.

Rather than allow negative feelings to build up and covert contracts to form, you should simply be up-front about the reality. An "Early Frame Announcement" of why you are doing the process the way you do, and what you expect out of it helps a lot. Managing Expectations will never stop being important in your relationships with those you work with, so you need to do this from the very beginning.

Sometimes you can be too explicit with your expectations. As with anything else measured, people bend over backwards to conform to them. This can be good, when what you measure actually matters. Unfortunately, very few things which get measured in this way actually do.

Employers allow themselves to get bullshitted into hiring people very good at telling them exactly what they want to hear. They then get blindsided when they inevitably find out the candidate, like everyone else, is in fact far from perfect. That said, people do manage to "fake it till they make it" more often than not so this isn't something to get too worried about. As long as they keep doing things you want, who cares if they were less than fully truthful during the interview process? You as the interviewer can't and won't be fully disclosing the facts of the situation on the ground either. What you actually want is a system that exposes the faults and strengths of candidates as quickly as possible.

First, you need to understand that nobody believes the posted requirements in public openings (nor should they). Accept that you will just get spammed. Then you need to tightly focus on your "Must Haves" rather than a laundry list of wants. If there are more than 4 technical skills you need for a given position, the solution you want someone to implement is probably ill-designed. You can't prove an optimal solution exists for anything with more than 3 variables and can't guarantee a closed form solution exists for systems with 5 variables, after all.

If you still have too many candidates to pick from (you will), you should then treat your "want" keyword list much like you would a digital ad campaign. Try and find people with at least a pair of them and that usually winnows you down to a list worth talking to. Don't get too excited here -- you won't be shocked to find maybe 10% of even these can get past a phone screen.

The phone screen's only purpose is to determine whether or not you are being catfished, and telling people this up-front tends to result in better outcomes. Most of the technical questions here should be softballs which can be googled as they talk to you. All you want to see is that they know and care enough at this point to actually get you the right answer. This is all to lull them into a false sense of security until you slip them the hilariously obvious false question. If they don't call you out on making an obviously false statement and try to bullshit you, just hang up.

Lots of people do tests and programming puzzles in lieu of the phone screen now. This is actually a bad idea. Online tests should only be employed in lieu of phone screens when you have too many candidates. Even then, they should be done similar to what the phone screen would have been.

I personally prefer to save tests as prelim for the in person interview. I like making the in-person basically a code review of the test, as this gets you into mind-meld with the candidate quite quickly. This also more closely mimics how they will actually be working, which is what you really want to know. Making this clear to candidates up-front tends to get the best results (which is what you actually want from candidates).

Nevertheless the online code test should consist of one straightforward question, another less so. The ultimate goal is that they should take more time than allotted for most people. This can be established by administering the test to a sample of your existing staff. Be up-front that this is the point, lest they get upset and corrupt the results with cheating and other such chicanery. You should end up seeing an 80% solution from the candidates at the very least.

From here the question remains what to do with the candidates you are on the fence about. Sometimes people just get a case of nerves, or aren't quite experienced enough yet but still can do the work. It's here that you need to understand that all deals can happen at the right price. Making it clear that you're willing to take a risk on a candidate during a probationary period with introductory rates can work out quite well. It's even better when you have a candidate offer this themselves, go-getters like that almost always turn out well.

At this point you should have a candidate and a price. Now you need to take them to lunch and fish for their victim puke. The last thing you need is a whipped dog ready to snap. This is unfortunately common thanks to widespread pathological behavior running rampant in corporate and public schools.

From there the problem is making sure they fit on a team. This means you have to figure out how to make the others invest in the candidate's success and vice versa. Too often things quickly turn adversarial, resulting in bad outcomes that were totally avoidable. That is a pretty in-depth topic which is worthy of it's own post.


Why am I still passing around phone numbers in CURRENT_YEAR? πŸ”—
1636669564  


SIP and trunking to the POTS have been around for more than a decade now. The million dollar question nobody seems to be able to answer is why all our mobiles still use a number instead of addressing via DNS (e.g. user@domain). The carriers wouldn't be cut out of the loop by this, as they can remain a dumb, albeit wireless, pipe. Indeed, some carriers actually implement their systems as glorified SIP trunks on the backend.

Address book software is also plenty capable of tracking email-style addresses. Android has natively supported SIP calling such addresses for more than a decade. This means that for everyone I know using android it's as simple as setting up openPBX on my server. I could have full control and encrypted video calling tomorrow. The trouble is that everyone's favorite status symbol, the iPhone doesn't support this. Which means I couldn't communicate with half my family and many of my clients, as they're not about to install squat to please me.

It appears the reason for this lack of support is the usual modus operandi from Apple. That is to say, they have their own proprietary standard they'd prefer everyone use instead (but that is shoddy by comparison to standard software). Further complicating matters is that new and popular video conferencing firms like Zoom have also introduced yet more shoddy and incompatible software.

While Zoom can bridge to SIP clients, it costs extra, and they already trunk to the POTS at no cost, further entrenching the phone number. Skype has had a similar model for many years. FaceTime users can provide links to allow non-apple clients to call them, but not the other way around. That said, the fact that there is now an HTTP means of doing FaceTime means reverse-engineering the protocol and building a SIP bridge is but a matter of time. When PBXes are capable of appearing to be apple devices with FaceTime things will finally be "good enough" to ditch the number.

Much of the reason for the success of these non-open packages is because the cost structure is largely hidden from users. The FaceTime ecosystem is "free" past the initial phone purchase, and only the host of zoom calls generally pays for the service. By comparison, users of open software and standards bear recurring costs (and they're already paying a phone bill). Like with the telcos themselves, very few people are willing to pay for a SIP account if it's not bundled with hosting, mail, DAV and everything else.

Competing the telcos down from being vertically integrated multi-service providers to mere ISPs is the real mountain to climb here. The first major shared host to execute on this will be able to tap billions in additional MRR. When and if that day comes, I'd likely ditch the cellphone entirely in favor of superior clients on real computers.


Why am I still sending unencrypted email in CURRENT_YEAR? πŸ”—
1636484555  


Regardless of whether you use OpenPGP or S/MIME certificates, the core problem of distribution of public keys was never really solved to anyone's satisfaction. S/MIME essentially never even addressed the problem beyond assuming you'd link them somewhere on a website and that people would go out of their way to communicate securely. I'll give you one guess how that turned out.

OpenPGP by contrast built a key server called SKS. The trouble is that it was flaming garbage and abandonware to boot. Thankfully hagrid fixed that problem. The trouble is that this model relies on users to upload their keys to the server, rather than things "just happening" automatically as in the case of things like LetsEncrypt on shared hosting. Again, I'll give you one guess how that turned out.

So the latest solution is a thing called WKD. It's a practical solution to essentially adopt the model used by LetsEncrypt to do DCV. Shared Hosts now have no real reason not to auto-generate OpenPGP keys for every email, as the impact of compromises are quite limited. A short renewal timeframe should be applied for the same reasons they are with LetsEncrypt's certificates.

The primary drawback is the same one as with CAs, which is to say they have the private key used to generate things. In short, it's a problem of trust. That said, we seem to put up with this issue in the web at large, and encryption by default would be better than the status quo of sending everything over the wire unencrypted. It would be straightforward for hosting management software to support users uploading their own keys to satisfy those with cause for concern, unlike with SSL certs.

The only remaining hurdle is that clients by and large do not consult WKD whatsoever. Some things like enigmail do support it, but anything short of this being the default setting on the most popular option won't matter. Like with authentication code the primary issue remains that the biggest vendors (exchange and gmail) would have to lead, follow or get out of the way. Frustratingly the default stance there remains to simply obstruct. This is baffling, as there are only upsides to them embracing this. Holding private keys, the management of firms can still be snoops if they feel like it (despite this actually being a bad idea). Their competitors would not and as such no longer have the option of MITMing their email to conduct corporate espionage.

At this point it simply appears to be a matter of inertia. Which makes sense, as email is not exactly the big moneymaker these days. Hosted chat, DevOps and ERP software is where all the energy is now.

Nevertheless, this is actually a place that shared hosting can take a leadership role to improve the world like they did with LetsEncrypt. Integrating automatic key generation and sharing into webmail via a plugin is possible today. That coupled with a marketing blitz might just be enough to finally fix this problem.

Postscript

There's a bigger problem here than just key distribution. Namely, how to filter spam in an encrypted world. This would require a much more browser-like world where it's the server doing the encrypting and de-crypting, so that it can read and filter before delivery. While not ideal, it's still better than the status quo where MITMing your mail is trivial. You have to trust your server operator not to compromise your mail, but let's be real here. They can straight up modify items in your mbox right now without your knowledge, so this is not a serious concern beyond "just host your own mailserver".

You can actually do this right now by rejecting anything that is relayed without SSL. But this is less than ideal because:

  1. Servers can lie and not set the header for this
  2. Many legit emails will start disappearing because "oops I went thru an unencrypted relay", as there are tons of such legitimate arrangements still extant.

To be fair, you can still PGP encrypt behind that if you wanna be super paranoid. As such I think we need both WKD and strong server-side encryption of mailservers.


Why do I still have to write auth code in CURRENT_YEAR? πŸ”—
1636477820  


Authentication and Authorization of users is one of those things I don't like storing data for. In short it's bad for the same reason storing credit card numbers (or any PII) is. Not only is it not my app's core job, it's a disaster waiting to happen in a breach. User provisioning is also one of those things I hate having to do when everyone on the web already has an email identity.

So the question remains: why is it that your only real options for OIDC just happen to be the big boys? One of the major goals for my CMS is to disintermediate from the big aggregators, so this is a problem I'd love to have solved. Why can't my shared hosting account be my OIDC provider? Furthermore, why do I need 9 different logins for services on the same server? It turns out there are some technical and organizational hurdles that have to be overcome.

Fixing this for the web

The primary difficulty is that there is no standard for how you ought to advertise the available JSON Web Tokens to pages. It's standard practice to use Double Submit cookies and synchronizers in DOM components to prevent XSS/CSRF, rather than stuffing JWTs into LocalStorage. This is why every OIDC process does the two-step over to another domain, as the cookies are SameSite and can't be used anywhere else. The practical consequence of the status quo is that every single oauth implementation is a hardcode rather than iterating over your active and available logins.

I suspect the big boys are content with this status quo and would resist any solution that didn't allow them to stay at the top of the list of available providers. Existing objections have to do with the user privacy (should I advertise my active CrazyFetish.yikes login?) provided by the status quo. To keep such control in the hands of the user, any such feature should be opt-in per service. To make that happen, browser integrations would have to happen.

Implementing such a feature should be straightforward. JWT issuers would use a standard name for their token so that the browser could know one exists for said domain. The browser would then have an array as part of the window object in javascript advertising which domains have active JWTs the user consented to share.

Therein lies the rub. Incentives don't align here, as the big boys are also the browser vendors. Such a regime that preserves privacy would necessarily not preserve their ability to get free advertising on every login screen worldwide.

This could be made less objectionable by throwing some bones to both the tech firms and developers. Both would like to display a branding image & copy for these auth providers alongside the login prompt. Thankfully, this could easily be stuffed into LocalStorage with a standard name without risk.

It would also not be the first time that browsers have had a pre-baked whitelist of services for self-serving reasons. To be entirely frank, I'd prefer that they do such a thing in this case rather than force every app developer on earth to write extra code for no good reason. I'd rather be happy than right and especially in this case. I have a feeling you would too.

Fixing this for non-http services

The first problem is that they went about discoverability wrong. They use the .well_known mechanism in vhosts. The more standard way would be something like a TXT or SRV record on an appropriate subdomain. LetsEncrypt found out the hard way that vhosts are a lot easier to compromise than A records. It also doesn't hurt that DNS is basically the LCD of web features. Almost everything has to know how to talk DNS, not everything talks HTTP.

From there it's essentially a matter of integrating SASL into applications, as it already supports Oauth tokens. Given most of the backbone services out there aren't hostages of large tech firms like browsers, the server end shouldn't be hard to gain momentum. The trouble will be with OS vendors, as they'll have the same problem as browser vendors. This pattern repetition suggests that the real problem is that we actually treat identity and authentication as a tack-on component rather than as a first class service.

This suggests that gaining the requisite momentum to solve this problem can't really be achieved externally. The OS and Browser vendors are going to have to want to solve this problem. Until then, developers will have to write auth code they don't have to, and put up with services written by people who don't understand security concerns. Which is the biggest selling point to me -- The OS and browser vendors could do humanity a favor and extinguish an entire category of bugs forever with this.


The spam and scraping problem is about to get even worse πŸ”—
1636133435  


A recent post has made it clear where the future of industrial scale spamming and scraping is increasingly headed. In short, exploitation of the nature of cellular phone networks is about to unleash the kraken in ways that existing countermeasures will all prove useless against.

Mobile phone networks are all essentially large proxies for fleets of NATed devices. As such, getting a new NAT IP is as simple as turning on and off the radio, and if you are moving around, likely a new proxy IP as well. That said you don't even need to move around, as blocking any IP from mobile phone providers is basically a nonstarter. Nobody wants to block the large number of legit clients coming from their IPs, after all.

One might think that eventually the telcos will wise up and start banning. The trouble is that the mobile nature of these systems and wide assortment of carriers to spread requests over make it entirely viable to be both mobile and decentralized enough to have no effective means of enforcement. To a large extent, I suspect we are already in the early stages of this problem given there are firms making hardware facilitating this.

Is the only way to resolve this problem would be to forgo NAT altogether and embrace IPv6? Not really, as IPs are so plentiful there as to make bans useless once more. Even AI and Bayesian filtering has proven to be insufficient at improving fingerprinting enough to matter. The only practical option left is QoS measures to ensure you at least don't get DoSed by this stuff. That, and actually improving the performance of your mailserver and web properties such that they can bear the load.


What's behind institutional resistance to remote work? πŸ”—
1636128408  


Ever since lockdown policies began 2 years ago, most of the white-collar workforce started working from home full-time. About six months ago management began to get anxious to get the workers back into the office. Looking into the data, this frankly appears irrational. The reduction in productivity is so small as to be easily outweighed by office costs.

Similarly, firms and employees remain irrationally attached to W4 employment. In this environment the home office environment very much lends itself to a favorable tax & regulatory situation via 1099 resulting in higher take-home pay for the worker and less administrative expense for the employer. Why is this organized irrationality the case?

I think the most persuasive case against the return to the office is laid out here by a blue-collar worker. This is also the case observed by white-collar WFH offices by and large. All the mandatory policy meetings my friends used to have to put up with mysteriously stopped over the last two years. Similarly, tedious manager interactions have slowed to a trickle, begging the question as to why any of these people are being paid.

Anybody paying attention to the ratio of administrative staff to workers has noticed troubling trends over the last 50 years. These numbers should have gone down in this age of automation rather than increased as it has. Nobody can make a serious case that we need more administrators per worker than we did in say, the 1970s. Many of the compliance & regulatory fears that have motivated this rise evaporate without an office. This cannot help but produce some degree of existential fear in middle management. As such their advocacy for a return to a useless and costly office makes sense.

Similarly, the reality of employees maintaining their own office space will have to be recognized at some point. For many years, courts have converted 1099 workers into W4 employees due to the application of the duck rule to working conditions. Such legal issues have ruined many an in-sourcing firm. The application of various mandates, regulations and tax policies on employees bearing the costs of maintaining their own office will eventually bring this to a head. Most remote workers both look and act like independent contractors, and would benefit from this being official.


The impact of open source licensing on your business πŸ”—
1636067872  


Recently, OVID had some remarks about using GPL3 code in your projects. The most relevant bit is this:

Do you have any code that cannot be open sourced but uses code with a "permissive" license that in turn uses code with a GPL license?

Congratulations! You now have a court case on your hands if anyone finds out.
The backstory here is that one of his friends has some issues he can't fix without hiring a lawyer.

Normally, this is not too problematic (even when the upstream is hostile) when you use packaged software and libraries, as distributing patches is fairly straightforward. However sometimes there are extenuating circumstances (such as clouded title) as happens thanks to contentious forks. The other circumstance is the viral nature of some licenses such as GPLv3 and Affero. There are even more extremely ideological licenses out there, but few which are of any practical consequence.

Both the normal and Affero GPL have the practical consequence of you needing to either license proprietary data or sell services rather than sell software. That is, unless you adopt a gratuity model which has proven less than viable in the overwhelming majority of circumstances. Even the idea of selling services is difficult to secure against competition, as the recent war between elasticsearch and Amazon proves. It's quite a bitter pill to swallow to be undercut by a competitor using the fruits of your own ongoing labors.

It's not an easy choice to make. Choosing to forgo software with viral licenses means more time-to-market, which is not always available. Similarly, your business model may be to help people with their data, not sell access to yours. Ultimately the only thing you can really rely upon in the long term are your own individual wits and physical capital, licenses and laws notwithstanding. This is why most tech firms (if they survive long enough) end up becoming glorified consulting firms like IBM.


The war on scraping is lost for the same reason as the war on piracy πŸ”—
1635962678  


A great deal of effort is expended upon anti-scraping measures in webpages. There are a number of reasons for this:

  • prohibitive bandwidth costs involved in allowing bulk downloads
  • wanting tight control over how users view the data in order to influence their conclusions
  • competitors stealing content and reproducing their service in a jurisdiction in which they have no legal recourse.
The first concern is best addressed by rate-limiting mechanisms or metering fees. The second concern has become quite a heated topic for social media of late. This is not a problem for most services, as it's not good business to second guess paying customers. On the other hand, if the business is advertising (as it is with social media) influence is precisely the point.

For most businesses the concern will be the last one. I've learned over my career programming that data oriented design not only results in faster code, but less code. It's entirely possible to build a successful business with entirely open source code but proprietary data using this model. That said, it makes one uniquely vulnerable to such data theft.

Can we fix it?

Enter anti-scraping technology. For a good overview of the current landscape, see here. You may have noticed the core problem is "fingerprinting", which is essentially the same one to solve with software licensing. This is because it's the same exact problem as software piracy, programs are just data that transform other data.

Those of you familiar with implementing software licensing schemes like I have are well aware that basically everything but phone-homes coupled with fingerprinting are not worth pursuing. Even then, there is no real way to prevent people from nopping your checks out. Generally you see mechanisms to ensure that a crack for one version does not work on the next. This has resulted in a status quo of customers submitting to this stick with the carrot of forward-going code updates.

Which is to say a stalemate in the immediate term, but total surrender in the long term. The only reliable way to prevent this is to never allow clients to interpret your code. Even then side-channel attacks are possible to reverse-engineer it.

This model breaks down for targeted and simple programs, as after some point there's nothing to update. I suspect this is much of the reason that observation of Zawinski's law is so prevalent in the software industry. There is however no such concern with data, as you can always add more. The video game industry in particular has embraced this with zeal. Expansion content not only drives much of sales, it also works quite well to keep their content artisans fully employed when they might otherwise have downtime.

You may have noticed that the ultimate remedy available to software is not exactly feasible for data. Data cannot be fully obscured from the client in nearly all use cases. Anti-scraping measures (as you can see from the overview) have also failed almost comprehensively. This has had far-reaching effects on a number of industries.

Tech Blogging has been totally smothered by plagiarists who know how to do SEO. The only real reason to do one nowadays is as a big "hire me" billboard. My father was an inventor with a number of patents, and he discovered (the hard way) that they were also useless besides as an inducement for employment. Almost every social media platform which started out with good APIs have now comprehensively crippled or dropped them altogether and an industry of scraping based tools have popped up to satisfy this need. Plaid became a multibillion dollar company by doing scraping of bank websites using bank clients own logins.

Like with software licensing it begs the question of why any of this effort is expended at all, given it's ultimately Canute screaming at the tides. This comes down to legal reasons. The courts generally say that you "had it coming" if you left a gold bar in the middle of the street and it gets stolen. So it is with software and data. If you don't at least make a token effort at anti-circumvention you have no recourse. Of course, this is not consistently applied to all firms and jurisdictions but such is the law. If we wanted consistent outcomes, we'd replace black robes and powdered wigs with programs. Even then this has no bearing internationally, as most firms ability to have recourse there is nil.

Time to get creative

The good news is that it turns out that any effort beyond token prevention in fact hinders your ability to stop piracy. Pirates are inherently lazy, and you can exploit this to get a handle on the problem. For example, I once worked with an IP based licensing scheme that also gathered OS fingerprints passively, but did not do enforcement based on the latter. This allowed some people to feel they were quite clever running a number of instances behind NAT. Periodically they get rounded up (random reinforcement works best for operant conditioning) and told they're gonna get a lifetime ban unless they buy the right number of licenses and sign an NDA about the incident. This was but one example of many where over the years where laying traps for pirates paid off quite handsomely.

Just like with my previous article about the victory of spam, the proper mindset is not to fight but "make the trend your friend". The motivations for piracy and spamming are both deeply ingrained in human nature. The most powerful people and organizations in the world have fought that war against our baser natures for millennia and are still no closer to victory then when they set out. This time will not be different.


Welcome to spamworld, where nobody reads πŸ”—
1635869893  


Recently, a resume went viral for getting good responses despite being filled with obvious BS such as rickrolls thanks to being SEO'd out the wazoo. A less obvious variant of this trick for more serious people has been to include a paragraph of text with white font color (so that it is not visible unless selected, or ever when printed) filled with these SEO keywords. While these tricks can open some doors, they still aren't enough because people still don't read what gets past the filters. Some of the reason for this is plain laziness, but the truth is that what gets past the filters are still too much to read.

This can get especially frustrating for programmers looking for contracts that have a large corpus of public work (such as a blog, or OSS contribs). Prospective employers invariably ask you to take yet another test whether or not you have clear and demonstrated ability to solve their business problems. At the end of the day, exploiting social proof is still what's needed to get hired. Whether you leverage a personal connection, build fame or use "jedi mind tricks" to quickly build emotional investment in an interview it always has to be done. The skills you actually need to do the job basically don't matter at all; they're just one more filter.

Why can't we have nice things?

The core issue which is unremarked upon here is that the war on spam is over, and the spammers have decisively won. The only set of spam filters which actually can catch 100% of spam also catches 100% of non-whitelisted ham. The most recent weapon in this war is greytrapping where you blacklist anyone sending to addresses not at the server, as it's evidence of scanning.

I realized that this approach could also be applied various other places to improve web hosting in general, as scanning happens all the time. My /var/log/messages is usually filled with queries for domains that are not, and never have been on the box. You could similarly ban HTTP requests against IPs specifying incorrect HOST headers.

There are a lot of areas where the other techniques applied to email would actually help. Greylisting phone calls in particular would essentially extinguish the epidemic of scam calls immediately. Especially if you combined it with a mandatory up-front first time leave-a-message running a bot-or-not analysis. That said, it appears there is 0 motivation to change in the telephony space. After all, most major smartphones have supported sending and recieving encrypted SIP calls identified by email addresses for years, yet we still trade the equivalent of IP addresses and pay for this!

This still doesn't fix the problem though. Given the only foolproof solution is whitelisting, it surprises me that no major mail package or hosting control panel automatically adds anyone you directly mail to the whitelist. Most don't even auto-whitelist your addressbook!

But wait! There's MORE

There is an even more insidious problem introduced by the net. While there is an endless tide of spam there is also more ham than anyone could ever possibly eat. This is the current state of scientific publishing despite the replication crisis. What happens when the possible routes of investigation are more than you could ever possibly investigate?

While it is possible that multiple routes lead to your destination, it's likely only one of them is optimal. As programmers know well, this is close to an undecidable problem short of exhaustive search. This flood of "not wrong, but not useful" content which increasingly hinders my search for solutions (again, thank you SEO blogs) has grown increasingly concerning.

I've begun to wonder if this will be the mechanism by which the spread of knowledge regresses to the pre-internet mean. I certainly don't relish the days of having to drive to and then search library stacks to get answers. I don't think it'll be as bad as it used to be, but this has major ramifications for AI researchers. If we can barely get through this tide of junk, I suppose it comes as no surprise that "expert systems" turn out to be closer to "mediocrities copying and pasting from stackoverflow".

This is good news for content creators at least. It means that posts like this one where I lead off with some "in the news" thing can easily be evergreened in the future. This is because everyone's social media feed is eternal september of the guy who just started paying attention. As PT Barnum said, there's a sucker born every minute!


Power Distortions in the Firm πŸ”—
1635781264  

🏷️ management

The most corrosive element in any relationship is power, especially when the wielder does not understand the way it subtly warps their interactions with others. Middle management in firms are quite unaware of this, as in the rest of their lives they are powerless peasants like the rest of us. Doing the sort of context switch to make this work does not come naturally, and the means by which we select managers does not select for the self-reflective. Occasionally they develop the necessary faculties, but this necessarily means their advance in rank will cease and much of the good they do will be plowed under by their peers.

This is why much of modern automation in firms is giving dynamite to children. Once managers saw how much things like issue trackers helped teams internally they could not resist using it as yet another lever to micro-control the process. The strength of Auftragstategik is in practice paid no more than lip service.

Having fallen victim to the siren song of automated measurement, they forget that now they have the same problem as search engines. Unscrupulous employees are now be able to SEO their way into the top ranks of performance with very little effort. Much of this is why the urge in firms to pick low-hanging fruit to get up numbers is so widespread. It is also yet another shackle on themselves, management begins to use the same hammer amongst themselves. This further distracts them from their true purpose of resolving systemic barriers to progress.

Management by Reid Technique

I can't think of a better way to induce anxiety and destroy productivity in the workforce than regularly scheduled police interrogations. Which is essentially the primary way in which employees and management interact now, commonly known as the "one on one". Well-meaning managers put out pieces like this on how they can be positive interactions.

The summary is that management generally wants to hear "all is well" so they may return to inaction, as this is easy. Basically anything else is seen as emotional whining they need to pacify at best. At worst the manager goes full on cop mode and fires people over throwing a tantrum. This in particular is quite perceptive:

A Disaster is the end result of poor management. Your employee believes totally losing their shit is a productive strategy – they believe it’s the only option left to making anything change.
It is true that many do not resort to communication of facts until incredibly frustrated their subcommunications have been comprehensively ignored. This is a rational response to the actual goal of the meeting, what managers want to hear is ketman so that's what people give them.

A manager which understands the distorting nature of the power they wield would not engage in such tactics. Like torture, one-on-ones can't possibly achieve anything you actually want. All you will hear is what you want to hear, or emotional outbursts which can and should be disregarded.

The only real way to learn the truth is to observe from a dis-empowered position, like Henry V going into camp incognito. It's either that or have spies. This is much of why QA is defined as "providing information to decision makers". The reports from your QA department is what should be finding the problems in the production process you need to resolve.

As to the people problems, an "open door" policy should suffice. If people won't tell you these things until they explode anyways, this at least saves time. This is not the policy by and large, as management is in love with the idea of prevention. While this is indeed the right strategy in the production process, it is dangerously wrong for personal development. Never allowing people to make interpersonal mistakes is to deprive them of essential learning opportunities. Can one truly be said to have repented under the lash? Or be said to be good without having experienced evil and rejected it?

The only way to avoid these distortions is systemic reform of the organization. Scaling organizations without diluting ownership (as in a partnership) inevitably results in the single-elimination ass-kissing tournament. As such we cannot expect anything but self-service (much less reform) from management at large. The attendant mendacity is a cost of doing business in large firms.

Even in a firm without these problems power can still prove corrosive. That said the incentives are at least not aligned against doing the correct thing.


Detect OOMs via cron πŸ”—
1635525018  

🏷️ scripts

I recently had a problem with a hypervisor where the dom0 was underprovisioned for RAM, and it OOMKiller'd the hypervisor processes for VMs during scheduled rsync backups. After nice-ing the processes appropriately to prevent this I decided to implement a simple cron to email me whenever oomkiller fires. This obviously isn't foolproof, as something could be killed preventing outgoing mails. It is however unlikely, so this will likely be a good enough solution going forward.

oomdetect.sh
#!/bin/bash

touch /root/ooms.log
FSZ=$(stat --printf "%s" /root/ooms.log)
grep -i oom-killer /var/log/messages >> /root/ooms.log
echo $(uniq < /root/ooms.log) > /root/ooms.log
NEWSZ=$(stat --printf "%s" /root/ooms.log)

if [ $FSZ != $NEWSZ ]
then
	echo "New OOM detected, investigate /var/log/messages:"
	tail -n1 /root/ooms.log
fi

Pretty simple altogether, just make sure to run it once before you install it to root's crontab. Don't forget that you can send crons to multiple recipients by making the relevant updates to /etc/aliases for that cron's user and running newaliases.


Why are US firms having such a hard time hiring? πŸ”—
1635261979  


The mainstream narrative throughout the last couple years has been that additional aid to the unemployed was encouraging them to be layabouts. Now that this aid has ceased, people are looking around and realizing the problem lies somewhere else. It turns out that the reality is a combination of both being on the cusp of a demographic cliff and workers fed up with common behaviors of employers in a glutted market.

One such behavior has direct relevance to hiring. Thanks to said oversupply of labor for years, a similar situation to that experienced by women on dating apps has developed. Which is to say a spam crisis. This has resulted in incredibly aggressive filtering measures. Most commonly these are automated filtering, inflated JDs (job descriptions) and deceptive offers. For a while this was working, but lately the successful match rate (Belveridge curve) has nosedived. I suspect I know why.

Automated Filtering and JD inflation

JD inflation started out the same way most product requirements go. Too many cooks adding too many ingredients. What started out as a light scout vehicle becomes the Bradley fighting vehicle.

Eventually programmers got a hold of these documents and realized they could apply search techniques to try and improve match rates. This provided great results for both employers and workers for the first decade of the 21st century. However, like the web it became filled with spam and SEO'd content and the signal-to-noise ratio plummeted.

The practical consequences of this are twofold:

The latter is particularly concerning as recruiters are quite expensive per hire. This means that the only cost-effective way small firms can acquire talent is via personal connections. The former is also corrosive, as many on both sides have come to accept that you have to cynically game this system to get good results.

Deceptive Offers, Liar Hires

Recently a story went viral about the abysmal rate of responses a qualified applicant got for entry level jobs, and how the offer was always lower than advertised on those that did respond. It should shock nobody that marginal firms engage in catfishing to game this system and get better hires than they could normally. So long as advantage may be acquired through dishonest practices people will try it absent any meaningful sanction. All's fair in love and war and hiring.

Similarly many prospective hires get good results using jedi mind tricks pioneered by salesmen and pickup artists to rapidly build emotional investment in their counterparts. This causes a great deal of resentment in those who prefer devoting their limited time to professional excellence when they see this results in the (relatively) unskilled getting ahead and them being left behind.

Much of what we are seeing with our worker shortage is these resentments finally boiling over. Professionals either decide to "play the game" or take their ball and go home. This results in turnover at worst and checked-out disloyal workers at best.

Firms have tried to prevent this by fostering a cult atmosphere via paternalist measures and propaganda. This is both expensive and ineffective outside of the short-term. Competitors have all adopted similar benefits and anyone paying attention is wise to the BS now. Joshua Fluke has made a career on youtube pillorying this nonsense. The only option left to employers is actually raising wages.

That said the astonishing levels of average household debt continues to weaken workers' position. They ultimately don't have enough savings to give an outright no. The American worker's only choice is who to say yes to. Woe unto those working in highly consolidated industries, where competition is less meaningful of a bargaining chip.

Demographics

What is making this even worse are demographic trends. The boomers are taking these frustrations they've been dealing with for years and being at or close to retirement age as sufficient reason to throw in the towel. Similarly many spouses are discovering they prefer being at home in a supporting role after having experienced it thanks to pandemic related layoffs. Anxiety over family formation (the lack thereof) likely factors into this decision to an extent.

Like the mainstream narrative as to why the shortage was incorrect, the advocated solution to the demographic problem is also incorrect. While increased immigration will provide a fresh supply of those ignorant of the reasons Americans are reticent in dealing with US firms, this is not a sustainable solution. The internet has massively increased the speed with which immigrants are stripped of their illusions regarding the "American Dream". Many immigrants deeply resent the sword of Damocles that green cards represent and they inevitably learn the reality of the American workplace as well.

Nevertheless, supposing an unlimited supply of skilled labor willing to immigrate (which upon reflection is actually quite a dubious assumption) the can could be kicked down the road indefinitely. However there has been no meaningful increase in immigration at present, which means employers must take concrete action now or accept understaffing. This means any change in immigration policy is unlikely to happen, as it will be "too little, too late" for the vast majority of firms.

Expansion of Entrepreneurship

One other way in which people are "taking their ball and going home" is striking out on their own, much as I have. The 49% increase in EIN applications is strong evidence many are doing so. I have been surprised for years that more didn't recognize the strength of contracting earlier. The tax advantages are quite strong and automation removes much of the need for support staff. Nevertheless now that people are taking the plunge, this is removing a significant number of people from the employment rolls permanently.


Missionary or Mercenary? πŸ”—
1634842858  


As part of this transition to entrepreneurship, I've talked to a lot of recruiters and companies in attempts to get contracts. Of those successful, there is almost always some element of a bait-and-switch involved either with the actual duties required or payment offered versus the expectations discussed up-front.

I don't take any of it personally, sometimes it's just a negotiation tactic you have to withstand. It is nevertheless a black mark on the relationship going forward, as it's assured they'll inevitably find some way to short you subsequently. If you are good at sticking to your guns, the place they inevitably slip is being late with payment.

What baffles me is how many employees I've known throughout the years who actually accepted an offer after such an obvious bait-and-switch without holding them to initial expectations. When I talked to them about it, their rationalization of the situation was inevitably a covert contract along the lines of "they'll do it when I prove myself". This of course was an axe ground forever in secret when that never came to pass. Such poison is everywhere when you know how to look for it in most firms.

Modern employment seems built around fostering these sorts of covert contracts. This is a side-effect of managers wanting missionaries not mercenaries. The trouble is that this vision of the firm is inevitably undermined by the incompetence of their self-serving management. It can't be any other way because the phenomenon of the "single-elimination ass-kissing tournament" and it's effects described in "Moral Mazes" saturate the market in the USA totally. As such employees regularly care deeply about firms that are at best indifferent to their well being. In that case being a "missionary" feels less inspiring and more like wearing matching nikes and drinking kool-aid. Missionaries tend to get disillusioned when they realize those which they follow are not gods, just men.

Which brings us to today, where we have many firms lamenting a labor shortage. This should shock nobody paying attention to demographics. Over the last 60 years firms have enjoyed an unprecedentedly glutted labor pool thanks to both the baby boom and feminine empowerment. The reality going forward is instead a diminishing labor pool. Yet firms still regularly adjust their final offers down and withhold Hiring "bonuses" until long after the initial work. They then have the gall to wonder why they're seeing a lack of enthusiasm.

Firms will not be able to get away with the sort of chicanery which has been commonplace in the last 60 years. Both wages and the meaningfulness of jobs will have to actually adjust higher. This is bad news for the management of most firms, as they've grown fat glad-handing and hiring armies of suck-ups which make them look good without being productive.

This is ultimately a hopeful sign for the future versus our reality of the last 20 years where we have had the best engineers ever lead by the worst management. We're already seeing huge levels of attrition from firms which have clueless management. It is but a matter of time before people look around and see that we have had the answers all along, but are blind to them on a systemic level.

It would be a breath of fresh air to be able to deal with management on a professional level rather than have to engage in a guerrilla insurgency of "Fuhren unter Der Hand" to achieve the actual goals of companies. I have grown weary of having to cynically exploit middle management's impulse to look good at all costs as the way to advance in a firm rather than actually accomplishing something. That said, the army has known about the superiority of auftragstrategik, the OODA loop and ideas similar to the germ theory of management for even longer than the business community has, yet remains the poster child for the kind of management described in "moral mazes". The capacity for self-deception in managers clearly is at least as great as those who work for them.

This is perhaps the greatest reason I've left it all behind me to do this hired-gun thing. At least this way I'm not holding my breath that this entrenched set of problems gets fixed.


PTR Records: waving a dead chicken πŸ”—
1634239190  


Helping out people with their shared hosting problems inevitably runs into mail issues. One of the more common ones has to do with being put on an RBL, usually a code 550 bounce. This usually happens for one of three reasons:

  • The mailserver is configured to do what seems right and HELO per the appropriate domain rather than the domain of the server
  • The mailserver responds on an IP other than that of the domain returned by the HELO (either lack of a PTR record, or responding on the mail domain's IP)
  • The client is actually spamming
It's rarely the latter, as this is fairly straightforward to prevent with outgoing mail limits, outgoing scanning and so forth.

You'd think that this cross-referencing of a domain's A record to it's PTR would prevent a lot of spam, as this would effectively prevent spoofing. Unfortunately the reality of shared hosting, IPv4 scarcity and limited ISP tooling has crippled this. This has resulted in a reality where there are only 2 correct configurations:

  • No sharing of IPs between domains which emit mail, period. Not even CNAMEs. You can then HELO per domain on it's IP.
  • Never HELO with anything other than the server hostname, and always respond from the server's IP
As you can see allowing the latter (which all RBLs must, as it's the Lowest common denominator here) essentially cripples the cross-referencing of the sender domain's A record with it's IP and the corresponding PTR record. Spammers can spoof any domain they wish as long as their MTA HELOs correctly until they get manually reported.

So why is it that we can't share IPs and HELO per-domain/ip? It's precisely because we are doing this PTR to IP to A record cross-referencing as an automated process over at the RBL makers like spamcop et al. That, and almost nobody is ever given a /24 block of IPs (0-254 on the least significant byte of the address). Instead the ISPs usually assign smaller blocks to multiple customers. This means that they can't delegate the NS record for *.X.Y.Z.in-addr-arpa (where XYZ is the remainder of your IP, but backwards) to their client.

So instead they provide some kind of interface for their clients to add PTR records. Unfortunately many don't know that a one-to-many PTR relationship is entirely supported by DNS, much like for A records themselves. As such these interfaces inevitably only allow one domain to be specified per IP. Which leaves you with only one option: send out mail from your server's hostname and primary IP.

Meanwhile, 0 spam is prevented because of this massive hole blown in the entire scheme. The only practical outcome is that unaware sysadmins get caught up by this and are put on RBLs erroneously. Which is a pity, as if the system actually worked correctly it would be an ironclad guarantee against spoofed emails.

IPv6 could have fixed this, as we could give everyone a /64 IP range until the end of time and never run out. Delegating NS for the PTR domains could then become standard practice. IPv6 never really got adopted by the major ISPs though. Given they haven't updated their tooling for multi-PTR (which has been supported almost since the beginning of DNS), we shouldn't hold our breath that NAT is going away anytime soon either.


2 Major paths to take advantage of CDNs πŸ”—
1634143208  

🏷️ tcms

When considering a switch from traditional web hosting to something like S3 (or ipfs) plus web workers or equivalent stateless compute resources (GKP, Lambda, etc) you need to re-think how you deploy your applications. Many of the complex backends that you are used to are simply not feasible to have on the live web with this model. That said, it's not hard to write CGIs in a stateless fashion, and many only keep their state client side or in a database, which is a model that works just fine here.

That said, most of your backend probably doesn't even need to be public. It could easily be behind a firewall and just push updates to your staging and production buckets and workers. In fact, I'm considering this model for tCMS right now! There's no good reason why I need the login functionality public, at least for post editors -- all of them could run a local instance and push out updates supposing I had an rqlite backend data storage sitting in a bucket as a read only node, with the editors behind firewalls doing the RAFT thing.

From there the question is whether to do static generation or load sqlite with sql.js and do everything client-side (or a combination of both). Static versus dynamic web pages is well trod ground at this point so do what feels best for your application. Personally I will probably go with static generation as I prefer to run as little as possible on the client machine to get the best user experience.

The big drawback to using IAAS is (historically) higher TCO and a lack of good abstractions making it easy to self-host under such models. Things like OpenStack make self-hosting possible, but prohibitively expensive and it feels like swatting flies with an elephant gun for most webmasters.

Even containerization and it's orchestration tools require a lot of extra thought and code versus the traditional LAMP approach. What's really needed to swim in both pools easily are some abstractions to make it easy for you to:

  1. Treat local directories like they are S3 buckets (and/or ipfs tokens) so you have flexibility for storage deployment
  2. Treat local httpds like they're a web worker API you deploy to, rather than vhosts
Implementing such tools would effectively allow any shared host to transition their existing infrastructure into such products, albeit without automatic scaling on their end. That said, if multiple servers are supported in a sort of federated model (again, likely mediated thru rqlite), that would largely ameliorate such concerns.

For S3, s3proxy is exactly what we are looking for. As to an imitation of web workers (or other "stateless" compute resources), that would be far more complex given there is no standardization across vendors.


Dealing with out of date slave zones after an IP migration πŸ”—
1633971893  


I had a problem the other day where I did an IP migration on a server which was shipping slave zones to another server. Unfortunately, the IP migration script provided with plesk either failed to update the zone serials, or BIND9/rndc just didn't care for whatever reason. As such, the other nameserver kept returning responses with the old IPs, despite:

  1. Restarting bind, reloading rndc on both
  2. incrementing every zone serial on the zones we were authoritative for using sed
  3. Manually wasting the zonefile on the slave, then running rndc retransfer and rndc resync on the master
  4. rndc delzone of said slave, and then try retransfer/resync again
Nothing actually brought over the new zonefile. Obviously, more rigorous means of forcing the issue were required.

First, I needed to see if the zones on the slave were not updating, or if I was nuts. Unfortunately rndc showzone tells you next to nothing other than that a zone exists and where it comes from. This means I need to convert them to text, as the slave zones by default are shipped in bind's Binary format. To make a more generic alias, I've made a means to detect whether or not it's binary and "do the right thing".

showzone.source

showzone()
{
    echo ${1?Error domain name is not defined e.g. showzone test.test }
    # Update the following as appropriate to your environment, this works on plesk
    ZONE_LOC=/var/named/chroot/var

    file $ZONE_LOC/$1 | grep text &> /dev/null
    RESULT=$?
    if [ $RESULT == 0 ]
    then
        cat $ZONE_LOC/$1
    else
        /usr/sbin/named-compilezone -f raw -F text -o - $1 $ZONE_LOC/$1
    fi
}

The missing link here was doing rndc addzone in the same way it would happen automatically, to fool the system into thinking it was a brand new slave configured:

fix_borked_slaves.sh

#!/bin/bash

ZONE_LOC=/var/named/chroot/var

# Run this On slave:
for zone in $(ls $ZONE_LOC)
do
    if [ -z `rndc showzone $zone 2>/dev/null` ]
    then
        # Grab rndc's current definition
        TO_ADD=$(rndc showzone $zone 2>/dev/null | grep slave | awk '{ $1=$2=$3=""; print $0}')
        # Dump the current list for use on master
        echo "$zone" >> for_master_update.txt
        # Delete and re-add the zones
        rndc delzone $(rndc showzone $zone 2>/dev/null | grep slave | awk '{ print $2 }')
        rndc addzone $zone $TO_ADD
    fi
done

# Preamble to script you need to run on master, based on for_master_update.txt

# ADAPTER=eth0
# LOCAL_IP=$(ip addr show $ADAPTER | grep 'inet ' | awk '{print $2}' | sed -e 's/\/.*$//g')
# REMOTE_IP="REPLACEME"

# On REMOTE_IP, do a mass search/replace to make for_master_update.txt look like this and run it

#rndc -b $LOCAL_IP -s $REMOTE_IP -p 953 -y rndc-key retransfer $zone

This thorough application of the LART did the trick. I'm still not 100% sure what was BIND and rndc's problem here; bumping zone serials should always result in a transfer.

What would be better?

Nonsense like this is probably why cPanel's "DNS Clustering" never used afxrs or master/slave DNS in the first place. Eventually people liked it's enabling of a multi-master setup where you could edit the zones from anywhere in your cluster. It was of course a pretty funky system in that it had no consensus protocol whatsoever. Edit loops and Sibyl attacks were entirely possible and straightforward for customers to mistakenly produce for years because of confusing documentation. The documentation is better now, but the design of the system haven't fundamentally changed.

I was part of the team tasked with considering redesign of that system when we had to add DNSSEC support to it. Even the supermaster mode in pdns doesn't have the same strength as the RAFT protocol. As such I think the ideal setup for that is probably pdns in sqlite mode, utilizing RQlite. This has the added benefit of being simple for people to just setup themselves.


MySQL utf8 migrations: big hairy queries πŸ”—
1633018799  


I had to migrate some mysql databases to UTF8mb4 last week. You would think this all would have happened a decade ago in a sane world, but this is still surprisingly common thanks to bad defaults in packaged software. Every day there's a new latin1 db and table being made because /etc/my.cnf isn't setup right out of the box. So People google around, and find some variant of this article.

I wasn't willing to waste hours with a non-automated solution for the many tables and databases piled up over years. So, I wrote a script of course. The plan was this:

  1. Convert the DB's default encoding & collation
  2. Get the list of tables so we can iterate
  3. Try and convert the table. If that fails, it's usually due to long varchars being indexed.
  4. On failure, drop and re-add the indices on long varchars with an acceptable shorter prefix length
  5. After that, try and re-convert the table
This ended up being a 99% solution. A couple of tables had some neat problems I'll talk about at the end. So without further ado, I'll talk about the queries themselves.

Converting the DB is straightforward.

 fix_db.sh
mysql -e "ALTER DATABASE $DB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci"


We'll assume for these examples that $DB is $1 (the first argument to the script).

Similarly, listing and converting tables was easy:

 fix_tables.sh ~ snippet 1
for table in $(mysql $DB -ss -e 'show tables')
    do
        mysql $DB -ss -e "ALTER TABLE $table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci" &> /dev/null
        STATUS=$?
        if [ $STATUS -ne 0 ]
        then
            ...


Now we have to do the hard part of dropping and re-adding the indices, because mysql doesn't support ALTER INDEX like postgres.

This means we need to capture the index state as it currently exists and only make the needed adjustments to turn varchar indexes into prefix indices. If you didn't read the article linked at the top, this is because utf8mb4 strings are 4x as large as latin1, and mysql has a (mostly) hardcoded index member size. You can get it bigger on InnoDB for most use cases, but in this case it was MyISAM and no such option was available.

Here's how I did it:

 fix_tables.sh ~ snippet 2

    for query in $(mysql -ss -e "select CONCAT('DROP INDEX ', index_name, ' ON ', table_name) AS query FROM information_schema.statistics WHERE table_name = '$table' AND index_schema = '$DB' GROUP BY index_name HAVING group_concat(column_name) REGEXP (select GROUP_CONCAT(cols.column_name SEPARATOR '|') as pattern FROM information_schema.columns AS cols JOIN information_schema.statistics AS indices ON cols.table_schema=indices.index_schema AND cols.table_name=indices.table_name AND cols.column_name=indices.column_name where cols.table_name = '$table' and cols.table_schema = '$DB' AND data_type IN ('varchar','char') AND character_maximum_length >= 250 AND sub_part IS NULL);")
    do
        echo "mysql $DB -e '$query'"
    done

    for query in $(mysql -ss -e "select CONCAT('CREATE ', CASE non_unique WHEN 1 THEN '' ELSE 'UNIQUE ' END, 'INDEX ', index_name, ' ON ', table_name, ' (', group_concat(concat(column_name, COALESCE(CONCAT('(',sub_part,')'), CASE WHEN column_name REGEXP (select GROUP_CONCAT(cols.column_name SEPARATOR '|') as pattern FROM information_schema.columns AS cols JOIN information_schema.statistics AS indices ON cols.table_schema=indices.index_schema AND cols.table_name=indices.table_name AND cols.column_name=indices.column_name where cols.table_name = '$table' and cols.table_schema = '$DB' AND data_type IN ('varchar','char') AND character_maximum_length >= 250 AND sub_part IS NULL) THEN '($PREFIX_LENGTH)' ELSE '' END)) ORDER BY seq_in_index), ') USING ', index_type) AS query FROM information_schema.statistics WHERE table_name = '$table' AND index_schema = '$DB' GROUP BY index_name HAVING group_concat(column_name) REGEXP (select GROUP_CONCAT(cols.column_name SEPARATOR '|') as pattern FROM information_schema.columns AS cols JOIN information_schema.statistics AS indices ON cols.table_schema=indices.index_schema AND cols.table_name=indices.table_name AND cols.column_name=indices.column_name where cols.table_name = '$table' and cols.table_schema = '$DB' AND data_type IN ('varchar','char') AND character_maximum_length >= 250 AND sub_part IS NULL);")
    do
        echo "mysql $DB -e '$query'"
    done


We then execute all the mysql statements. I output this to a shell script which can then be run. If we simply ran them, we would have an issue where the indexes we are looking for have already been dropped.

Let's go over the details, as it uses a solid chunk of the important concepts in mysql. Read the comments (lines beginning with --) for explanations.

 drop_indices.sql

-- The plan is to output a DROP INDEX query
SELECT CONCAT('DROP INDEX ', index_name, ' ON ', table_name) AS query
-- information_schema.statistics is where the indices actually live
FROM information_schema.statistics
WHERE table_name = '$table'
AND index_schema = '$DB'
-- The indices themselves are stored with an entry per column indexed, so we have to group to make sure the right ones stick together.
GROUP BY index_name
-- We want to exclude any indices which dont have VARCHAR or CHAR cols, as their length is irrelevant to utf8mb4
-- The simplest way is to build a subquery grabbing these, and then scan the columns in the index via regexp
HAVING group_concat(column_name) REGEXP (
    select GROUP_CONCAT(cols.column_name SEPARATOR '|') AS pattern
    -- We need to cross-reference the cols in the index with those in the table
    -- So that we can figure out which ones are varchars/chars
    FROM information_schema.columns AS cols
    JOIN information_schema.statistics AS indices ON cols.table_schema=indices.index_schema
    AND cols.table_name=indices.table_name
    AND cols.column_name=indices.column_name where cols.table_name = '$table'
    AND cols.table_schema = '$DB'
    AND data_type IN ('varchar','char') AND character_maximum_length >= 250
    -- sub_part is the prefix length.  In practice, any ones with a sub part already set were just fine, but YMMV.
    AND sub_part IS NULL
);


Things get a little more complicated with the index add. To aid the reader I've replaced the subquery used above (it is the same below) with "$subquery".

 make_indices.sql

-- As before, we wish to build a query, but this time to add the index back properly
SELECT CONCAT(
    'CREATE ',
    -- Make sure to handle the UNIQUE indexes
    CASE non_unique WHEN 1 THEN '' ELSE 'UNIQUE ' END,
    'INDEX ',
    index_name,
    ' ON ',
    table_name,
    ' (',
    -- Build the actual column definitions, complete with the (now) prefixed indices
    group_concat(
        concat(
            column_name,
            -- COALESCE is incredibly useful in aggregate functions, as the default behavior is to drop the result in the event of a null.
            -- We can use this to make sure that never happens when sub_part IS NULL.
            -- In that case, we either print a blank string or a prefix based on what I chose to be $PREFIX_LENGTH (in my case 50 chars) when varchar/char cols are detected by the subquery.
            COALESCE(
                CONCAT('(',sub_part,')'),
                CASE WHEN column_name REGEXP ($subquery) THEN '($PREFIX_LENGTH)' ELSE '' END
            )
        )
        -- Very important to preserve the sequence in the index, these are scanned before the latter ones and can have big perf impact
        ORDER BY seq_in_index
    ),
    ') USING ',
    -- Almost always BTREE, but might be something else if we are for example using geospatials
    index_type
) AS query
-- The rest is essentially the same as the DROP statement
FROM information_schema.statistics
WHERE table_name = '$table'
AND index_schema = '$DB'
GROUP BY index_name
HAVING group_concat(column_name) REGEXP ($subquery);


As you can see, this uses nearly every trick in the book. The only improvements I could think of would be to turn the subquery into a VIEW, as it was used multiple times. Turning the create statement generator without our constraints into a VIEW is generally useful and left as an exercise for the reader.

About the only major feature I didn't use were stored procedures or extensions. This all would have been a great deal simpler had ALTER INDEX been available as in postgres. In sqlite it would be a great deal more complicated save for the fact that this isn't a problem in the first place, given utf8 is the default.

The only places this didn't work out was when tables crashed on things like SELECTs of particular rows for whatever reason. The good news was that mysqldump generally worked for these tables, and you could modify the dump's CREATE TABLE statement to setup indices properly. There were also a few places where there were very long varchars that were *not* indexed, but made the table too large. Either shortening them or turning them into LONGTEXT was the proper remedy.


What's new in playwright-perl v0.015 πŸ”—
1631664613  

🏷️ video 🏷️ testing
Quick update on what's changed since I last talked about playwright.

tCMS September 2021 update πŸ”—
1631660780  

🏷️ video 🏷️ tcms
I finally got time to do some work on tCMS! Here's the new goodies all the users of tCMS can expect going forward.

tCMS September 2021 update πŸ”—
1631660718  

🏷️ video 🏷️ tcms
I finally got time to do some work on tCMS! Here's the new goodies all the users of tCMS can expect going forward.

Games people play: Ultimate Smackdown πŸ”—
1629390195  


Mark Gardner asked me on twitter the other day why I wrote this article without specifically naming any names. I realized my reasoning here is probably worth sharing and elaborating if the question needs to be asked.

A lot of people would naturally think most of the post was about the perl community given that's sort of what this series of posts has been about up to this point. While it does indeed apply in large part, it's far from the only place I've seen these phenomena, and a general treatment will naturally be more useful. I've also personally "been the bad guy" here myself, so I can't throw the first stone.

People don't usually read blogs because they're useful. It's all about getting engagement these days which gets down to the more important reason to not "name names" unless the situation described is already well understood in the public. When you do, your brand immediately becomes pro wrestling and you're either gonna be a heel or a face.

This is also why calumny is considered sinful, as it provokes strong emotions which tend to make people do things they regret. When you decide to participate in an online kayfabe, don't be shocked when you get the wrong kind of followers. Sometimes the drooling fans are more annoying than the online lynch mob.

Beefin' is generally not what you want to see from a professional offering actual goods or services. Do you want to sell software and services, or entertainment? You generally don't get to do the same within one brand. There's a reason my political/entertainment persona online is quite separated from my professional. While I don't try to hide that it exists, the degree of separation is an important signal that you can, in fact, control yourself when it comes to what's actually important -- making an upright living through service to others.

A lot of this first year of entrepreneurship has been de-programming myself and unlearning bad corporate habits I didn't even know I had. Much of my content up to now has been little more than sharing my journey. It's been useful to some people, but nowhere near as useful as my actual skill set has been to my clients. Sure feels good to write though.


Resizing, Expanding and Using raw images πŸ”—
1629321079  

🏷️ tutorial

So you need to resize some VM images, and possibly inspect them later. Here's the dope:

 mount_loopback.sh
# Replace size with what you see to be appropriate
qemu-img resize $IMAGE_FILE +80GB
# Setup a device for the image file so that we can resize the relevant partition
losetup -f -P $IMAGE_FILE

When using LVM, you'll have to do some extra steps to get the relevant device and resize things:

 get_mapped_volname.sh
pvscan --cache
# Grab the name of the volume
lvs
# Use the name from that to plug it in, there should be something similarly named in /dev/mapper
vgchange -ay

Now you need to resize the partition:

 resize_normal.sh
# When you have a normal DOS/BSD partition table:

# Using ext fS:
resize2fs /dev/loop0p1

# Basically anything else
growpart /dev/loop0 1

Things are of course different on LVM:

 lvm_extend.sh
# Resize partition you want, usually this is device 1, but on XFS the data section is usually device 2
pvresize /dev/loop0p2
lvextend -l +100%FREE /dev/mapper/$NAME

XFS also requires special handling. You will have to mount the relevant device and then:

 xfs_grow.sh
# Or, the loop device if it's not LVM
xfs_growfs /dev/mapper/$NAME

Mounting and fiddling with stuff is as simple as:

 mount_loopbacks.sh
# Replace partition number with whatever is appropriate
mount -t $FSTYPE /dev/loop0p1 $MOUNTPOINT
# Same story with LVM stuff, but use the /dev/mapper entry
mount -t $FSTYPE /dev/mapper/$NAME $MOUNTPOINT

When done, you should remove the loopback device:

 remove_loopbacks.sh
losetup -d /dev/loop0

One last point worth noting is that if you do this while the VM is active writes will not show up until you unmount & unload the loopback devices, as they don't share journals. The best use of mounted VM disks on the hypervisor is for backups though, and for this purpose loading and mounting them just for the backup period works quite well. As such I generally also add the -r (read only) option to losetup when I'm mounting for backup purposes.


25 most recent posts older than 1629321079
Size:
Jump to:
POTZREBIE
© 2020-2023 Troglodyne LLC