🌐
Downloads Videos Blog About Series πŸ—ΊοΈ
❓
πŸ”‘



Selling Internally, Externally and in Interviews πŸ”— 1619710468  

One of the common interview questions you get is the time preference question.  I've asked it myself multiple times. It goes something like this:

  • Tell me about a time you had to make sacrifices in the short term to achieve a long term goal.
Engineering companies very much want to think of themselves as builders of great works made to stand the test of time. They frequently fall short of this as the customer generally wants "Mr. Right Now" instead of "Mr. Right". Wise organizations achieve coherence in their strategic vision by having "fulfill customer desires" itself as the long term goal. I've mentioned before that a vision which does not align with the core business model is doomed to failure, and many companies fall into this trap.

Many view the agglomeration of technical debt associated with an iterative design process to be short-term thinking which undermines the long term...but that assumes the goal is to build quality software. In reality, the goal is to build software of acceptable quality that satisfies customer needs; worse is better.  In this framework, much of what goes on at an engineering corporation can be framed as a victory rather than a death march.  The problem to solve then becomes minimizing the iteration duration of your OODA loop.

The OODA loop of a software enterprise is basically this:
  1. Observe: Sample reaction to the latest software version
  2. Orient: Refine program and development schedule constraints based on reaction
  3. Decide: Choose optimal algorithms to satisfy new and changed constraints
  4. Act: Test, Anneal and Release

You break out of your loop when you stop getting meaningful observations.  Many organizations have successfully adopted this (see OPCDA).  The whole point here is that you accumulate less bad designs lurking in your code, as you can refine constraints quickly enough to not over-invest in any particular solution.

Many times this is paired with other questions to find out how much of a self-starter, leader or entrepreneurial aspect you have:

  • How do you drive adoption for your ideas?
  • How do you measure adoption of your ideas?

They always want a concrete example from your past employment, and it needs to be your thing from start to finish. This is usually also a good opportunity to reinforce how well you embrace iterative design principles.  In fact it drives at the real reason they ask the first question.

Knowing how to drive adoption and measure it is key to the observation phase.  If your observations are flawed, it will poison and invalidate the results of all resultant phases, so you need to get it right.

There are two primary adoption strategies. All marketing is a tree search algorithm of one sort or another thanks to the way influence networks work.

Breadth first vs Depth First Marketing

You can either drive adoption of something within an organization virally (infect the sheep) or evangelically (convert the Shepard).  You can do both, but conditions usually mean you need to lean primarily towards one or the other.
The cost of reaching consumers is directly is a great deal higher, and they have a lot less to spend than businesses and bosses with budgets.  That said, the total revenue you can get from targeting retail is vastly larger, and defections from the product are less troublesome.

In general you see a hybrid model nowadays where an open source (or reduced price) component is marketed towards retail, and a paid premium version is marketed towards business.
Infection of the sheep can drive conversion of the Shepard, much the way that conversion of the Shepard can drive the flock.

When it comes to driving change within organizations, the formula is turned upon its' head.  It is actually cheaper to convert fellow drones instead of the queen, and effect a coup de main. The drones are used to collaborating with each other and value each others' input far more than they do tools provided from above. Similarly, management is incapable of understanding many of the problems which occur in the production process as they happen, supposing they even look for them at all.  Furthermore, getting the kind of feedback needed to iterate and improve is fast and straightforward between drones.

This is why much of the approach around things like Kaizen and Scrum focus on empowering the drone to streamline production themselves.  The concept is generally referred to as Metis, and it is valuable for management to periodically inspect and experiment with cross-pollination of this across divisions to increase productivity.

War story time

For those of you not familiar with me, I have a decade of experience automating QA processes and testing in general.
This means that the vast majority of my selling has been of two kinds:

  • Selling tactical/strategic/logistic intelligence reports
  • Selling colleagues on tools to improve their productivity

That said, I also wore "all the hats" in my startup days at hailstrike, and had to talk a customer down from bringing their shotgun to our office.
I handled that one reasonably well, as the week beforehand I'd read Carl Sewell's Customers for Life and Harry Browne's Secret of selling anything.
The problem was that one of the cronies of our conman CEO was a sales cretin there and promised the customer a feature that didn't exist and didn't give us a heads up.
It took me a bit to calm him down and assure him he was talking to a person that could actually help him, but after that I found out what motivated him and devised a much simpler way to get him what he wanted.
A quick code change, a deploy and call back later to walk him through a few things to do on his end to wrangle data in Excel and we had a happy camper.

He had wanted a way to bulk import a number of addresses into our systems and get a list of hailstorms which likely impacted the address in question, and a link into our app which would pull the storm map view immediately (that they could then do a 1-click report generate for homeowners).

We had a straightforward way of doing this for one address at a time, but I had recently completed optimizations that made it feasible to do many as part of our project to generate reports up to two years back for any address.
Our application was API driven and already had a means to process batched requests, so it was a simple matter of building an excel macro talking to our servers which he could plug his auth credentials into.
I built this that afternoon and sent it his way.  This started a good email chain where we made it an official feature of the application.

It took a bit longer to build this natively into our application, but before the week was up I'd plumbed the same API calls up to our UI and this feature was widely available to our customers.
I was also able to give a stern talking to our sales staff (and gave them copies of C4L and SSS) which kept this from happening going forward, but the company ultimately failed thanks to aforementioned conman CEO looting the place.

The war within

After that experience I went back to being a salaryman over at cPanel.  There I focused mostly on selling productivity tools internally until I transitioned into a development role.

I'd previously worked on a system we called "QAPortal" which was essentially a testing focused virtual machine orchestration service based on KVM.  Most of the orchestration services we take for granted today were in their infancy at that time and just not stable or reliable enough to do the job.  Commercial options like CloudFormation or VSphere were also quite young and expensive, so we got things done using perl, libvirt and a webapp for a reasonable cost.  It also had some rudimentary test management features bolted on.

That said, it had serious shortcomings, and the system essentially was unchanged for the 2 year hiatus I had over at hailstrike as all the developers moved on to something else after the sponsoring manager got axed due to his propensity to have shouting matches with his peers.
I was quickly tasked with coming up with a replacement.  The department evaluated test management systems and eventually settled on TestRail, which I promptly wrote the perl API client for and put it on CPAN.
The hardware and virtual machine orchestration was replaced with an openstack cluster, which I wrote an (internal) API library for.
I then extended the test runner `prove` to talk to and multiplex it's argument list over the various machines we needed to orchestrate and report results to our test management system.
All said, I replaced the old system within about 6 months.  If it were done today, it would have taken even less time thanks to the advances in container orchestration which have happened in the intervening time.  The wide embrace of SOAs has made life a lot better.

Now the team had the means to execute tests massively in parallel across our needed configurations, but not every team member was technical enough to manage this all straightforwardly from the command line.  They had become used to the old interface, so in a couple of weekends I built some PHP scripts to wrap our apps as an API service and threw up a jQuery frontend to monitor test execution, manage VMs and handle a few other things the old system also accomplished.
Feedback was a lot easier than with external customers, as my fellow QAs were not shy about logging bugs and feature requests.

I suspect this is a lot of the reason why companies carefully cultivate alpha and beta testers from their early adopter group of rabid fans.  Getting people in the "testing mode" is a careful art which I had to learn administering exploratory test sessions back at TI, and not to be discarded carelessly.  That is essentially the core of the issue when it comes to getting valid reports back from customers.  You have to do Carl Sewell's trick of asking "what could have worked better, what was annoying...", as those are the sort of user feedback that you want rather than flat-out bugs.  Anything which breaks the customers' immersion in the product must be stamped out -- you always have to remember you are here to help the user, not irritate them.

Rewarding these users with status, swag and early access was the most reliable way to weed out time-wasters; you only want people willing to emotionally invest, and that means rewards have to encourage deeper integration with the product and the business.  It also doesn't hurt that it's a lot cheaper and easier to justify as expenses than bribes.

Are ya winning son?

Measuring adoption of software and productivity ideas in general can be tricky unless you have a way to either knock on the door or phone home. Regardless of the approach taken, you also have to track it going forwards, but thankfully software makes that part easy nowadays.
Sometimes you use A/B tests and other standard conversion metrics, as I used extensively back at HailStrike.  I may have tested as much copy as I did software!  Truly the job is just writing and selling when you get down to it.

In the case of inter-organization projects most of the time it's literally knocking on the door and talking to someone.  At some level people are going to "buy" what you are doing, even if it's just giving advice.  This is nature's way of telling you "do more of this, and less of the rest".

I can say with confidence that the best tool for the job when it comes to storing this data is a search engine, as you eventually want to look for patterns in "what worked and didn't".  Search engines and Key-Value stores give you more flexibility in what IR algorithm best matches the needs of the moment.  I use this trick with test data as well; all test management systems use databases which tend to make building reports cumbersome.

Time Preference versus Subjective Value

Rather than flippantly dismiss the original question, I would like to revisit the problem.  While it is obvious that I will probably gain more over the long term by sacrificing my desire to do something fun instead of writing this article, one must also take into consideration the law of diminishing marginal utility and the Paradox of Value.  Thinking long term means nothing when one is insolvent or dead without heirs tomorrow.  There will always be an infinite number of possible ends for which I sacrifice my finite means.  As an optimization problem, it is NP hard.  The best we can do is to use the Kelly Criterion to distribute our time and other assets wisely among the opportunities we best understand the risks about.

Building an online reputation is quite expensive and time consuming, but is beginning to pay off.  It doesn't hurt that I'm pursuing multiple aims simultaneously (building a MicroISV product, chasing contracts) with everything I write these days.  That said it cannot be denied that hanging out your shingle is tantamount to a financial suicide mission without multiple years of runway.  Had I not spent my entire adult life toiling, living below my means and not taking debts, none of this would be possible.  In many ways it's a lot like going back to college, but the hard knocks I'm getting these days have made me learn a whole lot more than a barrel full of professors.

For those who insist on the technical answer to this question, I would direct you to observe the design of Selenium::Client versus that of Selenium::Remote::Driver.  This is pretty much my signature case for why picking a good design from the beginning and putting in the initial effort to think is worth it.  My go-to approach with most big balls of mud is to stop the bleeding with modular design.  Building standalone plugins that can ship by themselves was a very effective approach at cPanel, and works very well when dealing with Bad and Right systems.  What is a lot harder to deal with is "Good and Wrong" systems, usually the result of creationist production.  When dealing with a program that puts users and developers into Procrustes' bed rather than conforming to their needs you usually have to start back from 0.  Ironically most such projects are the result of the misguided decision to "rewrite it, but correctly this time".

Given cPanel at the time was a huge monorepo sort of personifying "bad design, good execution", many "lets rewrite it, but right this time" projects happened and failed, mostly due to having forgotten the reasons it was written the way it had been in the first place.  New versions of user interfaces failed to delight users thanks to removing features people didn't know were used extensively or making things more difficult for users in the name of "cleaner" and "industry standard" design.  A lot of pain can be brought to a firm when applying development standards begins to override pleasing the customer.  The necessity of doing just that eventually resulted in breaking the monolith to some extent, as building parallel distribution mechanisms was the only means to escape "standardization" efforts which hindered satisfying customer needs in a timely manner.

This is because attempting to standardize across a monorepo inevitably means you can't find the "always right" one-size fits-all solution and instead are fitting people into the iron bed.  The solution of course is better organizational design rather than program design, namely to shatter the monolith.  This is also valuable at a certain firm scale (dunbar's number again), as nobody can fit it all into their head without resorting to public interfaces, SOA and so forth.  Reorientation to this approach is the textbook example of short-term pain that brings long-term benefit, and I've leveraged it multiple times to great effect in my career.


How to (not) do job interviews πŸ”— 1619627345  

The name of the game is building emotional investment, and most of the mechanisms that work elsewhere work here too.

Power in the Firm, and getting fired πŸ”— 1619537517  

I have been fired multiple times in my life. Each time it has been because I violated a fundamental rule of power. Where I had stayed employed when others were cut it was also due to "observation of the laws" of power. This is not to say I understood this at the time, but to observe that "this time is not different".

The first "real job" I got out of college was testing calculators for Texas Instruments. I subcontracted there for about 4 years, and was one of the few who survived a ruthless layoff associated with the 07/08 panic. This was a very close run thing. There was one day in which I was fired and re-hired in the same day.

It is clear in retrospect that the reason I stuck around was due to being better at finding issues than all my peers. I had by that time found a number of critical issues with the multi-line scientifics by mapping out the memory pages and watching for stomped flags. Nobody else testing the products at the time came close to understanding the hardware at this level, making me indispensable.

Which is to say I focused like a good protestant work ethic boy on laws #9 and #11. Demonstrate, don't explicate. Keep others dependent on you to achieve freedom. I keep going back to this over my career, as it worked.

I also learned law #13 "Only appeal to self-interest" when it came to seeking promotion and favor from management.  I found quickly that "job descriptions" were universally meaningless and the only important thing was delivering on stuff your manager was emotionally invested in.

This was about two years before I got fired. I went on to do more things for the firm which nobody else understood, such as solving a data encoding issue with archival documents and porting the TI-8X emulator to linux. I had made a good number of friends and was well liked at the firm.

Nevertheless, this made me a bit too comfortable. I was also still a pretty naive young man at the time, and actually believed upper management would appreciate serious criticism. This is of course not the case, and they see it as an affront and out of place. To do this is to violate rules of power #1 and #19, "Don't outshine the master" and "Don't offend the wrong people". Like my victories this has also bitten me more than once.

Interestingly enough a couple of months after my ouster, I got an offer to work on the programming of the color TI-84 from one of the programmers there I had a good relationship with. Apparently the criticisms which I had of management were quite timely and the issues I had brought up promptly blown up in their face like backdraft. As such, there was no resistance to my return as all oxen gored were now out of the picture.

I had taken a job with cPanel by then though, a firm which I would spend 8 years at. I also rapidly rose to a position of indispensability in the QA organization there, but took a brief hiatus to work with my cousin at his startup HailStrike. In retrospect this should have been an obvious violation of Power law #10 "avoid the unhappy and unlucky". The company was a reject bin in many ways.

Nevertheless due to my upbringing which had turned me into the stereotypical "nice guy" who immolates himself to keep others warm, I did a lot of good work there. I built a new product from the ground up and re-wrote the existing one to not have horrible projection bugs and awful performance. That said, nothing could save that firm, as my cousin and his partner hired a con-man to run the firm thanks to their lack of self-confidence. After about a year and a successful funding round, the co-founders went on a month long vacation and returned to find the place looted.

In that time and in the aftermath I basically kept tech end of the shop going single-handedly for minimum wage. After about 6 months of this I cut bait and returned to cPanel, being close to "zeroed out" financially. All I got for the trouble was some worthless stock in a firm which languishes to this day.

Meanwhile cPanel's QA department hadn't changed much from where I had left it. They were eager to make some forward progress and remembered my impact. So my departure at least had the positive effect of resulting in a big raise. Law #16 "Use absence to increase respect and honor" in action.

For the next 5 or so years I became the most senior man in the department. I made a number of tools without which the department couldn't do their jobs. I also was #1 across the board in test execution and bug filing metrics.

So far, so good. I also cultivated better options to effect a promotion and significant raise in my last two years, but this would prove my undoing. Much ink has been spilled about how it's always better to take the other offer (much of which I had read!) but I was emotionally invested in the firm after 6 years and accepted the counteroffer.

I was now writing product code rather than automation for our QA workflow. Similar to in my prior role, I quickly rose to the top 5 bug fixers and committers at the firm. This, however is not the same as indispensability.

It turns out that you need to work on products that matter if you want to stick around. At TI, I worked on cash cows, and these things will forever need their indispensable people. However at cPanel they had a monoproduct which was itself an aglommeration of various sub-products some of which were important and not. The teams I was put on were rarely working on anything the customer was particularly interested in.

To be fair, this is the case across much of the organization. For years only the CEO's team was the one working on anything relevant to customers. The rest of the firm was run autonomously by the middle management and fell victim to the principal-agent problems that entails.

As a middle manager, to aggrandize yourself you generally want to weed out the indispensable and maximize your headcounts. This is generally accomplished by two means:

  1. allowing the indispensable to silo and thus violate rule #18 "do not isolate yourself".
  2. making sure you don't take any risks you can be blamed for, and blaming failures on lack of manpower
The consequence of 2) is that nothing of consequence is worked on, meaning that no new person will ever achieve true indispensability. We had a core cabal of old developers most of whom were siloed (and gradually being pushed away), or had mentally checked out themselves due to working on unimportant tasks. I was obviously not among either as I "still cared" rather than being checked out (and thus not perceived as threatening).

It also didn't hurt that up to that point I used programming as a superpower in a field that traditionally has little power in development organizations (QA), or worked on cash cows (which are rarely influential).  The powerful never feel necessitous, and when push comes to shove they'll throw their best overboard.  I had unwittingly aligned with weak factions until this point, which is a necessary (but not sufficient) condition for becoming indispensable.  Now that I was a part of the engineering department which dominated the company, there could be no allies to protect my independence.

I was eventually assigned a new team and manager which I should in retrospect have realized was a trap. I had built quite the reputation for independence while I was there (which is normal with the indispensable) and clashed multiple times with this manager. Enough things piled up over time which were not explicitly breaking the rules but did not signal submission that he formed a negative opinion of me. I'd been making a number of other changes in my life at the time which were bringing refreshing youthful joy, so I suppose it is not surprising I returned to the indiscretions from my youth which tripped me up at TI.

All it took from there was a minor dispute which could easily have been resolved peacefully being escalated in bad faith.  Some of this was simply because I fought the situation at all.  Bosses like to feel like the "cop" in the relationship in these situations, and we all know how cops feel about anyone who doesn't instantly surrender, grovel and degrade themselves for daring to attract their ire. This is why rule of power #22 is a thing.  Surrender is the best option in such situations where you are already "caught up", as bosses think any benevolence they show from that point is a thumb-screw they can use on demand. Obviously you would prefer these thumbscrews not be used, so the tactic is to buy time and enough freedom of action to get out of there.

Rule #42 also comes into play, "Strike the Shepard and the sheep scatter".  Even if you are in the right, management cannot tolerate defiance spreading.  It's simply inviting further attack.  While this is effective at keeping management powerful, it also has the effect of entrenching whatever errors they are engaged in.

At the end of the day the question that must be asked is "would you rather be happy, or right?"  Being emotionally invested in the firm you work for and your role in it means it must "do right" in order for you to be happy.  This is a recipe for disaster, as everyone's emotional needs from the firm differ and become guaranteed to clash past Dunbar's number.  This is why Power law #20 is a thing: "commit to no one".

This desire to have a useful culture at a company and a good "mission" is power law #27, "Use people's need to believe to create a cult-like following".  While you can't hate the player for "playing the game", it is straightforward to realize that there are a great deal better things out there to direct your belief and worship towards than a corporation.  All the senior developers I've known who were checked out totally about the firm had the right idea all along.  The company can want a certain culture all it wants and even go to great lengths to inculcate it, but it simply can't work past a certain scale.  You have to insulate yourself from this and resist getting sheep-dipped into their hyperreality if you want to remain happy.  Focus instead on doing the things that give you power over your situation, which is real freedom.

The shock of being removed from a place I'd been 8 years with a number of good friends took a while to absorb, but it's pretty clear where I steered wrongly. I should have learned the lesson of the Count of Carmagnola. You can't be a star when what they need is a cog.


Telling Stories in corporate πŸ”— 1619450039  

Ever since I struck out on my own, I have done far more job interviews and storytelling than my entire carrer thus far. Most of the interviews for development and testing contracts are talking through problems. You must tell a story, as this is how people internalize your attributes and form emotional investments. I can talk about the lessons I have learned or the design of a thing until I am blue in the face, but it won't matter unless people know where they come from. This is because people don't engage the predictive engine in their hindbrain if you don't tell a story.  People don't get into camshaft thinking (imagination) without emotionally investing in knowing how it works, and as such are in "internal monologue" mode.  A story is the only thing this mode of thought can comprehend or output, and is a necessary prerequisite to get into the mode of thinking where imagination "fills in the blanks".  By the time they get there, diagrams and so forth are unnecessary unless you are interested in mass production which comes for free with software.

When you tell a story, people "think past the sale", and start to see themselves doing business with you unconsciously. If you don't tell a story, people default to their lower consciousness which is stimulus-response. In this case, if you aren't attractive from a "mode 1" (judging a book by it's cover) point of view, good luck. For knowledge work, this is only the case for established intellectuals with some degree of fame. This is why everyone has to do this "online brand" thing; eventually somebody fishing will see you in the net and haul you in.

That said, your online brand can only get you in the door. From there people in the knowledge trades have an innate skepticism beaten into them via the scientific method. This has to be overcome, and the way this is usually done is by telling stories which the interlocutor identifies with. The whole goal is for both the interviewer and evaluator to be congruent with what each expects from the other. This is why it always ends up being the senior development staff that does the heavy lifting here. They've heard this story enough times to sniff out the little details that break them out of their suspension of disbelief (also known as "benefit of the doubt"). This has a high rate of success, as it is difficult to fake having reasoning skills, and being able to practically apply them. It's also difficult to fake the little details which we encounter in the course of our daily toil. Difficult, but not impossible.

I remember setting up those little programming puzzles on hackerrank for the candidates to chew through. My colleague who was working on this with me on it at the time had some anxiety as to whether they were being specific enough in the description of the problems. I thought of how the application process ought to feel both to the applicant and evaluator in order to maximize the potential they can show and give ample opportunity to display their deficiencies. The job-seeker's story is supposed to be a gauntlet of increasing difficulty, hopefully revealing the core qualities needed in our work.

In that vein, I suggested we nail down problem 1 as well as possible, while leaving the second vague. This gives people the ability to show both how efficiently they operate when things are concrete and how quickly they pick up on our "trick" question which is ill-defined and start giving us options. The two hardest problems in software are choosing optimal algorithms and reducing vague requirements into concrete, testable execution constraints. Everything else is straightforward testing, investigation and annealing.

This is not to say that software organizations don't have other (mostly logistical and marketing) problems to solve, but that these are the core ones of interest to engineering. As an interviewer you have to lead the horse to water and see if they'll drink. The interviewer should focus on getting them interested enough in their stories that the evaluator shares some back. Reciprocity is the best sign of developing emotional investment.

You may have noticed I'm telling a story right now. It's uncanny how well this works on you even when you know how the sausage is made! I've been on both sides of the table when it's clear that "they know you know, and you know they know" based on the responses. In these cases breaking the fourth wall is even more convincing of a story as it too is a story.

This is unfortunately a rarity on both sides. All the world's a stage, and we are merely players. A performance cannot truly be great unless both sides can believe it and find more significance therein than their reality! This is despite foreknowledge that it's a performance and not a demonstration. To succeed, one has to get fully sheep-dipped into the hyperreality you want to hop into.

On that note, I will be putting out a series of war stories soon both as practice for upcoming contracts and for your enjoyment.

April 2021 Houstonpm: pairwise technicals πŸ”— 1618337136  

A re-record of the technical and maths-heavy aspects of my April 2021 Houston.pm presentation.

Hard Problems πŸ”— 1618336942  

When preparing any tool which you see all the pieces readily available, but that nobody has executed upon, you begin to ask yourself why that is. This is essentially what I've been going through building the pairwise tool.

Every time  I look around and don't see a solution for an old problem on CPAN, my spider-senses start to fire.  I saw no N-dimensional combination methods (only n Choose k) or bin covering algorithms, and when you see a lack of N-dimensional solutions that usually means there is a lack of closed form general solutions to that problem.  While this is not true for my problem space, it rubs right up against the edge of NP hard problems.  So it's not exactly shocking I didn't see anything fit to purpose.

The idea behind pairwise test execution is actually quite simple, but the constraints of the software systems surrounding it risk making it more complex than is manageable. This is because unless we confine ourselves to a very specific set of constraints, we run into not one, but two NP hard problems. We could then be forced into the unfortunate situation where we have to use Polynomial time approximations.

I've run into this a few times in my career. Each time the team grows disheartened as what the customer wants seems on the surface to be impossible. I always remember that there is always a way to win by cheating (more tight constraints). Even the tyranny of the rocket equation was overcome through these means (let's put a little rocket on a big one!)

Breaking it down

The first problem is that N-Wise test choosing is simply a combination.
This results in far, far more platforms to test than is practical once you get beyond 3 independent variables relevant to your system under test. For example:

A combination with 3 sets containing 3, 5 and 8 will result in 3 * 5 * 8 = 120 systems under test! Adding in a fourth or fifth will quickly bring you into the territory of thousands of systems to test.  While this is straightforward to accomplish these days, it is quite expensive.

What we actually want is an expression of the pigeonhole principle.  We wish to build sets where every element of each component set is seen at least once, as this will cover everything with the minimum number of needed systems under test.  This preserves the practical purpose of pairwise testing quite nicely.

In summary, we have a clique problem and a bin covering problem. This means that we have to build a number of bins from X number of sets each containing some amount of members. We then have to fill said bins with a bunch of tests in a way which will result in them being executed as fast as is possible.

Each bin we build will represent some system under test, and each set from which we build these bins a particular important attribute. For example, consider these sets:

  • Operating Systems: Windows, Linux, OSX
  • Processor Architecture: 32-bit, 64-bit
  • Browser: Firefox, Chrome, Safari, Brave, Opera, SeaMonkey

A random selection will result in an optimal multi-dimensional "pairwise" set of systems under test:

  1. Firefox - Windows - 64 Bit
  2. Chrome - Linux - 32 Bit
  3. Safari - Windows - 32 Bit
  4. Brave - OSX - 32 Bit
  5. Opera - OSX - 64 Bit
  6. SeaMonkey - Linux - 64-Bit

The idea is to pick one of each of the set with the most members and then pick from the remaining ones at the index of the current pick from the big set modulo the smaller set's size. This is the "weak" form of the Pigeonhole Principle in action, which is why it is solved easily with the Chinese remainder theorem.

Sometimes you can oversimplify

You may have noticed that perhaps we are going too far with our constraints here. This brings in danger, as the "strong" general form of the pigeonhole principle means we are treading into the waters of Ramsey's (clique) problem. For example, if we drop either of these two assumptions we can derive from our sets:

  1. No element of any given set is repeated
  2. No element of any given set is shared with another

We immediately descend into the realm of the NP hard problem. This is because we are no longer a principal ideal domain and can no longer cheat using the Chinese remainder theorem. In this reality, we are solving the Anti-Clique problem specifically, which is particularly nasty. Thankfully, we can consider those two constraints to be quite realistic.

We will have to account for the fact that the variables are actually not independent. You may have noticed that some of these "optimal" configurations are not actually realistic. Many Operating systems do not support various processor architectures and software packages. Three of the configurations above are currently invalid for at least one reason.  Consider a configuration object like so:

my $conf = {
    PlatformGroups => {
        'Operating Systems' => [qw{CentOS Ubuntu Windows OSX}],
        'CPU Archetechure'  => [qw{32-bit 64-bit}],
        'Browser'           => [qw{Firefox Opera Safari Chrome Iexplore Brave Dillo lynx}],
        'Mail Server'       => [qw{exim courier postfix qmail exchange}],
        'HTTP Server'       => [qw{ngnix apache lighttpd thttpd}],
        'Database Server'   => [qw{postgres mysql mariadb mssql oracle}],
        'Message Queue'     => [qw{rabbitmq zmq}],
        'Search Engine'     => [qw{solr lunr elasticsearch}],
    },
    incompatibilities => {
        'Windows' => [qw{32-bit Safari Dillo qmail exim courier postfix thttpd solr}],
        'OSX'     => [qw{32-bit Iexplore}],
        'CentOS'  => [qw{Iexplore}],
        'Ubuntu'  => [qw{Iexplore}],
    },
};
Thanks to the requirement that all configurations be unique, we can use a simplified data structure here rather than over-complicating the PlatformGroup data structure (and our processor code).

Can we throw away these configurations without simply "re-rolling" the dice?  Unfortunately, no.  Not without using the god algorithm of computing every possible combination ahead of time, and therefore already knowing the answer.  As such our final implementation looks like so:

sub cliques($conf,$tests) {
    my %pgroups = ref $conf->{PlatformGroups} eq 'HASH' ? %{$conf->{PlatformGroups}} : ();
    my @plans;

    # Randomize the ordering of the platform groups for eventual consistency.
    foreach my $pg (keys(%pgroups)) {
        @{$pgroups{$pg}} = shuffle(@{$pgroups{$pg}});
    }

    # The idea here is to have at least one pigeon in each hole.
    # This is accomplished by finding the longest list of groups, and then iterating over everything we have modulo their size.
    my $longest = (sort { scalar(@{$pgroups{$b}}) <=> scalar(@{$pgroups{$a}}) } keys(%pgroups))[0];
    my $llen = scalar(@{$pgroups{$longest}});
    my $tot = scalar(@$tests);

    # Bin covering
    my $remainder = ( $tot % $llen );
    my $to_take = int$tot / $llen);
    my $offset = 0;

    for (my $i=0; $i < $llen$i++) {
        my @newplats;
        foreach my $pgroup ( sort { scalar(@{$pgroups{$b}}) <=> scalar(@{$pgroups{$a}}) } keys(%pgroups)) {
            my $idx = $i % scalar(@{$pgroups{$pgroup}});
            my $orig_idx = $idx;

            # If a partial is invalid, we must re-roll the dice.
            while (!combination_valid($conf@newplats, ,$pgroups{$pgroup}[$idx])) {
                $idx = ($idx + 1) % scalar(@{$pgroups{$pgroup}});
                # Allow for 'incomplete' sets omitting a configuration group entirely due to total incompatibility
                last if $idx == $orig_idx;
            }
            push(@newplats,$pgroups{$pgroup}[$idx]);
        }
push(@plans, \@newplats);
    }
    return \@plans;
}

sub combination_valid ($conf,@combo) {
    my %compat = %{$conf->{incompatibilities}};
    foreach my $key (keys(%compat)) {
        next unless ref $compat{$keyeq 'ARRAY';
        my @compat = grep { my $element = $_defined $element && ( $element eq $key || grep { $element eq $_ } @{$compat{$key}} ) } @combo;
        return 0 if @compat > 1;
    }
    return 1;
}

This brings us to another unmentioned constraint: what happens if a member of a set is incompatible with all members of another set?  It turns out accepting this is actually a significant optimization, as we will end up never having to re-roll an entire sequence.  See the while loop above.

Another complication is the fact that we will have to randomize the set order to achieve the goal of eventual coverage of every possible combination. Given the intention of the tool is to run decentralized and without a central oracle other than git, we'll have to also have use a seed based upon it's current state.  The algorithm above does not implement this, but it should be straightforward to add.

Filling the bins

We at least have a solution to the problem of building the bins. So, we can move on to filling them. Here we will encounter trade-offs which are quite severe. If we wish to accurately reflect reality with our assumptions, we immediately stray into "no closed form solution" territory. This is the Fair Item Allocation problem, but with a significant twist.  To take advantage of our available resources better, we should always execute at least one test. This will result in fewer iterations to run through every possible combination of systems to test, but also means we've cheated by adding a "double spend" on the low-end.  Hooray cheating!

The fastest approximation is essentially to dole out a number of tests equal to the floor of dividing the tests equally among the bins plus floor(  (tests % bins)  / tests ) in the case you have less tests than bins. This has an error which is not significant until you reach millions of tests. We then get eaten alive by rounding error due to flooring.

We could simply add the remainder and give up on fair allocation.  But given the remainder will always be lower than the number of bins, we can just shave one off of it each go-through until we run out (while still retaining the minimum bound of 1).  This is is the optimal solution:

my $choose = int( $total_tests / $bins );
my $remainder = $total_tests % bins;
...
# later in our loop
my
 $take = $choose + ( $remainder && 1 ) || 1;
$remainder-- if $remainder;

From there we simply splice out the relevant elements from the array of tests.  The completed algorithm has some minor differences from cliques() above:

sub cliques($conf,$tests) {
    my %pgroups = ref $conf->{PlatformGroups} eq 'HASH' ? %{$conf->{PlatformGroups}} : ();
    my @plans;

    # Randomize the ordering of the platform groups for eventual consistency.
    foreach my $pg (keys(%pgroups)) {
        @{$pgroups{$pg}} = shuffle(@{$pgroups{$pg}});
    }

    # The idea here is to have at least one pigeon in each hole.
    # This is accomplished by finding the longest list of groups, and then iterating over everything we have modulo their size.
    my $longest = (sort { scalar(@{$pgroups{$b}}) <=> scalar(@{$pgroups{$a}}) } keys(%pgroups))[0];
    my $llen = scalar(@{$pgroups{$longest}});
    my $tot = scalar(@$tests);

    # Bin covering
    my $remainder = ( $tot % $llen );
    my $to_take = int$tot / $llen);
    my $offset = 0;

    for (my $i=0; $i < $llen$i++) {
        my @newplats;
        foreach my $pgroup ( sort { scalar(@{$pgroups{$b}}) <=> scalar(@{$pgroups{$a}}) } keys(%pgroups)) {
            my $idx = $i % scalar(@{$pgroups{$pgroup}});
            my $orig_idx = $idx;

            # If a partial is invalid, we must re-roll the dice.
            while (!combination_valid($conf@newplats, ,$pgroups{$pgroup}[$idx])) {
                $idx = ($idx + 1) % scalar(@{$pgroups{$pgroup}});
                # Allow for 'incomplete' sets omitting a configuration group entirely due to total incompatibility
                last if $idx == $orig_idx;
            }
            push(@newplats,$pgroups{$pgroup}[$idx]);
        }

        my $tt = $to_take + ( $remainder && 1 ) || 1;
        push(@plans,{ tests => [splice(@$tests, $offset$tt)], platforms => \@newplats });
        $remainder-- if $remainder;
        $offset += $tt;

        # Just repeat tests in the event we have more SUTs available than tests
        $offset = $offset % $tot;
    }
    return \@plans;
}

It is worth noting there is yet another minor optimization in our production process here at the end, namely that if we have more systems available for tests than tests to execute, we can achieve total coverage in less iterations by repeating tests from earlier groups.

Trade-offs in my trade-offs

Even this makes some significant assumptions:
  1. Each item we are packing into a bin is of equal size. This means every test is assumed to run in the same amount of time on the same computer.
  2. Each item is indivisible
  3. Each bin values each item equally (in our context this means "every computer executes it in the same amount of time")
  4. Each test will never change in how long it takes to execute when it changes, or the system under test does.
  5. Each bin represents one computer only.

Obviously the only realistic assumption here is #2. If tests can be executed faster by breaking them into smaller tests, the test authors should do so, not an argument builder.

Assumptions #1 and #3, if we take them seriously would not only doom us to solving an NP hard problem, but have a host of other practical issues. Knowing how long each test takes on each computer is quite a large sampling problem, though solvable eventually even using only git tags to store this data. Even then, #4 makes this an exercise in futility. We really have no choice but to accept this source of inefficiency in our production process.

Invalidating #5 does not bring us too much trouble. Since we expect to have a number of test hosts which will satisfy any given configuration from the optimal group and will know how many there are ahead of time, we can simply split the bin over the available hosts and re-run our bin packer over those hosts.

This will inevitably result in a situation where you have an overabundance of available systems under test for some configurations and a shortage of others. Given enough tests, this can result in workflow disruptions. This is a hard problem to solve without "throwing money at the problem", or being more judicious with what configurations you support in the first place. That is the sort of problem an organization wants to have though. It is preferable to the problem of wasting money testing everything on every configuration.

Whither N-wise

Since the name of the tool is pairwise, I may as well also implement and discuss multi-set combinations.  Building these bins is actually quite straightforward, which is somewhat shocking given every algorithm featured for doing pairwise testing at pairwise.org was not in fact the optimal one from my 30 year old combinatorics textbook.  Pretty much all of them used tail-call recursion in languages which do not optimize this, or they took (good) shortcuts which prevented them from functioning in N dimensions.

Essentially you build an iterator which, starting with the first set, pushes a partial combination with every element of its set matched with one of the second onto your stack.
You then repeat the process, considering the first set to be the partial, and crank right through all the remaining sets.

Dealing with incompatibilities is essentially the same procedure as above.  The completed algorithm looks like so:

sub combine($conf,$tests) {
    my %pgroups = ref $conf->{PlatformGroups} eq 'HASH' ? %{$conf->{PlatformGroups}} : ();
    my @plans;

    #construct iterator
    my @pigeonholes = values(%pgroups);
    my $bins = product map { scalar(@$_) } @pigeonholes;
    my $tot_tests = scalar(@$tests);

    # Bin covering
    my $remainder = $tot_tests % $bins;
    my $to_take = int$tot_tests / $bins);

    my $offset = 0;

    my @iterator = @{$pigeonholes[0]};
    while (scalar(@iterator) ) {
        my $subj = shift @iterator;

        #Handle initial elements
        $subj = [$subjif ref $subj ne 'ARRAY';

        #Break out of the loop if we have no more possibilities to exploit
        if (scalar(@$subj) == scalar(@pigeonholes)) {
            my $tt = $to_take + ( $remainder && 1 ) || 1;
            push(@plans, { tests => [ $offset$tt ], platforms => $subj } );
            $remainder-- if $remainder;
            $offset += $tt;
            # Just repeat tests in the event we have more SUTs than tests
            $offset = $offset % $tot_tests;
            next;
        }

        #Keep pushing partials on to the end of the iterator, until we run out of categories to add
        foreach my $element (@{$pigeonholes[scalar(@$subj)]}) {
            my @partial = @$subj;
            # If the combination isn't valid, return an undef member to simplify loop breakout
            # This results in some configurations which are essentially the same.
            # That said, we cannot simply discard them if we wish to cover the case a configuration having incompatibilities with entire configuration groups.
            # We could compress them later to avoid some slop, but it's probably not worth the effort.
            push(@partial, combination_valid($conf,@partial) ? $element : undef );
            push(@iterator,\@partial);
        }
    }
    return \@plans;
}

Uniting all under Heaven

You may have noticed this is a greedy algorithm.  If we decided to use this as a way to generate a cache for a "god algorithm" version of the anti-clique generator above, we could very easily run into memory exhaustion with large enough configuration sets, defeating the purpose. You could flush the partials that are actually complete, but even then you'd only be down to 1/n theoretical memory usage where n is the size of your 2nd largest configuration set (supposing you sort such that it's encountered last).  This may prove "good enough" in practice, especially since users tend to tolerate delays in the "node added to network" phase better than the "trying to run tests" phase.  It would also speed up the matching of available systems under test to the desired configuration supersets, as we could also "already know the answer".

Profiling this showed that I either had to fix my algorithm or resort to this.  My "worst case" example of 100 million tests using the cliques() method took 3s, while generating everything took 4.  Profiling shows the inefficient parts are almost 100% my bin-covering.

Almost all of this time is spent splice()ing huge arrays of tests.  In fact, the vast majority of the time in my test (20s total!) is simply building the sequence (1..100_000_000), which we are using as a substitute for a similar length argument array of tests.

We are in luck, as once again we have an optimization suggested by the constraints of our execution environment.  Given any host only needs to know what it needs to execute we can save only the relevant indices, and do lazy evaluation.  This means our sequence expansion (which takes the most time) has an upper bound of how long it takes to generate up to our offset.  The change is straightforward:

push(@plans,{ tests => [ $offset$tt ], platforms => \@newplats });

The question is, can we cheat even more by starting at our offset too?  Given we are expecting a glob or regex describing a number of files which we don't know ahead of time what will be produced, this seems unlikely.  We could probably speed it up globbing with GLOB_NOSORT. Practically every other sieve trick we can try (see DeMorgan's Laws) is already part of the C library implementing glob itself.  I suspect that we will have to understand the parity problem a great deal better for optimal seeking via search criteria.

Nevertheless, this gets our execution time for the cliques() algorithm down to 10ms, and 3s as the upper bound to generate our sequence isn't bad compared to how long it will take to execute our subset of 100 million tests.  We'd probably slow the program down using a cached solution at this point, not to mention having to deal with the problems inherent with such.  Generating all combinations as we'd have to do to build the cache itself takes another 3s, and there's no reason to punish most users just to handle truly extreme data sets.

It is possible we could optimize our check that a combination is valid, and get a more reasonable execution time for combine() as well.  Here's our routine as a refresher:

sub combination_valid ($conf,@combo) {
    my %compat = %{$conf->{incompatibilities}};
    foreach my $key (keys(%compat)) {
        next unless ref $compat{$keyeq 'ARRAY';
        my @compat = grep { my $element = $_defined $element && ( $element eq $key || grep { $element eq $_ } @{$compat{$key}} ) } @combo;
        return 0 if @compat > 1;
    }
    return 1;
}

Making the inner grep a List::Util::first instead seems obvious, but the added overhead made it not worth it for the small data set. Removing our guard on the other hand halved execution time, so I have removed it in production.  Who knew ref( ) was so slow?  Next, I "disengaged safety protocols" by turning off warnings and killing the defined check.  This made no appreciable difference, so I still haven't yet run into a situation where I've needed to turn off warnings in a tight loop.  Removing the unnecessary allocation of @compat and returning directly shaved another 200ms.  All told, I got down to 800ms, which is in "detectable but barely" delay territory, which is good enough in my book.

Conclusion

The thing I take away from all this is that the most useful thing a mathematics education teaches is the ability to identify specific problems as instances of generalized problems (to which a great deal of thinking has already been devoted).  While this is not a new lesson, I continuously astonish myself how unreasonably effective it is.  That, and exposure to the wide variety of pursuits in mathematics gives a leg up as to where to start looking.

I also think the model I took developing this has real strength.  Developing a program while simultaneously doing what amounts to a term paper on how it's to operate very clearly draws out the constraints and acceptance criteria from a program in an apriori way.  It also makes documentation a fait accompli.  Making sure to test and profile while doing this as well completed the (as best as is possible without users) methodologically dual design, giving me the utmost confidence that this program will be fit for purpose.  Given most "technical debt" is caused by not fully understanding the problem when going into writing your program (which is so common it might shock the uninitiated) and making sub-optimal trade-offs when designing it, I think this approach mitigates most risks in that regard.

That said, it's a lot harder to think things through and then test your hypotheses than just charging in like a bull in a china shop or groping in the dark.  This is the most common pattern I see in practice doing software development professionally.  To be fair, it's not like people are actually willing to pay for what it takes to achieve real quality, and "good enough" often is.  Bounded rationality is the rule of the day, and our lot in life is mostly that of a satisficer.  Optimal can be the enemy of good, and the tradeoffs we've made here certainly prove this out.

When I was doing QA for a living people are surprised when I tell them the most important book for testers to read is Administrative Behavior. This is because you have to understand the constraints of your environment do do your job well, which is to provide actionable information to decision-makers.  I'm beginning to realize this actually suffuses the entire development process from top to bottom.


April Houstonpm: pairwise πŸ”— 1618336523  

Here's a re-record of the non-technical aspects of my presentation made to Houston.pm in April 2021.

It should go without saying πŸ”—
1618254638  

Basically nothing about the response on social media to my prior post has shocked me.

The very first response was "this is a strawman". Duh. It should go without saying that everyone's perception of others can't be 100% accurate. I definitely get why some people put "Don't eat paint" warnings on their content, because apparently that's the default level of discourse online.

Much of the rest of the criticism is to confuse "don't be so nice" with "be a jerk". There are plenty of ways to politely insist on getting your needs met in life. Much of the frustrations Sawyer is experiencing with his interactions are to some degree self-inflicted. This is because he responds to far too much, unwittingly training irritating people to irritate him more.

This is the most common failure mode of "look how hard I tried". The harder you "try" to respond to everything, the worse it gets. Trust me, I learned this the hard way. If you instead ignore the irritating, they eventually "get the message" and slink off. It's a simple question: Would you rather be happy, or right? I need to be happy. I don't need other people to know I'm right.

I'm also not shocked that wading into drama / "red-meat" territory got me more engagement on a post than anything else I've got up here to date. This is just how things work online -- controversy of some kind is necessary. Yet another reason to stop being nice; goring someone's ox is just the kind of sacrifice needed to satiate the search engine gods, apparently.

This is not to say I don't find it distasteful, indeed there is a reason I do not just chase this stuff with reckless abandon. What I want is to have a positive impact on the community at large, and I think I may just have done it (see the image with this post).

Even though I gored a few oxen-feels posting this, it's clearly made a positive impact on at least one person's life. That alone makes it worth it. I still take the scout's vow to do a good turn daily seriously. Keep stacking those bricks, friends.


Games people play on P5P πŸ”—
1618241807  

SawyerX has resigned from the Perl 5 steering council. This is unfortunate for a variety of reasons, the worst of which is that it is essentially an unnecessary self-sabotage which won't achieve Sawyer anything productive.

I met Sawyer in a cafe in Riga during the last in-person EU Perl 5/6 con. Thankfully much of the discussion was of a technical nature, but of course the drama of the moment was brought up. Andrew Shitov, a Russian was culturally insensitive to westerners, go figure. He apologized and it blew over, but some people insisted on grinding an axe because they valued being outraged more than getting on with business.

It was pretty clear that Sawyer was siding with the outraged, but still wanted the show to go on. I had a feeling this (perceived) fence-sitting would win him no points, and observed this play out.

This discussion naturally segued into his experience with P5P, where much the same complaints as lead to his resignation were aired. At the time he was a pumpking, and I stated my opinion that he should just lead unrepentantly. I recall saying something to the effect of "What are you afraid of? That people would stop using perl? This is already happening." At the time it appears he was just frustrated enough to actually lead.

This lead to some of the most forward progress perl5 has had in a long time. For better or worse, the proto-PSC decided to move forward. At the time I felt cautiously optimistic because while his frustration was a powerful motivator, I felt that the underlying mental model causing his frustration would eventually torpedo his effort.

This has come to pass. The game he's playing out here unconsciously is called "look how hard I'm trying". It's part of the Nice Guy social toolkit. Essentially the worldview is a colossal covert contract: "If I try hard and don't offend anyone, everyone will love me!"

It's unsurprising that he's like this, as I've seen this almost everywhere in the software industry. I was like this once myself. Corporate is practically packed from bottom to top with "nice guys". This comes into conflict with the big wide world of perl, as many of the skilled perlers interested in the core language are entrepreneurs.

In our world, being nice gets you nowhere. It doesn't help you in corporate either, but corporate goes to great effort to forestall the cognitive dissonance which breaks people out of this mental model. The reason for this is straightforward. Studies have repeatedly shown those with agreeable personalities are paid less.

Anyways, this exposes "nice" people to rationally disagreeable and self-interested people. Fireworks ensue when their covert contract is not only broken, but laughed at. Which brings us to today, where Sawyer's frustration has pushed him into making a big mistake which he thinks (at some level, or he would not have done it) will get him what he wants.

It won't. Nobody cares how hard you worked to make it right. Those around you will "just say things" forever, and play what have you done for me lately on repeat until the end of time. Such is our lot as humans, and the first step in healing is to accept it.

Future people considering hiring Sawyer will not have a positive view of these actions. Rather than seeing the upright and sincere person exhausted by shenanigans that Sawyer sees in himself, they will see a person who cracked under pressure and that therefore can't be trusted for the big jobs.

I hate seeing fellow developers make some of the same mistakes I did earlier in life. Especially if the reason he cracked now has to do with other things going on in his personal life which none of us are or should be privy to. Many men come to the point where it's "Kill the nice guy, before he kills you". Let us hope the situation is not developing into anything that severe, so that he can right his ship and return to doing good work.


Don't end the week with nothing πŸ”—
1617382977  

I'm borrowing the title of a famous post by patio11, because I clearly hate having google juice because it's good and touches on similar points to my former colleague Mark Gardner recently made. (See what I did there, cross site linking! Maybe I don't hate having google juice after all...)

Anyways, he mentioned that despite having a sprint fail, he still learned a lot of good stuff. This happens a lot as a software developer and you need to be aware of this to ensure you maximize your opportunities to take something positive away from everything you work on.

On that note, I had a similar thing happen to me this week with playwright-perl. It turns out I didn't have to write a custom server with express to expose the Playwright API to Perl. The Playwright team have a command line program which talks on stdin/stdout to do these RPC calls for their python and go clients.

The reason I didn't know about it was that it is not documented! The only reason I found out was due to hopping into the Playwright slack and getting some good feedback from one of the Playwright devs.

This might seem like I did a bunch of work for no reason, and now have to do expensive re-tooling. I actually don't have to do anything if I don't want to. My approach seems to work quite well as-is. That said, even when I do replace it (as this will be good from a maintenance POV), the existing code can be re-used to make one of the things I really want. Namely, a selenium server built with playwright.

This would give me all the powerful new features, reliability and simpler setup that traditional Selenium servers don't have. Furthermore, (if it catches on) it means the browser vendors can stop worrying about releasing buggy selenium driver binaries and focus on making sure their devToolsProtocols are top-shelf. (Spoiler alert: This is one of the secret reasons I wrote Selenium::Client.)

This also shouldn't be too much of a hurdle, given I have machine-readable specs for both APIs, which means it's just a matter of building the needed surjections. Famous last words eh? Should make for an interesting Q3 project in any case.


Playwright, Selenium and Perl πŸ”— 1617057517  

Last week Sebastian Riedel did some mojo testing using Playwright, I encourage you to see his work here. It would have been neat if he'd used my playwright module on CPAN (as it was built to solve this specific problem). He did so in a way which is inside-out from my approach.

That's just fine! TIMTOWTDI is the rule in Perl, after all. For me, this underlines one of the big difficulties for even a small OSS developer; If you build it, nobody will come for years if you don't aggressively evangelize it.

On that front, I've made some progress; playwright-perl got a ++ from at least one other PAUSE author and I got my first ever gratuity for writing open source software thanks to said module. This is a pretty stark contrast from the 100% thankless task of Selenium::Remote::Driver, which is a lot more work to maintain.

This is a good point to segue into talking about Sebastian's article. Therein he mentions that some of the tricks Playwright are using might end up being a maintenance landmine down the road. Having both worked at a place which has maintained patches to upstream software for years at a time and maintained a selenium API client for years I can say with confidence this is less of a problem than selenium has.

The primary trouble with selenium over the years has to do with the fact that it is simply not a priority for any of the browser vendors. The vast majority of issues filed on Selenium::Remote::Driver over the years have been like this one: In essence, the browser vendor issues a broken driver for a release and we either can ignore it as transient or have to add a polyfill if it persists across releases. Selenium::Remote::Driver is more polyfill than client at this point (partially due to the new WC3 selenium standard not implementing much of the older JSONWire spec).

Historically, Chrome has been the biggest repeat offender in releasing broken drivers. However post-layoffs, it appears Mozilla is getting in on this game as well. Add people frequently using drivers of versions which are incompatible with their browser and encountering undefined behavior, and you begin to understand why microsoft decided to micromanage the browsers the way they did in Playwright. In practice, you need this level of control to have your testing framework be less buggy than the system you want to test with it.

In the end, the reason selenium sticks to open protocols is because they don't have the resources to devote to proper maintenance. I regard a firm which maintains patchsets as a positive; this signals they are actually willing to devote resources to maintenance. They would not have written and shipped them had they not been willing to; most especially not at a firm like Microsoft which is well aware of the consequences.

Selenium's dark secret

While Sebastian didn't mention these, there are also a number of other drawbacks to selenium other than selenium sticking to open protocols. The most glaring of which is that most of the browser vendors do not support getting non-standard attribute values (such as the aria* family) which are highly relevant. You must resort to simply executing javascript code, which more or less defeats the purpose of 90% of the Selenium API. This is the approach pretty much all the polyfills in Selenium::Remote::Driver take.

Another huge controversy over the last half-decade was the "Element Overlap" check, which was buggy for years (especially when negative margin was involved) and still can't be turned off reliably. By contrast, Playwright's check is easy to turn off and has always worked correctly. It sounds like Microsoft learned the right lesson instead of being insensitive to the will of the vast majority of users.

The "Upgrade" to the WC3 protocol also removed a great deal of functionality, while giving us less new features than were removed from the JSONWire spec. Back then the drivers were even more unreliable than they are now; The primary point of the standards was to try and find a minimum set of functionality that they could reliably maintain, an effort which is a clear failure at this point.

Microsoft's approach of just letting the browser vendors do their thing and adapt to them rather than demanding they adapt to testers is far better. In my career this always works out the same way. Your life as a developer and tester gets a lot better when you take the software you work with largely as a given.

Why did playwright have to be made at all?

All the points above lead one to conclude the only thing you can rely on in selenium is the javascript interpreter. So why not just skip selenium and write tests with something like protractor? This is in fact what a number of organizations have done.

It's not like the WC3 API gives you anything above and beyond what the JS interpreter can give you, so it makes a lot of sense from a practical perspective. Playwright on the other hand gives you easy access to everything enabled by the DevToolsProtocol on every browser with a unified API. Selenium 4.0 offers the ability to talk to the DevToolsProtocol, but without a unified API. This is why I consider Selenium an obsolete protocol which has been leapfrogged entirely by Playwright.

Selenium's Enduring Strengths

This is not to say that Selenium does not have some features which are still not met by the Playwright team. In particular the built-in Selenium Grid which has been massively strengthened in Selenium 4.0. This is enabled by it being a server based approach, rather than just a library for talking to the browser.

Obviously, this is quickly solved with but another layer of abstraction. I did precisely that to accomplish the first Playwright client not made by Microsoft. The server-based approach I took would allow me to replicate Selenium's grid functionality in the future with Playwright... but that's probably not needed in our modern era of coverage reporters and containers. That's why my current project Pairwise is aimed at simplifying this workflow specifically.

The holy grail of acceptance testing

Back in the JSONWire days, Microsoft UI had the genius idea to unify desktop testing under the Selenium API with WinAppDriver. This unfortunately has been abandoned in favor of making VSCode a world-beater. This was clearly the right move for microsoft, as even I have been largely converted from my vim + tmux workflow. I still think this is an amazing idea, and (if nobody beats me to it) I want to make an equivalent for linux (using XTest) and OSX...and windows, but all using the Playwright API instead.

Working with Playwright as a client maintainer

Playwright also made another design decision which guarantees it will be easy to spread and write clients for. It ships with a machine-readable specification, while Selenium has never (and likely will never do so). Since SeleniumHQ's 4.0 JAR made breaking changes, I decided to make a new client Selenium::Client. I liked the approach of dynamically making classes based upon a spec, and did so for the next generation selenium client. However, this required that I parse the specification document, which was a nontrivial task (see Selenium::Specification).

The intention long-term is to replace the guts of Selenium::Remote::Driver with Selenium::Client to reduce maintenance burden; this will take some time given how difficult it will be to untangle due to the module being a big ball of mud.

Closing Thoughts

The rest of Sebastian's article goes over the practical points of embedding your perl application inside Node to test it. Much of these are the same concerns (ensuring the server is up before testing, bringing it down correctly, ensuring deps) which I had with the server. Similarly, build toolchain issues are about the same either way; you'll have to wrangle both cpan and npm one way or another. In the end it comes down to personal preference; do you want to write Playwright in perl or JS?

For guys like Sebastian and I who are as fluent in Javascript as Perl, his approach actually makes a lot of sense and is a lot less work than making a module like Playwright-perl. The path to scaling is also less work than building in a grid-like functionality to Playwright-perl; Kubernetes deployment of a bunch of containers each running some subset of tests and using a coverage reporter isn't exactly rocket science. That said, doing the same with scripts built atop playwright-perl won't exactly be difficult either.

For those of you more comfortable in Perl than JS, I think you'll be well served by playwright-perl. Feel free to give it a shot if this sounds like you. If you like it a lot, feel free to send me a gratuity, become a patron, or log some bugs if you don't like it so much.


Announcing a new OSS tool: pairwise πŸ”— 1616627599  

While there are a number of other tools to do pairwise execution, none of them quite have the qualities needed by modern development organizations. I aim to fix that.

Q2 2021 Retrospective πŸ”— 1616521494  

6 Months in. Thoughts on where I need to keep developing and "Stacking the Bricks" that I should have done more of earlier in my Career.

Software Testing Videos πŸ”— 1615923751  

Videos about Software Testing topics

Async/Await? Real men prefer Promise.all() πŸ”— 1615853053  

I've been writing a bunch of TypeScript lately, and figured out why most of the "Async" modules out there are actually fakin' the funk with coroutines.

Turns out even pedants like programmers aren't immune to meaning drift! I guess I'm an old man now lol.

Article mentioned: Troglodyne Q3 Open Source goals


Q3 Open Source Goals πŸ”—
1615831259  

  1. Release PageNSA page activity watcher.
  2. Build a new tool "pairwise". I'll do a video on this soon.
  3. Release a few of my "test obscure scenario" scripts.
  4. Configure automatic docker image creation and Github actions for tCMS
  5. Finishing the transition of tCMS to "everything is a series" data model (see Issue 130)
  6. Porting Overload::FileCheck to windows - This still has a couple of failing tests (I’ve screwed up something porting over the XS): teodesian/Overload-FileCheck at win32 (github.com)
  7. Adding JSONWire support (and then WinAppDriver support) to Selenium::Client
  8. Re-factor Selenium::Remote::Driver to use Selenium::Client as backend rather than Selenium::Remote::RemoteConnection, CanStartBinary, etc
  9. Writing unit tests for Selenium::Client
I'll publish a retrospective video on Q2 performance and Q3 goals soon.

Selenium::Client released to CPAN πŸ”— 1612566669  

I got a client which works with Selenium v4 and WC3 Selenium! I decided to make a new module rather than deal with some of the design decisions that made maintaining Selenium::Remote::Driver such a pain, and was freed up to bake in some nice features in the bargain.

I also go over the various "gotchas" with the new selenium and where we go from here with the module and Selenium::Remote::Driver.

Big changes coming to Selenium::Remote::Driver πŸ”— 1610589448  

Selenium v4 looks like some good stuff, so it's about time to bring it all to the Perl community since it's going mainstream this February.

tCMS Hacking VII: Mixed Content Warnings πŸ”— 1609455753  

A common problem in websites is the "Mixed Content Warning" on SSL virtualHosts. In the end it becomes yet another "I should (and do) know better" stream, lol

tCMS Hacking VI: How programming usually goes πŸ”—
1609454786  

I tried to fix a bug, but had to fix other things first. This is how most days go when you are programming.

tCMS Deploys using Buildah and Podman πŸ”— 1609442334  

Branching out thanks to our friends over at the Houston Linux User's Group.

tCMS on CentOS 7 with Apache πŸ”— 1609367187  

I learned a few things deploying tCMS via proxy to a CentOS 7 box with Apache. Here's a breakdown.

tCMS Hacking V: Speeding up Docker deployment with overlays πŸ”—
1609292913  

The fundamental motivation for all programmers -- "this is taking to long!"

Speaking of, this stream took way too long because the docu I was looking at was solving a different problem (smaller disk size than less time).

Feed my greedy algos!!!1

tCMS Hacking IV: Practical concerns when doing docker deploys πŸ”— 1609273138  

Try not to stick your hands in the guts of your containers unless you want jungle diseases. Here's a practical example of doing the targeted surgery required to keep sane.

tCMS Hacking III: Filter your REQUEST_URI or you'll die πŸ”— 1609264670  

Yet another from the "I should know better" (and do) files. A little dab of regex will do.

25 most recent posts older than 1609264670
Prev Size:
Jump to:
POTZREBIE
© 2020-2021 Troglodyne LLC