Professional Services
    
        Custom Software
        Managed Hosting
        System Administration
        See my CV here.
        Send inquiries here.
    
    Open Source:
    
        tCMS
    
    
        trog-provisioner
    
    
        Playwright for Perl
    
    
        Selenium::Client
    
    
        Audit::Log
    
    
        rprove
    
    
        Net::Openssh::More
    
    
    cPanel & WHM Plugins:
    
        Better Postgres for cPanel
    
    
        cPanel iContact Plugins
    
    
When preparing any tool which you see all the pieces readily available, but that nobody has executed upon, you begin to ask yourself why that is. This is essentially what I've been going through building the pairwise tool.
Every time  I look around and don't see a solution for an
      old problem on CPAN, my spider-senses start to fire.  I saw
      no N-dimensional combination methods (only n Choose k) or bin
      covering algorithms, and when you see a lack of N-dimensional
      solutions that usually means there is a lack of closed form
      general solutions to that problem.  While this is not true
      for my problem space, it rubs right up against the edge of NP hard
      problems.  So it's not exactly shocking I didn't see anything
      fit to purpose.
    
 The idea behind pairwise
        test execution is actually quite simple, but the constraints
      of the software systems surrounding it risk making it more complex
      than is manageable. This is because unless we confine ourselves to
      a very specific set of constraints, we run into not one, but two
      NP hard problems. We could then be forced into the unfortunate
      situation where we have to use Polynomial time approximations.
    
I've run into this a few times in my career. Each time the team
      grows disheartened as what the customer wants seems on the surface
      to be impossible. I always remember that there is always a way to
      win by cheating (more tight constraints). Even the tyranny of the
      rocket equation was overcome through these means (let's put a
      little rocket on a big one!)
    
The first problem is that N-Wise test choosing is simply a combination.
      This results in far, far more platforms to test than is practical
      once you get beyond 3 independent variables relevant to your
      system under test. For example:
    
A combination with 3 sets containing 3, 5 and 8 will result in 3
      * 5 * 8 = 120 systems under test! Adding in a fourth or fifth will
      quickly bring you into the territory of thousands of systems to
      test.  While this is straightforward to accomplish these
      days, it is quite expensive.
    
What we actually want is an expression of the pigeonhole
        principle.  We wish to build sets where every
        element of each component set is seen at least once, as
      this will cover everything with the minimum number of needed
      systems under test.  This preserves the practical purpose of
      pairwise testing quite nicely.
      
In summary, we have a clique problem and a bin covering problem. This means that we have to build a number of bins from X number of sets each containing some amount of members. We then have to fill said bins with a bunch of tests in a way which will result in them being executed as fast as is possible.
Each bin we build will represent some system under test, and each set from which we build these bins a particular important attribute. For example, consider these sets:
A random selection will result in an optimal multi-dimensional "pairwise" set of systems under test:
 The idea is to pick one of each of the set with the most members
      and then pick from the remaining ones at the index of the current
      pick from the big set modulo the smaller set's size. This is the
      "weak" form of the Pigeonhole Principle in action, which is why it
      is solved easily with the Chinese
        remainder theorem.
    
You may have noticed that perhaps we are going too far with our constraints here. This brings in danger, as the "strong" general form of the pigeonhole principle means we are treading into the waters of Ramsey's (clique) problem. For example, if we drop either of these two assumptions we can derive from our sets:
 We immediately descend into the realm of the NP hard problem.
      This is because we are no longer a principal ideal domain and can
      no longer cheat using the Chinese remainder theorem. In this
      reality, we are solving the Anti-Clique
      problem specifically, which is particularly nasty. Thankfully, we
      can consider those two constraints to be quite realistic.
    
We will have to account for the fact that the variables are
      actually not independent. You may have noticed that some of these
      "optimal" configurations are not actually realistic. Many
      Operating systems do not support various processor architectures
      and software packages. Three of the configurations above are
      currently invalid for at least one reason.  Consider a
      configuration object like so:
    
Can we throw away these configurations without simply
      "re-rolling" the dice?  Unfortunately, no.  Not without
      using the god
        algorithm of computing every possible combination ahead of
      time, and therefore already knowing the answer.  As such our
      final implementation looks like so:
    
This brings us to another unmentioned constraint: what happens if
      a member of a set is incompatible with all members of another
      set?  It turns out accepting this is actually a significant
        optimization, as we will end up never having to re-roll
        an entire sequence.  See the while loop above.
    
 Another complication is the fact that we will have to randomize
      the set order to achieve the goal of eventual coverage of every
      possible combination. Given the intention of the tool is to run
      decentralized and without a central oracle other than git,
      we'll have to also have use a seed based upon it's current
      state.  The algorithm above does not implement this,
      but it should be straightforward to add.
    
We at least have a solution to the problem of building the bins. So, we can move on to filling them. Here we will encounter trade-offs which are quite severe. If we wish to accurately reflect reality with our assumptions, we immediately stray into "no closed form solution" territory. This is the Fair Item Allocation problem, but with a significant twist. To take advantage of our available resources better, we should always execute at least one test. This will result in fewer iterations to run through every possible combination of systems to test, but also means we've cheated by adding a "double spend" on the low-end. Hooray cheating!
 The fastest approximation is essentially to dole out a number of
      tests equal to the floor of dividing the tests equally among the
      bins plus floor(  (tests % bins)  / tests ) in
      the case you have less tests than bins. This has an error which is
      not significant until you reach millions of tests. We then get
      eaten alive by rounding error due to flooring.
    
It is worth noting there is yet another minor optimization in our
      production process here at the end, namely that if we have more
      systems available for tests than tests to execute, we can achieve
      total coverage in less iterations by repeating tests from earlier
      groups.
    
Obviously the only realistic assumption here is #2. If tests can be executed faster by breaking them into smaller tests, the test authors should do so, not an argument builder.
Assumptions #1 and #3, if we take them seriously would not only doom us to solving an NP hard problem, but have a host of other practical issues. Knowing how long each test takes on each computer is quite a large sampling problem, though solvable eventually even using only git tags to store this data. Even then, #4 makes this an exercise in futility. We really have no choice but to accept this source of inefficiency in our production process.
Invalidating #5 does not bring us too much trouble. Since we expect to have a number of test hosts which will satisfy any given configuration from the optimal group and will know how many there are ahead of time, we can simply split the bin over the available hosts and re-run our bin packer over those hosts.
 This will inevitably result in a situation where you have an
      overabundance of available systems under test for some
      configurations and a shortage of others. Given enough tests, this
      can result in workflow disruptions. This is a hard problem to
      solve without "throwing money at the problem", or being more
      judicious with what configurations you support in the first place.
      That is the sort of problem an organization wants to have though.
      It is preferable to the problem of wasting money testing
      everything on every configuration.
    
Since the name of the tool is pairwise, I may as well also
      implement and discuss multi-set combinations.  Building these
      bins is actually quite straightforward, which is somewhat shocking
      given every algorithm featured for doing pairwise testing at
      pairwise.org was not in fact the optimal one from my 30 year old
      combinatorics textbook.  Pretty much all of them used
      tail-call recursion in languages which do not optimize this, or
      they took (good) shortcuts which prevented them from functioning
      in N dimensions.
    
Essentially you build an iterator which, starting with the first
      set, pushes a partial combination with every element of its set
      matched with one of the second onto your stack.
      You then repeat the process, considering the first set to be the
      partial, and crank right through all the remaining sets.
    
Dealing with incompatibilities is essentially the same procedure
      as above.  The completed algorithm looks like so:
    
You may have noticed this is a greedy algorithm.  If we
      decided to use this as a way to generate a cache for a "god
      algorithm" version of the anti-clique generator above, we could
      very easily run into memory exhaustion with large enough
      configuration sets, defeating the purpose. You could flush the
      partials that are actually complete, but even then you'd only be
      down to 1/n theoretical memory usage where n is the size of your
      2nd largest configuration set (supposing you sort such that it's
      encountered last).  This may prove "good enough" in practice,
      especially since users tend to tolerate delays in the "node added
      to network" phase better than the "trying to run tests"
      phase.  It would also speed up the matching of available
      systems under test to the desired configuration supersets, as we
      could also "already know the answer".
    
Profiling this showed that I either had to fix my algorithm or
      resort to this.  My "worst case" example of 100 million tests
      using the cliques() method took 3s, while generating everything
      took 4.  Profiling shows the inefficient parts are almost
      100% my bin-covering.
    
Almost all of this time is spent splice()ing huge arrays of
      tests.  In fact, the vast majority of the time in my test
      (20s total!) is simply building the sequence (1..100_000_000),
      which we are using as a substitute for a similar length argument
      array of tests.
    
We are in luck, as once again we have an optimization suggested
      by the constraints of our execution environment.  Given any
      host only needs to know what it needs to execute we can
      save only the relevant indices, and do lazy
        evaluation.  This means our sequence expansion (which
      takes the most time) has an upper bound of how long it takes to
        generate up to our offset.  The change is
      straightforward:
    
The question is, can we cheat even more by starting at our offset
      too?  Given we are expecting a glob or regex describing a
      number of files which we don't know ahead of time what will be
      produced, this seems unlikely.  We could probably speed it up
      globbing with GLOB_NOSORT. Practically
      every other sieve trick we can try (see DeMorgan's
        Laws) is already part of the C library implementing glob
      itself.  I suspect that we will have to understand the parity
        problem a great deal better for optimal
      seeking via search criteria.
    
Nevertheless, this gets our execution time for the cliques()
      algorithm down to 10ms, and 3s as the upper bound to generate our
      sequence isn't bad compared to how long it will take to execute
      our subset of 100 million tests.  We'd probably slow the
      program down using a cached solution at this point, not to mention
      having to deal with the problems inherent with such. 
      Generating all combinations as we'd have to do to build the cache
      itself takes another 3s, and there's no reason to punish most
      users just to handle truly extreme data sets.
    
It is possible we could optimize our check that a combination is
      valid, and get a more reasonable execution time for combine() as
      well.  Here's our routine as a refresher:
    
Making the inner grep a List::Util::first instead seems obvious,
      but the added overhead made it not worth it for the small data
      set. Removing our guard on the other hand halved execution time,
      so I have removed it in production.  Who knew ref( ) was so
      slow?  Next, I "disengaged safety protocols" by turning off
      warnings and killing the defined check.  This made no
      appreciable difference, so I still haven't yet run into a
      situation where I've needed to turn off warnings in a tight
      loop.  Removing the unnecessary allocation of @compat and
      returning directly shaved another 200ms.  All told, I got
      down to 800ms, which is in "detectable but barely" delay
      territory, which is good enough in my book.
    
The thing I take away from all this is that the most useful thing
      a mathematics education teaches is the ability to identify
      specific problems as instances of generalized problems (to which a
      great deal of thinking has already been devoted).  While this
      is not a new lesson, I continuously astonish myself how
      unreasonably effective it is.  That, and exposure to the wide
      variety of pursuits in mathematics gives a leg up as to where to
      start looking.
    
I also think the model I took developing this has real
      strength.  Developing a program while simultaneously doing
      what amounts to a term paper on how it's to operate very clearly
      draws out the constraints and acceptance criteria from a program
      in an apriori way.  It also makes documentation a fait
      accompli.  Making sure to test and profile while doing this
      as well completed the (as best as is possible without users) methodologically
        dual design, giving me the utmost confidence that this
      program will be fit for purpose.  Given most "technical debt"
      is caused by not fully understanding the problem when going into
      writing your program (which is so common it might shock the
      uninitiated) and making sub-optimal trade-offs when designing it,
      I think this approach mitigates most risks in that regard.
    
That said, it's a lot harder to think things through and then
      test your hypotheses than just charging in like a bull in a china
      shop or groping in the dark.  This is the most common pattern
      I see in practice doing software development professionally. 
      To be fair, it's not like people are actually willing to pay
      for what it takes to achieve real quality, and "good enough" often
      is.  Bounded
        rationality is the rule of the day, and our lot in life is
      mostly that of a satisficer. 
      Optimal can be the enemy of good, and the tradeoffs we've made
      here certainly prove this out.
    
When I was doing QA for a living people are surprised when I tell
      them the most important book for testers to read is Administrative
        Behavior. This is because you have to understand the
      constraints of your environment do do your job well, which is to
      provide actionable information to decision-makers.  I'm
      beginning to realize this actually suffuses the entire development
      process from top to bottom.