As my nephews are coming of age, I'm considering taking an apprentice. This has resulted in me thinking more of how I might explain programming best practices to the layman. Today I'd like to focus on performance.
Suppose you had to till, plant and water an arbitrary number of acres. Would you propose ploughing a foot, planting a seed and watering ad nauseum? I suspect not. This is because context switching costs a great deal. Indeed, the context switches involved between planting, seeding and watering will end up being the costliest action when scaling this (highly inefficient) process to many acres.
This is why batching of work is the solution everyone reaches for instinctively. It is from this fact that economic specialization developed. I can only hold so much in my own two hands and can't be in two places at once. It follows that I can produce far more washed dishes or orders being a cook or dish-washer all day than I can switching between the tasks repeatedly.
That said, doing so only makes sense at a particular scale of activity. If your operational scale can't afford specialized people or equipment you will be forced to "wear all the hats" yourself. Naturally this means that operating at a larger scale will be more efficient, as it can avoid those context switching costs.
Unfortunately, the practices adopted at small scale prove difficult to overcome. When these are embodied in programs, they are like concreting in a plumbing mistake (and thus quite costly to remedy). I have found this to be incredibly common in the systems I have worked with. The only way to avoid such problems is to insist your developers not test against trivial data-sets, but worst-case data sets.
When ploughing you can choose a pattern of furroughing that ends up right where you started to minimize the cost of the eventual context switch to seeding or watering. Almost every young man has mowed a lawn and has come to this understanding naturally. Why is it then that I repeatedly see simple performance mistakes which a manual laborer would consider obvious?
For example, consider a file you are parsing to be a field, and lines to be the furroughs. If we need to make multiple passes, it will behoove us to avoid a seek to the beginning, much like we try to arrive close to the point of origin in real life. We would instead iterate in reverse over the lines. Many performance issues are essentially a failure to understand this problem. Which is to say, a cache miss. Where we need to be is not within immediate sequential reach of our working set. Now a costly context switch must be made.
All important software currently in use is precisely because it understood this, and it's competitors did not. The reason preforking webservers and then PSGI/WSGI + reverse proxies took over the world is because of this -- program startup is an important context switch. Indeed, the rise of Event-Driven programming is entirely due to this reality. It encourages the programmer to keep as much as possible in the working set, where we can get acceptable performance. Unfortunately, this is also behind the extreme bloat in working sets of programs, as proper cache loading and eviction is a hard problem.
If we wish to avoid bloat and context switches, both our data and the implements we wish to apply to it must be sequentially available to each other. Computers are in fact built to exploit this; "Deep pipelining" is essentially this concept. Unfortunately, a common abstraction which has made programming understandable to many hinders this.
Object-Orientation encourages programmers to hang a bag on the side of their data as a means of managing the complexity involved with "what should transform this" and "what state do we need to keep track of doing so". The trouble with this is that it encourages one-dimensional thinking. My plow object is calling the aerateSoil() method of the land object, which is instantiated per square foot, which calls back to the seedFurroughedSoil() method... You might laugh at this example (given the problem is so obvious with it), but nearly every "DataTable" component has this problem to some degree. Much of the slowness of the modern web is indeed tied up in this simple failure to realize they are context switching far too often.
This is not to say that object orientation is bad, but that one-dimensional thinking (as is common with those of lesser mental faculties) is bad for performance. Sometimes one-dimensional thinking is great -- every project is filled with one-dimensional problems which do not require creative thinkers to solve. We will need dishes washed until the end of time. That said, letting the dish washers design the business is probably not the smartest of moves. I wouldn't have trusted myself to design and run a restaurant back when I washed dishes for a living.
You have to consider multiple dimensions. In 2D, your data will need to be consumed in large batches. In practice, this means memoization and tight loops rather than function composition or method chaining. Problems scale beyond this -- into the third and fourth dimension, and the techniques used there are even more interesting. Almost every problem in 3 dimensions can be seen as a matrix translation, and in 4 dimensions as a series of relative shape rotations (rather than as quaternion matrix translation).
Thankfully, this discussion of viewing things from multiple dimensions hits upon the practical approach to fixing performance problems. Running many iterations of a program with a large dataset under a profiling framework (hopefully producing flame-graphs) is the change of perspective most developers need. Considering the call stack forces you into the 2-dimensional mindset you need to be in (data over time).
This should make sense intuitively, as the example of the ploughman. He calls furrough(), seed() and water() upon the dataset consisting of many hectares of soil. Which is taking the majority of time should be made immediately obvious simply by observing how long it takes per foot of soil acted upon per call, and context switch costs.