Chaining vs. Nesting

Any UNIX sysadmin who’s been around the block a few times has probably written a line something like this:

netstat -n | grep tcp | awk '{ print $5}' | sort | uniq -c | sort -n

Each command in this line produces output, and the end of the line, since it’s not shunted off to a file anywhere, prints to stdout. Every command (other than the first) accepts input on its stdin and has its output directed into the next command to the right.

If we were to write this in C-like pseudocode, it would look something like this:

print(sort_n(uniq_c(sort(awk(grep(netstat_n(), "tcp"), 5)))));

Or we could use the Lisp convention of including the function name as the first element of an S-expression:

(print (sort_n (uniq_c (sort (awk (grep (netstat_n) "tcp") 5)))))

This is essentially the same functionality as the command line, but this time each function passes its output to the enclosing function. Each function takes input from one of its arguments and optionally modifiers from the rest of its arguments. To the untrained eye it looks almost inside-out. You could reverse the order like so:

(((((((netstat_n) grep "tcp") awk 5) sort) uniq_c) sort_n) print)

Which makes it read from left to right but doesn’t make it any more readable: the problem is that you’ve got the primary parameter, followed by the function name, followed by any secondary parameters. We can change it once more and loop very nearly back to the original command by translating it into a method-chaining style, as is common in Ruby or jQuery:

puts netstat_n.grep("tcp").awk(5).sort.uniq_c.sort_n

This is certainly shorter (although I’ve cheated by omitting optional parentheses as allowed in Ruby) and arguably more readable. There is some evidence that non-programmers basically don’t really grok hierarchies, and I, for one, definitely have an easier time thinking about sequences than about hierarchies.

But making this happen is a bit more subtle than it appears at first glance. netstat_n would produce an array object, which would have a grep method that takes one argument (the expression to search for), and would output an array of the objects that match. awk would be another method on an array that would take an array of strings and return an array of substrings. sort, uniq_c, and sort_n would similarly be methods of the array object.

So at the very least you’re muddying up your array class with a bunch of things that are only peripherally related to arrays. Ruby solves this by making your write the more application-specific functionality as a block that gets passed to, say, the select and collect methods on array. So things are kept relatively in order.

Another to keep in mind is that the Lisp syntax may be easier for a program to manipulate: because the whole program consists of a giant nested list, any program with a nested-list-processing facility is able to manipulate the program with ease (it is likely, however, that one could define a relatively straightforward transformation from one form to the other).

Method chaining’s sweet spot seems to be in areas where functions have one dominant input (plus zero or more modifiers) and an equally obvious single output. Where the method chaining approach breaks down is in areas where two inputs have roughly the same importance. For instance, a conditional:

(if (> price 50) "expensive" "cheap")

You almost have to introduce some kind of special syntax, e.g.:

(price > 50) ? "expensive" : "cheap"

But this is the sort of muddling of statements and expressions that makes the Baby McCarthy cry. You could introduce a new syntax, for instance using a comma to separate two inputs that should be directed to a single consumer:

"expensive","cheap".if(price > 50)

But I’m not sure if that’s unfamiliar or just plain ugly. The way this would work is that if would be a method of a two-element tuple object that would return the first value if its parameter were true or the second if it were false.

It would make more sense to have if be a method on a boolean expression, but I can’t think of a clean way to encode that without having a clean way to feed a function multiple inputs.

So it’s certainly doable, but that last code snippet is a good deal less readable (for both man and machine) than the Lisp version. A language based almost completely on method chaining might make for an interesting academic exercise, but it looks like it won’t buy too much in practice.

Comments (12) left to “Chaining vs. Nesting”

  1. Micah wrote:

    What about concatenative languages, like Forth and Factor?

  2. mvm wrote:

    It may be interesting for you to know that languages exist, such as Haskell/ML, which are based on chains of methods; they use closures to pass multiple inputs into a function.

  3. rflrob wrote:

    In python, you could actually do something like the last format with very little change: (‘cheap’, ‘expensive’)[price > 50] is perfectly valid—if fairly weird—code. The trick, of course, is that you can index the literal tuple (‘cheap’, ‘expensive’), and that the True/False gets converted to 1/0.

  4. Richard wrote:

    Clojure has the ‘thread’ macro.

    Your example:

    (print (sort_n (uniq_c (sort (awk (grep (netstat_n) "tcp") 5)))))
    

    can be written something like

    (-> (netstat_n) (grep "tcp") (awk "5") sort uniq_c sort_n print)
    

    … you’re “threading” the first form through the remaining forms.

  5. masklinn wrote:

    You could introduce a new syntax, for instance using a comma to separate two inputs that should be directed to a single consumer

    this thing is utterly terrible, and Smalltalk solved the problem of expressing conditionals via means of method dispatch some 30 years ago:

    (price > 50) ifTrue:['expensive'] ifFalse:['cheap']

    It would make more sense to have if be a method on a boolean expression, but I can’t think of a clean way to encode that without having a clean way to feed a function multiple inputs.

    you know it’s possible to give a function more than one parameter don’t you?

  6. Matt Haley wrote:

    I thought the idea of implementing Array#if in Ruby would be interesting. Turned out to be ridiculously simple.

    class Array def if(predicate) predicate ? self.first : self.last end end

    http://gist.github.com/281687

  7. Edward Kmett wrote:

    Then there is the Haskell version:

    mapM_ (putStrLn . show) . nub . sort . awk 5 $ filter “tcp” netstat_n

    No pollution of the intervening classes, no syntactic noise from parentheses.

  8. jimmy wrote:

    Smalltalk defined if as methods on the True and False objects.

  9. Mark Lee Smith wrote:

    You should look at Smalltalk or some related language where If is commonly written as like so:

    price > 50 ifTrue: [ "expensive" ] ifFalse: [ "cheap" ].

    This could of course be simplified to the following with suitable semantics:

    (price > 50) then: “expensive” else: “cheap”.

    Which I think reads perfectly well.

  10. Mark Lee Smith wrote:

    @mvm

    You may be interested to know that this isn’t remotely unique to Haskell, or functional programming.

  11. must_submit_this_pic wrote:

    netstat -n | grep tcp | awk ‘{ print $5}’ | sort | uniq -c | sort -n

    in LINQ, assuming some a function netstat exists, this becomes something like

    nestat(resolve=false) .Where(line => line.Contains(“tcp”)) .Select(line => line.Split(” “)[5]) .Distinct() .Select(ip => new { IP=ip, Octets=ip .Split(“.”) .Select(int.Parse) .ToArray() }) .OrderBy(t => t.Octets[3]) .OrderBy(t => t.Octets[2]) .OrderBy(t => t.Octets[1]) .OrderBy(t => t.Octets[0]) .Select(t => t.IP)

    I’m guessing this will look totally unreadable, so I’ll leave a readable version over at reddit where I found this.

    See http://www.reddit.com/r/programming/comments/arsu8/chaining_vs_nesting/.

  12. Okke wrote:

    I think matlab works around explicit branches quite nicely.

    Matlab is used for working with matrixes (MATrix LABoratory), and instead of looping through the whole matrix, one can specify a condition on the elements of the matrix and then perform a calculation only on these elements:

    x = [1 2 3]; x(x =< 2) = 0;

    would produce a new matrix (‘list’) with elements [0 0 3].

    As you see, this syntax does not explicitly use an if at all. A kind of selector is used instead.

Post a Comment

*Required
*Required (Never published)