Chaining vs. Nesting
Any UNIX sysadmin who’s been around the block a few times has probably written a line something like this:
netstat -n | grep tcp | awk '{ print $5}' | sort | uniq -c | sort -n
Each command in this line produces output, and the end of the line, since it’s not shunted off to a file anywhere, prints to stdout. Every command (other than the first) accepts input on its stdin and has its output directed into the next command to the right.
If we were to write this in C-like pseudocode, it would look something like this:
print(sort_n(uniq_c(sort(awk(grep(netstat_n(), "tcp"), 5)))));
Or we could use the Lisp convention of including the function name as the first element of an S-expression:
(print (sort_n (uniq_c (sort (awk (grep (netstat_n) "tcp") 5)))))
This is essentially the same functionality as the command line, but this time each function passes its output to the enclosing function. Each function takes input from one of its arguments and optionally modifiers from the rest of its arguments. To the untrained eye it looks almost inside-out. You could reverse the order like so:
(((((((netstat_n) grep "tcp") awk 5) sort) uniq_c) sort_n) print)
Which makes it read from left to right but doesn’t make it any more readable: the problem is that you’ve got the primary parameter, followed by the function name, followed by any secondary parameters. We can change it once more and loop very nearly back to the original command by translating it into a method-chaining style, as is common in Ruby or jQuery:
puts netstat_n.grep("tcp").awk(5).sort.uniq_c.sort_n
This is certainly shorter (although I’ve cheated by omitting optional parentheses as allowed in Ruby) and arguably more readable. There is some evidence that non-programmers basically don’t really grok hierarchies, and I, for one, definitely have an easier time thinking about sequences than about hierarchies.
But making this happen is a bit more subtle than it appears at first glance. netstat_n would produce an array object, which would have a grep method that takes one argument (the expression to search for), and would output an array of the objects that match. awk would be another method on an array that would take an array of strings and return an array of substrings. sort, uniq_c, and sort_n would similarly be methods of the array object.
So at the very least you’re muddying up your array class with a bunch of things that are only peripherally related to arrays. Ruby solves this by making your write the more application-specific functionality as a block that gets passed to, say, the select and collect methods on array. So things are kept relatively in order.
Another to keep in mind is that the Lisp syntax may be easier for a program to manipulate: because the whole program consists of a giant nested list, any program with a nested-list-processing facility is able to manipulate the program with ease (it is likely, however, that one could define a relatively straightforward transformation from one form to the other).
Method chaining’s sweet spot seems to be in areas where functions have one dominant input (plus zero or more modifiers) and an equally obvious single output. Where the method chaining approach breaks down is in areas where two inputs have roughly the same importance. For instance, a conditional:
(if (> price 50) "expensive" "cheap")
You almost have to introduce some kind of special syntax, e.g.:
(price > 50) ? "expensive" : "cheap"
But this is the sort of muddling of statements and expressions that makes the Baby McCarthy cry. You could introduce a new syntax, for instance using a comma to separate two inputs that should be directed to a single consumer:
"expensive","cheap".if(price > 50)
But I’m not sure if that’s unfamiliar or just plain ugly. The way this would work is that if would be a method of a two-element tuple object that would return the first value if its parameter were true or the second if it were false.
It would make more sense to have if be a method on a boolean expression, but I can’t think of a clean way to encode that without having a clean way to feed a function multiple inputs.
So it’s certainly doable, but that last code snippet is a good deal less readable (for both man and machine) than the Lisp version. A language based almost completely on method chaining might make for an interesting academic exercise, but it looks like it won’t buy too much in practice.
Micah wrote:
What about concatenative languages, like Forth and Factor?
Posted on 2010-01-16 at 12:18 pm | Permalink
mvm wrote:
It may be interesting for you to know that languages exist, such as Haskell/ML, which are based on chains of methods; they use closures to pass multiple inputs into a function.
Posted on 2010-01-16 at 12:38 pm | Permalink
rflrob wrote:
In python, you could actually do something like the last format with very little change: (‘cheap’, ‘expensive’)[price > 50] is perfectly valid—if fairly weird—code. The trick, of course, is that you can index the literal tuple (‘cheap’, ‘expensive’), and that the True/False gets converted to 1/0.
Posted on 2010-01-16 at 4:15 pm | Permalink
Richard wrote:
Clojure has the ‘thread’ macro.
Your example:
can be written something like
… you’re “threading” the first form through the remaining forms.
Posted on 2010-01-17 at 1:33 am | Permalink
masklinn wrote:
this thing is utterly terrible, and Smalltalk solved the problem of expressing conditionals via means of method dispatch some 30 years ago:
(price > 50) ifTrue:['expensive'] ifFalse:['cheap']
you know it’s possible to give a function more than one parameter don’t you?
Posted on 2010-01-20 at 12:31 am | Permalink
Matt Haley wrote:
I thought the idea of implementing Array#if in Ruby would be interesting. Turned out to be ridiculously simple.
class Array def if(predicate) predicate ? self.first : self.last end end
http://gist.github.com/281687
Posted on 2010-01-20 at 12:46 am | Permalink
Edward Kmett wrote:
Then there is the Haskell version:
mapM_ (putStrLn . show) . nub . sort . awk 5 $ filter “tcp” netstat_n
No pollution of the intervening classes, no syntactic noise from parentheses.
Posted on 2010-01-20 at 12:47 am | Permalink
jimmy wrote:
Smalltalk defined if as methods on the True and False objects.
Posted on 2010-01-20 at 12:52 am | Permalink
Mark Lee Smith wrote:
You should look at Smalltalk or some related language where If is commonly written as like so:
price > 50 ifTrue: [ "expensive" ] ifFalse: [ "cheap" ].
This could of course be simplified to the following with suitable semantics:
(price > 50) then: “expensive” else: “cheap”.
Which I think reads perfectly well.
Posted on 2010-01-20 at 4:04 am | Permalink
Mark Lee Smith wrote:
@mvm
You may be interested to know that this isn’t remotely unique to Haskell, or functional programming.
Posted on 2010-01-20 at 4:25 am | Permalink
must_submit_this_pic wrote:
netstat -n | grep tcp | awk ‘{ print $5}’ | sort | uniq -c | sort -n
in LINQ, assuming some a function netstat exists, this becomes something like
nestat(resolve=false) .Where(line => line.Contains(“tcp”)) .Select(line => line.Split(” “)[5]) .Distinct() .Select(ip => new { IP=ip, Octets=ip .Split(“.”) .Select(int.Parse) .ToArray() }) .OrderBy(t => t.Octets[3]) .OrderBy(t => t.Octets[2]) .OrderBy(t => t.Octets[1]) .OrderBy(t => t.Octets[0]) .Select(t => t.IP)
I’m guessing this will look totally unreadable, so I’ll leave a readable version over at reddit where I found this.
See http://www.reddit.com/r/programming/comments/arsu8/chaining_vs_nesting/.
Posted on 2010-01-20 at 11:43 am | Permalink
Okke wrote:
I think matlab works around explicit branches quite nicely.
Matlab is used for working with matrixes (MATrix LABoratory), and instead of looping through the whole matrix, one can specify a condition on the elements of the matrix and then perform a calculation only on these elements:
x = [1 2 3]; x(x =< 2) = 0;
would produce a new matrix (‘list’) with elements [0 0 3].
As you see, this syntax does not explicitly use an if at all. A kind of selector is used instead.
Posted on 2010-01-20 at 2:24 pm | Permalink