trip logs / gnuvola


Trip Log 2022-01-01 h04 -- Indices Style Upgrade Part 9

Down to the final two patches in the Indices Style Upgrade series — woo hoo!  In this episode, we start to cover patch 7: “Order tags most-impactful first”.  (Feel free to grab the accompanying tarball and follow along.)  This is a user-visible change that is one of the prime motivations for this patch series, so it deserves thorough investigation. 

Luckily, there are only two hunks to look at.  We present them separately this time, to make reading easier.  The first one, plus the leading change description, follows: 

 4	Subject: [PATCH 7/8] tl-mkindex: Order tags most-impactful first.
[...]
 9	* sub/tl-mkindex (most-impactful-first): New proc.
10	(lexicographically-ordered): Delete proc.
11	(consult-db): Use ‘most-impactful-first’.
[...]
20	@@ -217,8 +217,23 @@ (define (sort/car bef ls)
21	 (define (youngest-first ls)
22	   (sort/car string>? ls))
23
24	-(define (lexicographically-ordered ls)
25	-  (sort/car string<? ls))
26	+(define (most-impactful-first tag+count+latest)
27	+
28	+  (define (more-impactful a b)
29	+    (let-values (((a-tag a-count a-latest) (tag+count+latest a))
30	+                 ((b-tag b-count b-latest) (tag+count+latest b)))
31	+      (if (not (= a-count b-count))
32	+          ;; more refs
33	+          (> a-count b-count)
34	+          (if (not (string=? a-latest b-latest))
35	+              ;; more recent
36	+              (string>? a-latest b-latest)
37	+              ;; lexicographically less (for hysterical raisins)
38	+              (string<? a-tag b-tag)))))
39	+
40	+  ;; rv
41	+  (lambda (ls)
42	+    (sort ls more-impactful)))
43
44	 (define (consult-db upath-ignored where-clause)

On the surface, this checks out rather simply: Line 9 describes lines 26-42, while line 10 describes lines 24-25.  Another fall guy situation, perhaps?  Let's dig deeper.  We see that the change maintains the “sort” core (line 25, line 42), which is to be expected, as we are still trying to achieve a “proper” ordering in the presentation; only the specifics of what is “proper” have changed.  Also, what is being sorted: a ‘ls’ (list of items, line 24, line 41) hasn't changed. 

“But, ttn, what about the mysterious comment (line 40) and what is this ‘lambda’ thing?  These are very different from what we've seen previously!” 

All true!  These two particular aspects, the comment and the ‘lambda’ are actually related.  It turns out ‘most-impactful-first’ does NOT take a ‘ls’ and do the sorting itself; instead, it constructs an anonymous function (the ‘lambda’ form), and “returns” that (i.e., the ‘lambda’ form is the “return value” (rv) of the procedure ‘most-impactful-first’).  That ‘lambda’ is what actually does the sorting, and thus, it is up to the caller (of ‘most-impactful-first’) to invoke the ‘lambda’ (or call it) in some way.  That's a lot to digest if this is the first time you've seen this kind of code, so check out the Wikipedia page on higher-order functions (mentioned before) for more info. 

This is usually the place where Scheme fans start raving about what great fun it is to hack in the language, but we'll spare you the noise (this time! :-D). 

While all that high-falutin' stuff percolates, let's turn to the guts of the sorting procedure (lines 31-38).  There are basically two kinds of sorting procedures, those that take a predicate (“Should item A come before item B?”) and those that take an ordering function (“What is the relationship between item A and item B — before, same, or after?”).  The Guile Scheme builtin procedure ‘sort’ (on which we build all these other procedures) is the first kind — its second argument is often called “lessp” (for “less predicate”, with “less” being a numeric way of saying “before”, as in 0 is before 1). 

31	+      (if (not (= a-count b-count))
32	+          ;; more refs
33	+          (> a-count b-count)
34	+          (if (not (string=? a-latest b-latest))
35	+              ;; more recent
36	+              (string>? a-latest b-latest)
37	+              ;; lexicographically less (for hysterical raisins)
38	+              (string<? a-tag b-tag)))

This predicate works as follows:  First, it compares the “count” aspects of item A and B (line 31).  If they are different (line 31, still, via ‘not’), then the one with “more refs” (lines 32-33) comes before and we are done with the comparison.  Otherwise (“count” is same, line 34 onward), move onto other aspects.  Second, it compares the “latest” aspect of items A and B (line 34).  If they are different (line 34, still, via ‘not’), then the one “more recent” comes before and we are done with the comparison.  Otherwise (“latest” is same, line 37 onward), move onto the last aspect, “tag”.  Whichever item with a “lexicographically less” tag comes before. 

As you can see, sometimes concise comments really can help explain the underlying algorithm. 

This core predicate is comforting in that it follows our basic intuition on how to compare things with (perhaps) varying aspects.  It's basically, try the first things first, and if that doesn't reveal a difference, keep going until you have something you can differentiate.  In the end, if the items are indeed “the same”, then it really doesn't matter which comes before, right? 

OK, that's enough for this episode.  We'll finish looking at patch 7 in the next trip log. 


Copyright (C) 2022 Thien-Thi Nguyen