Shallowly Clever

I gave a talk entitled “Change is the Only Constant” (video) at the excellent Voxxed Days conference in Zurich last month. It got a little attention on Twitter, particularly this slide, tweeted by Mario Fusco:

The point of the slide is that immutability is, strictly speaking, an all-or-nothing proposition: mutability anywhere in a data structure makes it mutable as a whole. “Schopenhauer’s” “Law” isn’t a law, isn’t about entropy, and seems to have absolutely no connection to Schopenhauer, but none of this harms its propagation as a meme across the Internet. I’m not myself wildly enthusiastic about the slide, but I put it in the talk because it’s a favourite of Stuart Marks, who is the primary maintainer of the Java Collections Framework and an infallible source of ideas and good advice to me. An unexpected payoff, though, was this response from Brian Goetz, who tweeted:

So I’m happy to take credit for the slide, because of course to be called any kind of clever by Brian is a compliment, however backhanded, to be treasured. Now at least the problem of knowing what to have written on my tombstone is definitively solved!

More seriously, Brian’s point is obviously fundamentally correct. It’s always been extremely difficult to be certain that you’ve got anything 100% right in computing. This reminds me of the argument in the formal verification community—or rather, the argument used against the verification community—that if the correctness proof of your program depends on the correctness of a verification system that itself hasn’t actually been verified, your argument is built on (unreinforced) sand. And if you do prove the correctness of your verifier, that proof will depend on the correctness of the layer beneath it —and so on, all the way down to the hardware which, as we know from the disasters of Spectre and Meltdown, is nowadays just too complex to analyse completely. The same argument, that we’re ultimately dependent on the hardware properties, applies to immutability.

And to be fair to the talk, the slide about “Schopenhauer’s Law” was taken out of context: the rest of the talk did lay emphasis on the importance of reducing mutability. It’s worth looking at the value of this in a little more detail. Mario also tweeted this slide, on the problems of mutability:

:

Before looking at this in detail, we should dispose of a question of terminology that often gets in the way of discussions of immutability: are we discussing individual objects, or object graphs? This is a real problem in talking about collections, because preventing change to the state of a collection object in itself won’t necessarily prevent the ill effects of mutability listed in the slide. That’s because their impact depends on the properties of the entire object graph, and the immutability of the collection object is an entirely separate question from the mutability of the objects that it contains. We might reduce confusion by reserving the term “immutable” to object graphs, abandoning the notion of “shallow” immutability even if that would involve losing the source of jokes like Brian’s. This was the thinking that led to the label “unmodifiable” being given to the collections implementations introduced in Java 9. Here we’ll say that the opposite of “mutable” for an object is “unmodifiable”; for an object graph it is “immutable”.

Returning to Brian’s objection, are the problems of the second slide really reduced by reducing mutability? Consider an object graph, all but one of whose elements are unmodifiable:

  • This structure as a whole will not be thread-safe, but protecting only the single mutable element is all that is required to make it so.
  • Defensive copying will still be required, but that can again be restricted to the single mutable element.
  • The question of stable lookup in keyed and ordered collections depends on the definitions of equality or comparison: if these relations can be made independent of the state of the mutable element, then decreasing mutability will have paid off.
  • Consistency of program state refers to invariants that hold between object graphs, rather than within them as object-oriented principles demand. For instance, systems supporting user interfaces must ensure that the state of the underlying system is consistently mirrored in the interface, for example by disabling currently inapplicable UI elements. If the mutable element forms part of such an invariant, then extra care has to be taken whenever it is changed to ensure the invariant is maintained by corresponding adjustments of its other component(s).
  • The gains in simplicity and clarity are proportional to the number of unmodifiable elements in the graph: each of these has only a single possible state, and is accordingly that much simpler to reason about.

This discussion may help to answer a popular question about the Java 9 unmodifiable collections: Why have them? Clearly, reducing mutability does bring gains, even if immutability in its perfect form is an ideal we can never reach.

I would call the result a draw: Goetz 1, Schopenhauer 1. And, as a verdict, how shallow is that?

Java Generics and Collections – 2e!

Yesterday I heard that O’Reilly Media, publishers of Java Generics and Collections, have approved a proposal for a second edition. This news came from Zan McQuade, Programming Languages acquisition editor at O’Reilly, whose strategies for acquiring content include indefatigable patience—she calmly waited out two years of silence from me after I first pitched the idea to her. In fact, I’ve been thinking about it for much longer than that, starting from when I was writing Mastering Lambdas in 2014. Despite the title, that book is mostly about Java streams, which since Java 8 have been complementary to collections for processing bulk data. A half-book on collections seemed to need revision from that point on.

But that’s only half the book, and—it’s always seemed to me—the less important half. After all, the original USP of JG&C was its co-authorship by Phil Wadler, who was one of the originators of the “Generic Java” prototype that eventually led to the introduction of generics in Java 5 in 2004. Nowadays few Java programmers will remember how controversial and sometimes difficult generics seemed at their introduction, and how important it was to have an authoritative explanation of their peculiarities. But those have changed little in nearly two decades, even if Project Valhalla seems likely to alter that considerably at some point in the relatively near future, for some value of “relatively near”. Perhaps in another year we’ll know enough about Valhalla to be able to change the generics half of the book in line with it—or perhaps Zan will have to go back in two years’ time to argue for a third edition!

But meanwhile the Java Collections Framework has continued to evolve—perhaps without huge changes, but with enough to justify a revision. Much of this evolution has been adaptation to Java’s journey in the direction of a more functional style. A prime example is unmodifiable collections; although it’s more than four years since they were introduced in Java 9, many people are only now migrating to Java 11—or, equally likely, to Java 17. If the latter, they will encounter records too, so this seems like a good time for an explanation of how these different functionally-oriented features, as well as streams, can work together.

Another reason for a new edition is the ageing of the Java Collections Framework. (Actually, in personal sympathy with this elderly API, I should probably say “maturing”.) It’s worn very well for an API designed in the last century. (On another personal note, I have to tip my hat here to Joshua Bloch, the designer of the JCF. At a time when I’d given up hope of ever getting the collections material into shape, he very generously provided an extraordinarily detailed, precise—and painful!—technical review, highlighting virtually every one of my many errors, and saving the collections material from disaster.) But JG&C was written at a time when the JCF was still only five years old. Nearly two decades on, we have the opportunity for a much more considered design retrospective and for a comparison with other collections frameworks, like Guava and Eclipse collections, that have appeared since then.

I’m also looking forward to supplying two other elements absent from the first edition. One is a cause of increasing dissatisfaction for me with the collections half of JG&C: its discussion of the relative performance of the different collection implementations. I compared them there solely on the basis of their asymptotic (Big-O) performance, without providing any experimental results. That’s quite embarrassing now, after I’ve given so many conference talks on the difficulty and importance of accurate measurement when discussing performance. And since a (half-)book on collections is one place where such discussions are inescapable, I’m looking forward to providing data to back up the theory—or, more likely, to require its modification to fit with modern machine architectures.

I’m feeling quite—perhaps foolishly—confident about this revision, very much in contrast to my feelings approaching the first edition. Much of that is due to already having a technical editor on board, the tireless Doctor Deprecator, Stuart Marks. Stuart is ideally placed for this, being the Oracle lead on the Java collections library. He was TE on Mastering Lambdas, so I’ve had the pleasure of working with him before, and he’s already provided a lot of the ideas in support of the book proposal, including some in this blog piece. If you’ve read this far (congratulations!) you’ll see there’s quite a lot of work to do, but with Stuart on the team I’m confident that we’re really going to produce something valuable for the working Java programmer.

Dijkstra and DevOps: Is Programming Doomed?

Credit: The University of Texas at Austin

I don’t suppose that outside academic circles the name of Edsger Dijkstra is nowadays familiar to many people, but it should be: he was a giant of computing science who could claim, over the four decades of his research and teaching, extraordinary achievements that included the first Algol-60 compiler; the popularisation of structured programming (he coined the term); many important algorithms; and the concepts of mutual exclusion, semaphores, and deadlock, in work which practically invented the field of concurrent programming. In the 1970s and 80s, it seemed he had been everywhere, laying the foundations of almost every area of computing.

Dijkstra was always guided by a fierce determination that computing problems should be both formulated and solved in simple and elegant ways. His famous paper “On the Cruelty of Really Teaching Computing Science” argued that computer programming should be understood as a branch of mathematics. On software engineering, he famously wrote that “it has accepted as its charter ‘How to program if you cannot.’” People generally take this to be a condemnation of software engineering as a discipline, and he may have meant it that way when he wrote it in 1988. But at an earlier time he had used that term for himself, so it seems more likely that he was disparaging the people calling themselves software engineers. (For Dijkstra, disparaging people was actually his normal way of relating to them.)

What does this have to do with anything now? I’ve been thinking about Dijkstra recently in the context of my late adoption (late adoption, my life’s story) of DevOps and cloud technologies. Cloud technology has survived the hype cycle’s Trough of Disillusionment to become fully mainstream, and AWS as currently the leading provider can be taken as representative. Does Dijkstra’s view of the principles that should underlie software engineering have any relevance to people working with it?

Two aspects of the AWS offering stand out immediately: the sheer number of its services, and the diversity of the abstraction level that they provide to its user. For 2019, Wikipedia lists 165 AWS services, covering areas including computing, storage, networking, database, analytics, application services, deployment, management, mobile, and IoT tools. Even developer tools are provided – AWS provides Cloud9, a web-based IDE for the most popular languages. The intention is clearly to provide every single tool or environment needed to develop, deploy, and maintain any application, from toy to enterprise scale. Amazon may have the first-mover advantage in this, but Azure and Google Cloud are in pursuit, with other heavyweights like IBM Watson and Oracle Cloud not far behind.

How do these services relate to software engineering as Dijkstra would have liked to think of it? We can’t know what he would have made of the problem of mastering complexity of modern enterprise systems, which are enormously larger and more complex than anything in his day, but we can guess that he would certainly have scornfully excluded from his definition of software engineering the many services devoted to cost management, governance, and compliance that bedevil an old-fashioned software engineer aiming for AWS certification. But most of the services that constitute a cloud offering are actually aimed at the classical problem that software engineering addresses: managing the complexity of large systems. They do this by providing a giddying variety of ways to compose different operational and computational libraries.

And, in fact, these libraries only continue the tradition of ever-increasing abstraction and power provided to the client programmer — or DevOps engineer, as they will now usually be. I started my working life close to the bare metal, at the end of the assembler epoch, then moved on to coding well-known algorithms and data structures in a high-level language, then saw best-of-breed implementations of those algorithms and data structures incorporated into libraries that I could use, then saw those libraries combined with compilers and runtimes to form a platform on which a client program sits; now many, many layers up from the hardware. So I can’t say that increasing abstraction is anything new.

What is new is the way that cloud services integrate operational concerns like provisioning, scaling, and failover into the same framework as traditional data manipulation. DevOps engineers can script the horizontal scaling policy for their application using the same development tools and in the same language that they use to implement their sorting algorithms. You could imagine that as the developer’s job has ascended in abstraction, so it has also broadened to devour the jobs of the application architect, the data centre designer, the DBA, and in fact almost everyone who ever had any role in the management or use of computers. (Another way to look at this would be that Amazon and its rivals are using automation to deskill these jobs to the point that even developers can do them, or simply consume them as services. Amazon calls this “democratizing advanced technologies”.) However we feel about the casualties of this kind of progress, I think you can reasonably argue that it makes sense to integrate all these functions into a single discipline of, say, systems engineering.

Where next, then? Is the role of programmer doomed? Are Dijkstra’s innovations destined to become essential but forgotten, embedded deep within black-box systems that developers use without understanding – just like modern hardware, in fact? I’m forced towards that conclusion, though I continue to argue that we should value an understanding of the inner workings of our black-box systems: even when we can just about get by without it, having that understanding will improve our daily practice. And, of course, not everyone works on applications headed for the cloud; smaller-scale systems, including front ends of various kinds, will continue to be important. But more than anything, Dijkstra’s real principles, his relentless focus on abstraction and simplicity, remain as relevant as ever, to enterprise DevOps engineers as much as to embedded system programmers.