Mythbusting - Part 1

(Disclaimer: as always, the issue of reasoning about software quality is largely a matter of personal judgment)

Yes, I really like Adam and Jamie. This post is about myths that you won't see on their TV show.

Actually, this post is motivated by Ian Sutton's post titled: "Software erosion in pictures - Findbugs". That piece shows the package dependency graphs of several releases of Findbugs. The graphs were obtained by Structure101, a static analyzer that reads class files. The conclusion there is that Findbugs is eroding since the number of cycles (tangles) in the graph, as well as the size of these cycles, grows over time.

Ian's premise is that tangled packages are indicative of poor code quality.

I think this premise is based on myths that needs to be carefully examined. So this is what I'll do here.

Myth #1: Tangled packages cannot be tested separately

The counter argument is this: When the packages were not tangled were they tested separately? It is likely that the answer to this question is "No". The lack of compile-time dependency does not mean lack of run-time dependency. Maybe the two packages were already implicitly related (let's say through data passed as a String), and we just made this relation explicit (through a dedicated class).

Myth #2: Tangled packages cannot be deployed separately

This is not true in Java (and in many other modern languages) since we have dynamic loading. And again, due to hidden run-time dependencies, it may very well be the case that you will not be able to separately deploy packages even if they were *not* statically tangled.

Myth #3: Non tangled packages can be modified without affecting each other

Assume you pass an Iterator, implemented by class from package A to some code from package B. The packages are not statically related (they depend on Iterator but not on each other). Still, they must agree on certain things: Can this iterator's next() method return null? What is the semantics of remove()? If one side breaks the contract, the other side will bleed.

Myth #4: If I eliminated a package level tangle, I solved some dependency issues

Not necessarily. You might have just consolidated the participating packages. So the dependency is still there, this time at the class level.

Myth #5: A similar analysis is also valuable at the class level

This is usually not true. The *full* dependency graph of many well written classes is so cluttered that it is practically useless. Most visualization tools workaround this by (a) showing (by default) dependencies only from the same package; (b) ignoring library classes. When you choose the non-default view you see how intricate the network of classes is. This happens with almost every program you examine regardless of what you think of its quality. Additionally, class level analysis is also ignorant of run-time dependencies.

There a few more myths, but I think we can stop now. However, there is one more thing that I wanted to talk about: After reading Ian's post I did some field research: I took all the jar files that come along with Spring (including spring.jar itself) and fed them into Structure101, just like Ian. All in all: 23 jar files. Do you know how many of these *didn't* have tangled packages?



The answer is: 3.

Let me repeat that: 20 libraries had their packages tangled.


So, if you want your library to be used by Spring you should make the packages tangled. That's what the numbers are telling us. As long as you keep writing test cases, you have no reason to be afraid of tangles: the developers of JUnit, Dom4J, Log4J, Ant, Antlr (partial list) are happily living with tangles.

4 comments :: Mythbusting - Part 1

  1. Counter-counter argument to myth1: weak argument, first of all. No maybe's or likely's allowed in an argument. Regardless, if the two packages were implicitly tangled, they'd still not be *testable* separately, which is what you were trying to bust. That ephemeral string that you mention would still be shared by them, right? what static languages (and therefore tools that are based on static analysis) buy you is the ability to make this relationship explicit, and therefore more manageable.

    Counter-counter argument to myth2: This is a loop-hole argument. Of course you can get away with murder using dynamic loading. But that doesnt mean *every single layer* of your application code will be doing it. In a well designed system, such loading will be relegated to the "platform" layers, or even more preferably, to the actual platform such as the jdk, or spring, etc.

    myth 3: You're right, it is a myth that non tangled packages can be modified without affecting each other - but only because they can be modified such that they get tangled! And seriously, if you're using collection classes as your exemplar for implicit dependency, you should also include such dependencies as system calls and file formats - these are always shared!

    Myth 4: Again, you're right. But that doesnt make the statement a myth. The whole static dependency analysis thing works on the basis that some dependencies are good and required, and some are undesirable; and furthermore, that the higher the dependency the costlier it is. Ergo, move the dependency down the hierarchy where its sphere of influence is reduced, not necessarily eliminated. In fact, at lower levels dependency (or coupling to be precise) is desirable as it keeps execution tight.

    Myth 5: Same arguments as above hold.

    And finally: Let's say all of Spring's packages are tangles. Is that a model you want to follow for your code? Really?

    Can I suggest something? Look at a codebase today, and come back in 6 months and try to make sense of it - tangles and all.

  2. @vinod

    "No maybe's or likely's allowed in an argument" - I acknowledge the fact that other that Turing's proof there's very little you can prove in this field. Hence, the maybe's and the likely's. Unfortunately, many people choose to ignore this reality.

    "if the two packages were implicitly tangled, they'd still not be *testable* separately"

    That's exactly my point. Many things cannot be tested separately. Static analysis gives the false impression that they can.

    "But that doesnt mean *every single layer* of your application code will be doing it [dynamic loading]". I was not advocating dynamic loading. I just said that it is not true that "tnagled packages cannot be deployed separately" (that's a quote from Ian's post).


    "Let's say all of Spring's packages are tangles. Is that a model you want to follow for your code?"

    The model I want to follow is that of testing. A strong suite will give me much more than a nice dependency graph.

  3. Can you start by giving some origin to these myths? Personally I would say that highly cohesive and loosely coupled packages are simpler to test. But that does not translate to "Tangled packages cannot be tested separately". Who believes this statement? I don't think I know anyone in the software industry who would sign their name to belief in that statement. If you can't either, then perhaps you should classify the post as a rant so we all know this is not about thoughtful discussion.

    Jeff Santini

  4. Quoting from Software erosion in pictures - Findbugs:
    "we see that these [packages] too have been sucked into the tangle. From now on, it seems, all testing, deployment etc. will need to include both."

Post a Comment