Formalizing the Paradox

During the debate about static analysis I realized that my precise claim was not well understood. Let me make things clearer by formally proving that static type checking is at odds with quality estimation based on static detection of tangled packages.

Definitions

d(p) - Static dependency of a program p, measured as the number of tangled packages.

q(p) - Quality metric of a program p (there are almost no restriction about it. Can be number of defects, cost per feature, etc. Can be calculated analytically or measured empirically).

"Equivalent programs" - Two programs that produce the same output for the same input.

Deduction

[A1] Given two equivalent programs, x and y, then if x's code provides more type information that y's the following holds: q(x) > q(y)

[A2] q(p) and d(p) are inversely correlated

Deduction

[1] Given a statically typed program, p1, compute q(p1), d(p1).

[2] Obtain a program p2 that upholds the following conditions: p2 is equivalent to p1; p2 has no tangled packages; p2 has less type information that p1. The transformation rules described here ensure the existence of a program p2. However, any p2 that satisfies these conditions is suitable.

[3] p2 has less type information than p1 (see [2]). Based on [A1] we have that q(p1) > q(p2)

[4] p2 has no tangled packages (see [2]) so necessarily d(p1) >= d(p2). Based on [A2] we have that q(p1) <= q(p2)

The conclusions of [3] and [4] form a contradiction: q(p1) > q(p2), q(p1) <= q(p2). Therefore we have that one of our assumptions, either [A1] or [A2] is wrong.


I am tempted to put a Q.E.D here, but that is not the point. I do not think that either static type checking (that is: [A1]) or that static analysis ([A2]) is completely useless. However, this little proof shows that applying them indiscriminately is logically flawed. Universal statements such as "a tangle was formed hence quality eroded" are ill formed without additional reasoning and explanations.

Update 1: Changed the phrasing of [2] to explicitly express the requirements from p2.

Update 2: Removed an unnecessary step.

6 comments :: Formalizing the Paradox

  1. the flaw in your logic is that step [2] always introduces reflection to remove tangles because that's all that the transformations allow. hence getting rid of tangles this way adds reflection and removes static typing. that's nonsenical, surely!?

    formalising this as above is just sophistry. the formal notation adds nothing to the argument.

    to prove your hypothesis of inverse correlation you'd need to prove that the set of possible transformations that remove tangles (including all the set of typed transformations) always take away typing information. this clearly isn't necessarily true, as one of the most effective ways to remove tangles is to add interfaces, which adds static typing information.

    Andrew

  2. Andrew,

    Thanks for turning my attention to [2]. I just rephrased it to clarify that you don't have to apply reflection. The transformation rules that I reefer to just ensure the existence of p2.

    "to prove your hypothesis ... that the set of possible transformations ... always take away typing information"

    You're certainly wrong here. In order to show that [A1] and [A2] cannot be both true, you just need to arrive at a contradiction, which is what I did.

    This proof does not imply that static typing/static analysis are always useless. It means that in the general case they form a conundrum. You can avoid the conundrum by strengthening the assumptions, which - of course - will make them less generally applicable.

  3. all you've shown by the contradiction is that if you remove tangles by removing type information (i.e. using reflection), it is possible to lower the quality metric.

    this gets back to my point -- yes, you can code in a way which simultaneously lowers the quality and fools static type checkers into thinking there are no tangles. no argument there. however, you haven't shown any lack of correlation between quality and removing tangles with readily-available static techniques such as introducing an interface.

    as such, i'm having trouble aligning your logic above with your initial hypothesis showing "why static typing is useless?".

    Andrew
    p.s. my last post -- i've lost interest.

  4. Andrew, you're not following the logic.

    I showed a contradiction (which you agree I have). Therefore, a claim such as "removal of tangles raises quality" is a false claim. That's how it works: a counter example falsifies a "for all" claim.

    Second, I didn't rely on reflection. The removal of tangles can be done any way you want (e.g: structural subtyping). The reflection was just an example to show the existence of a p2 program.

  5. Hi Itay,

    I see the contradiction. You cannot mathematically prove [A2]. Proofs of correctness and imperative programming don't go. All we have is our intuition and the ability to suck it and see :) Not the safety net some would like, but unless you restrict your program by removing state, its the best we've got.

    BTW. I have been trying to get an angle on Category Theory. Do you have some useful pointers you could suggest?

    Cheers,

    Paul.

  6. Paul,

    Even with a purely immutable language we will not be able to determine termination and other properties of the dynamic behavior.

    Category Theory: I'm afraid I don't have any pointers. Sorry.

Post a Comment