General musings on programming languages, and Java.

Tuesday, December 08, 2009

Deleting code, what first?

I have about 200kloc of Java code to work with, and I often stumble across bits that are more complex than they need to be, costing me time when I'm trying to solve something else.

Within reason, having a smaller body of code is an improvement. Here are my guidelines, mainly for myself, to shrinking it.

1. Use IDEA's analysis tools to find unused code, and delete it. If it's public-facing API, and you know of no code that uses it, it's probably broken anyway. Deprecate it now and delete it at some later date. If you deleted it and somebody was using it, they'll complain at you, you'll reinstate it and include at least a comment that team X at company Y is using it, and at best an automated test that the code does what team X requires.

Watch out for main methods when you delete code - there are a few ad-hoc utilities in our codebase.

Always ensure that deprecated code is only called by deprecated code, i.e., that deprecation forms a clique.

2. Look at 'throws' clauses. If one low-level method has, say, 'throws IOException' in its signature, and all callers either pass it on to their caller or catch and do something dumb with it like logging, then you may as well do that within the low-level method, or rethrow it as a RuntimeException. This can remove a surprising amount of bullshit code.

3. Look for non-final static variables, a reasonable indicator of mutable singletons. Try to stamp them out. While this might actually increase the amount of code you have, it will do wonders for the maintainability. I did this today and found a potential bug (which I've yet to prove). The worst case is that you find out why you can't stamp a particular one out - you'll have learned something useful about the application's architecture and flaws along the way.

4. Use IDEA's "field could be local variable" inspection to simplify classes. As a general rule, a class with fewer fields is simpler.

5. Private getters and setters that do nothing beyond this.foo = foo; and return foo; are useless; inline them.

6. Search for repeated code; the IDEA Ultimate Edition can do this, but the Community Edition cannot. Unfortunately, it can only find them, not suggest a solution, and in my experience it doesn't get the ordering right; we had 15 almost-identical classes that it didn't even notice. It's still very useful though.

Even if you can't think of a decent abstraction, get rid of the repetition anyway. If needed, a better name or better way of doing it will be easy to use later. It's far easier to refactor 1 copy than 10 copies of the same code.

After a discussion at work, we decided that duplicated code has a cost proportional to its age multiplied by its size multiplied by the number of repetitions. That seems a reasonable rule; the cost is paid when using it or removing it.

7. Try to replace setters with constructor parameters, until it gets unwieldy or simply impossible as the objects get mutated after initialisation. When it gets unwieldy because the parameter list gets too long, consider splitting the class up.

8. Immutables remove the need for defensive copying. Lots of awful code is dedicated to copying values so that they can be passed to third parties. If you can make those values immutable this gets a lot easier.

9. Immutables remove the need for synchronization. Lock-free programming is far simpler if you can get away with it. Most programmers, myself included, get locks wrong, and the problems grow exponentially. That said, I don't have a general answer for what to do instead; I'm still at the stage where I need to work it out per-case.

10. Delete commented-out code. It's revision-controlled; if anyone actually ever remembers about it they can easily restore it. It's usually traces of previous experiments, and isn't of any real use. Similarly, delete the IDE-generated crap, like "Created on Mar 13 2005". You might want to keep the @author info, unless it's obviously useless, like @author Administrator, but even that can be of limited value years later when the author has left the company.

11. This isn't really about code, but: commit little and often. It's easy to do 3 refactors in one go and screw the third up, then have to revert and repeat the first 2. It's easier if you committed the first two. Similarly, if you find that a commit caused a regression, it's easier to work out what change caused it if each commit is small. That's not to say you should commit known broken code, at least not to anything other than an experimental branch.

In short: commits are game save points, not presentations.

4 comments:

Daniel Alexiuc said...

I find that having @author tags also tends to discourage collective code ownership. Two things I've noticed:

Senior Developers: "I am Jack the magnificent! Nobody can improve upon my perfect code so I shall stamp my name in the @author tag.

Junior Developers: "Hmm this is Jack's code, I had better not touch it."

Kristian Domagala said...

Author tags are also at best redundant (due to revision control) and at worst, incorrect (if the majority of the file was written/changed by someone other than the person who created it).

Det said...

Reg. 11.:
This small but many commits may lead to a discussion in your team if you use subversion.

Simple but really useful solution:
Install git, commit often locally and after many commits and perhaps some undos declare a stable state that you then commit into the central svn repo.

Jason Wilson said...

>> Try to replace setters with constructor parameters

Like you're other advice, this is a good idea especially when the programmer stores the constructor parameters in final instance variable (even if they end up protected instead of private).

My recent experience using it is that Guice can really help make this more practical since programmers completely avoid having to pass arguments into objects that are automatically created by Guice and Guice's Providers can break many circular dependencies in a really clean way.

Overuse of getter's and setter's may be an indication that the programmer is trying to reuse an object when he could have instead created a new instance for the new use. So this is really related to your suggestion to use immutable objects, however programmers can also utilize "single use" objects (preferably only in the same thread) when immutability isn't quite the right abstraction. The simplest example of this approach may be the builder pattern which often culminates in the creation of a good old immutable object.

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.