Wednesday, October 14, 2015

Null is optional

Null reference is commonly used to express the absence of value. It has a certain drawback in many languages - the ones that allow nullable references without explicit syntax. Let's take a look at the following java method:

User findUser(String name);

What happens when a user with the given name can't be found? Returning null expresses the concept of an absent value in probably the simplest way, but it has a specific disadvantage: the possibility of the null reference is implicit, the interface doesn't reveal this. Why is that a big deal? Null references need handling, otherwise NullPointerException may occur at runtime, a very common error:

findUser("jonhdoe").address(); // NullPointerException at runtime
Generally not all the values are nullable in a system. Indicating where exactly handling is needed is useful to reduce both risks and unnecessary handling effort. Documentation or meta code can communicate nullability, but they're not ideal: possible to miss and lack compile safety.

Some languages don't allow null references - or only by specific syntax. For others there is an alternative to address the issue by making use of their type system: the option type. It's useful to know about this pattern as it can be easily implemented in most languages - in case they don't contain it already. Let's take a look at java 8's Optional for instance. "A container object which may or may not contain a non-null value."

Optional<User> findUser(String name);
This interface communicates clearly that a user may not be present in the result and the client is forced to take that into account:
// findUser("jonhdoe").address(); compile error!
Handling the absent value:
Optional<User> user = findUser("johndoe");
if(user.isPresent()) {
 user.get().address();
}
This example is intentionally kept somewhat similar to how null references are often handled. A more idiomatic way of doing the same:
findUser("johndoe").ifPresent(user -> user.address());

It's interesting to consider the effects of the pattern in the wider context of an application. With consistent use of Optional, it is possible to establish a powerful convention of avoiding the use of null references altogether. It transforms an interface from:

interface User {
  String name();
  Address address();
  BankAccount account();
}
to:
interface User {
  String name();
  Address address();
  Optional<BankAccount> account();
}
The client can see a user may not have a bank account and can assume it always has a name and address - the convention at play here. Such practice facilitates changes also: eg if address becomes optional at some point in the future all client code will be forced to conform. The code examples so far presented the effects on method signatures, the same benefits apply to class fields or local variables:
class Address {
  String city;
  Optional<String> street;
}

It's particularly handy for functional programming and also provides some syntax sugar that simplifies code in many scenarios:

Optional<String> foo = Optional.ofNullable(thirdPartyApiThatMayReturnNull());
String foo2 = foo.orElse("No value.. take this default one.");
String foo3 = foo.orElseThrow(() -> new RuntimeException("I really can't work without a value though"));
thirdPartyApiExpectingNullable(foo.orElse(null));
if(foo.equals(foo2)) { // no NullPointerException, even if foo is absent
  // ...
}

Conclusion

Avoiding implicit nulls as much as possible can do wonders to a codebase, makes it a much safer place to be. Disadvantages? Patterns like the option type work well in most cases, but as with everything there are exceptions: it consumes heap object, needs to be considered in extreme cases. There may be specific scenarios where such practice doesn't deliver real value or is impractical. 3rd party code may also force the use of null references.

An example of the use of optional: the java 8's Stream API

Monday, February 23, 2015

Sneaky exceptions

Streams and lambdas are powerfool constructs to facilitate functional style in Java 8. There is a specific inconvenience related to checked exception handling though. I'll illustrate the issue through a simple example: converting a collection of Strings - containing urls - to a collection of URL objects using the Stream API.
List<String> urlStrings = Arrays.asList(
    "http://google.com",
    "http://microsoft.com",
    "http://amazon.com"
);

List<URL> urls = urlStrings.stream().map(URL::new).collect(Collectors.toList());
This would look nice but it won't compile: the constructor of the URL throws a checked exception: MalformedURLException. The map method expects a Function, it's apply method implemented by the lambda expression doesn't declare any checked exceptions:
@FunctionalInterface
public interface Function<T, R> {
    R apply(T t);
}
Placing the stream operation into a try block or declaring the exception in throws of the method containing it wouldn't help anything - the Function / lambda expression itself won't compile. The only viable option is to handle the exception within the Function. This sacrifices quite a bit of brevity:
List<URL> urls = urlStrings
    .stream()
    .map(url -> {
        try {
            return new URL(url);
        } catch (MalformedURLException e) {
            throw new SomeRuntimeException(e);
        }
    }).collect(Collectors.toList());
Compact code is one thing, the bigger issue is the original exception can't be directly propagated - to be handled outside the stream for instance. There is a pattern that may come handy in such situations: the sneaky throw. The intent is to throw the original checked exception as is, but hide it from the method's signature - effectively imitating characteristics of runtime exceptions. In our case this would be useful for defining Functions that may throw checked exceptions that propagate from the stream operation.

The first step is to define an alternative Function that defines checked exceptions.
public class Sneaky {
    @FunctionalInterface
    public static interface SneakyFunction<T, R> {
        R apply(T t) throws Exception;
    }
    ...
}
This allows us to define something similar to Functions. It won't be useful in itself, the Stream.map() expects a Function implementation. We need a way to transform our SneakyFunction into a Function.
public class Sneaky {
    @FunctionalInterface
    public static interface SneakyFunction<T, R> {
        R apply(T t) throws Exception;
    }

    public static <T, R> Function<T, R> function(SneakyFunction<T, R> function) {
    ...
 }
}
Now comes a bit of type erasure trick to hide the checked exception from the method signature and make it behave like a runtime exception.
public class Sneaky {
    @FunctionalInterface
    public static interface SneakyFunction<T, R> {
        R apply(T t) throws Exception;
    }

    public static <T, R> Function<T, R> function(SneakyFunction<T, R> function) {
        return o -> {
            try {
                return function.apply(o);
            } catch (Exception e) {
                Sneaky.<RuntimeException>sneakyException(e);
                return null;
            }
        };
    }

    @SuppressWarnings("unchecked")
    private static <T extends Throwable> T sneakyException(Throwable t) throws T {
        throw (T) t;
    }
}
The usage pattern in streams operations will be defining SneakyFunction instead of Function - allowing checked exception throws - then transform it to a Function:
List<URL> urls = urlStrings.stream().map(Sneaky.function(URL::new)).collect(Collectors.toList());
This code compiles and runs fine, will throw MalformedURLException when needed and it's possible to propagate the exception easily. There are simpler and "safer" solutions for this specific problem of course, that's not the point. The need for propagating checked exceptions from Streams is not uncommon, and this pattern can come handy despite it's controversies. Similar variants can be created for other commonly used functional or pseudo-functional interfaces, like the Consumer. Now about the risks: the compiler will have less chance to force explicit handling of the checked exception in some scenarios. As with pretty much everything, it's use needs some judgment. Sneaky throws are used in quite a few libraries, and there are some that support sneaky throws, like Project Lombok or Google Guava (in a more cautious way).

Wednesday, February 11, 2015

Find the first spaghetti

Refactoring code - to a deeper extent than what seems pragmatic at first - is a great excercise to learn and shape coding style. In this post I'm gonna take a look at some code taken from Google Guice and see how some functional refactoring can improve on it. This is not an if you see this, do that kind of guide, rather exploring some possibilities and alternatives. The same thought pattern may contribute to solving other problems too.

The following code is part of an interceptor that manages annotation-driven transactions. It looks for a @Transactional annotation on the target method, if it can't find one looks for one on the class, if it can't find one there either, uses some default.
public class JpaLocalTxnInterceptor {
    // ...
    private Transactional readTransactionMetadata(MethodInvocation methodInvocation) {
        Transactional transactional;
        Method method = methodInvocation.getMethod();
        Class targetClass = methodInvocation.getThis().getClass();

        transactional = method.getAnnotation(Transactional.class);
        if (null == transactional) {
            // If none on method, try the class.
            transactional = targetClass.getAnnotation(Transactional.class);
        }
        if (null == transactional) {
            // If there is no transactional annotation present, use the default
            transactional = Internal.class.getAnnotation(Transactional.class);
        }

        return transactional;
    }

    @Transactional
    private static class Internal {
        private Internal() {
        }
    }
    // ...
}
For illustration, a possible service this interceptor can be applied to:
@Transactional
public class Service {
    public void foo() {
        
    }

    @Transactional(rollbackOn = {FileNotFoundException.class})
    public void bar() {

    }
}

Functional alternative

Yoda conditions aside, readTransactionMetadata() is not that great to read: reference re-assignments, high cyclomatic complexity, etc. "Conventional" refactoring steps like method extraction wouldn't help much, more functional style has better chances. It's easy to miss functional opportunities when the starting point is not a collection of elements, but complex imperative code.

Notice that the code can be interpreted as a simple "find first" logic. There is an ordered collection of elements (@Transactional of method, @Transactional of class, default @Transactional) and we're looking for the first one based on some criteria (not null). The original imperative code can be transformed to a single statement:
private Transactional readTransactionMetadata(MethodInvocation methodInvocation) {
        return Arrays.asList(
                methodInvocation.getMethod().getAnnotation(Transactional.class),
                methodInvocation.getThis().getClass().getAnnotation(Transactional.class),
                Internal.class.getAnnotation(Transactional.class)
        ).stream()
                .filter(transactional -> transactional != null)
                .findFirst()
                .get();
}
The code solves the problem but there is a side-effect: value of all 3 elements are evaluated eagerly. The original imperative code does this lazily, avoiding unnecessary reflection operations. The Supplier interface (provided by JDK) is useful to express lazy evaluation:
public interface Supplier<T> {
        T get();
}
The solution that makes use of it:
private Transactional readTransactionMetadata(MethodInvocation methodInvocation) {
        return suppliers(
                () -> methodInvocation.getMethod().getAnnotation(Transactional.class),
                () -> methodInvocation.getThis().getClass().getAnnotation(Transactional.class),
                () -> Internal.class.getAnnotation(Transactional.class)
        ).stream()
                .map(Supplier::get)
                .filter(transactional-> transactional != null)
                .findFirst()
                .get();
}

@SafeVarargs
private static <T> List<Supplier<T>> suppliers(Supplier<T>... suppliers) {
        return Arrays.asList(suppliers);
}
The new code is almost equivalent to the original in terms of functionality. One could argue that it's still verbose, not simpler to read than the original and it's up to taste. There are a few advantages: it does makes the main pattern of the logic more explicit, has low cyclomatic complexity which generally leads lower likelihood for bugs.

Performance impact

Surprisingly the functional variant performed about 40% faster than the original imperative code. I needed to do some digging to find out why. It turned out the reason was determining "targetClass" in the original code too early - even in cases the class is not needed later. I optimized that and the imperative code got 10% better, which is what I expected due to the extra overhead the functional code involves. This is in the same ballpark and - except for the most extreme scenarios - will not be a bottleneck in practice. In a way the functional code is "natively optimized" due to it's declarative, lazy nature. The original imperative did need intentional optimization to reach and somewhat surpass it's performance.

Friday, January 16, 2015

Functional, fluent JDBC - FluentJdbc

Some projects relying on a relational database do just fine using a high-level ORM, some have need for more low-level control over the queries. Using JDBC directly for application development can be really cumbersome. Not because it's a bad API, the problem is it's so low-level it's really more suitable for tool- than application development .
 
There are many abstraction layer libraries over JDBC that add the much needed syntax sugar. Even so, sometimes I find it difficult to find a suitable one for the applications I work on. Either because they constrain in some way or have other unwanted properties (like transitive dependencies).

In some cases I implemented abstractions of my own and a pattern has started to emerge. I decided to collect those ideas and make it a library for easier reuse. A library that makes the most common SQL querying operations as simple, declarative as possible, is light-weight, has no dependencies. Most similar libraries don't make use of a fluent API which makes way for some interesting possibilities in terms of both readability and flexibility. Also I wanted to put some emphasis on easy integration to popular persistence / transaction management tools since the goal is to complement them, not necessarily replace them. As a result it blends quite well with things like JPA, Spring Data, Guice Persist, JEE, etc.

Some of FluentJdbc 's features: 
  • execution of select/insert/update/delete/alter/... statements as one-liners
  • parameter mapping (named, positional, supporting java.time, Optional, extension for custom types)
  • access to generated keys
  • automatic result -> pojo mapping
  • transaction handling
  • big data (streaming select / insert / update)
Check out FluentJdbc's github page for usage details. The artifact is up in maven central, if you want to give it a shot:
<dependency>
    <groupid>org.codejargon</groupid>
    <artifactid>fluentjdbc</artifactid>
    <version>1.3</version>
</dependency>
Note: java 8 only

Tuesday, November 4, 2014

A common pitfall of Domain-Driven Design

The philosophy of domain-driven design - placing the primary focus on the core domain - is a very natural and powerful concept for application design. Despite the core idea being simple, the learning curve is apparently steep. DDD is very often implemented wrongly, an unideal architectural decision can lose the potential of obtaining and taking advantage of a rich domain model.

There is a specific pattern I've come across surprisingly often in systems built on the notion of DDD. I think it's useful to know about this anti-pattern, and why/how systems end up implementing it. It is to navigate a design to wrong directions.

Given some domain, the development starts with a single layer including some prototype of a domain model. Then the need for some infrastructure comes into picture - most commonly some sort of persistence solution. That's placed into another layer with the good intention of keeping responsibilities separate.
The result is this:
Domain model prototype
This won't work well, there is a problem with the dependency. The domain model depends on the infrastructure layer to do its job. The infrastructure layer can't access the domain objects - to persist them for instance - due to the direction of the dependency. There are many ways to solve this:
  • Pushing part of the model to the infrastructure layer. Responsibility leak, something to be avoided.
  • Transfer objects defined in the infrastructure layer. The domain will still be concerned with infrastructure specifics. 
  • Lets assume a somewhat better alternative is chosen: moving the domain model to it's own layer:
Dedicated domain model

The infrastructure trouble is solved and layer design of many applications stops at this point. In my experience it is a really common pattern. It looks like the real deal with the domain model having a dedicated layer. So let's take a look at the issues:

The domain model can't access the infrastructure layer. As a result, parts of the domain model that rely on infrastructure - actually most of it - will have to be moved up to the application service layer. Gradually - to keep things consistent, symmetrical - most logic will be moved out from the domain layer, leaving mere property bags behind with little or no logic. The end result of all the efforts is the classic case of an anemic domain model:
Anemic domain model
 
Where did things go wrong? As it turns out, at the very beginning, when the infrastructure layer was introduced. In DDD the domain model is supposed to have the main focus, yet it was made dependent of a layer with infrastructural responsibilities. Subsequent efforts try to address this and while it succeeds on the surface level, sacrifices are made on domain model.

The main problem was caused be the violation of the dependency inversion principle: a higher level module (domain) should not be dependent on a lower level one (infrastructure). This problem can be properly solved by the dependency inversion pattern:
Rich domain model

This simple change makes a world of difference: obstructions are out of the way of building a rich domain model, implementing all the domain functionality. Other layers become thin - no domain logic in them - and can concentrate on their own resonsibilities (eg application services or infrastructure integration details).

In the context of tackling high-complexity problems with DDD, this pattern described above is definitely something to look out for. It's very easy to compromise a domain model with constraints imposed by fundamental layering issues.

Monday, February 10, 2014

Coding style

Style has an important role when it comes to writing clean, readable code. Conforming to conventions and best practices is a good starting point. In addition to that there is still plenty of room for choice. Those depend greatly on the level of the developers and personal preferences.

There is a very specific danger with conventions and best practices. Most of them are distilled experience of the industry and are there for good reasons. Some of them may be questionable but are rarely - if ever - questioned. They get incorporated into the coding style and without conscious effort to pinpoint them become invisible. I think all developers should investigate their code from time to time and inspect even the most trivial elements for such issues.

A couple of things that I think are worthy to look at:

Getter/setter methods
Since IDEs can generate them so easily, these are very often abused. Some developers have the habit of generating them right away, throwing away the most important element of object oriented programming - encapsulation - without further thought. Mutators are created on objects that could be immutable. The get...() prefix is often meaningless, and actually harms readability:
  person.getName();
  person.getAddress();
  person.getMobileNumber(); 
  
  person.name();
  person.address();
  person.mobileNumber();
Constant naming conventions
The convention is to use uppercase names which can be quite unreadable - especially if there are many long-named constants in a class. Unless it's part of the api, the fact that a field is constant is an implementation detail.

Final keyword on local variables
Some static code analyis tools show warnings when local variables are not marked as final - if they've been assigned only once. While it adds value in some specific cases, if applied on all local variables without reason, the clutter it introduces can make the code harder to read. One can argue their rigorous use is good in long and complex methods, then the problem is the complex method in the first place.

Exception suffix
A really controversial one, but I think it's worth to at least toy with the idea. Almost all exceptions have the ...Exception suffix in their names. This is actually not really necessary and it does add some noise. Note that in whatever context we encounter an exception, the syntax makes it already obvious:
  • Definition of an exception - Something extends Exception
  • Throwing an exception - throw new Something
  • Declaration of an exception thrown - throws Something
  • Catching an exception - catch(Something)
Dropping the suffix can work on exceptions that are named as events, not just plain nouns - this is btw a useful practice on it's own right. Compare FileNotFound vs FileNotFoundException. Event-based names and lack of suffix applied throughout can have nice effects in a codebase. Some libraries do this, eg javassist, bouncycastle.

These are just some specific examples, the important part is to notice things we just do automatically and be concious about the reason why we're doing them, making sure they actually make sense.

Wednesday, June 6, 2012

Refactoring legacy systems

Working on a greenfield project once in a while is nice and liberating. More often though, maintenance of legacy systems is the reality. People define the term "legacy system" differently. A naive - and common - definition is legacy = old. Since the industry evolves so quickly, old must be bad as a consequence. I think there is more to it than that. There are new systems - most of them actually - that can be considered legacy even before they're made.

Systems are built to satisfy requirements. Requirements have the tendency to change often. Ideally a system can be feasibly adapted to these changes. My definition of legacy is simple: when such adapting is not feasible. This can happen for various different reasons: bad internal quality, badly chosen or obsolete technology, compatibility constraints, and a lot more.

The pace systems turning into legacy can be controlled and some improvements can be made even when the damage is already done. Compatibility constraints (like APIs) can be tough to challenge, changing certain technologies may be difficult. Improving internal quality through continuous refactoring is usually available as a viable option. In refactoring, the technical todo is usually the easy part. The hard part is how to do it in the context of development process, especially if new functionality needs to be continuously delivered. In such a scenario, several ideas can come into mind.

One is to make small changes at a time. This approach can be problematic if the development team is not disciplined enough. Bad code is viral and tends to reinfect the already healed parts. Also there are certain types of changes that can't be reasonably split into several smaller changes. A bit more elaborate strategy is called for. An enticing option is to create bubbles: carefully separate a part of the system from the others, refactor the bubble to it's "ideal" state. Eventually bubbles grow, merge and take over the system. This is a most useful approach, and can result in success.

These bubbles have interesting pitfalls though. The anti corruption layer that protects a bubble has its cost. If the bubble is left alone, not nurtured actively, the additional complexity it introduces becomes a burden to the system. In some cases it's worse to live with it than with the problems of original system. Fragmented pieces of hanging refactorings can be expensive.