Saturday, March 22, 2014

Scala: Object equality and the canEqual method

This week I'm in chapter 30 of Programming in Scala, 2nd ed., the chapter titled Object Equality.

Getting Object Equality Right

This chapter is about how hard it is to properly implement a class' equals method. There is good advice here on the pitfalls to avoid and recipes to write good equals and hashCode methods. Most of this chapter is present in the article How to Write an Equality Method in Java except that the examples in the article use Java instead of Scala.

What is the canEqual method?

One of the strategies in this chapter is to introduce a canEqual method for non-final classes. This allows subclasses to override canEqual if they want to not allow being equal to the parent class or sibling classes.

The example in the chapter is this. You start with a Point class, which has x and y coordinate members. Then you have a ColoredPoint class that subclasses Point and overrides equals to make it so that ColoredPoints aren't equal to Points.

Here's the definition of Point:

class Point(val x: Int, val y: Int) {
    override def equals(other: Any) = other match {
        case that: Point => this.x == that.x && this.y == that.y
        case _ => false
    }
}

And the naive implementation of ColoredPoint:

class ColoredPoint(x: Int, y: Int, val color: Color.Value) 
    extends Point(x, y) {

    override def equals(other: Any) = other match {
        case that: ColoredPoint => 
            this.color == that.color && super.equals(that)
        case _ => false
    }
}

where Color is:

object Color extends Enumeration {
    val Red, Orange, Yellow, Green, Blue, Indigo, Violet = Value
}

The problem with this is that equals is not symmetric. It is possible for a Point to be equal to a ColoredPoint, but the ColoredPoint wouldn't be equal to the Point.

This is a little better, but still not transitive:

class ColoredPoint(x: Int, y: Int, val color: Color.Value) 
    extends Point(x, y) {

    override def equals(other: Any) = other match {
        case that: ColoredPoint =>
            (this.color == that.color) && super.equals(that)
        case that: Point => that equals this
        case _ => false
    }
}

To see how this doesn't work, consider a red ColoredPoint with x and y coordinates of 1,2 and a blue ColoredPoint with the same coordinates. The red ColoredPoint is equal to a Point(1, 2) and the Point(1, 2) is equal to the blue ColoredPoint, but the red and blue ColoredPoint are not equal to each other.

The solution proposed in the chapter is to introduce a new method, canEqual:

def canEqual(other: Any): Boolean

Point would then be defined as such (including hashCode):

class Point(val x: Int, val y: Int) {

    override def hashCode = 41 * (41 + x) + y
    override def equals(other: Any) = other match {
        case that: Point =>
            (that canEqual this) && 
            (this.x == that.x) && (this.y == that.y)
        case _ => false
    }

    def canEqual(other: Any): Boolean = other.isInstanceOf[Point]
}

And ColoredPoint is defined like so

class ColoredPoint(x: Int, y: Int, val color: Color.Value) extends Point(x, y) {

    override def hashCode = 41 * super.hashCode + color.hashCode
    override def equals(other: Any) = other match {
        case that: ColoredPoint =>
            (that canEqual this) && super.equals(that) && this.color == that.color
        case _ => false
    }

    override def canEqual(other: Any): Boolean = other.isInstanceOf[ColoredPoint]
}

Now Point instances cannot be equal to ColoredPoint instances since the first check that Point.equals will make is to call ColoredPoint.canEqual(Point) which will return false. It's vitally important that the canEqual method be call on that with this as the argument. canEqual is a way for classes to define what they can be equal to. In Point and ColoredPoint the match expression has been used to make sure that that is the right type, so now we can call canEqual on that to make sure that equality is possible in the reverse direction: that that canEqual this.

Does canEqual violate the Liskov Substitution Principle?

One criticism of the canEqual approach is that it violates the Liskov Substitution Principle which, according to Wikipedia, states:

if S is a subtype of T, then objects of type T may be replaced with objects of type S (i.e., objects of type S may substitute objects of type T) without altering any of the desirable properties of that program (correctness, task performed, etc.)

The authors of the book make the argument that Liskov Substitution Principle is not violated because, although the behavior of subclasses is different than the parent class, the contract itself, that equals return a boolean value, is not changed.

My intuition though is that this is a violation. From the perspective of the Point class, any two Point instances at the same x and y coordinates are equal. A subclass changing that definition has violated the contract in a way that makes a subclass not substitutable with a parent class. Let's look at an example of where substitution is violated.

Consider a distance method in Point that calculates the distance between two Points.

def distance(point:Point):Double =
    if (this == point)
        0
    else
        Math.sqrt(Math.pow(this.x - point.x, 2) + Math.pow(this.y - point.y, 2))

This method is taking a shortcut: if the two points are equal, then just return 0. This shortcut won't work if applied to a ColoredPoint even though it could be if ColoredPoint hadn't overridden the equals method. To me this is an example of a case where substitutability is violated because this method is expecting that it can take this shortcut for any Point instance.

canEqual is a code smell

In general I think that canEqual is a code smell. When you look at just the problem of defining equals it might seem reasonable to introduce canEqual. But think about the impact on other methods. For example, consider if we had extended Point with a new 3DPoint class. Then we would have to completely change the distance method. So not only can 3DPoint instances not equal Point instances, but we can't calculate the distance between 3DPoints and Points. This is clearly a Liskov Substitution Principle violation.

So if you are thinking of using canEqual because you want subclasses to not be equal to parent classes, it seems likely to me that if your subclasses can't equal the parent class then that probably affects other methods too.

As an alternative, consider using composition instead of inheritance. For example, we could define the ColoredPoint like so:

class ColoredPoint2(x: Int, y: Int, val color: Color.Value) {

    val point = new Point(x, y)

    override def hashCode = 41 * point.hashCode + color.hashCode
    override def equals(other: Any) = other match {
        case that: ColoredPoint2 =>
            point.equals(that.point) && this.color == that.color
        case _ => false
    }
}

Additional Resources

Sunday, March 09, 2014

Scala: Understanding the self type annotation and how it relates to the Cake Pattern

I'm currently working through the book Programming in Scala, 2nd ed.. I'm in chapter 29 which is on Modular Programming Using Objects. What is covered in the chapter has also been referred to as the Cake Pattern. The chapter takes the approach of building up to the Cake Pattern through an example that helps illustrate the kind of problems that the Cake Pattern solves. To understand the Cake Pattern you have to understand how the self type annotation works. So I'm going to take a slightly different approach and start with how the self type annotation works since it is so crucial.

What is the self type annotation?

In brief, the self type annotation allows you to split a trait in two but still refer to things on the second trait as though they are defined in the first trait, basically assuming (well, requiring) that they are mixed in together. For example, the following trait simply prints a greeting at a prompt:

trait Prompter1 {

    val prompt = "> "
    val greeting = "Hello world"

    def printGreeting() {
        println(prompt + greeting)
    }
}

val prompter1 = new Object with Prompter1
prompter1.printGreeting

Now suppose you want to split out the greeting part, maybe so that you can provide greetings for different languages. You could do this:

trait Prompter2 {
    // the self type annotation
    this: GreetingProvider =>

    val prompt = "> "

    def printGreeting() {
        println(prompt + greeting)
        // Next line would work too. 'this' can refer to things within this trait or
        // the declared self type trait
        //println(this.prompt + this.greeting)
    }
}

trait GreetingProvider {

    val greeting = "Hello world"
}

val prompter2 = new Prompter2 with GreetingProvider
prompter2.printGreeting

The code in Prompter2 is exactly the same as in Prompter1 except that there is now a self type annotation which says that this can also refer to GreetingProvider. Also the greeting has moved to a separate trait, GreetingProvider.

Do you need the self type annotation if you take care to always mix in the two traits together? No, because Prompter2 won't even compile and you get this error:

<console>:37: error: not found: value greeting
                println(prompt + greeting)

What happens if you tried to create a new Object and only mixed in Prompter2?

scala> new Object with Prompter2
new Object with Prompter2
<console>:19: error: illegal inheritance;
self-type Prompter2 does not conform to Prompter2's selftype Prompter2 with GreetingProvider
            new Object with Prompter2
                            ^

Does order matter? It doesn't appear to. The following works just fine:

val prompter2backwards = new GreetingProvider with Prompter2
prompter2backwards.printGreeting

If you have a self type annotation, does this only refer to that type? Apparently not. We could just as well have defined Prompter2.printGreeting like so:

def printGreeting() {
    // 'this' can refer to things within this trait or the declared self type trait
    println(this.prompt + this.greeting)
}

So the self type annotation appears to extend what this can refer to.

You can also have multiple traits mixed in for your self type annotation:

this: Foo with Bar with Baz =>

So what does this have to do with the Cake Pattern? We'll get there but for now hopefully you can see that the self type annotation allows you to express a dependency within a trait, a dependency that when satisfied allows one trait to make use of code defined in a separate trait. The Cake Pattern is a Dependency Injection technique that uses self type annotations to express and ultimately fulfill the dependencies between traits.

Applying the Cake Pattern to the examples in chapter 29

Chapter 29 doesn't mention the Cake Pattern and doesn't really apply it. It goes as far as introducing the self type annotation and showing how you can use to refactor a large trait and split it up into several smaller traits.

The chapter presents a Database class that abstracts getting Food and Recipe instances from some sort of backing store. The Database is then split up into three traits: one for FoodCategory methods, one for Food methods and one for Recipe methods. It's not all spelled out in the chapter but you would end up with something that looks like the following:

abstract class Food(val name: String) {
    override def toString = name
}

object Apple extends Food("Apple")
object Orange extends Food("Orange")
object Cream extends Food("Cream")
object Sugar extends Food("Sugar")

class Recipe(
    val name: String,
    val ingredients: List[Food],
    val instructions: String
) {
    override def toString = name
}

trait FoodCategories {
    case class FoodCategory(name: String, foods: List[Food])
    def allCategories: List[FoodCategory]
}

trait Foods {
    def allFoods: List[Food]
    def foodNamed(name: String) =
        allFoods.find(f => f.name == name)
}

trait Recipes {
    def allRecipes: List[Recipe]
}

abstract class Database extends FoodCategories with Foods with Recipes {
}

The book then proceeds to implement Foods and Recipes like so:

trait SimpleFoods extends Foods {
    object Pear extends Food("Pear")
    def allFoods = List(Apple, Pear)
    def allCategories = Nil
}

trait SimpleRecipes extends Recipes { // Does not compile
    object FruitSalad extends Recipe(
        "fruit salad",
        List(Apple, Pear),  // Uh oh
        "Mix it all together."
        )
    def allRecipes = List(FruitSalad)
}

The problem is that SimpleRecipes has a dependency on SimpleFoods. As such Pear is no longer in scope. Self type annotation to the rescue!

trait SimpleRecipes extends Recipes {
    this: SimpleFoods =>

    object FruitSalad extends Recipe(
        "fruit salad",
        List(Apple, Pear),
        "Mix it all together."
        )
    def allRecipes = List(FruitSalad)
}

This all works well enough for such a short example, but in the real world, when you have a database abstraction that has hundreds (or thousands?) of methods for retrieving objects from the database, would this suffice? Because the problem becomes that you will end up mixing all of those methods into the local namespace of your Database class. For example, instead of having a Foods.getAll method even this simple example compels us to having the method called Foods.allFoods.

This is where the Cake Pattern comes in. With the Cake Pattern you wrap your trait in a "component" trait. For example, the FoodCategories trait gets wrapped like so:

trait FoodCategoriesComponent {
    val foodCategories: FoodCategories
    trait FoodCategories {
        case class FoodCategory(name: String, foods: List[Food])
        def allCategories: List[FoodCategory]
    }
}

Likewise for Foods and Recipes:

trait FoodsComponent {
    val foods: Foods
    trait Foods {
        def allFoods: List[Food]
        def foodNamed(name: String) =
        allFoods.find(f => f.name == name)
    }
}

trait RecipesComponent {
    val recipes: Recipes
    trait Recipes {
        def allRecipes: List[Recipe]
    }
}

Now I'll create a SimpleFoodsComponent that implements Foods and FoodCategories (the book implements them together too):

// Implement Foods and FoodCategories at once
trait SimpleFoodsComponent extends FoodsComponent with FoodCategoriesComponent {

    val foods = new Object with SimpleFoods
    val foodCategories = foods
    trait SimpleFoods extends Foods with FoodCategories {
        object Pear extends Food("Pear")
        def allFoods = List(Apple, Pear)
        def allCategories = Nil
    }
}

And now time for the SimpleRecipesComponent. This one will depend on SimpleFoodsComponent:

trait SimpleRecipesComponent extends RecipesComponent {
    this: SimpleFoodsComponent =>

    val recipes = new Object with SimpleRecipes
    trait SimpleRecipes extends Recipes {
        object FruitSalad extends Recipe(
            "fruit salad",
            List(Apple, foods.Pear),
            "Mix it all together."
        )
        def allRecipes = List(FruitSalad)
    }
}

But this time the self type annotation brings foods into scope. So instead of bringing the Foods methods into scope we're just bringing the foods reference (which implements Foods) into scope. I like this since it keeps type members in separate namespaces.

Does it make sense to have a Database abstraction any longer? I'm not sure, but, for completeness, here's how I implemented it.

trait DatabaseComponent extends FoodCategoriesComponent
        with FoodsComponent with RecipesComponent {
    val database: Database

    trait Database extends FoodCategories with Foods with Recipes
}

trait SimpleDatabaseComponent extends DatabaseComponent
        with SimpleFoodsComponent with SimpleRecipesComponent {

    val database = new Database with SimpleFoods with SimpleRecipes
}

I haven't yet talked about the other main piece of the chapter's example, the Browser. As a component it would be defined like so:

trait BrowserComponent {
    this: DatabaseComponent =>

    val browser: Browser

    trait Browser {
        def recipesUsing(food: Food)
        def displayCategory(category: database.FoodCategory)
    }
}

And then it could be implemented with a dependency on the SimpleDatabaseComponent:

trait SimpleBrowserComponent extends BrowserComponent {
    this: SimpleDatabaseComponent =>

    val browser = new Object with SimpleBrowser
    trait SimpleBrowser extends Browser {

        def recipesUsing(food: Food) =
            database.allRecipes.filter(recipe =>
                recipe.ingredients.contains(food))

        def displayCategory(category: database.FoodCategory) {
            println(category)
        }
    }
}

If the Browser is the top level component then you could define it more simply, kind of like how the book does it.

For more information:

Sunday, March 02, 2014

Scala: Extractors

I'm currently working through the book Programming in Scala, 2nd ed.. I'm in chapter 26 which is on Extractors.

What are Extractors?

Earlier in the book case classes were introduced. One of the things you can do with case classes is use them in pattern matching expressions. For example, you might have an Email case class to describe an email address:

case class Email(username:String, domain:String)

def isEmail(e: Any):Boolean = e match {
    case Email(_,_) => true
    case _ => false
}

But what if you are dealing with values that aren't case classes? Extractors allow you to basically define arbitrary match expressions without needing to set up case classes.

To understand extractors I find it helpful to understand their inverse, injections, first. Injections take inputs and produce an output of some type, not unlike a case class constructor. Injections have an apply method. So instead of our Email case class above, we could define a somewhat similar injection like so:

object Email {
    def apply(username: String, domain: String) = user + "@" + domain
}

(This is basically the example given in the book.)

An extractor on the other hand works in the reverse direction. In this case it defines an unapply method that would take a candidate string and, if it is an email address, will return the parts which would correspond to the arguments to the apply method:

object Email {
    def apply(username: String, domain: String) = user + "@" + domain
    def unapply(s: String): Option[(String, String)] = {
        val parts = s aplit "@"
        if (parts.length == 2) Some(parts(0), parts(1)) else None
    }
}

An extractor requires the unapply method, the apply method is optional. Also note that the unapply method returns an Option and will return None if the string is not an email address.

For more details see Extractor Objects. For example, you can have an unapply method that returns a Boolean, and there is the unapplySeq method for handling variable number of arguments to an extractor.

When to use extractors vs case classes?

So extractors let you define match expression patterns in much the same way as can be achieved with case classes. When do you use extractors instead of case classes?

The advantage that extractors have over case classes is that case classes expose the implementation type of an API's data model. Using an extractor allows you to hide the implementation type(s) of the API's data model and change it without requiring client code to change as well.

For example, we can use our Email extractor with a non-case class email class like so:

class EmailImpl(val username:String, val domain:String) {

    override def toString: String = username + "@" + domain
}

object Email {

    def unapply(s: String): Option[(String, String)] = {
        val parts = s split "@"
        if (parts.length == 2) Some(parts(0), parts(1)) else None
    }
}

Case classes do have advantages though. They are simpler, can be optimized better by the Scala compiler and with sealed case class hierarchies the compiler can also catch missed cases.

You can always start with case classes and then switch to extractors once there is a need to change the data model's concrete representation type.

An example: Date

Extractors seem like a great way of working with unstructured data and APIs where you don't have control over the source code. As an example consider the following extractor for getting the individual time values out of a java.util.Date instance:

import java.util.{Calendar => JCal, Date => JDate}

object Date {

    def unapply(d: JDate):Option[(Int,Int,Int,Int,Int,Int,Int)] = {
        val cal = JCal.getInstance
        cal.setTime(d)

        Some(cal get JCal.YEAR,
            cal get JCal.MONTH,
            cal get JCal.DAY_OF_MONTH,
            cal get JCal.HOUR_OF_DAY,
            cal get JCal.MINUTE,
            cal get JCal.SECOND,
            cal get JCal.MILLISECOND)
    }
}

This allows you write match expression against java.util.Date instances, like so:

def isFebruary(d: JDate): Boolean = d match {
    case Date(_, 1, _, _, _, _, _) => true
    case _ => false
}

You can also just use them to unpack values:

val Date(year, month, day, hour, minutes, seconds, milliseconds) = new JDate

Now you have a much simpler way to work with Java date instances.

Final thoughts

Extractors are kind of hard to think about because you have to think in terms of turning output backwards into inputs. Also, it's not always obvious what would be useful for unapply to return. When working through the examples and making up some of my own I found it useful to think of how I would implement the injection (i.e., the apply method) first.