October 18, 2011

Scaling up (enhanced)

The results of trying to scale up in my previous post was so disappointing that I asked Martin Odersky, the creator of Scala, for some advice. He helped me get in touch with the author of Scala's parallel collections, Aleksandar Prokopec. Here is his feedback:
  1. Always use the scala.testing.Benchmark class to measure performance
  2. Always use the server VM. Benchmark results will in most cases be entirely different.
  3. There are no parallel lists - when calling .par on a list, it gets converted to the standard immutable parallel collection, which is ParVector. Transformer methods (such as filter and map) process items in parallel, but construct the final parallel vector sequentially - until vector concatenation is built into the standard library, the construction of the parallel vector will not be parallelized. We are looking into adding concats. In the meantime, I suggest using parallel arrays and comparing them against regular arrays in terms of performance.
  4. Every invocation of a parallel method requires some sync, so you might want to keep the number of invocations to a minimum. This is particularly the case with transformer methods which produce intermediate collections - thus possibly invoking gc cycles.
... and the modified version of the example:
object util {
  val studentCount = 1000000
}

case class Student(gradYear: Int, score: Double)

object StudentExample extends testing.Benchmark {
  val students = (for (i <- 1 to util.studentCount) 
    yield new Student(2010 + (i % 2), math.random)).toArray

  def run() {
    val gradYear = 2011
    students.filter(_.gradYear == gradYear).map(_.score).fold(0.0d)(math.max)
  }
}

object StudentExamplePar extends testing.Benchmark {
  val students = (for (i <- 1 to util.studentCount)
    yield new Student(2010 + (i % 2), math.random)).toArray.par

  def run() {
    val gradYear = 2011
    students.filter(_.gradYear == gradYear).map(_.score).fold(0.0d)(math.max)
  }
}

object StudentExampleImp extends testing.Benchmark {
  val students = (for (i <- 1 to util.studentCount)
    yield new Student(2010 + (i % 2), math.random)).toArray

  def run() {
    val gradYear = 2011
    var maxscore = 0.0d
    for (student <- students)
      if (student.gradYear == gradYear)
        maxscore = math.max(maxscore, student.score)
  }
}
... and the results on my dual-core MacBook Pro:
$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExample 10
StudentExample$ 273 38 35 40 36 36 36 36 35 39

$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExamplePar 10
StudentExamplePar$ 495 32 31 31 30 30 30 32 30 31
... and the results on my quad-core iMac:
$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExample 10
StudentExample$ 154 18 17 17 17 17 17 17 17 17

$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExamplePar 10
StudentExamplePar$ 222 10 10 10 9 10 10 10 10 10
A performance gain can now be seen, but both Aleksandar and I agreed that it wasn't the expected speedup. As a reference I also ran the imperative example, which still was by far the most efficient. The results on the dual-core was:
$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExampleImp 10
StudentExampleImp$ 24 7 13 12 10 10 11 11 13 13

No comments:

Post a Comment