The results of trying to scale up in my previous post was so disappointing that I asked Martin Odersky, the creator of Scala, for some advice. He helped me get in touch with the author of Scala's parallel collections, Aleksandar Prokopec. Here is his feedback:
- Always use the scala.testing.Benchmark class to measure performance
- Always use the server VM. Benchmark results will in most cases be entirely different.
- There are no parallel lists - when calling .par on a list, it gets converted to the standard immutable parallel collection, which is ParVector. Transformer methods (such as filter and map) process items in parallel, but construct the final parallel vector sequentially - until vector concatenation is built into the standard library, the construction of the parallel vector will not be parallelized. We are looking into adding concats. In the meantime, I suggest using parallel arrays and comparing them against regular arrays in terms of performance.
- Every invocation of a parallel method requires some sync, so you might want to keep the number of invocations to a minimum. This is particularly the case with transformer methods which produce intermediate collections - thus possibly invoking gc cycles.
... and the modified version of the example:
object util {
val studentCount = 1000000
}
case class Student(gradYear: Int, score: Double)
object StudentExample extends testing.Benchmark {
val students = (for (i <- 1 to util.studentCount)
yield new Student(2010 + (i % 2), math.random)).toArray
def run() {
val gradYear = 2011
students.filter(_.gradYear == gradYear).map(_.score).fold(0.0d)(math.max)
}
}
object StudentExamplePar extends testing.Benchmark {
val students = (for (i <- 1 to util.studentCount)
yield new Student(2010 + (i % 2), math.random)).toArray.par
def run() {
val gradYear = 2011
students.filter(_.gradYear == gradYear).map(_.score).fold(0.0d)(math.max)
}
}
object StudentExampleImp extends testing.Benchmark {
val students = (for (i <- 1 to util.studentCount)
yield new Student(2010 + (i % 2), math.random)).toArray
def run() {
val gradYear = 2011
var maxscore = 0.0d
for (student <- students)
if (student.gradYear == gradYear)
maxscore = math.max(maxscore, student.score)
}
}
... and the results on my dual-core MacBook Pro:
$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExample 10
StudentExample$ 273 38 35 40 36 36 36 36 35 39
$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExamplePar 10
StudentExamplePar$ 495 32 31 31 30 30 30 32 30 31
... and the results on my quad-core iMac:
$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExample 10
StudentExample$ 154 18 17 17 17 17 17 17 17 17
$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExamplePar 10
StudentExamplePar$ 222 10 10 10 9 10 10 10 10 10
A performance gain can now be seen, but both Aleksandar and I agreed that it wasn't the expected speedup. As a reference I also ran the imperative example, which still was by far the most efficient. The results on the dual-core was:
$ java -server -cp /scala-2.9.1.final/lib/scala-library.jar:. StudentExampleImp 10
StudentExampleImp$ 24 7 13 12 10 10 11 11 13 13
No comments:
Post a Comment