Performance Tuning and Optimization | Scala Programming Guide

- Published on

JVM Performance Fundamentals
JIT Compilation and Warm-up
The JVM's Just-In-Time compiler is what makes Java (and Scala) fast in production, but it requires warm-up time to achieve peak performance. Initially, the JVM interprets bytecode—which is slow. As code is executed repeatedly (typically after 10,000 calls), frequently-executed methods become "hot" and the JIT compiler compiles them to native machine code, applying sophisticated optimizations. This has huge implications: benchmark results from cold starts are meaningless, performance improves over time as the application runs, and production workloads that reach steady state are far faster than microbenchmarks might suggest. For latency-sensitive applications, warm-up can be critical: requests that arrive after the application has been running for hours will be faster than those that arrive right after startup. Understanding JIT is essential for writing performant code and interpreting performance measurements correctly.
// JIT needs warm-up time to optimize
// First calls use interpreter (slow)
// After ~10,000 calls, methods become "hot" and JIT compiles them
def fibonacci(n: Int): Long = {
if (n <= 1) n else fibonacci(n - 1) + fibonacci(n - 2)
}
// Warm-up phase
(1 to 10000).foreach(_ => fibonacci(30)) // Slow initially
// Now JIT has optimized this
val optimized = fibonacci(30) // Much faster
// Implication: Always warm up before benchmarking!
Garbage Collection
Garbage collection is both a blessing and a challenge: it frees you from manual memory management but introduces pauses that can destroy latency-sensitive applications. Understanding JVM garbage collection helps you minimize these pauses and design applications that play well with the GC. The JVM heap is divided into generations: young generation (where most objects are created and die quickly) and old generation (where long-lived objects reside). Young GC is fast; full GC is slow. By minimizing object allocation and reducing the number of objects that survive the young generation, you can dramatically reduce GC pause times. For high-frequency trading, real-time systems, or low-latency services, understanding and tuning GC becomes critical. This section shows how to measure GC behavior, understand what's happening, and design code that cooperates with the garbage collector rather than fighting it.
// Monitor GC with JVM flags:
// -XX:+PrintGCDetails -XX:+PrintGCDateStamps
// -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
// Young generation (fast): Most objects die here
// Old generation (slow): Long-lived objects
// Creating too many objects triggers Young GC
def allocateHeavily(): Unit = {
val lists = (1 to 1000000).map { i =>
List.range(1, 100) // Each iteration allocates new list
}
// Young GC pauses here as temporary lists fill Eden
}
// Better: Reuse objects or process in streaming fashion
def allocateLightly(): Unit = {
val list = List.range(1, 100)
(1 to 1000000).foreach { _ =>
// Reuse same list, not allocating new ones
list.map(_ * 2)
}
}
// Minimize allocation in latency-critical code paths
class HighThroughputProcessor {
private val buffer = Array.ofDim[Byte](8192) // Reused buffer
def process(stream: InputStream): Unit = {
var bytesRead = 0
while ({
bytesRead = stream.read(buffer)
bytesRead >= 0
}) {
// Process buffer without allocating new arrays
processBuffer(buffer, bytesRead)
}
}
private def processBuffer(data: Array[Byte], length: Int): Unit = {
// ... process without allocating
}
}
Benchmarking with JMH
Java Microbenchmark Harness provides accurate performance measurement. Understanding JMH is essential because naive benchmarking leads to completely misleading results: HotSpot optimizations, garbage collection, and JVM warmup mean that code timed once at the start of a program runs vastly differently than code timed after millions of executions.
JMH handles the hard problems: ensuring adequate warmup so HotSpot optimizations kick in, running multiple forks to reduce interference between benchmarks, controlling garbage collection, preventing dead code elimination, and providing statistical significance testing. The annotations are straightforward—@Benchmark marks a method as a benchmark, @State controls object lifecycle, @Param lets you test multiple configurations—but understanding what they control is crucial for valid measurements.
The profilers integrated with JMH let you see why a benchmark behaves the way it does: CPU time vs. wall-clock time tells you about GC pressure, allocation flamegraphs show unexpected object creation, lock contention profiles reveal synchronization bottlenecks. Running jmh:run in SBT is convenient, but for detailed analysis you'll want to export results and analyze them with tools like JMH Visualizer. Remember that benchmarks measure specific scenarios—always verify that your benchmark matches your real-world workload.
// build.sbt
addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.4.7")
// In jmh config
enablePlugins(JmhPlugin)
// src/jmh/scala/benchmarks/StringProcessingBenchmark.scala
package benchmarks
import org.openjdk.jmh.annotations._
import org.openjdk.jmh.infra.Blackhole
import scala.util.Random
// Benchmark class
@State(Scope.Benchmark) // Reused across benchmark threads
class StringProcessingBenchmark {
// Warm-up: 5 iterations of 10 seconds each
@Warmup(iterations = 5, time = 10, timeUnit = TimeUnit.SECONDS)
// Measurement: 5 iterations of 10 seconds each
@Measurement(iterations = 5, time = 10, timeUnit = TimeUnit.SECONDS)
// Run in 4 threads
@Fork(value = 4)
// Avoid dead code elimination
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
def stringConcatenation(bh: Blackhole): Unit = {
var result = ""
(1 to 100).foreach { i =>
result += i.toString
}
bh.consume(result) // Prevent JVM from optimizing away computation
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
def stringBuilderConcatenation(bh: Blackhole): Unit = {
val sb = new StringBuilder()
(1 to 100).foreach { i =>
sb.append(i.toString)
}
bh.consume(sb.toString())
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
def listAlloc(bh: Blackhole): Unit = {
val list = (1 to 1000).toList
bh.consume(list)
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
def vectorAlloc(bh: Blackhole): Unit = {
val vector = (1 to 1000).toVector
bh.consume(vector)
}
// Parameter-based benchmarks
@Param(Array("100", "1000", "10000"))
var size: Int = _
@Benchmark
def mapOperation(bh: Blackhole): Unit = {
val result = (1 to size).map(_ * 2).sum
bh.consume(result)
}
}
// Run with: sbt "jmh:run"
// Result: stringConcatenation is ~100x slower than StringBuilder!
// Average time for stringConcatenation: ~2.5 microseconds
// Average time for stringBuilderConcatenation: ~0.025 microseconds
Value Classes to Avoid Boxing
Value classes provide zero-cost wrappers:
// Problem: Generic types box primitives
def processNumbers(nums: List[Int]): Int = {
// Each Int in List[Int] is boxed as Integer object
nums.map(_ * 2).sum
}
// Solution: Value class
class UserId(val value: String) extends AnyVal {
// This is compiled away - no object allocation at runtime!
def formatted: String = s"USR-${value.toUpperCase}"
}
val id = UserId("abc123") // No allocation!
id.formatted // Direct method call, no indirection
// Value class benefits:
// - No memory overhead (compiled to primitive/String directly)
// - No boxing/unboxing overhead
// - Only works with single val parameter
// - Can't be subclassed
// Example: High-performance domain types
class OrderId(val value: String) extends AnyVal
class CustomerId(val value: String) extends AnyVal
class Money(val cents: Long) extends AnyVal {
def +(other: Money): Money = new Money(cents + other.cents)
def toDecimal: BigDecimal = BigDecimal(cents) / 100
}
// These are free abstractions - zero runtime cost!
val order1 = OrderId("ORD-001")
val order2 = OrderId("ORD-002")
val amount = Money(9999) // $99.99 in cents
// Cannot accidentally mix types - compile-time safety
// amount + order1 // Compile error!
Specialization for Generics
The @specialized annotation generates specialized versions for primitive types:
// Without specialization: boxing overhead
class Container[T](value: T) {
def getValue: T = value
}
val intContainer = new Container(42)
val x = intContainer.getValue // Boxing: Int -> Integer -> Int
// With specialization: generates specialized bytecode for each primitive
@specialized
class FastContainer[@specialized T](value: T) {
def getValue: T = value
}
val intContainer2 = new FastContainer(42)
val y = intContainer2.getValue // No boxing!
// @specialized works with type parameters in methods too
class Processor {
@specialized
def process[@specialized T](items: Array[T]): Unit = {
var i = 0
while (i < items.length) {
println(items(i))
i += 1
}
}
}
// Note: @specialized generates separate bytecode for:
// - Int, Long, Double, Float, Boolean, Byte, Char, Short, Unit
// This increases compiled code size, so use judiciously
Tail Recursion for Stack Safety
@tailrec verifies that recursion is optimized:
// NOT tail recursive - grows stack
def sum(nums: List[Int]): Long = nums match {
case Nil => 0
case head :: tail => head + sum(tail) // Recursive call after addition
}
sum((1 to 10000).toList) // StackOverflowError!
// Tail recursive - can be optimized to loop
@tailrec
def sumTailRecursive(nums: List[Int], accumulator: Long = 0): Long = {
nums match {
case Nil => accumulator
case head :: tail => sumTailRecursive(tail, accumulator + head)
}
}
sumTailRecursive((1 to 1000000).toList) // No stack overflow!
// @tailrec catches accidental non-tail-recursion at compile time
@tailrec
def badRecursion(n: Int): Int = {
if (n <= 0) 0
else badRecursion(n - 1) + 1 // ERROR: Could not optimize tail call
}
// Practical example: deep tree traversal
sealed trait JsonValue
case class JsonObject(pairs: List[(String, JsonValue)]) extends JsonValue
case class JsonArray(items: List[JsonValue]) extends JsonValue
case class JsonString(value: String) extends JsonValue
case class JsonNumber(value: Double) extends JsonValue
case object JsonNull extends JsonValue
// Find all string values in JSON structure
@tailrec
def findStrings(
node: JsonValue,
accumulator: List[String] = List()
): List[String] = {
node match {
case JsonString(s) =>
s :: accumulator
case JsonObject(pairs) =>
// Cannot use tail recursion here due to list traversal
pairs.flatMap(_._2).foldLeft(accumulator) { (acc, value) =>
findStrings(value, acc)
}
case JsonArray(items) =>
items.foldLeft(accumulator) { (acc, value) =>
findStrings(value, acc)
}
case _ => accumulator
}
}
Collection Performance
Different collections have different performance characteristics:
// Scenario 1: Frequent random access
val array = Array(1, 2, 3, 4, 5) // O(1) access - BEST
val list = List(1, 2, 3, 4, 5) // O(n) access - BAD
val vector = Vector(1, 2, 3, 4, 5) // O(log n) access - OK
// If you need random access, use Array or Vector
def sum(numbers: Vector[Int]): Int = {
var total = 0
var i = 0
while (i < numbers.length) {
total += numbers(i) // Fast random access
i += 1
}
total
}
// Scenario 2: Frequent prepending
val prepended = 0 :: list // O(1) for List
val prepended2 = 0 +: vector // O(n) for Vector - SLOW
// Use List for prepending
// Scenario 3: Iteration only
val iterated = array.map(_ * 2) // O(n) - fine
val iterated2 = list.map(_ * 2) // O(n) - fine
val iterated3 = vector.map(_ * 2) // O(n) - fine
// All are equivalent for iteration
// Scenario 4: Building collection
val builder1 = scala.collection.mutable.ListBuffer() // Efficient for List
builder1 += 1
builder1 += 2
val result1 = builder1.toList
val builder2 = scala.collection.mutable.ArrayBuffer() // Efficient for Array
builder2 += 1
builder2 += 2
val result2 = builder2.toArray
// Scenario 5: Large immutable collection creation
// Use Vector for structural sharing with reasonable performance
// Use List only if you primarily access head
// Use Array only if you need mutable access or zero-copy
// Performance comparison benchmark results:
// Access: Vector ~200ns, List ~5000ns, Array ~10ns
// Append: Vector ~20ns, List ~200ns, Array ~n/a (grows heap)
// Prepend: List ~10ns, Vector ~500ns
Profiling Tools
Java Flight Recorder (JFR)
# Start JVM with JFR enabled
java -XX:+UnlockCommercialFeatures \
-XX:+FlightRecorder \
-XX:StartFlightRecording=duration=60s,filename=recording.jfr \
-jar myapp.jar
# Analyze recording
jmc # Open Java Mission Control GUI with recording
# Command-line analysis
jcmd <pid> JFR.dump filename=recording.jfr
jfr print recording.jfr
async-profiler for CPU Profiling
# Install: https://github.com/jvm-profiling-tools/async-profiler
# Compile Scala with -g flag for debug info
# Profile CPU usage (60 seconds)
./profiler.sh -d 60 -e cpu -f flamegraph.html <pid>
# Profile allocation
./profiler.sh -d 60 -e alloc -f flamegraph.html <pid>
# Profile lock contention
./profiler.sh -d 60 -e lock -f flamegraph.html <pid>
# Generate report
./profiler.sh -d 60 -e cpu -f jfr <pid>
jfr print profile.jfr
Common Performance Pitfalls
1. String Concatenation in Loops
// SLOW: O(n²) due to new String creation per iteration
def slowConcat(words: List[String]): String = {
var result = ""
words.foreach { word =>
result += word + ", "
}
result
}
// FAST: O(n) single allocation
def fastConcat(words: List[String]): String = {
val sb = new StringBuilder
words.foreach { word =>
sb.append(word).append(", ")
}
sb.toString()
}
// Performance: slowConcat on 10000 items takes ~500ms
// fastConcat on 10000 items takes ~1ms
2. Excessive List Operations
// SLOW: Multiple traversals
val result = list
.filter(_ > 0)
.map(_ * 2)
.filter(_ < 100)
.map(_ + 1)
// Each operation traverses entire list
// FAST: Single traversal (fusion)
val result = list.collect {
case x if x > 0 && x * 2 < 100 => x * 2 + 1
}
// Or use iterators for large lists
val result = list.iterator
.filter(_ > 0)
.map(_ * 2)
.filter(_ < 100)
.map(_ + 1)
.toList
3. Boxing/Unboxing Overhead
// SLOW: List[Int] boxes each integer
val numbers: List[Int] = (1 to 1000000).toList
val sum = numbers.sum // Unboxes each element
// FAST: Use Array or mutable.ArrayBuffer
val numbers: Array[Int] = (1 to 1000000).toArray
var sum = 0
var i = 0
while (i < numbers.length) {
sum += numbers(i)
i += 1
}
// Performance: List.sum takes ~50ms, Array loop takes ~2ms
4. Repeated Collection Conversions
// SLOW: Multiple conversions
def process(list: List[Int]): Set[Int] = {
val array = list.toArray
val set: Set[Int] = array.toSet
set
}
// FAST: Direct conversion
def process(list: List[Int]): Set[Int] = {
list.toSet
}
// SLOW: Converting back and forth
val list = mySet.toList
val filtered = list.filter(_ > 10).toSet
// FAST: Stay in one type
val filtered = mySet.filter(_ > 10)
5. Unnecessary Option/Either Wrapping
// SLOW: Boxing small values
def find(items: List[Int]): Option[Int] = {
items.find(_ > 0) // Allocates Some/None object
}
// FASTER: Direct return in loops
def find(items: List[Int]): Int = {
for (item <- items) {
if (item > 0) return item
}
-1 // sentinel value
}
// Note: Modern Scala with primitive specialization can make this moot,
// but the general principle of minimizing allocations remains.
6. Lock Contention
// SLOW: Holding lock during I/O
class SlowService {
private val lock = new Object()
private var cache = Map[String, String]()
def lookup(key: String): String = lock.synchronized {
if (cache.contains(key)) {
cache(key)
} else {
val result = expensiveIoOperation() // Long I/O while holding lock!
cache = cache + (key -> result)
result
}
}
}
// FAST: Release lock before I/O
class FastService {
private val lock = new Object()
private var cache = Map[String, String]()
def lookup(key: String): String = {
// Check cache without holding lock
val cached = lock.synchronized { cache.get(key) }
cached match {
case Some(value) => value
case None =>
val result = expensiveIoOperation() // No lock!
lock.synchronized {
cache = cache + (key -> result)
}
result
}
}
}
7. Creating Objects in Hot Paths
// SLOW: New object per call
def processData(data: String): Result = {
val parser = new JsonParser() // Allocates per call
parser.parse(data)
}
// FAST: Reuse object
class Processor {
private val parser = new JsonParser()
def processData(data: String): Result = {
parser.parse(data) // Reuses same parser
}
}
// SLOW: Closure captures variables
(1 to 1000000).map { i =>
val multiplier = 2 // Captured in closure
val result = expensiveComputation(i)
result * multiplier // Closure overhead
}
// FAST: Avoid captures
val multiplier = 2
(1 to 1000000).map { i =>
val result = expensiveComputation(i)
result * multiplier // Direct variable access
}
APPENDIX A: Scala 2 vs Scala 3 Migration Cheatsheet
Scala 3 (Dotty) modernized the language with significant improvements. This appendix provides a side-by-side migration guide.
Syntax Changes
| Feature | Scala 2 | Scala 3 | Notes |
|---|---|---|---|
| Implicit parameters | def foo(implicit x: Int) | def foo(using x: Int) | Clearer intent, no ambiguity |
| Implicit conversions | implicit def strToInt(s: String): Int = s.toInt | given Conversion[String, Int] = _.toInt | More explicit, controlled |
| Extension methods | implicit class StringOps(s: String) | extension (s: String) def foo: String = ... | Built-in syntax, cleaner |
| Context bounds | def foo[T: Ordering] | def foo[T: Ordering] (same, but powered by given) | Same syntax, new semantics |
| Union types | Manual sealed trait hierarchy | Int | String | Boolean | Direct type syntax |
| Intersection types | Manual trait mixing | Printable & Serializable | Direct type syntax |
| Tuple syntax | (1, "a", true) | (1, "a", true) (same, better inference) | More consistent |
| Named arguments | foo(x = 1, y = 2) | foo(x = 1, y = 2) (same) | Works with more cases |
| Case classes | case class User(name: String, age: Int) | case class User(name: String, age: Int) (same) | Can now also be enums |
| Pattern matching | x match { case _ => ... } | Same syntax, ` | ` patterns added |
| Match expressions | val x = y match { ... } | Same, more composable with GADTs | Type-level benefits |
| For comprehensions | for (x <- xs) yield f(x) | Same | Works with more types |
| Indentation-based syntax | Braces required | Optional braces | Pythonic alternative |
| Trailing commas | Only in edge cases | Generally allowed | More flexible formatting |
| Do-notation (unstable) | N/A | do { x <- foo; y <- bar; x + y } | Async syntax improvements |
Implicit to Given Migration
The most significant change - implicits are now "givens":
// Scala 2: Old implicit
implicit val intOrdering: Ordering[Int] = Ordering.Int
implicit def listOrdering[T: Ordering]: Ordering[List[T]] = {
implicitly[Ordering[T]] // Access other implicits
}
def sort[T](xs: List[T])(implicit ord: Ordering[T]): List[T] = {
xs.sorted(ord)
}
val sorted = sort(List(3, 1, 2)) // Implicit passed automatically
// Scala 3: New given syntax
given Ordering[Int] = Ordering.Int
given [T: Ordering]: Ordering[List[T]] with {
def compare(x: List[T], y: List[T]): Int = ???
}
def sort[T](xs: List[T])(using ord: Ordering[T]): List[T] = {
xs.sorted(ord)
}
val sorted = sort(List(3, 1, 2)) // Using passed automatically
Migration strategy:
- Replace
implicitwithgivenat definition site - Replace
implicit x: Twithusing x: Tin parameter lists - Use
implicitly[T]->summon[T]for explicit summoning - Replace
implicit classwithextension
// Scala 2
implicit class RichString(s: String) {
def shout: String = s.toUpperCase + "!"
}
"hello".shout // "HELLO!"
// Scala 3
extension (s: String)
def shout: String = s.toUpperCase + "!"
"hello".shout // "HELLO!"
New Features in Scala 3
Enums (Algebraic Data Types)
Scala 3 introduces built-in enum syntax that makes defining algebraic data types far more concise than the sealed trait pattern. Enums capture the intent directly: you're defining a type with a fixed set of possible values. The compiler understands enum structure and can provide better error messages and exhaustiveness checking. Enums are particularly powerful for domain modeling where you want to ensure all cases are explicitly handled. This improves code readability and reduces boilerplate compared to Scala 2's sealed trait approach.
// Scala 2: Verbose sealed trait pattern
sealed trait Color
object Color {
case object Red extends Color
case object Green extends Color
case object Blue extends Color
case class Custom(hex: String) extends Color
}
// Scala 3: Enum syntax
enum Color:
case Red
case Green
case Blue
case Custom(hex: String)
// Enums with methods
enum Bool:
case True
case False
def negate: Bool = this match
case True => False
case False => True
// Matching
color match
case Color.Red => "red"
case Color.Green => "green"
case Color.Blue => "blue"
case Color.Custom(hex) => hex
Opaque Types
Opaque types let you create distinct types that compile away to their underlying type at runtime, giving you type safety without performance overhead. This is perfect for creating newtype-like distinctions: you want UserId to be distinct from Int at compile time, but at runtime it's just an Int with no wrapper object. Unlike case class wrappers, opaque types have zero runtime cost. They're ideal for strongly-typed domain modeling, preventing errors like passing a UserId where an OrderId is expected, while maintaining the performance characteristics of the underlying type.
// Scala 3
opaque type UserId = String
object UserId:
def apply(value: String): UserId = value
def getUserName(id: UserId): String = ???
// At compile time: UserId is different type
val id1: UserId = UserId("user-123")
val id2: UserId = UserId("user-456")
// id1 + id2 // ERROR: Cannot add UserId values
// At runtime: UserId is erased to String (zero cost abstraction!)
Structural Types
Structural types let you define interfaces based on structure (duck typing) rather than explicit inheritance, enabling ad-hoc type compatibility. If a type has the required methods, it's compatible—without needing explicit declaration. This is useful for integrating external libraries or writing generic code that doesn't assume a specific type hierarchy. However, use structural types sparingly: they sacrifice some type safety and clarity compared to explicit trait-based interfaces. They're best reserved for situations where you truly need duck typing or must work with unrelated types that have similar interfaces.
// Scala 3
def quack(x: { def quack(): String }): String = {
x.quack() // Works on any type with quack method
}
class Duck:
def quack(): String = "Quack!"
class FakeDuck:
def quack(): String = "Fake quack!"
quack(Duck()) // Works
quack(FakeDuck()) // Works
Inlined Code
The inline modifier tells the compiler to expand a function's body at every call site, eliminating the function call overhead and enabling compile-time metaprogramming. This is useful for performance-critical code where function call overhead matters, and for generating code at compile time based on type information. Inlining also allows better optimization because the compiler can see the surrounding context. However, it increases code size (binary bloat), so use it judiciously. Scala 3 makes inlining safer and more predictable than C++'s inline hints, and it's the foundation for compile-time code generation and macro-like functionality.
// Scala 3
inline def powerOf2(n: Int): Int = {
n match
case 1 => 2
case 2 => 4
case 3 => 8
case _ => sys.error("Unknown power")
}
// Compiler inlines and optimizes
val x = powerOf2(1 + 1) // Becomes: val x = 4
// Conditional compilation
inline def debug(msg: String): Unit = {
if scala.compiletime.testing.isCompiletime then
println(s"COMPILE TIME: $msg")
}
Match Types
Match types enable type-level pattern matching, allowing you to compute types based on other types at compile time. This is advanced territory: you define rules that transform input types into output types. Match types are useful for building type-safe abstractions that adapt their behavior based on input types, such as libraries that provide different APIs for different container types. They're particularly valuable in functional programming libraries where the API must adapt based on the concrete monad or functor being used. This is an advanced feature that most developers don't need, but it enables incredibly expressive type-safe abstractions.
// Scala 3
type Elem[T] = T match
case String => Char
case Array[t] => t
case List[t] => t
case _ => T
val s: Elem[String] = 'a' // Char
val a: Elem[Array[Int]] = 0 // Int
val l: Elem[List[String]] = "hi" // String
val other: Elem[Int] = 42 // Int
Removed/Changed Features in Scala 2
DelayedInit Removed
Scala 2 constructor-side effects won't work:
// Scala 2
class App extends DelayedInit {
def delayedInit(body: => Unit): Unit = println("Starting"); body
println("Main code") // Executes via delayedInit
}
// Scala 3: Use @main annotation
@main def hello(name: String): Unit =
println(s"Hello, $name!")
Procedure Syntax Removed
Scala 2 allowed omitting Unit return type:
// Scala 2
def foo { println("bar") } // Returns Unit
// Scala 3: Must be explicit
def foo: Unit = println("bar")
Automatic Tuple Unapplying
Scala 2 allowed implicit tuple unpacking in patterns:
// Scala 2
val pair = (1, "a")
pair match {
case (a, b) => println(a) // Automatic unpacking
}
// Scala 3: Still works, but more explicit
pair match {
case Tuple2(a, b) => println(a)
}
View Bounds Deprecated
// Scala 2 (deprecated)
def foo[T <% Int](x: T): Int = x // Uses implicit conversion
// Scala 3: Use implicit conversion instead
def foo[T](x: T)(using Conversion[T, Int]): Int = ???
Migration Tools
Scalafix
Automated code migrations:
# Install scalafmt for automatic formatting
brew install scalafmt
scalafmt --scala-version 3.3.1 src/
# Use scalafix for targeted migrations
sbt "scalafix RemoveUnusedImports"
sbt "scalafix MissingOptionDefault"
Manual Migration Checklist
- Update
scalaVersioninbuild.sbtto 3.x - Replace all
implicitwithgivenorextension - Replace
implicitparameters withusing - Convert sealed trait + case objects to
enum - Remove
implicit class→ useextension - Replace
implicitly[T]withsummon[T] - Update dependent libraries to Scala 3 versions
- Test with
-Xmigrationflag for warnings - Check for Java interop issues
Common Migration Issues
// Issue 1: Implicit resolution order changed
// Scala 2 priority: local scope, imports, companion objects
// Scala 3: More systematic (lexical scope first)
// Issue 2: Anonymous function syntax
// Scala 2: (x: Int) => x + 1
// Scala 3: (x: Int) => x + 1 (same)
// But: x => x + 1 (only works in limited contexts now)
// Issue 3: Wildcard imports
// Scala 2: import foo._
// Scala 3: import foo.* (both work, * preferred)
// Issue 4: Function type syntax
// Scala 2: Function2[Int, Int, Int]
// Scala 3: (Int, Int) => Int (preferred, clearer)
APPENDIX B: Common Pitfalls and How to Avoid Them
1. Null Pointer Exceptions from Uninitialized Variables
The Problem:
// DANGEROUS: var without initialization
var config: AppConfig = _ // _ means null
val result = config.getDatabaseUrl() // NullPointerException!
Why It Happens: Scala allows var with uninitialized values (defaults to null for reference types), but doesn't track initialization, leading to silent nulls.
The Fix:
// Option 1: Always initialize
var config: AppConfig = loadConfigOrDefault()
// Option 2: Use Option (better)
var config: Option[AppConfig] = None
config = Some(loadConfig())
// Option 3: Use lazy val (best for single initialization)
lazy val config: AppConfig = loadConfig()
// Option 4: Use Try for error handling
val config: Try[AppConfig] = Try(loadConfig())
2. Incorrect Pattern Matching with Variables
The Problem:
// WRONG: binds x to the literal 5, not variable x
val x = 10
val y = 5 match {
case x => x + 1 // x shadows outer x, returns 6
}
println(x) // Still 10, but confusing!
// WRONG: using variable in pattern thinking it's value
val threshold = 10
List(5, 15, 8, 12) match {
case List(a, threshold, c, d) => // threshold binds to 15, not compared!
println(s"$a $threshold $c $d") // Matches unconditionally
}
Why It Happens: Variables in patterns always bind to values, not compare. To compare values, use backticks.
The Fix:
// Use backticks for value comparison
val threshold = 10
List(5, 15, 8, 12).foreach {
case x if x > `threshold` => // backticks access outer variable
println(s"$x exceeds threshold")
case x =>
println(s"$x is within threshold")
}
// For pattern matching with guard
val x = 10
val result = 5 match {
case n if n == x => n + 1 // Guard checks value
case n => n // n binds to value, not variable
}
3. Modifying Collections While Iterating
The Problem:
// WRONG: Concurrent modification exception
val list = scala.collection.mutable.ListBuffer(1, 2, 3, 4, 5)
list.foreach { item =>
if (item % 2 == 0) {
list -= item // java.util.ConcurrentModificationException
}
}
// WRONG: Silent incorrect behavior
val set = scala.collection.mutable.Set(1, 2, 3, 4, 5)
for (item <- set) {
if (item % 2 == 0) {
set -= item // Unpredictable iteration
}
}
Why It Happens: Modifying underlying collection during iteration violates iterator contracts.
The Fix:
// Option 1: Filter into new collection
val list = scala.collection.mutable.ListBuffer(1, 2, 3, 4, 5)
val odds = list.filter(_ % 2 != 0)
// Option 2: Collect mutations, apply after
val list = scala.collection.mutable.ListBuffer(1, 2, 3, 4, 5)
val toRemove = list.filter(_ % 2 == 0)
toRemove.foreach(list -= _)
// Option 3: Use immutable, assign back
var list = List(1, 2, 3, 4, 5)
list = list.filter(_ % 2 != 0)
// Option 4: Use iterator carefully
val set = scala.collection.mutable.Set(1, 2, 3, 4, 5)
val iter = set.iterator
while (iter.hasNext) {
val item = iter.next()
if (item % 2 == 0) {
iter.remove() // Proper removal during iteration
}
}
4. Off-by-One Errors in Range Operations
The Problem:
// WRONG: Often forgets range is exclusive at upper bound
for (i <- 0 to 10) println(i) // Prints 0-10 (inclusive)
for (i <- 0 until 10) println(i) // Prints 0-9 (exclusive)
// WRONG: Creating wrong size array
val array = Array.ofDim[Int](5 to 10) // Error: requires Int
Why It Happens: Two similar operators with different semantics; easy to confuse.
The Fix:
// Use descriptive naming
val count = 5
val indices = 0 until count // 0 to count-1
val inclusive = 0 to (count - 1) // Same, more explicit
// When in doubt, print
println(s"${(0 until 5).toList}") // [0,1,2,3,4]
println(s"${(0 to 5).toList}") // [0,1,2,3,4,5]
// Array sizing is usually better done explicitly
val n = 10
val array = new Array[Int](n) // Size n, indices 0 to n-1
5. Shared Mutable State in Closures
The Problem:
// WRONG: All functions reference the same mutable var
def makeCounters(n: Int): List[() => Int] = {
var count = 0
(1 to n).map { _ =>
() => { count += 1; count } // All share same 'count'!
}.toList
}
val counters = makeCounters(3)
println(counters(0)()) // 1
println(counters(1)()) // 2 (expected 1!)
println(counters(2)()) // 3 (expected 1!)
Why It Happens: Closures capture variables by reference, not value. All closures share the same mutable variable.
The Fix:
// Option 1: Capture by value using local scope
def makeCounters(n: Int): List[() => Int] = {
(1 to n).map { index =>
var count = 0 // Each closure gets its own 'count'
() => { count += 1; count }
}.toList
}
// Option 2: Use case classes for immutable state
def makeCounters(n: Int): List[() => Int] = {
(1 to n).map { initialValue =>
var currentValue = initialValue
() => { currentValue += 1; currentValue }
}.toList
}
// Option 3: Use functional approach without mutable state
def makeCounters(n: Int): List[() => Int] = {
(1 to n).map { index =>
val initialState = index
() => initialState // Immutable, captures value
}.toList
}
6. Type Erasure with Generics
The Problem:
// WRONG: Cannot distinguish List[Int] from List[String] at runtime
def process(list: List[Int]): Unit = ???
def process(list: List[String]): Unit = ??? // ERROR: Duplicate signature!
// WRONG: Type information lost at runtime
def isIntList(obj: Any): Boolean = {
obj match {
case list: List[Int] => true // WARNING: unchecked cast
case _ => false
}
}
// WRONG: Unsafe casting assumptions
val mixed: List[Any] = List(1, "two", 3)
val ints = mixed.asInstanceOf[List[Int]] // Unsafe!
println(ints(1).toUpperCase) // ClassCastException at runtime
Why It Happens: JVM erases generic type parameters at runtime for efficiency. List[Int] becomes just List in bytecode.
The Fix:
// Option 1: Use type tags/manifests (Scala 2)
import scala.reflect.ClassTag
def process[T: ClassTag](list: List[T]): Unit = {
println(implicitly[ClassTag[T]]) // Can access type at runtime
}
// Option 2: Use different method names
def processInts(list: List[Int]): Unit = ???
def processStrings(list: List[String]): Unit = ???
// Option 3: Pattern match on container type, not element type
def isListOfAny(obj: Any): Boolean = {
obj match {
case _: List[_] => true // Matches any List
case _ => false
}
}
// Option 4: Use wrapper type to preserve information
case class IntList(values: List[Int])
case class StringList(values: List[String])
def process(list: IntList): Unit = ???
def process(list: StringList): Unit = ??? // OK: Different types
7. Forgetting List is Persistent (Immutable)
The Problem:
// WRONG: Expecting mutation
var list = List(1, 2, 3)
list.map(_ * 2) // Returns new List, doesn't modify original
println(list) // Still [1, 2, 3], not [2, 4, 6]!
// WRONG: Inefficient rebuilding
var list = List(1, 2, 3)
list = list :+ 4 // Creates entire new list, O(n)
list = list :+ 5 // Again, O(n)
list = list :+ 6 // Again, O(n)
// Total: O(n²) for n appends
Why It Happens: List is immutable; operations return new lists. Append is inefficient for immutable List.
The Fix:
// Option 1: Use ListBuffer for building, then convert
val builder = scala.collection.mutable.ListBuffer(1, 2, 3)
builder += 4
builder += 5
val list = builder.toList // O(n) single conversion
// Option 2: Build in one expression
val list = List(1, 2, 3).flatMap(x => List(x, x * 2))
// Option 3: Use Vector for append-friendly immutable collection
var list = Vector(1, 2, 3)
list = list :+ 4 // O(log n) for append
list = list :+ 5
list = list :+ 6
// Total: O(n log n) for n appends
// Option 4: Just use Array if mutability needed
val array = scala.collection.mutable.ArrayBuffer(1, 2, 3)
array += 4
array += 5
8. Comparing Objects with == Instead of Structure
The Problem:
// WRONG: References compared, not values
val a = new String("hello")
val b = new String("hello")
println(a == b) // For String, true (String overrides ==)
// WRONG: Custom class without overriding ==
class User(val name: String) {
// No hashCode/equals override
}
val user1 = new User("Alice")
val user2 = new User("Alice")
println(user1 == user2) // false! Reference equality
val set = Set(user1)
set.contains(user2) // false, different reference
Why It Happens: Default == compares references (identity) for custom classes. Case classes auto-generate structural equality, but regular classes don't.
The Fix:
// Option 1: Use case class (auto-generates equals, hashCode)
case class User(name: String, email: String)
val user1 = User("Alice", "alice@example.com")
val user2 = User("Alice", "alice@example.com")
println(user1 == user2) // true!
// Option 2: Manually override equals and hashCode
class User(val name: String, val email: String) {
override def equals(obj: Any): Boolean = obj match {
case other: User =>
this.name == other.name && this.email == other.email
case _ => false
}
override def hashCode: Int = {
java.util.Objects.hash(name, email)
}
}
// Option 3: Use structural typing with ==
val a = ("Alice", "alice@example.com")
val b = ("Alice", "alice@example.com")
println(a == b) // true! Tuples have structural equality
9. Shadowing Variables Accidentally
The Problem:
// WRONG: Inner scope shadows outer variable
val x = 10
def process(): Unit = {
val x = 20 // Shadows outer x
println(x) // 20, but intended outer value?
}
// WRONG: In pattern matching
val value = 5
val result = List(1, 2, 3) match {
case List(value, _, _) => // Shadows outer 'value'
value + 10 // Uses pattern-matched value (1)
}
println(result) // 11, but expected 15?
Why It Happens: Scala allows variable shadowing to simulate late binding and scope-local reasoning, but this can be confusing.
The Fix:
// Enable compiler warning for shadowing
// In build.sbt: scalacOptions += "-Wshadow"
// Rename to avoid shadowing
val outerX = 10
def process(): Unit = {
val localX = 20
println(localX)
}
// Use different names in patterns
val threshold = 5
val result = List(1, 2, 3) match {
case List(first, _, _) => // Clear different name
first + 10
}
// Use backticks to reference outer variable in pattern
val value = 5
val result = List(1, 2, 3) match {
case List(`value`, _, _) => // Compares to outer 'value'
value + 10
case List(other, _, _) => // Binds to other
other + 10
}
10. Ignoring Future/IO Exceptions
The Problem:
// WRONG: Exception silently swallowed
val future = Future {
throw new Exception("Oops!")
}
// Future failed, but no one knows!
// WRONG: onComplete without proper error handling
import scala.concurrent.Future
val f = Future(riskyOperation())
f.onComplete {
case scala.util.Success(value) => println(value)
case scala.util.Failure(_) => println("Failed") // Silent failure
}
// Program might exit before handling result
Why It Happens: Futures are asynchronous; exceptions don't propagate normally. onComplete callback might not execute before program exits.
The Fix:
// Option 1: Use Await for synchronous code
import scala.concurrent.Future, scala.concurrent.Await
import scala.concurrent.duration._
val future = Future(riskyOperation())
try {
val result = Await.result(future, 5.seconds)
println(result)
} catch {
case e: TimeoutException => println("Operation timed out")
case e: Exception => println(s"Operation failed: ${e.getMessage}")
}
// Option 2: Use map/flatMap for composition
val result = Future(riskyOperation())
.map(processSuccess)
.recover { case e => processError(e) }
// Option 3: Use IO monad (cats-effect)
import cats.effect._
val io = IO(riskyOperation())
.handleError(e => s"Error: ${e.getMessage}")
io.unsafeRunSync()
// Option 4: Scala 3's async syntax
import scala.concurrent.{Future, Await}
import scala.async.Async.{async, await}
val result = async {
val r1 = await(futureA)
val r2 = await(futureB)
r1 + r2 // Natural syntax
}
11. Accumulating Large Lists in Recursion
The Problem:
// WRONG: O(n²) list concatenation
def sumLists(lists: List[List[Int]]): List[Int] = {
lists match {
case Nil => Nil
case head :: tail =>
head ++ sumLists(tail) // List.++ is O(n), called n times
}
}
// WRONG: Building string via concatenation
def buildReport(items: List[String]): String = {
var result = ""
items.foreach { item =>
result = result + "\n" + item // O(n²)
}
result
}
Why It Happens: List concatenation copies the entire left list each time. Accumulating with concatenation is inefficient.
The Fix:
// Option 1: Use accumulator with prepend (O(n))
def sumLists(lists: List[List[Int]]): List[Int] = {
@scala.annotation.tailrec
def go(lists: List[List[Int]], acc: List[Int]): List[Int] = {
lists match {
case Nil => acc.reverse // Reverse at end
case head :: tail =>
go(tail, head.reverse ++ acc) // Prepend reversed
}
}
go(lists, Nil)
}
// Option 2: Use flat method (more idiomatic)
def sumLists(lists: List[List[Int]]): List[Int] = {
lists.flatten
}
// Option 3: Use ListBuffer for building
def sumLists(lists: List[List[Int]]): List[Int] = {
val buffer = scala.collection.mutable.ListBuffer[Int]()
lists.foreach { list =>
buffer ++= list
}
buffer.toList
}
// Option 4: Use StringBuilder for strings
def buildReport(items: List[String]): String = {
val sb = new StringBuilder()
items.foreach { item =>
sb.append(item).append("\n")
}
sb.toString()
}
12. Not Handling Partial Functions Safely
The Problem:
// WRONG: MatchError if not all cases covered
val f: PartialFunction[Int, String] = {
case 1 => "one"
case 2 => "two"
}
f(3) // MatchError: no case match for 3
// WRONG: Assuming isDefinedAt without checking
val evens: PartialFunction[Int, String] = {
case n if n % 2 == 0 => "even"
}
List(1, 2, 3, 4, 5).map(evens) // MatchError for odd numbers!
Why It Happens: Partial functions are only defined for some inputs. Calling them with undefined inputs throws MatchError.
The Fix:
// Option 1: Check isDefinedAt
val f: PartialFunction[Int, String] = {
case 1 => "one"
case 2 => "two"
}
if (f.isDefinedAt(3)) {
println(f(3))
} else {
println("Not defined")
}
// Option 2: Use applyOrElse
val result = f.applyOrElse(3, (_: Int) => "unknown")
// Option 3: Use collect (filters to defined cases)
val results = List(1, 2, 3, 4, 5).collect {
case n if n % 2 == 0 => s"$n is even"
}
// Only processes 2, 4 (even numbers)
// Option 4: Use lift to convert to Option
val lifted = f.lift
lifted(1) // Some("one")
lifted(3) // None
// Option 5: Use match expressions (not PartialFunctions)
def describe(n: Int): String = n match {
case 1 => "one"
case 2 => "two"
case _ => "other"
}
13. Lazy Evaluation Gotchas
The Problem:
// WRONG: Lazy val evaluated multiple times due to error
lazy val config: Config = {
println("Loading config...")
loadConfigFromFile() // Throws exception
}
try {
val c = config // First access, loads config, throws
} catch { case _ => }
try {
val c = config // Retries loading? No, still throws
} catch { case _ => }
// WRONG: Lazy val causing deadlock in circular dependency
lazy val a: Int = b + 1
lazy val b: Int = a + 1
val x = a // Infinite recursion!
Why It Happens: Lazy vals evaluate once and cache result. If initialization throws, that exception is cached and re-thrown. Circular lazy dependencies don't cause infinite loops; they cause StackOverflowError.
The Fix:
// Option 1: Use Try for error-safe lazy loading
import scala.util.Try
lazy val config: Try[Config] = Try(loadConfigFromFile())
val result = config match {
case scala.util.Success(cfg) => cfg
case scala.util.Failure(e) => Config.default()
}
// Option 2: Use Option for optional lazy values
lazy val config: Option[Config] = {
try {
Some(loadConfigFromFile())
} catch {
case e => None
}
}
// Option 3: Avoid circular lazy dependencies
// Break cycles with immediate values or methods
val a: Int = 10
def b: Int = a + 1 // Method, not lazy val
def c: Int = b + 1
// Option 4: Use IO monad for delayed computation
import cats.effect.IO
val config: IO[Config] = IO(loadConfigFromFile())
// Deferred - only executes when explicitly run
14. Implicit Type Class Inference Failures
The Problem:
// WRONG: Implicit not found when needed
def format[T](value: T): String = {
implicitly[Formatter[T]].format(value)
}
implicit val intFormatter: Formatter[Int] = new Formatter[Int] {
def format(x: Int): String = x.toString
}
format(42) // Works
format("hello") // Compile error: could not find Formatter[String]
// WRONG: Ambiguous implicits
implicit val formatter1: Formatter[Int] = ???
implicit val formatter2: Formatter[Int] = ???
format(42) // Compile error: ambiguous implicit
Why It Happens: Implicit resolution is sensitive to scope and specificity. Compiler can't disambiguate when multiple candidates exist.
The Fix:
// Option 1: Provide all necessary instances
implicit val intFormatter: Formatter[Int] = ???
implicit val stringFormatter: Formatter[String] = ???
// Option 2: Use trait with default instances
trait Formatter[T] {
def format(value: T): String
}
object Formatter {
implicit val intFormatter: Formatter[Int] = ???
implicit val stringFormatter: Formatter[String] = ???
}
// Option 3: Be specific about which implicit to use
implicit val preferredFormatter: Formatter[Int] = ???
// Option 4: Use extension methods to be explicit
implicit class FormattableOps[T](value: T) {
def formatted(implicit fmt: Formatter[T]): String = fmt.format(value)
}
42.formatted // Explicit method call
// Option 5: Move imports to control scope
// Import only needed implicits in scope
15. Memory Leaks from Retained References
The Problem:
// WRONG: Cache never evicts, grows indefinitely
class DataCache {
private val cache = scala.collection.mutable.Map[String, Data]()
def get(key: String): Data = {
cache.getOrElseUpdate(key, loadData(key))
}
}
// WRONG: Static reference prevents GC
object EventTracker {
private val allEvents = scala.collection.mutable.ListBuffer[Event]()
def trackEvent(event: Event): Unit = {
allEvents += event // Never cleared
}
}
// WRONG: Listener not unregistered
class Subject {
private val listeners = scala.collection.mutable.ListBuffer[Observer]()
def addListener(l: Observer): Unit = listeners += l
def notifyListeners(): Unit = {
listeners.foreach(_.update())
}
}
val subject = new Subject()
subject.addListener(largeObject)
// largeObject held indefinitely even if no longer needed
Why It Happens: References in mutable collections are never cleared. GC can only collect objects with no live references.
The Fix:
// Option 1: Use weak references for caches
import scala.ref.WeakReference
class DataCache {
private val cache = scala.collection.mutable.Map[String, WeakReference[Data]]()
def get(key: String): Data = {
cache.get(key) match {
case Some(ref) =>
ref.get match {
case Some(data) => data
case None =>
val data = loadData(key)
cache(key) = new WeakReference(data)
data
}
case None =>
val data = loadData(key)
cache(key) = new WeakReference(data)
data
}
}
}
// Option 2: Use caffeine or Guava caches with eviction
// (Requires external library)
// Option 3: Explicitly unregister listeners
class Subject {
private val listeners = scala.collection.mutable.ListBuffer[Observer]()
def addListener(l: Observer): Unit = listeners += l
def removeListener(l: Observer): Unit = listeners -= l
def notifyListeners(): Unit = {
listeners.foreach(_.update())
}
}
val subject = new Subject()
val observer = new MyObserver()
subject.addListener(observer)
// ... use observer ...
subject.removeListener(observer) // Explicit cleanup
// Option 4: Use WeakHashMap for listener storage
class Subject {
// Keys are garbage collected automatically
private val listeners = scala.collection.concurrent.TrieMap[Observer, Boolean]()
def addListener(l: Observer): Unit = listeners(l) = true
def notifyListeners(): Unit = {
listeners.keys.foreach(_.update())
}
}
// Option 5: Use Try-with-resources pattern (Scala 3)
def processLargeDataset(filename: String): Unit = {
Using.resource(scala.io.Source.fromFile(filename)) { source =>
source.getLines().foreach(process)
} // Resource automatically closed here
}