Scala Programming Guidesscala-functional-programmingfunctional-programming

Immutability and Pure Functions | Scala Programming Guide

By Dmitri Meshin
Picture of the author
Published on
Immutability and Pure Functions - Scala Programming Guide

What is a Pure Function?

A pure function is one that:

  1. Always returns the same output for the same input (deterministic)
  2. Produces no side effects — doesn't modify external state, perform I/O, or interact with the outside world
  3. Exhibits referential transparency — you can replace any call to the function with its result value without changing the program's behavior

Consider this example: we're building a sensor data analyzer. Here's an impure version first, then we'll refactor to pure functions.

// IMPURE: Has side effects and mutation
var totalReadings = 0
var maxTemperature = Double.MinValue

def processTemperatureSensor(reading: Double): Unit = {
  totalReadings += 1  // MUTATION: modifying external variable
  if (reading > maxTemperature) {
    maxTemperature = reading  // MUTATION: modifying external variable
  }
  println(s"Processing: $reading")  // SIDE EFFECT: I/O
}

processTemperatureSensor(22.5)
processTemperatureSensor(24.1)
// Now totalReadings and maxTemperature have been modified globally

The problem: calling this function twice with the same input produces different results because maxTemperature changes. We can't reason about it independently.

Here's the pure version:

// PURE: No side effects, no mutation
// Input: a reading and the current state
// Output: a new state object (no external changes)
case class SensorState(
  totalReadings: Int,
  maxTemperature: Double,
  averageTemperature: Double
)

def processSingleReading(
    currentState: SensorState,
    reading: Double
): SensorState = {
  // Pure function: only uses input parameters
  // Returns a new SensorState (immutable)
  val newMax = math.max(currentState.maxTemperature, reading)
  val newAverage = (currentState.averageTemperature * currentState.totalReadings + reading) /
                   (currentState.totalReadings + 1)

  SensorState(
    totalReadings = currentState.totalReadings + 1,
    maxTemperature = newMax,
    averageTemperature = newAverage
  )
}

val initial = SensorState(0, Double.MinValue, 0.0)
val after1st = processSingleReading(initial, 22.5)
val after2nd = processSingleReading(after1st, 24.1)

println(after2nd.maxTemperature)  // 24.1
println(after2nd.totalReadings)   // 2
// Original 'initial' is unchanged — this is referential transparency!

Why Immutability Matters

Immutability is far more than an academic ideal—it's a practical solution to some of the hardest problems in concurrent systems. When data cannot change, you eliminate entire classes of bugs: race conditions, data corruption, and subtle timing-dependent failures that plagued traditional imperative code. Immutable data structures act as a contract: "this data will never change," which means any code using it can safely assume its state remains stable, even across threads. Beyond concurrency, immutability improves code clarity: readers don't need to trace through mutation history to understand a value's state. This section explores three powerful reasons to embrace immutability and how Scala makes it practical without sacrificing performance.

1. Thread Safety

When data is immutable, multiple threads can safely read it simultaneously without locks. This eliminates the entire category of synchronization bugs that plague mutable-state systems. No locks mean no deadlocks, no priority inversions, no lost updates. A thread can read immutable data a thousand times and always get the same result. Think of immutable data like a published textbook: a hundred people can read chapter 5 without interference because they can't change it. In mutable systems, every read or write needs coordination to prevent corruption. Immutable data needs no such ceremony. Here's the practical difference:

case class BudgetEntry(
  category: String,
  amount: BigDecimal,
  date: java.time.LocalDate
)

case class Budget(
  entries: List[BudgetEntry],
  currency: String  // Immutable
)

def calculateCategoryTotal(
    budget: Budget,
    category: String
): BigDecimal = {
  // No locks needed! 'budget' is immutable
  // Multiple threads can call this simultaneously
  budget.entries
    .filter(_.category == category)
    .map(_.amount)
    .foldLeft(BigDecimal(0))(_ + _)
}

// Safe concurrent calls:
val budget = Budget(
  List(
    BudgetEntry("Food", 45.50, java.time.LocalDate.now()),
    BudgetEntry("Transport", 12.00, java.time.LocalDate.now())
  ),
  "USD"
)

// No data races possible because 'budget' never changes
val foodTotal = calculateCategoryTotal(budget, "Food")  // Returns new value
val transportTotal = calculateCategoryTotal(budget, "Transport")  // Original 'budget' untouched

2. Predictability and Testability

Pure functions with immutable data are dramatically easier to test, reason about, and debug. When a function cannot modify external state and always produces the same output for the same input, testing becomes trivial: no setup, no teardown, no mocking of complex state machines. Furthermore, immutable functions are deterministic—call them with the same arguments a thousand times and you'll get the same result every time. This predictability transforms testing from a fragile, time-consuming activity into a simple matter of asserting input-output pairs. The original data remains unchanged throughout, making tests independent and allowing them to run in any order without side effects poisoning subsequent tests. Each test is a pure mathematical truth: "given this input, this is the output." There's no test pollution, no mysterious failures caused by test order, no hidden state lurking in the corners. Let's see how this works in practice:

Immutable functions are easy to test — no hidden state to track:

def validateBudgetEntry(entry: BudgetEntry): Either[String, BudgetEntry] = {
  // Pure function: given the same entry, always returns the same result
  if (entry.amount <= 0) {
    Left(s"Invalid amount: ${entry.amount}")
  } else if (entry.category.isEmpty) {
    Left("Category cannot be empty")
  } else {
    Right(entry)
  }
}

// Testing is trivial — no setup, no cleanup, no mock objects
assert(validateBudgetEntry(BudgetEntry("Food", 0, java.time.LocalDate.now())).isLeft)
assert(validateBudgetEntry(BudgetEntry("Food", 50, java.time.LocalDate.now())).isRight)
assert(validateBudgetEntry(BudgetEntry("", 50, java.time.LocalDate.now())).isLeft)

The Substitution Model — Reasoning About Code

The substitution model is perhaps the most powerful mental tool for understanding functional code. It says: if a function is pure and immutable, I can replace any call to it with its return value without changing the program's meaning or behavior. This is far more than a theoretical nicety—it's the foundation of mathematical reasoning about code. You can analyze a program algebraically, substituting function calls for their results, simplifying expressions, and proving correctness just as you would with algebraic equations. This property, called referential transparency, is completely impossible with mutable state. With imperative code, the same function call at different points in execution produces different results because state has changed, so substitution breaks the program. But with pure functions operating on immutable data, every call is equivalent to every other call with the same input. This enables optimizations (caching), parallelization, and proofs of correctness. Let's see this power in action, starting with a concrete example you can verify by hand:

The substitution model says: if a function is pure, I can replace any call to it with its return value without changing the program. This is one of the most liberating insights in functional programming because it means you can reason about your code like mathematicians reason about equations. No hidden state, no invisible dependencies—just pure logical substitution. Let me show you a complete walkthrough where we manually substitute function calls and simplify expressions:

// Example: Travel planner that calculates journey cost
case class Leg(
  distance: Int,       // in km
  transportType: String  // "car", "train", "flight"
)

def costPerKm(transportType: String): BigDecimal = {
  // Pure function — always returns the same cost for the same type
  transportType match {
    case "car"   => BigDecimal("0.15")
    case "train" => BigDecimal("0.08")
    case "flight" => BigDecimal("0.25")
    case _       => BigDecimal("0.10")
  }
}

def legCost(leg: Leg): BigDecimal = {
  // Pure: output depends only on 'leg' parameter
  costPerKm(leg.transportType) * leg.distance
}

def journeyTotalCost(legs: List[Leg]): BigDecimal = {
  // Pure: can reason about it as a fold
  legs.map(legCost).foldLeft(BigDecimal(0))(_ + _)
}

val trip = List(
  Leg(100, "car"),
  Leg(200, "train"),
  Leg(50, "flight")
)

// Because legCost is pure, we can reason:
// journeyTotalCost(trip)
// = legCost(Leg(100, "car")) + legCost(Leg(200, "train")) + legCost(Leg(50, "flight"))
// = (0.15 * 100) + (0.08 * 200) + (0.25 * 50)
// = 15 + 16 + 12.5
// = 43.5

// We can verify this without running the code!
assert(journeyTotalCost(trip) == BigDecimal("43.5"))

Persistent Data Structures — How Immutable "Updates" Work

When we need to "update" an immutable structure, we create a new one while sharing as much structure as possible. This is called a persistent data structure, and it's the secret to making immutability efficient. You might think that creating a new copy every time you update something is wasteful—and you'd be right if we actually copied everything. But clever data structures share the unchanged portions between versions, so updating is nearly as fast as mutation while maintaining immutability. The key insight is structural sharing: when you prepend to a list, you create one new node that points to the old list, so the old list remains completely unchanged and usable. Multiple versions can coexist in memory without duplication. This is how Scala collections achieve both immutability and performance—a seemingly impossible combination. Let's see how this works and why it matters:

When we need to "update" an immutable structure, we create a new one while sharing as much structure as possible:

// Persistent list implementation concept
sealed trait PList[+A]
case object PNil extends PList[Nothing]
case class PCons[A](head: A, tail: PList[A]) extends PList[A]

def pListPrepend[A](elem: A, list: PList[A]): PList[A] = {
  // Very fast: only creates one new node, reuses the rest
  PCons(elem, list)
}

val original: PList[Int] = PCons(2, PCons(1, PNil))
val updated = pListPrepend(3, original)

// Memory layout:
// original: [2] -> [1] -> Nil
// updated:  [3] -> [2] -> [1] -> Nil
//           ^^^^^ only this is new; rest is shared!

// In Scala, we use immutable collections which implement this pattern:
val scalaList = List(1, 2, 3)
val newList = 0 :: scalaList  // Prepend: very fast, shares tail

// For "removing" the head, we just reference the tail:
val withoutHead = scalaList.tail  // Instant operation, no copying needed

Here's a practical example with a log file analyzer:

case class LogEntry(
  timestamp: String,
  level: String,
  message: String
)

case class LogAnalysis(
  entries: List[LogEntry],
  filteredErrors: List[LogEntry]  // Cached
)

def addLogEntry(
    analysis: LogAnalysis,
    newEntry: LogEntry
): LogAnalysis = {
  // Create new lists, but Scala shares the underlying structure
  val newEntries = analysis.entries :+ newEntry
  val newFiltered =
    if (newEntry.level == "ERROR")
      analysis.filteredErrors :+ newEntry
    else
      analysis.filteredErrors

  // The structure is immutable, but operations are efficient
  analysis.copy(
    entries = newEntries,
    filteredErrors = newFiltered
  )
}

val log1 = LogAnalysis(List(), List())
val log2 = addLogEntry(log1, LogEntry("10:00", "INFO", "Starting"))
val log3 = addLogEntry(log2, LogEntry("10:05", "ERROR", "Connection failed"))

// Each version is immutable and independent
// But Scala's persistent data structures share memory efficiently

When Mutation is Acceptable

Not all mutation is forbidden. Local, contained mutation that never escapes is acceptable: This is called encapsulated mutation or mutable-in-private. If a function uses mutable data internally but presents an immutable interface to the world, that's completely fine. The mutation is purely an implementation detail—an optimization. As long as the mutation is confined to the function and doesn't affect external state, it's transparent to callers. This allows you to use mutable data structures (like ArrayBuffer) for efficiency when building results, as long as you return an immutable structure. This best-of-both-worlds approach gives you performance where it matters while maintaining the guarantee that functions are pure from the outside.

// ACCEPTABLE: Local mutation within a pure function
def medianTemperature(readings: List[Double]): Double = {
  // We use a mutable buffer internally
  val sorted = scala.collection.mutable.ArrayBuffer(readings: _*)

  // Sort mutates the buffer, but it's local — never escapes
  scala.util.Sorting.quickSort(sorted)

  // Return immutable result
  val len = sorted.length
  if (len % 2 == 0) {
    (sorted(len / 2 - 1) + sorted(len / 2)) / 2.0
  } else {
    sorted(len / 2).toDouble
  }
}

// The mutation is "hidden" inside the function
// From the caller's perspective, this is pure
assert(medianTemperature(List(3.0, 1.0, 2.0)) == 2.0)

Practical Example: Refactoring Imperative Code to Functional

Let's refactor a recipe cost calculator from imperative to functional, seeing how the change in mindset leads to better code:

// ============ IMPERATIVE VERSION (Problematic) ============
case class Ingredient(name: String, quantity: Double, unitPrice: Double)

class RecipeCostCalculator {
  // Mutable state
  private var ingredients = scala.collection.mutable.ListBuffer[Ingredient]()
  private var totalCost = 0.0

  def addIngredient(name: String, quantity: Double, unitPrice: Double): Unit = {
    ingredients += Ingredient(name, quantity, unitPrice)
    totalCost += quantity * unitPrice  // MUTATION!
  }

  def getTotalCost(): Double = totalCost

  def removeLastIngredient(): Unit = {
    if (ingredients.nonEmpty) {
      val removed = ingredients.remove(ingredients.length - 1)
      totalCost -= removed.quantity * removed.unitPrice  // MUTATION!
    }
  }
}

// Problems:
// 1. Order of calls matters
// 2. Can't reason about state without history
// 3. Hard to test — need to call methods in sequence
// 4. Not thread-safe
val calc = new RecipeCostCalculator()
calc.addIngredient("Flour", 2.0, 3.50)
calc.addIngredient("Sugar", 1.0, 2.00)
println(calc.getTotalCost())  // 9.00

// ============ FUNCTIONAL VERSION (Pure) ============
case class Recipe(
  name: String,
  ingredients: List[Ingredient]
)

// Pure function: no side effects, no mutation
def calculateRecipeCost(recipe: Recipe): Double = {
  // Use fold to accumulate cost
  recipe.ingredients.foldLeft(0.0) { (acc, ingredient) =>
    acc + (ingredient.quantity * ingredient.unitPrice)
  }
}

def addIngredientToRecipe(
    recipe: Recipe,
    ingredient: Ingredient
): Recipe = {
  // Return new recipe, don't mutate the original
  recipe.copy(ingredients = recipe.ingredients :+ ingredient)
}

def removeLastIngredientFromRecipe(recipe: Recipe): Recipe = {
  // Return new recipe without the last ingredient
  if (recipe.ingredients.isEmpty) recipe
  else recipe.copy(ingredients = recipe.ingredients.dropRight(1))
}

// Usage:
var recipe = Recipe("Chocolate Cake", List())
recipe = addIngredientToRecipe(recipe, Ingredient("Flour", 2.0, 3.50))
recipe = addIngredientToRecipe(recipe, Ingredient("Sugar", 1.0, 2.00))
println(calculateRecipeCost(recipe))  // 9.00

// Now we can test each function independently:
val testRecipe = Recipe("Test", List(
  Ingredient("Flour", 2.0, 3.50),
  Ingredient("Sugar", 1.0, 2.00)
))
assert(calculateRecipeCost(testRecipe) == 9.00)
assert(calculateRecipeCost(removeLastIngredientFromRecipe(testRecipe)) == 7.00)
assert(testRecipe.ingredients.length == 2)  // Original unchanged!

// The original recipe is never modified
println(recipe.ingredients.length)  // 2

Summary

Pure functions and immutability form the foundation of functional programming:

  • Pure functions are predictable, testable, and safely concurrent
  • Immutability prevents bugs and simplifies reasoning
  • Persistent data structures make immutability efficient
  • Local mutation is fine if it never escapes the function
  • Refactoring to immutable code often reveals better design