Memoize and Cache functions in JavaScript

December 09, 2020
JavaScript, Cache

Today I’m going to write about memoizing or caching functions in JavaScript. But what exactly is Memoization? And in what differs from caching?

Memoization

To put it simple Memoization is a special case of caching, therefore it’s useful every single time we have to perform expensive operations. Increasing the space cost of a function it decreases its time cost. Any function can be memoized, if it’s referentially transparent (i.e. replacing the execution of the function with the expected result - or viceversa - does not change the behaviour of the program). This is always true when dealing with pure functions, where there are no side effects - the function return just a value and does not interact with something outside of its scope - and where given the same input you get always the same output.

Memoize a function in JS - a functional approach

There are many different JS libraries to use memoization in our projects, but listing them is not the purpose of this post: I think that it’s more important to know the basics that stand behind them. In functional programming, functions are first class citizens and you can compose them in different ways. To memoize a not memoized function we use a Higher order Function: a function that returns a function. In this specific case the input of this HoF will be the function that we want to have memoized, the output will be the memoized function. For instance:

function veryExpensiveFn() {
  /* some expensive calculation*/
}

const veryExpensiveFnMemoized = memoize(veryExpensiveFn)
veryExpensiveFnMemoized() // expensive calculation performed
veryExpensiveFnMemoized() // result retrieved from cache and calculation not performed

We can write in just a few lines of codes a very simple memoize function that works fine for synchronous functions with simple inputs:

memoize.js

export function memoize(fnToMemoize) {
  const cache = {}

  return function memoizedFn(...inputs) {
    const key = JSON.stringify(inputs)
    if (cache[key]) {
      return cache[key]
    } else {
      cache[key] = fnToMemoize(...inputs)
    }
    return cache[key]
  }
}

In this example we take advantage of the closures to have a cache object that will store the result of the different function calls. To lookup the result we use a simple method to generate the cache key: with JSON.stringify you can create a unique key for each JSON equivalent input, so it works only if inputs are simple variables that can be represented as JSON without losing informations. This cache key creation is not suitable if, for example, the function that we want to memoize takes as input, objects that will perform operations (functions can not be serialized into a JSON object):

import { memoize } from 'memoize'

const obj1 = {
  operation() {
    return 1
  },
}
const obj2 = {
  operation() {
    return 2
  },
}

function fnToMemoize(objWithXMethod) {
  return objWithXMethod.operation()
}

const fnMemoized = memoize(fnToMemoize)

fnMemoized(obj1) === fnMemoized(obj2) // true, but it shouldn't

With our memoize function we can now have other functions memoized, this is something you can test as well:

import { memoize } from 'memoize'

function longOperation(initialValue) {
  console.log('long operation started')

  let result = initialValue
  for (let i = 0; i < 20_000_000; i++) {
    result += i
  }

  console.log('long operation ended')

  return result
}

const longOperationMemo = memoize(longOperation)

longOperationMemo(2)
// long operation started
// long operation ended
// -> 199999990000002

longOperationMemo(2)
// -> 199999990000002

Just remember: with this optimization we are decreasing the function time cost and in exchange we are increasing its space cost.

Caching an impure function

We just saw that memoization is a good technique when we have pure functions but it shouldn’t be used when working with impure functions. For example it won’t work as expected, here:

import { memoize } from 'memoize'

let variableOutOfScope = 1

function impureFunction(input) {
  return input + variableOutOfScope
}

const impureFunctionMemo = memoize(impureFunction)

impureFunctionMemo(1) // -> 2
variableOutOfScope = 2
impureFunctionMemo(1) // -> 2 but should be 3

If memoization doesn’t work for impure functions it does not mean that we cannot have a HoF to cache impure functions, but things become a little more complex. Since an impure function may have side effects or use a reference that comes from outside of its lexical scope (as shown in the previous example) we should have a cache invalidation mechanism to call whenever we want to rely no more on the stored data. This is a basic example, similar to the memoization one but tweaked to manage impure functions:

cache.js

export function cache(fnToCache) {
  let cache = {}

  const cachedFn = function (...inputs) {
    const key = JSON.stringify(inputs)
    if (cache[key]) {
      return cache[key]
    } else {
      cache[key] = fnToCache(...inputs)
    }
    return cache[key]
  }

  const invalidateCacheFn = () => (cache = {})

  return [cachedFn, invalidateCacheFn]
}

Now we can retrieve a cached version of an impure function, the complexity is that anytime a dependency which is out of the scope of the function changes we need to invalidate the cache, otherwise we could obtain wrong values.

import { cache } from 'cache'

let variableOutOfScope = 1

function impureFunction(input) {
  return input + variableOutOfScope
}

const [impureFunctionCached, invalidateCache] = cache(impureFunction)

impureFunctionCached(1) // 2
impureFunctionCached(1) // 2
variableOutOfScope = 2
impureFunctionCached(1) // 2 but should be 3, we didn't invalidate the cache

invalidateCache()

impureFunctionCached(1) // 3

Memoizing and Caching are important techniques and should be used whenever necessary. But every optimization has a cost that must be taken into account, and it’s important to avoid too early or unnecessary optimizations: a simple function that sums two numbers should not be memoized or cached, the overhead of the caching would cost more than the “optmization” itself.