np.nanpercentile() - there has to be a faster way!

Recently I was trying to calculate the quantiles of the vegetation index of an area over time. For this I have a time-series of satellite raster images of a certain region that cover identical extents. This is represented as a numpy.ndarray of the shape(96, 4800, 4800) - in other words 96 satellite images each measuring 4800 by 4800 pixels. I want to calculate the 10th, 25th, 50th, 75th and 90th quantile along the time/z-axis, which can be done easily with np.percentile(a, q=[10,25,50,75,90], axis=0). The data I am working with contains no_data areas due to residual cloud cover, rainfall, etc. represented as np.NaN. Naturally I was turning to numpys np.nanpercentile(a, q=[10, 25, 50, 75, 90], axis=0). Unfortunately np.nanpercentile() was ~300x slower on my dataset than np.percentile() so I had to find a way to speed it up.

np.nanpercentile() speed

Generate some test data and benchmark numpys function to see what we are working with:

# create array of shape(5,100,100) - image of size 10x10 with 5 layers
test_arr = np.random.randint(0, 10000, 50000).reshape(5,100,100).astype(np.float32)
# place random NaN
rand_NaN = np.random.randint(0, 50000, 500).astype(np.float32)
for r in rand_NaN:
    test_arr[test_arr == r] = np.NaN
%timeit np.nanpercentile(test_arr, q=25, axis=0)
100 loops, best of 3: 5.92 ms per loop

The reason np.nanpercentile() is so slow in my case can be found in the source code. Numpys implementation includes a private function to calculate the percentile along a 1D array while ignoring NaNs. This means that it has to loop through all pixels resulting in 4800 x 4800 native Python loops. Not exactly a recipe for speed.

The alternative: a pure Python implementation of the quantile calculation

Since we are working with just 3 dimensions we can use a pure Python implementation of the quantile function and should see a significant increase in speed (sounds wrong, I know!). The code for percentile calculation was posted in a stackoverflow answer.

The general idea is to:

  • find the number of valid observations (non NaN)
  • replace NaN with maximum value of array
  • sort values along axis
  • find position of quantile regarding number of valid observations
  • linear interpolation if the desired quantile is inbetween two positions (like numpys linear interpolation)
def nan_percentile(arr, q):
    # valid (non NaN) observations along the first axis
    valid_obs = np.sum(np.isfinite(arr), axis=0)
    # replace NaN with maximum
    max_val = np.nanmax(arr)
    arr[np.isnan(arr)] = max_val
    # sort - former NaNs will move to the end
    arr = np.sort(arr, axis=0)

    # loop over requested quantiles
    if type(q) is list:
        qs = []
        qs = [q]
    if len(qs) < 2:
        quant_arr = np.zeros(shape=(arr.shape[1], arr.shape[2]))
        quant_arr = np.zeros(shape=(len(qs), arr.shape[1], arr.shape[2]))

    result = []
    for i in range(len(qs)):
        quant = qs[i]
        # desired position as well as floor and ceiling of it
        k_arr = (valid_obs - 1) * (quant / 100.0)
        f_arr = np.floor(k_arr).astype(np.int32)
        c_arr = np.ceil(k_arr).astype(np.int32)
        fc_equal_k_mask = f_arr == c_arr

        # linear interpolation (like numpy percentile) takes the fractional part of desired position
        floor_val = _zvalue_from_index(arr=arr, ind=f_arr) * (c_arr - k_arr)
        ceil_val = _zvalue_from_index(arr=arr, ind=c_arr) * (k_arr - f_arr)

        quant_arr = floor_val + ceil_val
        quant_arr[fc_equal_k_mask] = _zvalue_from_index(arr=arr, ind=k_arr.astype(np.int32))[fc_equal_k_mask]  # if floor == ceiling take floor value


    return result

One issue to work around is numpys implementation of np.choose() which only allows for 32 choices, meaning it failed on my dataset with with 96 raster images. Luckily my question on StackOverflow, Index 3D array with index of last axis stored in 2D array was answered with a workaround to indexing the 3D array without using np.choose(). This is achieved with a small helper function.

def _zvalue_from_index(arr, ind):
    """private helper function to work around the limitation of np.choose() by employing np.take()
    arr has to be a 3D array
    ind has to be a 2D array containing values for z-indicies to take from arr
    This is faster and more memory efficient than using the ogrid based solution with fancy indexing.
    # get number of columns and rows
    _,nC,nR = arr.shape

    # get linear indices and extract elements with np.take()
    idx = nC*nR*ind + nR*np.arange(nR)[:,None] + np.arange(nC)
    return np.take(arr, idx)

edit (2018-10-22): This only works on evenly spaced rasters. Jean-Francois Bourdon pointed out, that it works on rasters with uneven dimensions by changing the second to last line. Thanks for the tip Jean-Francois!

idx = nC*nR*ind + nC*np.arange(nR)[:,None] + np.arange(nC)

edit (2018-11-08): Sam Bowers contacted me with a further suggestion on improving the function. I’m fascinated that this blog post is still gathering attention. Thanks for the suggestion Sam!

idx = nC*nR*ind + np.arange(nC*nR).reshape((nC,nR))

edit (2019-11-13): Ben Sugerman porposed an extension to the code to support ‘nearest neighbour’, ‘lowest’ and ‘highest’ interpolation options present in np.percentile().

if interpolation=='nearest':
    f_arr = np.around(k_arr).astype(np.int32)
    quant_arr=_zvalue_from_index(arr=arr, ind=f_arr) 
elif interpolation=='lowest':
    f_arr = np.floor(k_arr).astype(np.int32)
    quant_arr=_zvalue_from_index(arr=arr, ind=f_arr) 
elif interpolation=='highest':
    f_arr = np.ceiling(k_arr).astype(np.int32)
    quant_arr=_zvalue_from_index(arr=arr, ind=f_arr) 

Result and speed comparison

Let’s see if our new functions produces the same results as np.nanpercentile()

input_arr = np.array(test_arr, copy=True)
old_func = np.nanpercentile(input_arr, q=range(0,100), axis=0)
new_func = nan_percentile(input_arr, q=range(0,100))

np.allclose(new_func, old_func)

So far so good. Let’s see what we achieved in terms of speed.

input_arr = np.array(test_arr, copy=True)
%timeit np.nanpercentile(input_arr, q=[10,25,50,75,90], axis=0)
1 loops, best of 3: 603 ms per loop
input_arr = np.array(test_arr, copy=True)
%timeit nan_percentile(test_arr, q=[10,25,50,75,90])
100 loops, best of 3: 3.81 ms per loop

The new function is roughly 160x faster than np.nanpercentile(), so it was time well spent looking for a faster way.