Recently I was trying to calculate the quantiles of the vegetation index of an area over time. For this I have a time-series of satellite raster images of a certain region that cover identical extents. This is represented as a `numpy.ndarray`

of the `shape(96, 4800, 4800)`

- in other words 96 satellite images each measuring 4800 by 4800 pixels. I want to calculate the 10th, 25th, 50th, 75th and 90th quantile along the time/z-axis, which can be done easily with `np.percentile(a, q=[10,25,50,75,90], axis=0)`

. The data I am working with contains *no_data* areas due to residual cloud cover, rainfall, etc. represented as `np.NaN`

. Naturally I was turning to numpys `np.nanpercentile(a, q=[10, 25, 50, 75, 90], axis=0)`

. **Unfortunately np.nanpercentile() was ~300x slower on my dataset than np.percentile() so I had to find a way to speed it up.**

## np.nanpercentile() speed

Generate some test data and benchmark numpys function to see what we are working with:

```
# create array of shape(5,100,100) - image of size 10x10 with 5 layers
test_arr = np.random.randint(0, 10000, 50000).reshape(5,100,100).astype(np.float32)
np.random.shuffle(test_arr)
# place random NaN
rand_NaN = np.random.randint(0, 50000, 500).astype(np.float32)
for r in rand_NaN:
test_arr[test_arr == r] = np.NaN
```

```
%timeit np.nanpercentile(test_arr, q=25, axis=0)
100 loops, best of 3: 5.92 ms per loop
```

The reason `np.nanpercentile()`

is so slow in my case can be found in the source code. Numpys implementation includes a private function to calculate the percentile along a 1D array while ignoring NaNs. This means that it has to loop through all pixels resulting in 4800 x 4800 native Python loops. Not exactly a recipe for speed.

## The alternative: a pure Python implementation of the quantile calculation

Since we are working with just 3 dimensions we can use a pure Python implementation of the quantile function and should see a significant increase in speed (sounds wrong, I know!). The code for percentile calculation was posted in a stackoverflow answer.

The general idea is to:

- find the number of valid observations (non NaN)
- replace NaN with maximum value of array
- sort values along axis
- find position of quantile regarding number of valid observations
- linear interpolation if the desired quantile is inbetween two positions (like numpys linear interpolation)

```
def nan_percentile(arr, q):
# valid (non NaN) observations along the first axis
valid_obs = np.sum(np.isfinite(arr), axis=0)
# replace NaN with maximum
max_val = np.nanmax(arr)
arr[np.isnan(arr)] = max_val
# sort - former NaNs will move to the end
arr = np.sort(arr, axis=0)
# loop over requested quantiles
if type(q) is list:
qs = []
qs.extend(q)
else:
qs = [q]
if len(qs) < 2:
quant_arr = np.zeros(shape=(arr.shape[1], arr.shape[2]))
else:
quant_arr = np.zeros(shape=(len(qs), arr.shape[1], arr.shape[2]))
result = []
for i in range(len(qs)):
quant = qs[i]
# desired position as well as floor and ceiling of it
k_arr = (valid_obs - 1) * (quant / 100.0)
f_arr = np.floor(k_arr).astype(np.int32)
c_arr = np.ceil(k_arr).astype(np.int32)
fc_equal_k_mask = f_arr == c_arr
# linear interpolation (like numpy percentile) takes the fractional part of desired position
floor_val = _zvalue_from_index(arr=arr, ind=f_arr) * (c_arr - k_arr)
ceil_val = _zvalue_from_index(arr=arr, ind=c_arr) * (k_arr - f_arr)
quant_arr = floor_val + ceil_val
quant_arr[fc_equal_k_mask] = _zvalue_from_index(arr=arr, ind=k_arr.astype(np.int32))[fc_equal_k_mask] # if floor == ceiling take floor value
result.append(quant_arr)
return result
```

One issue to work around is numpys implementation of `np.choose()`

which only allows for 32 choices, meaning it failed on my dataset with with 96 raster images. Luckily my question on StackOverflow, Index 3D array with index of last axis stored in 2D array was answered with a workaround to indexing the 3D array without using `np.choose()`

. This is achieved with a small helper function.

```
def _zvalue_from_index(arr, ind):
"""private helper function to work around the limitation of np.choose() by employing np.take()
arr has to be a 3D array
ind has to be a 2D array containing values for z-indicies to take from arr
See: http://stackoverflow.com/a/32091712/4169585
This is faster and more memory efficient than using the ogrid based solution with fancy indexing.
"""
# get number of columns and rows
_,nC,nR = arr.shape
# get linear indices and extract elements with np.take()
idx = nC*nR*ind + nR*np.arange(nR)[:,None] + np.arange(nC)
return np.take(arr, idx)
```

## Result and speed comparison

Let’s see if our new functions produces the same results as `np.nanpercentile()`

```
input_arr = np.array(test_arr, copy=True)
old_func = np.nanpercentile(input_arr, q=range(0,100), axis=0)
new_func = nan_percentile(input_arr, q=range(0,100))
np.allclose(new_func, old_func)
True
```

So far so good. Let’s see what we achieved in terms of speed.

```
input_arr = np.array(test_arr, copy=True)
%timeit np.nanpercentile(input_arr, q=[10,25,50,75,90], axis=0)
1 loops, best of 3: 603 ms per loop
```

```
input_arr = np.array(test_arr, copy=True)
%timeit nan_percentile(test_arr, q=[10,25,50,75,90])
100 loops, best of 3: 3.81 ms per loop
```

The new function is roughly **160x faster** than `np.nanpercentile()`

, so it was time well spent looking for a faster way.