Prototyping time-series analysis with Google Earth Engine

Sometimes you are struck by an idea how to analyze satellite images, but how fast can you get from a remote sensing idea to a prototype of the result? The sheer size and amount of openly available satellite images makes developing prototypes rather cumbersome. To get from an idea to a prototype product you need to download gigabytes of data, pre-process it, write the code that calculates your prototype product and save it somewhere. Oftentimes you’ll end up spending more time on downloading and pre-processing than the actual prototyping and calculation. More often than not I discarded ideas because I had no time to test them. Today I’ll show an example of how Google’s Earth Engine can be used to put the rapid back into rapid prototyping for remote sensing.

The Sentinel satellites are an amazing opportunity for scientists all over the world to explore unprecedented amounts of remote sensing data free of charge. I am genuinely happy that this is one of the first large remote sensing missions that has Open Data and Open Access baked in right from the start. All Sentinel data can be accessed through Copernicus Open Access Hub. Searching and downloading of multiple scenes however is not very user friendly. Fortunately the Hub also provides an API we can use to search and download multiple scenes at once. This post aims to show a simple workflow of searching and downloading multiple Sentinel-2 scenes using the Python package sentinelsat
Recently I was trying to calculate the quantiles of the vegetation index of an area over time. For this I have a time-series of satellite raster images of a certain region that cover identical extents. This is represented as a numpy.ndarray of the shape(96, 4800, 4800) - in other words 96 satellite images each measuring 4800 by 4800 pixels. I want to calculate the 10th, 25th, 50th, 75th and 90th quantile along the time/z-axis, which can be done easily with np.percentile(a, q=[10,25,50,75,90], axis=0). The data I am working with contains no_data areas due to residual cloud cover, rainfall, etc. represented as np.NaN. Naturally I was turning to numpys np.nanpercentile(a, q=[10, 25, 50, 75, 90], axis=0). Unfortunately np.nanpercentile() was ~300x slower on my dataset than np.percentile() so I had to find a way to speed it up.