Fast linear 1D interpolation with numba

I am currently doing time-series analysis on MODIS derived vegetation index data. In order to get a reliable signal from the data outliers need to be removed and the resulting gaps interpolated/filled before further filtering/smoothing of the signal. The time-series for one tile, covering 10° by 10°, spans roughly 14 years with 46 images per year. Each image weighs in at around 70-100 Mb. If you are processing, say, Africa you are looking at roughly 2.3 Terrabyte of input data. Interpolation of such massive amounts of data begs teh question - What is the fastest way to do it?

Landsat batch download from Google and Amazon

Landsat is the work horse for a lot of remote sensing applications with it’s open data policy, global data vailability and long spanning acquisition time-series. The USGS Bulk Downloader however is clunky, depends on special ports being open on your network an can not be scripted to suit needs like automatic ingestion of new acquired Landsat-8 scenes. Fortunately Google and Amazon provide mirrors to a lot of the Landsat datasets which can be used for scripted bulk downloading.

GeoTiff compression benchmarking

While collecting data for a time-series analysis I quickly started running out of diskspace. While HDDs are cheap nowadays they are still not free and I’d run out of space multiple times before I get a new one at work, so I had to ask: What is the smallest, most efficient way to store all my Geotiff data?

Generating a global tree cover vector dataset

Trees cover a large part of the earth and can sometimes be quite annoying when you are trying to classify any other land cover than forest. In my case I was looking for a dataset I could use to mask out the forest areas so I don’t have to worry about them producing false positives in my classification therefore making the whole process simpler, faster and more accurate. Unfortunately there aren’t a whole lot of options to choose from if you are looking for something global with a resolution not less than MODIS (250m). One quite prominent and easy to use dataset is the Global Forest Change 2000 - 2013 by Hansen et al.. The problem now is: How do you get such a large dataset into an easy to use vector dataset?