How can I efficiently handle and analyze large remote sensing datasets in Python, specifically for vegetation indices calculation?
I am currently working with a dataset containing multiple years of Landsat imagery, and I want to calculate various vegetation indices such as NDVI, EVI, SAVI, etc. However, my code is running extremely slow and I keep running out of memory.
How can I optimize my code and handle this large dataset in a more efficient manner?
[Important] I don't want to use Google Earth Engine or any online platforms such as Google Colab.
--- Here is my sample code ---
import numpy as np
import rasterio
from rasterio.plot import show
from rasterio.windows import Window
# Open raster file
with rasterio.open("landsat.tif") as src:
# Define window size for processing
win_height, win_width = 256, 256
# Loop through windows to process data
for i, j, window in src.block_windows(1, height=win_height, width=win_width):
# Read data
data = src.read(window=window, out_shape=(src.count, win_height, win_width))
# Calculate vegetation indices
ndvi = (data[3] - data[2]) / (data[3] + data[2])
evi = 2.5 * (data[3] - data[2]) / (data[3] + 6 * data[2] - 7.5 * data[0] + 1)
savi = ((data[3] - data[2]) / (data[3] + data[2] + 0.5)) * 1.5
# Write indices to output file
with rasterio.open("output.tif", 'w', driver='GTiff',
width=win_width, height=win_height, count=3,
crs=src.crs, transform=src.transform,
dtype=np.float32) as dst:
dst.write(ndvi, 1)
dst.write(evi, 2)
dst.write(savi, 3)