How can I efficiently handle and analyze large remote sensing datasets in Python, specifically for vegetation indices calculation?

I am currently working with a dataset containing multiple years of Landsat imagery, and I want to calculate various vegetation indices such as NDVI, EVI, SAVI, etc. However, my code is running extremely slow and I keep running out of memory.

How can I optimize my code and handle this large dataset in a more efficient manner?

[Important] I don't want to use Google Earth Engine or any online platforms such as Google Colab.

--- Here is my sample code ---

import numpy as np

import rasterio

from rasterio.plot import show

from rasterio.windows import Window

# Open raster file

with rasterio.open("landsat.tif") as src:

# Define window size for processing

win_height, win_width = 256, 256

# Loop through windows to process data

for i, j, window in src.block_windows(1, height=win_height, width=win_width):

# Read data

data = src.read(window=window, out_shape=(src.count, win_height, win_width))

# Calculate vegetation indices

ndvi = (data[3] - data[2]) / (data[3] + data[2])

evi = 2.5 * (data[3] - data[2]) / (data[3] + 6 * data[2] - 7.5 * data[0] + 1)

savi = ((data[3] - data[2]) / (data[3] + data[2] + 0.5)) * 1.5

# Write indices to output file

with rasterio.open("output.tif", 'w', driver='GTiff',

width=win_width, height=win_height, count=3,

crs=src.crs, transform=src.transform,

dtype=np.float32) as dst:

dst.write(ndvi, 1)

dst.write(evi, 2)

dst.write(savi, 3)

More Mahdi Khoshlahjeh Azar's questions See All
Similar questions and discussions