I have developed a sugarcane yield prediction model using vegetation index and previous known yield. Now, How can I predict yield for the Karnataka state (large area-mixed crop) using the same model.
The validity of yield predictions depend on the strength of input factors, their correlations, and the ability to verify our predictions with measured/ reported values from several sample points. Here, the inputs are stated broadly and there is no information on the availability of yield values from other areas. Are these vegetative index values available at various spatial and temporal scales? if so, how diversified are they ? are they representative of the whole state in your case. How homogenous are your yields across the state (growth conditions such as, soil, water and weather). These are some critical inputs required for any yield predictions at a large scale. We should seriously consider the risk of oversimplifying this approach with one single model. Probably, you have this information; adding more information might help us to answer this appropriately. For example, in Canada, we have an yield model for several crops based on the average EVI or NVDI satellite data at a particular day of observation for each crop that correlated with average yields at a large scale. We then distributed our yields to smaller areas based on the variation in NDVI/EVI across the large region. We were able to verify our model predictions from observed values from smaller areas within the large area (smaller scales). We used a GIS based approach for achieve this. This was a top-down model. In your case, we can use a bottom up approach however, more information ensuring the model validation step is critical to proceed.
Thank You Dr. Arumugam Thiagarajan . Yes I am using the bottom up approach to predict yield for large area. For sugarcane we have used 4 parameters (indices) NDVI, GNDVI, EVI and NDWI using sentinel data to see which correlates better with sugarcane yield and therefore prepared a prediction model using 1000 hectare of land (500+ plots). I agree that large area can't be homogeneous (growth conditions such as, soil, water and weather) but we want to see how accurately these indices can predict yield for this whole region. Cropland mapping or sugarcane delineation for the whole region is the second major challenge for us. so if you can suggest any method for cropland mapping that would be helpful in my research.
Hello Sunil Jha, Good to know you have a decent sample size for building a model. Lineation of crops is another topic. I will skip that part for now, and focus on sugarcane yield prediction. First step is to run your prepared model to predict the yields in the other plots across the state and see how it fits with your observed data. These points should be independent of your model data. If you don't have any observed data at the state level, look for proxy information. The proxy data could be reported yields from larger or smaller scale at both spatial and temporal levels. Lets say you have production data for years x1 to xn for the whole state you can compare the predictions from you model and gather the evidence on the performance. The same approach could be used for other proxy data (spatial). If there is no proxy information, you may have to split your model input data and reserve some for validation of your model. During this step, if the model over-predicts or under-predicts the yields across time or space a calibration procedure could be initiated. The use of multiple time-space datasets will test the model against the vagaries of weather and soil conditions. Hope this helps.
Thank You Dr. Arumugam Thiagarajan for your support. yes, we have used the model on observed data which was independent of our model data. There was fluctuation for each plot so now we are trying to calibrate our model according to variation in the recorded (actual) result. Now we will see its result on proxy data (reported yields) for the whole state (Karnataka).