I am working on a CV project where I am trying to extract key frames from videos. The videos are of bottles containing text labels, now the criteria for key frame in my case, is "to extract those frames such that the frames cover all the text on the bottle". So as you can see the criteria for choosing key frames is more text driven here.
I know that we generally either use frame clustering , shot detection or compare histograms of frames to extract the key frames but I am not sure if that is the best approach for this particular use case, given that the colour intensity may not vary much from frame to frame(Black/White text written on white label)
So have anyone of you worked on such a problem before or any pointers as to what could be a better way to approach this