thanks for your question. I made a search to provide you a detailed solution of your concern. I have shortlisted few files. I strongly believe the materials present in the below links will address your concern. Feel free to ask more if you have any doubts.
Local descriptors have being used to describe key-frames in order to allow better video shot (and scene/story) segmentations. If you google for SIFT + video segmentation, for instance, you will find a lot of material about video retrieval area. Best! Rudinei.