Hello everyone,
While reimplementing a video summarization model, I noticed something unexpected: my reproduced results give higher F1 scores than the baseline reported in the original paper. I did not intentionally make architectural changes, only fixed some minor bugs (e.g., data handling).
My questions are:
Any insights from those who have reimplemented models in video summarization (or related areas) would be really helpful.
Thank you!