I have gone through quite a few papers in the domain of computer vision, and what I found that pyramid pooling operations are often found to be superior over regular pooling operations. This has also been empirically shown in several papers. Although, I cannot find any proper reasoning behind that. Most of the papers use very primitive terms and reasons. Can someone please explain the reason in details. It would be really helpful if you could explain from scratch.

More Hritam Basak's questions See All
Similar questions and discussions