Hı, the position information is lost when We use bag of words. My question is that is it possible to use it not only for classification but also for detection?
yes, it is possible to use bag of word modelling for detection. this can be achieved using supervised machine learning approach. however, the loss of position information may lead to some inaccuracies which can be reduced using spatial pyramid.
Yes. Spatial pyramid matching (SPM) is one approach. In summary SPM, works by partitioning the image into sub-images (let say divide each image to four parts) and construct BoW for each part separately. At last, BoWs for each part are concatenated and provide a new description containing position information. If you need more accurate position information, increase the number of sub-images. For example, divide the image to 16 or 64 parts.
Another approach is to utilize n-grams in the model. Basically, this method models the relation between visual words in the image and how many times n words occur in vicinity.
Thank you for answers, Increasing the resolution of Spatial pyramid approach can give a position information; however, I think that it's not sufficient to put a bounding box around the object as required in detection challenges.
Another simple approach is to utilize learning kmeans to cluster the extracted features from your images and then construct BoW, The BoW will contain the description of the position information.