I need some research articles to cite for proving / emphasizing that feature selection or reduction improves classifier performance and prevents overfitting.
One of the most cited papers in the field is the following:
Guyon, Isabelle, and André Elisseeff. "An introduction to variable and feature selection." The Journal of Machine Learning Research 3 (2003): 1157-1182.
Actually, it serves as a review for a special session of the journal, but it is very well written and it covers a large array of topics.
J.M Banda and R.A Angryk “Selection of Image Parameters as the First Step Towards Creating a CBIR System for the Solar Dynamics Observatory”. International Conference on Digital Image Computing: Techniques and Applications (DICTA). Sydney, Australia, December 1-3, 2010. pp. 528-534.
J.M Banda and R.A Angryk “An Experimental Evaluation of Popular Image Parameters for Monochromatic Solar Image categorization” Proceedings of the twenty-third international Florida Artificial Intelligence Research Society conference (FLAIRS-23), Daytona Beach, Florida, USA, May 19–21 2010. pp. 380-385.
I would like to suggest this publications, they are applied to image recognition and font recognition, there you will find a review under texture and statistical approach:
José Félix Serrano-Talamantes, Carlos Avilés-Cruz, Juan Villegas-Cortez, Juan H. Sossa-Azuela
Expert Systems with Applications (Impact Factor: 1.85). 06/2013; 40(7):2398–2409. DOI:10.1016/j.eswa.2012.10.064
Edit
ABSTRACT In this work we describe a new statistically-based methodology to organize and retrieve images of natural scenes by combining feature extraction, automatic clustering, automatic indexing and classification techniques. Our proposal belongs to the content-based image retrieval (CBIR) category. Our goal is to retrieve images from an image database by their content. The methodology combines randomly extracted points for feature extraction. The describing features are the mean, the standard deviation and the homogeneity (from the co-occurrence matrix) of a sub-image extracted from the three color channels (HSI). A K-means algorithm and a 1-NN classifier are used to build an indexed database. Three databases of images of natural scenes are used during the training and testing processes. One of the advantages of our proposal is that the images are not labeled manually for their retrieval. The performance of our framework is shown through several experimental results, including a comparison with several classifiers and comparison with related works, achieving up to 100% good recognition. Additionally, our proposal includes scene retrieval.
Font Recognition by invariants moments of global textures
Aviles-Cruz Carlos Villegas-Cortez Juan
02/2014;
Edit
ABSTRACT An alternative for the crucial task of Optical Font Recognition (OFR) is proposed in this work; this is based on the analysis of texture characteristics of document images formed of pure text through the invariants moments technique. Page segmentation and paragraph structure analysis are out of the scope of this study. There is not need of explicit local analysis in our method since the central feature of it is the extraction of global characteristics from the analysis of textures. A printed text block with a unique font is suitable to provide the specific texture properties necessary for the process of recognition of the most commonly used fonts in the Spanish language (Courier, Arial, Bookman Old Style, Franklin Gothic Medium, Comic Sans, Impact, Modern and Times New Roman), and their respective styles (regular, italic, bold, italic with bold). The invariant moment technique is used in this study to extract the font characteristics; from an entry text set a data base was build for the learning stage, and then standard statistical classifiers were applied for the identification stage. Three main results were obtained. First, the number of performed operations was lower with respect to other studies [8], [1]. Second, as opposed to what the theory predicts [3], we found that the invariant moments change significantly when the textures are rotated and scaled as digital images. And third, the introduction of random noise over the samples yielded good levels of classification with an average of 95%.
Article Self organizing natural scene image retrieval
Data Font Recognition by invariants moments of global textures