It really depends on what you're interested in. Often when looking at affixes, researchers are interested in productivity, which can mean different things: token frequency, type frequency (how many different words in a sample of N tokens), or proportion of neologisms (productively formed cases, often approximated by looking at the proportion of very rare types). Baayen (2009) offers a good overview of these notions. There are however many metrics to estimate these, for a detailed overview see chapter 3 of Zeldes (2012).
Alternatively you might be interested in dispersion (are certain forms idiosyncratic to some parts/documents in your corpus or do you find them spread across all documents). On this topic I'd recommend Baroni & Evert (2007) and Gries (2008, 2010).
Hope this helps!
Amir
References:
Baayen, R. Harald (2009), Corpus Linguistics in Morphology: Morphological Productivity. In: Anke Lüdeling & Merja Kytö (eds.), Corpus Linguistics. An International Handbook.Berlin: Mouton de Gruyter, 899–919.
Baroni, Marco & Stefan Evert (2007), Words and Echoes: Assessing and Mitigating the Non-randomness Problem in Word Frequency Distribution Modeling. In: Proc. ACL 2007. Prague, 904–911.
Gries, Stefan Th. (2008), Dispersions and Adjusted Frequencies in Corpora. International Journal of Corpus Linguistics 13(4), 403–437.
Gries, Stefan Th. (2010), Dispersions and Adjusted Frequencies in Corpora: Further Explorations. In: Stefan Th. Gries, Stefanie Wulff & Mark Davies (eds.), Corpus Linguistic Applications: Current Studies, New Directions. Amsterdam: Rodopi, 197–212.
Zeldes, Amir (2012), Productivity in Argument Selection. From Morphology to Syntax. (Trends in Linguistics: Studies and Monographs 260.) Berlin and Boston: De Gruyter.
The langiage is konkani and i want to extract different paradigms in the language via clustering. Basically learning the morphology of the konkani language