You can start by giving this to Google: "how to build a linguistic corpus". Once you have digested that, you can give the same string to Scholar. Then repeat the process with "linguistic corpus software". The fastest way is to shovel all the digital texts you can find into a program that casts out duplicate words and maintains a sorted dictionary with useful add-ons such as frequency of occurrence and your tags. Data cleaning will be necessary.
Hi ! Great to know that you are working on corpus building for Telugu. Unfortunately, there is not enough web content for the different dialects of Telugu. You can find large amounts of corpora for Standard Telugu which you can crawl and clean using the Natural Language toolkit (http://nltk.org/).
I would check existing corpora (that are open to public first), and you might want to take a look at Python NLTK to see if there is any corpus available in this language. If not, Internet would be a good source of building a corpus quickly.