It might be of use to you is http://quandl.com; I use it all the time when I just want to play around with a dataset.
Most US government data is pre-packaged and they offer free HTML and FTP links for downloads. It often times comes in common analytics formats (SPSS, Strata) to ease uploading into those systems, e.g. it will give you the relational schema.
XML, so long as it is properly configured, is actually designed to 'auto-configure' the data. XML data binding, which required the use of certain API's has generally given way to having the author include a schema in the XML (lookup E4X) as a way to categorize and otherwise tag data and the dataset with descriptive attributes.
Your Second Question is not clear according to me.i
intruder detection over big data?
Actually detection of intrusion in cyber space now a days depends on Data , it can be network traffic etc . I think you should clear your doubts regarding this .
you can partner with data mining companies (such as www.cavaone.com), which can set up a project for you, gather the data, and deliver in whichever type format is best for you. We are actively seeing partners in the academic area to align ourselves with.