Currently I am working with searchable encryption where I perform keyword based search. I want a dataset having at least 100 fields to take each field as keyword field. The type of each field may be anything (numeric or text or date etc.)
The dataset attributes include physics information parsed from the dataset namespace, as well as static and run-time statistics about the use of each dataset. On the basis of this information the different sets of features given in input to streamlining classifiers trained to predict dataset popularity.
Visit these sites for more, and I am attaching a few:
You may use the Census tables. These tabulated files have several parameters based on Census Survey performed every decade in India. The questionnaire is lengthy and comprehensive enough to generate broad spectrum understanding about settlements, populations and demographic details along with housing conditions pivoted to a village/ward in a city. Details of demography is narrated in about 26 columns, housing condition with about 30 parameters, tables on slums includes for about 20 parameters and migration tables in addition. Should you need, I can share some tables with data (let me know) or you may fetch it from the website of Census of India under various heads.
I like to use the CFR (US Code of Federal Regulations). I have it as a single 650MB text file. You can get it on the web. I've performed all sorts of analyses on it (finding hapaxes, misspellings, longest words, etc.) and tested many algorithms for compression and encryption. It's the greatest monument to bureaucratic incompetence ever erected in the history of civilization. Other than ballast and carbon capture (when in the form of paper) this may be the only practical use ever proposed for the CFR.