My question is very ubiquitous in nature, which applies to all range of research fields. Very often physicians and researchers present me data in Excel format for sophisticated data analysis and modeling which needs extensive treatment in terms of cleaning, processing, formatting and numerous transformations before its ready to be put into the mill. One thing very common to such datasets is lake of data dictionary which include variable short description and value labels (coding description). I wonder if anyone has developed a script to develop such information especially value labels from the raw data in any of statistical packages such as R, SAS, Stata or SPSS. and produce the usual summary estimates? Just to give idea categorical variables such as gender, race, death, diseased etc. all need to be coded such that numerical values such as 0, 1.... represent the categories which need to be labeled such as female/male, died/alive, White/Black/Hispanic etc.