A BBC article reports on the use of big data analytics for predicting successful matches on dating websites (bbc.in/1jqhDWj); for instance, they find the 3 most predictive questions couple agree on on the first date. Nothing particularly new for the data mining community, but the article also documents a problem of people lying in their questionnaire responses. "To present themselves in what they believe to be a better light, the information customers provide about themselves is not always completely accurate: men are most commonly economical with the truth about age, height and income, while with women it's age, weight and build." For agencies like Match.com & OKCupid, it's as simple as 2+2: inaccurate data = unsuitable matches.
How big of a deal do you think deceptive reports are in big data? Can you think of other concrete applied examples of human deception impacting data mining results? Especially on a large scale, in big data analytics applications? Your thoughts are much appreciated. Thank you!