When dealing with Big Data Privacy and Security, you need to consider the fact that you must have security first, in order to ensure that you can then have privacy. If you have data which has privacy, but there is no security, then anyone can help themselves to the data, and it is simply a matter of time before the privacy element is cracked and the data is available to the attacker. Whereas, if the data is secure in the first place, then privacy simply adds another level of comfort to your already secure data.
So the first goal must always be to aim for a high level of security. This is then followed by ensuring a high level of privacy can be achieved. Next, we need to ensure we retain the means to ensure we can certify the provenance of the data. Data is useless if it has been accessed and corrupted, modified or had important elements deleted. Thus, we must achieve three goals in order to have a useful outcome.
I have attached a number of useful papers from my own research collection which cover each of these three areas to get you started. As you can see, there is little work on Big Data Provenance, and the work of Thomas Pasquier is something you should explore further.
If you consider the source of data, data which comes from corporate sources, where the corporate is ISO 27002 compliant, it is likely to be a reasonable source of data. For cloud sourced data, while there are some standards now coming out, there is currently no complete cloud security standard, thus this data is likely to be of a lesser standard. Once we move to Internet of Things data, we are moving into 'Wild West' territory. Anybody and their dog can easily hack in to IoT systems, meaning the level of trust in this data has to be considerably discounted.
If you want to find a big data area that can provide you with an exptreme challenge, IoT is the pace to go. Of course, this means that until cloud big data security and privacy are solved, your big data IoT challenge will be an impossible goal to achieve.
Ultimately, the choice is yours, but we need to focus on resolving problems in a logical way, so you may want to address the cloud big data challenge first, before moving on to IoT big data. You could, of course, focus on non-cloud big data, but given the ease with which cloud enables the creation of big data, the cloud route may be the better choice.
Well, Homomorphic encryption at a cloud side. It's a form of encryption that allows computation on ciphertexts (e.g. at cloud), generating an encrypted result which, when decrypted (at user side in our example), matches the result of the operations as if they had been performed on the plaintext.
He et al. recently published a paper "The practical implementation of artificial intelligence technologies in medicine" in Nature Medicine. In this paper, some issues about the privacy in big data were discussed.