Gray code mapping will be better because adjacent constellation points will only differ in one bit, so typically in case of an incorrect slicing/demapping only one bit will be wrong. For the natural mapping multiple bits will be wrong.
Actually, it depends. There was a similar discussion to this one elsewhere on Researchgate. If the communication scheme uses any sort of forward error correction codes, and byte interleaving, it makes no difference at all whether you use Gray coding in mapping bits in the 16-QAM constellation.
Even if the mapping on the 16-QAM symbols is Gray code, by the time you deinterleave and remove the FEC redundancy at the receiver, whatever single-bit error may have been introduced by the symbol demod could end up being a huge error at the decoded baseband output.
Indeed, it depends, but there are so many different scenarios to consider that I am unable to give you an exhaustive answer here, but you would find a lot of related information in my book on QAM. Some sample chapters are enclosed here, but ALL chapters are available at IEEE Xplore on line.
Most importantly, in the recently discovered spatial modulation star-qam performs better then square qam with gray mapping!!!
Even that depends, though. Gray mapping is only useful in cases of A/D and then D/A conversion, isn't it? Even aside from any FEC coding, if I transmit a file coded, say, in BCD or in ASCII, then a "mere" 1 bit error could make a tremendous difference, Gray coding or not. The difference between $2M and $3M, if ASCII encoded, is only 1 bit. The difference between M for "most," and L for "least," is also only 1 bit.
Seems to me that we need to take into consideration what the channel is being used for.
BTW, without having gone into great depths in this, I'm not sure why a star configuration wouldn't always outperform a square or cross, for BER performance? Just because the difference in phase values does not become smaller and smaller as we reach the outer edges of the constellation.
I acutely agree about your example related to the transmission of any specific source info, like the ASCII text you used in your example. There are millions of similar examples in the context of sources, such as video, audio, speech etc, where additionally the 'sensitivity' of corruption is different for different bits...
As to your point concerning star-qam, normally it performs worse in BER terms than square-QAM owing to its reduced Eucledian distance amongst the constellation points, but it has other advantages, as detailed in my related book on QAM. I pasted in the URL last time and you would find it at IEEE Xplore, Googlebooks etc