This question is still unanswered for me. Like CDs use Reed-Solomon codes for this purpose. It would be great that some one could refer me to some technical paper on this particular topic.
Cross-interleaved Reed–Solomon codes are used because they have a high rate and can correct both burst errors and large numbers of random errors. Other codes could be used too. Have you looked at: K.A.S. Immink, Reed–Solomon Codes and the Compact Disc in S.B. Wicker and V.K. Bhargava, Edrs, Reed–Solomon Codes and Their Applications, IEEE Press, 1994
@Stephen: your reference is a bit old. I must say that LDPC are popular these days. Might be a bit long for this application, though. Oh I regret the happy days of algebraic decoding!
Currently, most hard drives implement Reed-Solomon to correct errors per sector.
However, there are several issues appearing nowadays due to the increase of hard drive sizes (and consequentially of sector size):
1. the likelihood of soft errors increases because of increased number of bits (and also probably because of smaller components, we are approaching electromagnetic theoretical bounds).
2. most Reed-Solomon implementations have an algorithmic complexity of O(n²), which means that for bigger disks, it will take more time to fix errors, which is exactly what you want to avoid with the new trend of SSD drives. This is particularly a problem for data centers with lots of disk arrays.
Because of this, hard drive manufacturers and data center managers are looking for alternative to the original Reed-Solomon. Here are a few leads:
* LDPC codes, which are suboptimal codes but with correct parametrization they can lead to 99.9% recovery compared to Reed-Solomon (which is optimal -- this means that there is 0.1% chance the LDPC decoder might need an extra block compared to the optimal Singleton Bound in order to be able to correct the errors) and are a lot faster. A good example is the wirehair library by catid (https://github.com/catid/wirehair), and is about O(n). Indeed, the biggest issue with LDPC codes is that you will get immensely different results depending on the type of LDPC graph you build, this is still being actively researched.
* XOR codes (see "Reliability and Power-Efficiency in Erasure-Coded Storage Systems", Kevin M Greenan, 2009).
* Reed-Solomon faster encoding and decoding algorithms, such as Cauchy Reed-Solomon and the latest Number Theoretic Transform allowing O(n log n) performance ("Novel Polynomial Basis and Its Application to Reed-Solomon Erasure Codes", Lin, Han and Chung, FOCS14).