If you mean an audio CD, the usual method is creating a hashtag based on the number and durations of its tracks, that's how most CD database systems work. It's not 100% foolproof (there might be two different CDs with exactly the same track numbers and duration) but this is normally quite unlikely.
For CD-ROMs, you could rely on certain indicators such as volume label and serial number, number and size of files and create a similar hashtag.
Thank you so much. The hash calculator for CD will be used in front end using vc++ GUI and the other one is in windows driver. Hence the algorithm should be fast so as to work efficiently in kernel. Calculating the hash using number and sizes of files is heavy for kernel driver. One thing I have not tried is using volume label and serial number. Does all CD have unique serial number?
What I am trying to achieve is, CD-ROMs hash value should change each time when I add the file in the CD-RW disk. Is there any way to do this without putting much load on kernel driver such as calculating number of files and size. I can use GetDiskFreeSpaceEx but using it in xp return me always 0. It works in win 7.
The volume number is actually a FAT file system artifact which has transferred over to CDFS and ISO 9660 filesystems. It's only a 32-bit number, so at 4-bliion-odd possible values, it can't be any more unique that IP numbers, plus Audio CDs don't use it.
You cannot use the same exact method for Audio and Data CDs, since they work in completely different ways. Video CDs etc. are actually closer to Data CDs and have ISO 9660 filesystems, so they are still usable.
As for CD-RW, I t hink you're trying to deal with packet writing ("adding files"), which is an entirely different beast entirely. Packet-written CD-R/RW are not recommended for broad compatibility.
Thank you again. I added more that 5000 files of size 64 bytes in the CD-RW and using FindFirstFile and FindNextFile I loop through and get the file size and add up to total and used it in the existing hash calculation. It worked and I expected it to be slow but to my surprise it worked very fast less than a second.
>>You cannot use the same exact method for Audio and Data CDs. , since they work in completely different ways
I have to try this same method in audio CD. If audio CDs are completely different then will the above method be able to get the actual data size of every audio files?
>>As for CD-RW, I t hink you're trying to deal with packet writing ("adding files"),
Audio CDs don't have "audio files", they have a TOC with track information (position, length), which can be accessed only through low-level drivers or higher level wrappers. The ".cda files" you see in Windows are just a wrapper provided by the OS for convenience, but they don't contain actual data, nor do they link to the actual audio data. Still, you can use this information for calculating the above information (and use it to query services like e.g. CDDB).
Packet-written CDs however have other variables as well, e.g. DELETED files, so you must take that into account too. E.g. two CDs with 5000 same-size files but with one of them containing a "deleted" entry, are to be considered "the same" CD?
I suggest you read the online CD Recordable FAQ for more details about how CD-Audio, CD-ROM, packet-writing etc. work.
In any case, CD-ROM (with ISO filesystems), CD-Audio and packet-written CD-R or CD-RW are all different formats using the same medium. Thus by definition, you cannot use the same method for handling all of them. Plus there are possibilities like exotic non-ISO filesystems (e.g. game console CDs), hybrid CDs, CD-extra, photo CD etc. so you really need to narrow down the context.
Thank you so much. I check the audio CD and they don't reflect the actual audio content.
Yes, either narrow down the context or narrow down the application to specific CD. Another way I use is skipping 64 KB of Raw data and reading 4 KB of raw data always get me unique hash. I have no idea what that data means. This hash is unique for all the CDs but not for the data in the CD.
>> two CDs with 5000 same-size files but with one of them containing a "deleted" entry, are to be considered "the same" CD?
Using the above data I add the number of raw data size in the CD and then generate the hash. So in this case the two CDs 4 KB data supposed to be different. Then immaterial of whether the data is deleted in one CD and in another is not, the hash will be different. Hence adding a new file again changes the data size and different hash is generated. The windows API FindFirst/FindNext will any way does not list the deleted files I suppose.
I tried in Audio CD. In this case as you said reads the TOC file of size 64 (B/KB). Now again read the first 4KB raw data + TOC file size and then generate the hash. Suppose I add another track in the Audio CD and another TOC file is added. This in turn increase the total TOC file size. This time the hash generation will give me different hash. But I am not sure whether I am going in a right direction.
But I am quite sure that generating hash from 4KB data is unique (really?).