I need to write some files on the disk where the retrieval and writing is fastest. Are there any libraries or tools where we can direct the OS (Operating System) or application to write files to a particular section of the HDD?
Bit density on modern drives is limited by magnetic properties of the head and media to somewhere between 1-2 million bits per inch with the current generation of drives. Since drives have a constant rotation rate, if you pack bits at maximum density on all the tracks you'll get more bits (and a higher transfer rate) on the outer tracks. (google ZCAV - zoned constant angular velocity - for details). Since LBAs are numbered from the outside in[*], this results in low addresses giving higher transfer rates - see e.g. http://media.bestofmicro.com/C/M/337414/original/diagram-WD-VelociRaptor.png for an example from a recent Toms Hardware review.
[*] Although in theory disks can map LBAs any way they want, in practice that only happens for runtime-detected bad blocks. (factory-detected ones are remapped by slipping the LBA numbering) File systems were designed for performance assuming a naive LBA-to-track mapping, and as disks evolve they are designed to perform well with existing file systems, so they keep this mapping with few changes.
The only reasonable way to force data to the beginning of the disk is to partition the disk and use the first partition - e.g. many people swear by putting a swap partition at the beginning of the drive to get better performance. (while other people would rather give the fast section to the file system)
The difference between the ends of the disk is about 2x, but unless you're doing large contiguous reads and writes (10MB or more) seek times and OS overhead are going to dominate. Besides, if storage performance is really important you should be considering an SSD, or a hybrid disk/SSD system - see e.g. the dm-cache work from Florida Int'l University, which is available on Github.
I guess i got what i needed. A little digging on the mmap revealed that is built for scenarios where a relatively small portion (view) of a considerably larger file needs to be accessed repeatedly. Luckily .NET takes care of writing this to a sector on disk where its easy to read and write. in.NET this is more in context of IPC . I guess it will work for me, for the time.
I think there are some basic misconceptions here that certain sectors would be faster on disk drives. That can be a function of the disk controller, its use of write-thru cache, multi-disk array striping, etc., etc. Intelligent disk subsystems may virtualize such ideas, overriding any intentions of the O/S or applications. If your application can be ensured of running only on some specific hardware configuration - for its lifetime, you might be able to optimize the performance of a particular disk system simply by controlling physical tile allocation, but in practice I think such efforts are futile...
Why do you think that this specific application's I/O must be the fastest of all disk I/Os'?
As James points out, this is hard to do because of the HDD firmware and LBA (Logical Block Address), but you can access disk drives in Linux using SCSI interfaces at a block level - sg3_utils is particularly nice - on Ubuntu Linux do "sudo apt-get install sg3-utils" and this is a great way to learn more about any SAS (and many SATA that use SAS controllers with STP) disks connected to Linux. In the old days, HDDs were "short stroked", meaning that the high latency random seek+rotate delay (typically 5 msec) was minimized by reducing seek range (not much you can do about rotate) - at most, that means maybe cutting this latency in half - the real problem is that HDDs are mechanical and will therefore always be limited by rotation rates (at most 15,000 RPM) and head seek time to each track/cylinder. So, I agree that this is not a good use of your time. I would suggest either: 1) RAID with striping and mirroring (RAID-10) or 2) SSD (Solid State Disk) or 3) use both. Buffer cache will help you as already pointed out, but not for sustained writes or large and continuous reads that are more random in nature (check out slabtop in Linux) - with slabtop if you do aggressive I/O, you'll see yourself run out of buffer cache (e.g. I did work with 10G iSCSI and SSD/RAID and would see buffer cache just fill up in most of my testing). HDD has great capacity value (at about 10 cents per gigabyte) and therefore will always be of use, but access, especially random access is not the HDD's strong point (ok on sequential, but still limited to 200 MB/sec in round numbers - only 200 random I/O's is the real bummer). I suggest you re-think entirely and consider either RAID and/or SSD / PCIe Nand Flash (FusionIO, Virident, Intel, Micron) and in the future NVMexpress cards. Flash is still about 50 cents per gigabyte and has some limitations, but it has made huge strides lately and pairs well for transaction processing and for specific uses alongside RAID to get parallel performance from HDDs. Good luck!
I wanted to share few information which i got from one of my friend working in Samsung. I asked the same question and the reply was very interesting.
Its true that the Read and Write speed are different on different tracks, its also true that the OS dose not have much control on where the information is physically written.
Now if we go back a couple of decade, the CD players which we used to hear music, when it starts from the first track (usually on the inner tracks) the CD spins at a faster rate and as the songs play by, the spin slows down. This design was very effective in terms of cost. The Hard drives were also based on the same principle. This allowed simpler hardware design for the on board hard drive controllers. One single controller working at one clock speed gave very low cost boards. Most of the Manufacture favored this approach, because of low cost of board. Controlling the spindle speed was more easier than controlling the Read Write operation speed.
Lets put this way: If an hard drive spins at a constant speed, the data under the read-write (RW) head will face the bits from the platter (disk) at different speed. the inner tracks will yield lesser bits than the outer tracks (per second). The same will happen while writing the data on the platter. Also the Head needs to stay for some time to magnetize the platter to store the information. this heavily depends on the material of the platter and the magnetization strength of the RW head.
Now if the disk is spinning at a constant speed the read and write operations will become very complex for the controller. the controller must adapt to different track speed. As it goes to the outer track the RW speed increases.
A cleaver solution is to slow down the spindle RPM as the head moves outward. hence all other hardware becomes very simple to design, one speed to write, read, transmit data. Good way to keep the cost down.
Recently some manufacturers started exploiting this situation. With the rise of cheaper controllers and low cost of production, the on board controllers became more flexible on reading bits from the platter at different speed. Hence a Single track could be read at different RPM. Any manufacturer tries to makes this gap larger, wins on performance. However there are some minor limitation to this, and a very critical cost-performance balance. The on board controllers also started having heavy Cache memory (2 - 32 MB ) to compensate the different read and write speed. There are Hard drive now which can do a sequential operation at 200 Mbps. The changes to the spindle speed was reduced when the RW head can operate at different spindle speed on the same track (or nearby tracks).
The operating system usually wont have a clue about the speed of the RW because now there is a cache with the controller and the OS dose not need to know the controllers strategy to read/write the data to the Disk. Hence no API are exposed at such granular level. (of course there were some exceptions where the OS could refer to a physical block and few parameters, but we will never know if the controller is really doing it or not, and this can be an open argument and depends on the manufacturers ! )
Company like Samsung, Toshiba , WD have dedicated R&D centers where a lot of research goes on making the Platter and the controller more flexible and cheaper. there are a lot of details that have gone into these on-board controllers and quite a closely guarded secret and will not reveal the design. Lots of patents goes in these areas too.
The APIs too are not untouched. How a command needs to be handled, where to write, how to fragment, how much to cache etc.. by the controller has also complicated with the new line of Hard drives (these are separated from the control of the Operating System).
I hope in upcoming year there will be options available to write files on specific areas of the disk to leverage the advantages.
I think your friend from Samsung was referring to old CDs as analog recording devices rather than digital disks - they have somewhat different engineering considerations.
It's been many years since I considered such issues (I was never an engineer), so in some regards I may be out of date, and I worked mostly with system software and performance, equipment evaluation and capacity planning for very large mainframes, although I did some work with Unix based systems that used the same disk subsystems. Fundamentally though, the phenomena you're concerned with here is only a minute consideration in determining the I/O performance delivered by a disk system.
To explain, what you are discussing is a function of disk geometry and recording media density. First, I'm skeptical that (even today) disk speeds are varied, as high rotational speed is a critical performance factor for both data transfer rate and rotational latency (delay). As a result, mechanical rotational speed is generally maximized in order to maximize average data transfer rate and minimize rotational latency - disk are rotated just about as fast as reliably possible: around 15k rpm or whatever it is these days. If you listen, you can hear that it takes several seconds for a disk drive to reach its operational speed when it's initially turned on.
The second critical performance factor is recording density. For a given EM or optical recording media, maximizing recording density (bits per inch) determines the data capacity of the device. Manufacturers strive to maximize recording density in order to minimize cost/bit. Combined with rotation speed, recording density determines the data transfer rate of the device.
Because the amount of surface area that passes under a read/write head at a fixed rotational speed varies depending on the radial location of the head, for a given recording density more data can be recorded per track on the outer disk radii than can be on the inner tracks. There are different choices that can be made to deal with this phenomena - if I recall there once two common choices depending on whether sector sizes were fixed or variable within a disk system architecture. However, the implications of those choices are not usually the most critical to application performance, as they can only affect data transfer rate.
For disks that contain either a single randomly accessed database or many files that are intermittently accessed, the typical I/O is for a block of data, likely a few kilobytes, at a specific position on the disk. The amount of time spent actually transferring data is on the order of a few milliseconds. First the read/write head must be mechanically positioned to the correct track (at a specific disk radius) - this can vary quite a bit depending on the data and the disk drive, but it can average from a few to more than 10 ms. Lastly, the data transfer cannot occur until the required sector passes under the read/write head - this on average takes a few to several ms. Most often, disk access times depend more on the disk rotational latency, r/w head positioning (track seek) and the amount of data to be transferred than the data transfer rate of the device.
The most important factor in considering application I/O performance is not the physical disk characteristics at all! Since electronic memory can now be economically configured as very large disk caches, many if not most application data requests can be fulfilled without any disk access at all. In this case, the primary performance consideration is the data transfer rate between processor memory and disk cache memory - generally on the order of a few ms. For the most frequently read data (and in some restricted cases for newly written data), the application I/O can be satisfied with no disk access delay.
In most cases, applications, database systems and even operating systems cannot be fully aware of, much less in control of, the factors that are actually affecting disk access performance... For this reason, applications should not attempt to micro-manage disk performance! For additional information, see http://en.wikipedia.org/wiki/Disk-drive_performance_characteristics and its references.
Bit density on modern drives is limited by magnetic properties of the head and media to somewhere between 1-2 million bits per inch with the current generation of drives. Since drives have a constant rotation rate, if you pack bits at maximum density on all the tracks you'll get more bits (and a higher transfer rate) on the outer tracks. (google ZCAV - zoned constant angular velocity - for details). Since LBAs are numbered from the outside in[*], this results in low addresses giving higher transfer rates - see e.g. http://media.bestofmicro.com/C/M/337414/original/diagram-WD-VelociRaptor.png for an example from a recent Toms Hardware review.
[*] Although in theory disks can map LBAs any way they want, in practice that only happens for runtime-detected bad blocks. (factory-detected ones are remapped by slipping the LBA numbering) File systems were designed for performance assuming a naive LBA-to-track mapping, and as disks evolve they are designed to perform well with existing file systems, so they keep this mapping with few changes.
The only reasonable way to force data to the beginning of the disk is to partition the disk and use the first partition - e.g. many people swear by putting a swap partition at the beginning of the drive to get better performance. (while other people would rather give the fast section to the file system)
The difference between the ends of the disk is about 2x, but unless you're doing large contiguous reads and writes (10MB or more) seek times and OS overhead are going to dominate. Besides, if storage performance is really important you should be considering an SSD, or a hybrid disk/SSD system - see e.g. the dm-cache work from Florida Int'l University, which is available on Github.
Still, disk device performance characteristics typically only affect an application if the data to be read is not contained within cache (read-miss), or the caching policy is write-through (the host is not notified of I/O completion until that data has been written to disk - the cache memory is consider volatile).
Cache devices with battery backup may be considered to be non-volatile (including Solid State Disk devices - cache without backing disks), but failures can occur that cause loss of data (please do not use for my bank account).
Thanks for pointing out the system swap disk partition - this is the one I/O application that transfers large blocks of data per I/O (thus benefiting most from increased data transfer rates in a write-through cache) and can affect the performance of all host applications. However, assuming high swap I/O rates, this also ensures that other partitions will suffer longer than average I/O delays, since the r/w head will often be positioned far from the location of the requested data...