I am very much interested in this question. Though von Neumann had calculations regarding the issue, I do not know any sound experimental technique that was used. Yes, this is not an answer, just wanted to say that I would be interested, should there be any support that I can supply.
Awesome question, I've given it some thought back a few years ago but never got around to actually do anything about it. So I'll be thrilled to see the answers people will give here.
The number of axons in the nerves is known in many species and and the maximum sustained action potential rate is also known, so IF you treat an action potential as 1 bit of information, then it is a matter of simple calculation. However, there is growing evidence, that information transfer in the visual system does not work this way, context (and previously sent information) is super important thus one bit may actually carry more than one bit (or less).
I can think of two major groups of techniques that can be used, electrophysiology and imaging. The problem with the first is that these nerves are really well insulated and there is absolutely no way at this moment you can monitor the hundreds of thousands of fibers (the # depends on the species) with multi-electrodes. Field potential measurements with cuff electrodes is one way to compromise. When getting the most information from spatially distinct (but close) sources, imaging cannot be beaten, however it also has its drawbacks. First, it's not easy to put a microscope behind the eye to image the optic nerve. There are workarounds (take out the whole thing, perfuse the eye etc), but it necessarily means one or another type of a compromise. Also with imaging, I don't know if there are any voltage sensitive indicators that could be genetically encoded and "sent" to the optic nerve. The voltage sensitive small molecule dyes that you can load in the optic nerve are not quite as sensitive (or as fast) as the more commonly used Ca-indicators, so there is a pretty good chance of not detecting every action potential.
You can record through microelectrodes the spiking activity of the optical nerve and then you can apply the information theory framework developed by Quian Quiroga and Stefano Panzeri (there is a notable review on Nature Reviews Neuroscience, http://www.nature.com/nrn/journal/v10/n3/abs/nrn2578.html). Good work!
I don't have an answer to this question, but the idea that an action potential is 'one bit' is flawed, but NOT because of context or any of that.
It has to do with timing. The timing of an action potential has been shown to be important. As firing rate increases, the interval of time within which each action potential can happen decreases, lowering the amount of information each action potential carries.
Methods introduced in the following publication by Bialek et al have been widely employed to measure the rate of information in bit/s conveyed by neurons.
The information rate is calculated in terms of the signal-to-noise ratio of a spiketrain-based estimate of a stimulus variable i.e. how much our uncertainty about the stimulus is reduced by observing/decoding a spiketrain elicited by that stimulus (as Dr. Zanos says above).
An application of the above methods to characterize information flow in the early visual pathway can be found here: https://www.researchgate.net/publication/235726980_Retinal_oscillations_carry_visual_information_to_cortex
An informal lecture at the Redwood Institute in 2008 by Prof. Bialek outlining this conceptual framework can be viewed here: http://archive.org/details/Redwood_Center_2008_07_23_Bill_Bialek
Article Reading a Neural Code
Article Retinal Oscillations Carry Visual Information to Cortex
There has been some work in the auditory domain that has quantified the amounts of different types of information in the auditory nerve. See Romain Brette's 2010 paper in the Journal of the Acoustical Society of America for a (critical) review that points to some of some of the papers that have used the technique ("On the interpretation of sensitivity analyses of neural responses", 128(5): 2965-2972).
You have two components to this question: how to record the data and how to measure the information in the data recorded. Most of the previous responses have focused on interpreting the data assuming you have the data. The other component is how to get the data, which Dr. Barabas addressed but assumed you were working in-vivo and Dr. Usai has an in vitro prep with ganglion cells.
Unfortunately, each of these components has several sub-components. Where you record, how you record and how you present stimuli will all have an important impact on the amount of information you measure. Interpreting what the measures of information mean is important and must be done in context.
So, while your question is an important one, it is too broad to be answered directly. Bialek's book as others have noted is the best place to start - easy clear reading. But, if you have the right data, there are better, more sophisticated approaches that have been developed more recently. The correct approach is related to the kind of data you are recording and the question you really want to ask. To get a bit more clarity, there are some very nice papers, especially in the rat whisker field, that examine the transmission of information from the whiskers to the cortex and/or thalamus. Carl Petersen probably has some of the most extensive set of publications on this particular approach. We have done some of this work as well.
The brain has a very unique way of computation that we do not understand very well. It is a kind of mixture of analoge and digital computation with more emphasis on the analogue part. I think most of the computation is done between spikes, which can not be easily converted to bit/s. The question is similar to the problem of "how much information (bits) is stored by a piece of paper". It depends on what you call information. Its count can be a number, its weight can be a floating point number, its 2 dimensions could be two floating point munbers, It can hold a phone number, 1000 characters. If it is scanned each pixel can be a bit of information, and the resolution, the total number of pixels will determine the number of bits, which can go to almost infinite down to an atomic scale. It is almost the matter of taste how big number you want as a result. Let me twist the problem a bit further. How much information is required to recognize the grand mother on an photo. Grandmother can be recognized on a very bad quality black and white but a very good high quality high resolution color picture either...
I think the answer to your question highly depends on the definition of the bit of information, which is not one spike for sure.
Most answers seem to concentrate on the nature or the way information is coded in neural networks. That is one hell of a big, important and interesting topic, but I had the impression, that Steffen's question was geared more towards actually measuring action potentials running through nerves.
Because I think we can all agree that it is the way information is transmitted through the axons of nerve bundles we're talking about, right? Or is there any other method(s), information is transmitted in these "long distance calls"? Yes, I agree (and I think most everybody who learnt contemporary neurophysiology will also agree) that it is most probably pointless to just look at a single action potential and call it a bit. Information may be encoded in the temporal and spatial pattern of the spikes and then yes these spikes will be converted into a totally different kind of message but "inside the cable" - is there anything else? Maybe there is, I just don't know of any other way of fast information flow through nerve fibres. So as I see it, the question dropped here by Steffen is basically how do we measure the spatiotemporal pattern of spikes "in the cable"? Because as daunting as this task may be, it is still way simpler than trying to tackle all the kinds and levels and parallel ways of information processing that occur right after that (not that people are not trying :D).
And yet, very importantly, all the information we use to see and recognize for example grandma has to go through that optic nerve. (Well, this may or may not be true... what I implied in my earlier rambling, was that maybe not all information needs to be sent.) This makes it a valid and good question and something worth doing.
I think, the amount of information transmitted via the optic tracts can be estimated and one can come up with a number that sounds as follows: "At least X Mbits/s must be transmitted, otherwise subjects could not perceive this and that stimulus." So I would not bother myself with electrophysiological measurement, since the number of spikes does not seem to be a reliable measure. Conversely, I would use the appropriate stimulus and I would try to figure out how much is the information content of that stimulus. We have quite conventional measures of the information content of an image. I would control and vary the amount of information sent to the visual system as an input and I would study the visual perception in a psychophysical paradigm in relation to the amount of information loaded into the visual system.
The big question is what sould be the most appropriate stimulus and paradigm?
Dynamic Random Dot Stereograms seems to be a good start. Correlation and matching mechanisms is proven as a pure bottom up and automatic process in the brain. At the speed random dot imatrices can be fused, that amount of information is more probably also transmitted via the optic tracts. The good profile of the random dot images is the randomness and it is hard to assume any "data compression" convergence mechanism. Take it as an idea, I am not sure if it works or not, but at least something to argue about.
To be more specific, when you have to perceive something, the performance most probably will follow a logit function as a function of infromation amount (IA). When IA is too few no perception at all, as the IA increases performance increases, when too much information is sent the performance saturates. You have to find the most demandint task, which requires the largest IA. :-) The performnce most probably also have a retinotopic map, which has to be integrated for the entire visual field.
I think you can use the computational neuroscience course in the coursera website.
lectures of the 2&3 week are completely devoted to your question.
they are:
neural encoding/decoding and information theory
I think we can find the relationship between the Input info and output info by calculating the mutual information (MI) and for MI you need to know about entropies of the I/O which can be obtained from input entropy and neuronal spike trains.