I assume you're talking about satellite data and you do not have access to aerial photos which provide far more details. Nevertheless, accurate estimation is impossible since the spatial resolution of satellite images range from Km to couple of meters so you can't directly count individuals.
So, here is an idea: you should use a geometric approach instead of relying on spectral data. Despite this, even high-resolution satellite data (like of Quickbird or PAN images) would not show you individuals. Authorities tend to fill-in the audience blocks until the capacity is reached, and then open-up the doors of the next block. So most of the times, we have some full blocks and some empty blocks. You could estimate the number of seats in a given area (e.g. a block) and count the number full blocks. You can google the images of the stadium to evaluate the size of the seats and their ordering if you do not have direct access to area.
If you have some full blocks and some sparse ones you may want to assume different populations for full blocks and sparse blocks and count them separately.
Direct counting of individuals in high numbers, however, requires a different approach. If you have that high resolution material, you can use the same approaches that folks do when evaluate the number of trees using aerial photography (subsampling methods).
High resolution satellite images from sensors such as WorldView and GeoEye provide 50cm spatial resolution were you can count people.
However these sensors are sun synchronous, which means they capture images at 10:30AM, and this is not (usually) the time of game. In case of covered stadium, it'll be more difficult.