Using log scale sensors has obvious disadvantages if you want to get a good accuracy.
Isn't it easier (and way cheaper) to use two identical sensors of low dynamic range, one equipped with a neutral density (ND) filter reducing the Sun's radiance to measurable range and another one - without it ? Actually, the filter should not be that opaque to make the solar flux comparable to clear sky flux.
The only thing which bothers me in this setup is a constant overexposure of certain pixels of the second camera, which can affect their gain. If this is really a problem, then I would use a solar tracker and/or mask the Sun for the second camera.
Actually, one can achieve a dynamic range of camera_dynamic_range x 1E5 using a single camera with a revolving set of commercially available ND filters, and there will be enough information to self-calibrate the system.
To give a better answer, one needs to know an intended application of your experimental setup.
In fact I was more or less convinced of the approach of a multi-exposure time in combination with using a filter wheel with neutral density. But I was also trying to find these kind of log-sensor information. I had a talk this morning with a researcher at Singapore that told me the unsuccessful approach of a log-response sensor.
The ultimate goal is to measure the optical thickness and the radiance of the sky to retrieve the optical properties of the aerosol, just like a Sun photometer (see AERONET).
Would have reference papers of the multi-exposure approach ?
no, I don't, but it's a kind of standard practice both in optical measurements and in photography. An obvious advantage of this approach is the usage of the same sensor in the most favorable mode for a chosen domain of flux intensity - one can choose a linear part of its sensitivity curve, higher than noise level and lower than any kind of saturation.
The algorithm for cross-calibration of the sequence of images is also straightforward and it will provide both the cross-calibraition coefficients and a mask for a "good" signal for each snapshot (the map of pixels, which satisfy the aforementioned criteria).