We have made almost 400 laser experiments where the measurements are done through a photocell, registering voltage variations corresponding to variations in the intensity of the laser beam. When the raw data was plotted, we noticed a "raining" effect, meaning that there was a general line and often from that line came straight down several data points (see for example the Figure attached to this explanation, which includes a base line and the experimental line)

Upon close examination, it was found that the "raining" effect was because Arduino, the laptop, or combination of both, registered the same time stamp for different data. The parsing frequency is 1/100 per second or 100 registers per second. The time stamps do have a resolution of 1/1000th second (for example, 11:09:06.676 -> 332, where the three last digits before the arrow represent tenths, hundredths and milliseconds, but still we get the said "raining" phenomenon).

Here is the question: how to properly calculate statistics with data (time series) repeated in the same time stamp?

Up to now, the descriptive statistics of the data were performed "as is", but later we wondered if the "raining" effect should be removed. The way I came with to remove the rain was to calculate the median of data within the same time stamp (sort of inspired by what it is done in quantum calculations), and then we would have graphics without rain, but then I realized that this entailed that all statistics should be recalculated to account for this modification. So, the choices I am facing, which are the motivation of this question are:

  • Keep using both the graphics and data as they were (no "raining" removed).
  • Use the statistics with all the original data and the graphics with raining removed for better visualization of the difference between base lines and experimental lines.
  • Recalculate all the statistics, once the raining has been removed.
  • To me, either option 0) or 2) would be consistent in the way of handling data, but I do not discard 1), but the point remains on whether removing the "raining" and substitute it with the median across the subset of points would be a good idea, at least for the graphical representation for the data.

    More Arturo Ortiz Tapia's questions See All
    Similar questions and discussions