We have made almost 400 laser experiments where the measurements are done through a photocell, registering voltage variations corresponding to variations in the intensity of the laser beam. When the raw data was plotted, we noticed a "raining" effect, meaning that there was a general line and often from that line came straight down several data points (see for example the Figure attached to this explanation, which includes a base line and the experimental line)
Upon close examination, it was found that the "raining" effect was because Arduino, the laptop, or combination of both, registered the same time stamp for different data. The parsing frequency is 1/100 per second or 100 registers per second. The time stamps do have a resolution of 1/1000th second (for example, 11:09:06.676 -> 332, where the three last digits before the arrow represent tenths, hundredths and milliseconds, but still we get the said "raining" phenomenon).
Here is the question: how to properly calculate statistics with data (time series) repeated in the same time stamp?
Up to now, the descriptive statistics of the data were performed "as is", but later we wondered if the "raining" effect should be removed. The way I came with to remove the rain was to calculate the median of data within the same time stamp (sort of inspired by what it is done in quantum calculations), and then we would have graphics without rain, but then I realized that this entailed that all statistics should be recalculated to account for this modification. So, the choices I am facing, which are the motivation of this question are:
To me, either option 0) or 2) would be consistent in the way of handling data, but I do not discard 1), but the point remains on whether removing the "raining" and substitute it with the median across the subset of points would be a good idea, at least for the graphical representation for the data.