I am trying to use the HAR dataset (see attached link) to test my activity recognition algorithm. However, before using it to test my own method, I am trying to make sense of the data, and understand the features that are extracted.

As a simple test I tried to reproduce them, starting with the mean. I calculated the arithmetic mean of the first row of values in the data/train/Inertial Signals/body_acc_x_train.txt file. If I understand the explanation correctly, this should be the first value of the first line of data/train/X_train.txt However when computing the mean, I obtained 0.00226869, whereas the value in the X_train.txt file is 0.28858

This same discrepancy occurs for the y and z values. If I omit the division by 128 (number of samples in a window) then the value is nearer (at least of the same order of magnitude), but still further off than floating point errors should account for (just to be sure, I used the bigfloat package in my Python code with a precision of 15 to ensure the rounding errors were not the problem on my side.

I understand this is a rather niche question and sent it to the admin of the data set, unfortunately, he's out of office until the end of August, so thought I'd ask here in case someone has experience with using this dataset.

https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Similar questions and discussions