Can anyone help me to understand the UCI human activity recognition dataset?

08 August 2014 4 3K Report

I am trying to use the HAR dataset (see attached link) to test my activity recognition algorithm. However, before using it to test my own method, I am trying to make sense of the data, and understand the features that are extracted.

As a simple test I tried to reproduce them, starting with the mean. I calculated the arithmetic mean of the first row of values in the data/train/Inertial Signals/body_acc_x_train.txt file. If I understand the explanation correctly, this should be the first value of the first line of data/train/X_train.txt However when computing the mean, I obtained 0.00226869, whereas the value in the X_train.txt file is 0.28858

This same discrepancy occurs for the y and z values. If I omit the division by 128 (number of samples in a window) then the value is nearer (at least of the same order of magnitude), but still further off than floating point errors should account for (just to be sure, I used the bigfloat package in my Python code with a precision of 15 to ensure the rounding errors were not the problem on my side.

I understand this is a rather niche question and sent it to the admin of the data set, unfortunately, he's out of office until the end of August, so thought I'd ask here in case someone has experience with using this dataset.

https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Shalender Singh

Is it generated using opencv?

Pratishtha Sharma

Hi Andrew,

Did you get the solution to your problem? I am stuck at the same problem from such a long time and unable to get the result. Can you help me if you have got the solution to the problem

Arun Kishore Ramakrishnan

I'm a too late for this conversation but facing the same issue. I understand two things,

1) They do normalise each feature between -1 and 1. and

2) There is a 50% overlap in the window they use for calculations.

This brings the values to the same order of magnitudes (i.e., same decimals and signs).

But still the values are not the same. So, wondering if the issue is related to the implementation choices (such as certain library in certain language).

It would be great if you can share your insights.

Thanks!

Andrew Koster

I didn't manage to figure it out and dropped the issue. The admin of the dataset did answer eventually, but his answer didn't really help me make sense of it, and I had already stopped using it. I suggest trying to get in touch with him, perhaps he can help you.

How to learn more about SPSS and its Application?

Is there a problem with my RNA pellet?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

Request Python code?

RNA Extraction Using Hot Borate Method No Longer Working?

How are iso-frequency contours plotted?