Human Activity Recognition

Saksham Checker
8 min readAug 9, 2021

--

Introduction:

There are multiple sensors in a phone such as accelerometer, gyroscope sensors etc. These sensors can be used to detect the activity of the owner of the phone. The major implementation of this techniques is smart watches and fitness bands. The hand-wears detect the activity of the owner and thus calculate the number of calories burnt by them during the activities.

This project is to build a model that predicts the human activities such as Walking, Walking_Upstairs, Walking_Downstairs, Sitting, Standing or Laying. This dataset is collected from 30 persons(referred as subjects in this dataset), performing different activities with a smartphone to their waists. The data is recorded with the help of sensors (accelerometer and Gyroscope) in that smartphone. This experiment was video recorded to label the data manually. The dataset is taken from the UCI Machine Learning Repository.

Source — Google Images

Knowing The dataset:

The dataset was created using the signals of the sensors. I used the data which was created by processing those signals by the experts. The following features are created by the experts — tBodyAcc-XYZ, tGravityAcc-XYZ, tBodyAccJerk-XYZ, tBodyGyro-XYZ, tBodyGyroJerk-XYZ, tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag, fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccMag, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag.

These features are in the following categories —

mean(): Mean value, std(): Standard deviation, mad(): Median absolute deviation, max(): Largest value in array, min(): Smallest value in array, sma(): Signal magnitude area, energy(): Energy measure. Sum of the squares divided by the number of values, iqr(): Interquartile range, entropy(): Signal entropy, arCoeff(): Autorregresion coefficients with Burg order equal to 4, correlation(): correlation coefficient between two signals, maxInds(): index of the frequency component with largest magnitude, meanFreq(): Weighted average of the frequency components to obtain a mean frequency, skewness(): skewness of the frequency domain signal, kurtosis(): kurtosis of the frequency domain signal, bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window, angle(): Angle between to vectors.

All are thus in 3 directions, making a total of 561 features. In the dataset, Y_labels are represented as numbers from 1 to 6 as their identifiers. WALKING as 1; WALKING_UPSTAIRS as 2; WALKING_DOWNSTAIRS as 3; SITTING as 4; STANDING as 5; LAYING as 6.

Problem Framework

  • 30 subjects(volunteers) data is randomly split to 70%(21) test and 30%(7) train data.
  • Each datapoint corresponds one of the 6 Activities.

Problem Statement

  • Given a new datapoint we have to predict the Activity

Explanatory Data Analysis

Initially, data provided by the each user segregated according to the activities is plotted with the count.

count vs subject plot

It can be observed that most of the users had almost equal number of activities in the data except a few like user 1. Although, throughout the dataset, the number of datapoints for each activity are almost equal as shown in the graph below.

Now, using the t-body-acceleration feature, It can be seen that the stationary and the moving activities can easily be separated as there is a sufficient distance between the two. It can be seen in the plot below.

Using the box plots, I tried to observe how acceleration can separate the activities.

We can see that moving and stationary activities can easily be separated from this plot. Also, the Walking and Walking downstairs can be separated with an acceleration near -0.1.

Laying activity can easily be separated from all other activities by the angle of gravity mean from both the X and Y axis.

Now, T-SNE plot was used to check if the activities can be clustered or not. it was observed that the standing and sitting activities are hard to separate as most of the features are similar for both of them. The plot can be seen below where the red and blue points are overlapping. The plot with perplexity 50 is shown.

Machine Learning Models-

For evaluating the models, the following utility functions are used to form a confusion matrix and check how much a model is successful.

In the functions, the first one is plot confusion matrix. This function plots the actual and the predicted values to check the accuracy, precision and other performance matrix of a given model. The second function, perform model is used to fit any model and also calculates the time taken by each model. Whereas, the third function directly prints all the calculated performance matrix, best parameters, cross validation sets and the best score of the model.

GridSearchCV

It is the process of performing hyperparameter tuning in order to determine the optimal values for a given model. As it is known that, the performance of a model significantly depends on the value of hyperparameters. Note that there is no way to know in advance the best values for hyperparameters so ideally, we need to try all possible values to know the optimal values. Doing this manually could take a considerable amount of time and resources and thus we use GridSearchCV to automate the tuning of hyperparameters.

GridSearchCV is a function that comes in Scikit-learn’s(or SK-learn) model_selection package.This function helps to loop through predefined hyperparameters and fit your estimator (model) on your training set. So, in the end, we can select the best parameters from the listed hyperparameters.

Linear Regression with GridSearch

Linear regression is a well known basic model. LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

On implementing Linear Regression with GridSearch, 0.9589412962334578 accuracy score was observed. The model did show a less score in detecting sitting and standing activities compared to other activities as seen in the Confusion Matrix below.

Also, the parameters of best estimator was{‘C’: 8}

Linear SVC with GridSearchCV

Linear Support Vector Classification is similar to SVC but with linear kernel. t has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

On implementing the LinearSVC with GridSearchCV, a 0.9670851713607058 score of accuracy was seen. The model did show a less score in detecting sitting and standing activities compared to other activities as seen in the Confusion Matrix below.

Decision Trees with GridSearchCV

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation.

On implementing the DecisionTrees with GridSearchCV, a 0.8775025449609772 score of accuracy was seen. The model did show a less score in detecting sitting and standing activities compared to other activities as seen in the Confusion Matrix below.

Parameters of best estimator : {‘max_depth’: 9} Average Cross Validate scores of best estimator : 0.8465816673372272

Random Forest with GridSearchCV

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

On implementing the Random Forest with GridSearchCV, a 0.9267051238547676 score of accuracy was seen. The model did show a less score in detecting sitting and standing activities compared to other activities as seen in the Confusion Matrix below.

Parameters of best estimator : {‘max_depth’: 13, ‘n_estimators’: 30} and Average Cross Validate scores of best estimator : 0.9234279054925846

It is thus observed that LinearSVC with GridSearchCV gave the best accuracy for predicting the activity.

Deep Learning Models-

For deep learning models, the signals are processed with the help of LSTM model to predict the activity. For ML models, as mentioned, that the experts generated the features. They processed them with the help of better knowledge of signal processing.

LSTM-

Long Short Term Memory networks — usually just called “LSTMs” — are a special kind of RNN, capable of learning long-term dependencies. LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn.

The following Model structure was used-

The model gave an accuracy of 91.14% with an error of 0.3281. The accuracy was less than the accuracy with the Expert generated features. The confusion matrix shows that still the Standing and sitting activities are not easily separable.

Conclusion and Future

The expert generated features did give a better performance. It shows that the knowledge of the domain is important to get better performing models.

In the DL models, feature generations can help in better detection in future.

Also, an end-to-end application of this concept will be implemented in future.

References-

Github — https://github.com/sakshamchecker/Human-Activity-Recognition

Google Colab Notebook — https://colab.research.google.com/drive/11sPthE4JSj9k0dDfaW0Ca5LTXTPTsrDv?usp=sharing

Dataset — https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones

--

--

Saksham Checker

Hi there! I am Saksham Checker from Delhi, India. I am currently pursuing Bachelor of Technology in Engineering Physics from Delhi Technological University