Activity tracking using the phone and MATLAB
You are a physically active person? Do you count the time spent walking or Jogging? Recent studies have shown that nearly 20% of adults use certain technologies to track physical activity. Whether you belong to this 20% and analyze whether daily activities to get to know yourself?
This post will talk about how to use the Android device is paired with machine learning algorithms and MATLAB Statistics and Machine Learning Toolbox for tracking physical activity in real time.
the
We will need:
the
Connection of smartphone and MATLABа is done using the MATLAB Connector, the plugin can be downloaded the link.
More information about collecting data from the sensors of smartphones tut.
the
Watching someone we easily identify what the person is busy, even if we see it for the first time. We default know how to recognize human activities. The brain compares the observed action with thousands of previously seen and gives the necessary coincidence. Similarly, the computer (the phone) recognizes the actions that he taught.
Using machine learning algorithms, it is possible to teach a computer to recognize specific human activities, and to improve this ability as new information becomes available. Such recognition, including data split into separate "classes", called classification. Another example of classification — formulation of patient diagnosis based on the presence of certain symptoms.
The classification algorithm for such problems is applied in two stages: training and detection. During the training phase constructs a model that sorts training data according to certain categories. During the discovery phase of the new data are tied to the already existing categories.
The app used the acceleration sensor (accelerometer) which determines the activity. Was selected by the classifier based on K-nearest neighbor (KNN). Is a convenient algorithm for this application since it quickly detects movement and is very accurate in working with data of low dimensionality (small set of properties). Based on the majority of the values of its K-nearest neighbors in the training dataset, it detects the category to which the point belongs in the new data.
The process of recognition of movements was done in three stages:
the
Since the detection was carried out by the classifier, for the further work he had to learn on the set of pre-known data points. MATLAB Mobile in combination with the MATLAB support package for Android sensors (MATLAB Support Package for Android Sensors) allows to collect the device's accelerometer and send the measurement to a session of MATLAB on a PC.
After a connection is established with Android, was created an instance of an object "mobiledev" to write data from device sensors. Then the accelerometer on the device were included, and MATLAB started collecting data.
the
After executing the last command I did not move for 10 seconds. Then he stood up and walked next 70 seconds. Then came to the stairs, climbed down and ran for approximately 60 seconds. After another 70 seconds walked, then climbed the stairs and returned to the office. In the end I sat down and didn't move. From the accelerometer I got a recorded three-dimensional data and visualize them.
the
The chart below contains data on all actions: immobility, walking, running, ascent and descent of stairs. As you can see, they are impossible to detect just by looking at the graph. So it is required to isolate and identify those properties that will help to recognize each action and to distinguish it from the others.
the
Although time-domain raw readings of the accelerometer look the same for each activity, in fact, they contain unique characteristics that you can use in order to identify different kinds of movements. This can be, for example, the maximum value of all data points or the number of data points outside of any range. Such characteristics are called properties.
Because of these properties can be classified. You can see a lot of properties:
the
However, to effectively remove them, it is necessary to find a minimal set of properties that allow to distinguish between different activities and not be too resource intensive.
Of all the possible properties, the following six were the most suitable. They are marked "Feature_N":
The value of data is the square root of the sum of the squares of the readings of the acceleration along the Y-axis and z-axis Readings X-axis can be ignored because they are not very different for different types of activities is associated with the position of the phone in his pocket. Of 5-second segments of recording each activity were recovered corresponding to six properties. The measurement window of 5 seconds is chosen because of the length sufficient to secure consistent and stable properties. The length of the window can be varied, bearing in mind that in Windows too long (about a minute or more), the probability of error due to possible change of activity.
Calculating, in the procedure of learning the above properties for each type of activity, the algorithm asked two input properties and appropriate response (i.e. an activity which belongs to these properties). A volatile function is calculated on the basis of "raw" three-dimensional measurements of the accelerometer during running and determines the size of the above-mentioned properties. For example:
the
Here the variable function looks like: [Feature_1, Feature_2, Feature_3, Feature_4, Feature_5, Feature_6]. Each pair of "property-action" is 1 point of the training data. To learn more about how to obtain these points, the training data is possible through the MATLAB script recordTrainingData in the attached files. Initial readings of the accelerometer for each activity is stored in a MAT-file. To avoid collecting data during the change activities, added prompts between each activity. This ensures that the source data for the allocation of properties each activity is clean and consistent.
To extract properties from the raw data stored in each MAT-file, use extractTrainingFeature from the downloaded archive. Please note that the original accelerometer data discretized with a frequency of about 200 Hz, but the frequency may be much less due to the nature of work Android. In addition, the sampling rate may change during the measurement, thereby making the data unevenly discrete.
To bring the data to a uniform sampling rate, the algorithm "resampling implementation". The algorithm has helped to improve identification of properties and further classification.
In the chart below, the red line marked with the 'x' indicates uneven discrete initial data Y-axis accelerometer, and the blue line marked with the 'o' stands for processed data. Note that the first three properties listed above are in the time domain, while the rest in frequency.
From the following graphs it is seen that the properties of different types of activities are grouped together (e.g. all the red data points correspond to the properties associated with a run). This character of properties (different clusters for different types of activity) can accurately determine the activity for new indications of the accelerometer
the
Typically, the learning algorithm requires many points of the training data to build reliable models of recognition. To train the classifier, collected thousands of data points for each activity. For starters, the properties are grouped into an array in the following order — walking, running, resting position, lifting up and lowering down:
the
The code line above, featureWalk is an array of six properties with a dimension of 1000×6. Properties designed for the original accelerometer data collected during walking. Similarly, featureRun is an array of dimension 1000×6 of six properties calculated on the raw accelerometer data collected during the run, and so on.
When the properties for all activities were collected into a single dataset, it was found that the running properties of larger properties walking or immobility. Because of this, properties of different types of activities changed and affect the ability of the algorithm to accurately determine the activity according to new data (which may be scaled differently). Therefore, it is necessary to normalize the data to limit the range of values from 0 to 1:
The chart above illustrates the original values Feature_1 designed for all types of activities, as well as normalized values. As can be seen, after the normalization values Feature_1 lie between [0, 1]. Similarly, normalized data of the other five properties.
After normalization and data preparation for use, it is necessary to determine how the algorithm responds to the input data array. The input data and the response to them then train the algorithm to classify new data.
To construct a vector of the output response are first assigned to integers for each activity: -1, 0, 1, 2, 3 — the descent down the rest, climbing stairs, walking and running, respectively. As the desired response for each set of input properties that are created by a column vector (containing these integers) of the same length as the response vector. To detected activities is easy to read, the response vector is converted into a an array of categories with values "Going Downstairs", "Idling", "Climbing Upstairs", "Walking", "Running" and "Transition":
the
Generate an array of responses above, the K-NN algorithms trained to create models. To do this, use the FITCKNN of Statistics and Machine Learning Toolbox; after several attempts for the app was selected the value of 'NumNeighbors', K equal to 30, so has it provided the necessary performance and accuracy of detection. Received model using the training data, it is necessary to confirm on new data coming from the phone.
the
For this was created by the extractFeatures function for the calculation of the six properties required for classification. Calculated properties (stored in a new variable Feature) then used together with the model for recognition of activities: the model was trained to distinguish the 5 species. A natural question arises: what to do if a detected activity is different from them?
the
The detector will continue to classify the activities in the current discovery window, and assign small probabilities for each of the 5 categories. The same small probability may occur during the transition from one activity to another within the same window of detection is a more difficult classification. To avoid such incorrect detection, the predictor is implemented to issue a "Transition" when the probability of prediction is less than 95%.
The same rule applies when changing from one classified activities to other, for example, from walking ("Walking") to running ("Running"). This is reasonable, because on successive Windows of detection values of the properties in such a transition can be unstable.
Below is a plot of the original data collected from the phone for a minute, and the detected action. For convenience, the recognized line of business is the bottom and it is marked as:
x — walk (Walking), * run (Running), o — standstill (Idling), — climbing stairs (Walking Upstairs) — the staircase (Walking Downstairs), • — transition (Transition):
Please note that for training the machine learning algorithm uses the raw accelerometer data for a variety of actions performed. If you use the learning algorithm with the finished data from the archive (see link at end of article), the detection algorithm for data collected from your phone is likely not accurate. Even if you place the phone, as the author, in the right front pants pocket, all the same, the algorithm will not be accurate. This is due to the difference in the gaits, based on measurements of height and weight of a particular person.
To create a activity detector that will accurately determine your actions, it is necessary to start by gathering multiple sets of data of the accelerometer of the phone for those types of activities that you want to recognize. Then, for each data set to extract the 6 properties listed above, with the help of training extractTrainingFeatures. Then use the retrieved properties to train a machine learning algorithm. Now we can use this algorithm to recognize your new action.
Fasting is considered only one of many possible applications of the resulting application. This application can be applied to any other recognition system, for example, for a mobile robot. You can Supplement the application with the data from GPS sensors, gyroscopes, and magnetometers, for more functional tracker.
"Code example
Article based on information from habrahabr.ru
This post will talk about how to use the Android device is paired with machine learning algorithms and MATLAB Statistics and Machine Learning Toolbox for tracking physical activity in real time.
the
Installation and configuration
We will need:
the
-
the
- computer with MATLAB R2014b or later, installed Sensors support Android, and Statistics and Machine Learning Toolbox; the
- Android mobile app MATLAB Mobile; the
- Internet connection on the smartphone.
Connection of smartphone and MATLABа is done using the MATLAB Connector, the plugin can be downloaded the link.
More information about collecting data from the sensors of smartphones tut.
the
Introduction
Watching someone we easily identify what the person is busy, even if we see it for the first time. We default know how to recognize human activities. The brain compares the observed action with thousands of previously seen and gives the necessary coincidence. Similarly, the computer (the phone) recognizes the actions that he taught.
Using machine learning algorithms, it is possible to teach a computer to recognize specific human activities, and to improve this ability as new information becomes available. Such recognition, including data split into separate "classes", called classification. Another example of classification — formulation of patient diagnosis based on the presence of certain symptoms.
The classification algorithm for such problems is applied in two stages: training and detection. During the training phase constructs a model that sorts training data according to certain categories. During the discovery phase of the new data are tied to the already existing categories.
The app used the acceleration sensor (accelerometer) which determines the activity. Was selected by the classifier based on K-nearest neighbor (KNN). Is a convenient algorithm for this application since it quickly detects movement and is very accurate in working with data of low dimensionality (small set of properties). Based on the majority of the values of its K-nearest neighbors in the training dataset, it detects the category to which the point belongs in the new data.
The process of recognition of movements was done in three stages:
-
the
- data collection: with the accelerometer of the Android device was about the values of the accelerations in three dimensions; the
- retrieving properties for each of the monitored activity from accelerometer readings were extracted and identified distinguishing properties; the
- classification of activity: to train a classifier that used the properties extracted for different activities. Then the classifier was used on the new accelerometer readings to determine which movements occurred.
the
data Collection
Since the detection was carried out by the classifier, for the further work he had to learn on the set of pre-known data points. MATLAB Mobile in combination with the MATLAB support package for Android sensors (MATLAB Support Package for Android Sensors) allows to collect the device's accelerometer and send the measurement to a session of MATLAB on a PC.
After a connection is established with Android, was created an instance of an object "mobiledev" to write data from device sensors. Then the accelerometer on the device were included, and MATLAB started collecting data.
the
mobileSensor = mobiledev() % create mobiledev object
mobileSensor.AccelerationSensorEnabled = 1; % enable accelerometer
mobileSensor.start; % start sending data
After executing the last command I did not move for 10 seconds. Then he stood up and walked next 70 seconds. Then came to the stairs, climbed down and ran for approximately 60 seconds. After another 70 seconds walked, then climbed the stairs and returned to the office. In the end I sat down and didn't move. From the accelerometer I got a recorded three-dimensional data and visualize them.
the
[accel time] = accellog(mobileSensor); % acquire data from logs
plot(time, accel); % plot data
The chart below contains data on all actions: immobility, walking, running, ascent and descent of stairs. As you can see, they are impossible to detect just by looking at the graph. So it is required to isolate and identify those properties that will help to recognize each action and to distinguish it from the others.
the
Retrieving properties
Although time-domain raw readings of the accelerometer look the same for each activity, in fact, they contain unique characteristics that you can use in order to identify different kinds of movements. This can be, for example, the maximum value of all data points or the number of data points outside of any range. Such characteristics are called properties.
Because of these properties can be classified. You can see a lot of properties:
the
-
the
- average value the
- median the
- dispersion the
- high, the
- at least the
- the value of the frequency component and so on.
However, to effectively remove them, it is necessary to find a minimal set of properties that allow to distinguish between different activities and not be too resource intensive.
Of all the possible properties, the following six were the most suitable. They are marked "Feature_N":
-
the
- Feature_1: mean value of the data. the
- Feature_2: the square of the sum of the values of data below the 25 percentile. the
- Feature_3: the square of the sum of the values of data below 25 percentile. the
- Feature_4: the frequency Peak in the spectrum data on the Y-axis below 5 Hz. the
- Feature_5: the Number of peaks in the spectrum data along the Y-axis below 5 Hz. the
- Feature_6: Integral spectrum of the data along the Y-axis from 0 to 5 Hz.
The value of data is the square root of the sum of the squares of the readings of the acceleration along the Y-axis and z-axis Readings X-axis can be ignored because they are not very different for different types of activities is associated with the position of the phone in his pocket. Of 5-second segments of recording each activity were recovered corresponding to six properties. The measurement window of 5 seconds is chosen because of the length sufficient to secure consistent and stable properties. The length of the window can be varied, bearing in mind that in Windows too long (about a minute or more), the probability of error due to possible change of activity.
Calculating, in the procedure of learning the above properties for each type of activity, the algorithm asked two input properties and appropriate response (i.e. an activity which belongs to these properties). A volatile function is calculated on the basis of "raw" three-dimensional measurements of the accelerometer during running and determines the size of the above-mentioned properties. For example:
the
feature = [30, 15, 7.6, 2.3, 5, 8];
activity = 'running';
Here the variable function looks like: [Feature_1, Feature_2, Feature_3, Feature_4, Feature_5, Feature_6]. Each pair of "property-action" is 1 point of the training data. To learn more about how to obtain these points, the training data is possible through the MATLAB script recordTrainingData in the attached files. Initial readings of the accelerometer for each activity is stored in a MAT-file. To avoid collecting data during the change activities, added prompts between each activity. This ensures that the source data for the allocation of properties each activity is clean and consistent.
To extract properties from the raw data stored in each MAT-file, use extractTrainingFeature from the downloaded archive. Please note that the original accelerometer data discretized with a frequency of about 200 Hz, but the frequency may be much less due to the nature of work Android. In addition, the sampling rate may change during the measurement, thereby making the data unevenly discrete.
To bring the data to a uniform sampling rate, the algorithm "resampling implementation". The algorithm has helped to improve identification of properties and further classification.
In the chart below, the red line marked with the 'x' indicates uneven discrete initial data Y-axis accelerometer, and the blue line marked with the 'o' stands for processed data. Note that the first three properties listed above are in the time domain, while the rest in frequency.
From the following graphs it is seen that the properties of different types of activities are grouped together (e.g. all the red data points correspond to the properties associated with a run). This character of properties (different clusters for different types of activity) can accurately determine the activity for new indications of the accelerometer
the
Classification of activities
Typically, the learning algorithm requires many points of the training data to build reliable models of recognition. To train the classifier, collected thousands of data points for each activity. For starters, the properties are grouped into an array in the following order — walking, running, resting position, lifting up and lowering down:
the
data = [featureWalk; featureRun; featureIdle; featureUp; featureDown];
The code line above, featureWalk is an array of six properties with a dimension of 1000×6. Properties designed for the original accelerometer data collected during walking. Similarly, featureRun is an array of dimension 1000×6 of six properties calculated on the raw accelerometer data collected during the run, and so on.
When the properties for all activities were collected into a single dataset, it was found that the running properties of larger properties walking or immobility. Because of this, properties of different types of activities changed and affect the ability of the algorithm to accurately determine the activity according to new data (which may be scaled differently). Therefore, it is necessary to normalize the data to limit the range of values from 0 to 1:
The chart above illustrates the original values Feature_1 designed for all types of activities, as well as normalized values. As can be seen, after the normalization values Feature_1 lie between [0, 1]. Similarly, normalized data of the other five properties.
After normalization and data preparation for use, it is necessary to determine how the algorithm responds to the input data array. The input data and the response to them then train the algorithm to classify new data.
To construct a vector of the output response are first assigned to integers for each activity: -1, 0, 1, 2, 3 — the descent down the rest, climbing stairs, walking and running, respectively. As the desired response for each set of input properties that are created by a column vector (containing these integers) of the same length as the response vector. To detected activities is easy to read, the response vector is converted into a an array of categories with values "Going Downstairs", "Idling", "Climbing Upstairs", "Walking", "Running" and "Transition":
the
Down = -1 * ones(length(featureDown), 1);
Idle = zeros(length(featureIdle), 1);
Up = ones(length(featureUp), 1);
Walk = 2 * ones(length(featureWalk), 1);
responseVector = [Walk; Run; Idle; Up; Down]; % building the output response vector
valueset = [-1:3, -10];
cateName = {'Going downstairs', 'Idling', 'Climbing upstairs', 'Walking', ...
'Running', 'Transition'};
response = categorical(responseVector, valueset, ...
cateName); % converting to a categorical array
Generate an array of responses above, the K-NN algorithms trained to create models. To do this, use the FITCKNN of Statistics and Machine Learning Toolbox; after several attempts for the app was selected the value of 'NumNeighbors', K equal to 30, so has it provided the necessary performance and accuracy of detection. Received model using the training data, it is necessary to confirm on new data coming from the phone.
the
mdl = fitcknn(data, response);
mdl.NumNeighbors = 30;
For this was created by the extractFeatures function for the calculation of the six properties required for classification. Calculated properties (stored in a new variable Feature) then used together with the model for recognition of activities: the model was trained to distinguish the 5 species. A natural question arises: what to do if a detected activity is different from them?
the
newFeature = [0.15, 0.28, 0.2, 0.35, 0.65, 0.7]; % features for the new activity
result = predict(mdl,newFeature); % predicting the activity
The detector will continue to classify the activities in the current discovery window, and assign small probabilities for each of the 5 categories. The same small probability may occur during the transition from one activity to another within the same window of detection is a more difficult classification. To avoid such incorrect detection, the predictor is implemented to issue a "Transition" when the probability of prediction is less than 95%.
The same rule applies when changing from one classified activities to other, for example, from walking ("Walking") to running ("Running"). This is reasonable, because on successive Windows of detection values of the properties in such a transition can be unstable.
Below is a plot of the original data collected from the phone for a minute, and the detected action. For convenience, the recognized line of business is the bottom and it is marked as:
x — walk (Walking), * run (Running), o — standstill (Idling), — climbing stairs (Walking Upstairs) — the staircase (Walking Downstairs), • — transition (Transition):
Please note that for training the machine learning algorithm uses the raw accelerometer data for a variety of actions performed. If you use the learning algorithm with the finished data from the archive (see link at end of article), the detection algorithm for data collected from your phone is likely not accurate. Even if you place the phone, as the author, in the right front pants pocket, all the same, the algorithm will not be accurate. This is due to the difference in the gaits, based on measurements of height and weight of a particular person.
To create a activity detector that will accurately determine your actions, it is necessary to start by gathering multiple sets of data of the accelerometer of the phone for those types of activities that you want to recognize. Then, for each data set to extract the 6 properties listed above, with the help of training extractTrainingFeatures. Then use the retrieved properties to train a machine learning algorithm. Now we can use this algorithm to recognize your new action.
Fasting is considered only one of many possible applications of the resulting application. This application can be applied to any other recognition system, for example, for a mobile robot. You can Supplement the application with the data from GPS sensors, gyroscopes, and magnetometers, for more functional tracker.
"Code example
Комментарии
Отправить комментарий