Software Design

The purpose of the software subsystem was to process and classify forearm EMG signals into discrete gesture commands.

Overview

The purpose of the software subsystem was to process and classify forearm EMG signals into discrete gesture commands. It is responsible for ingesting raw sensor data, organizing it into temporally structured inputs, and interfacing with the trained Temporal Convolutional Network to perform gesture inference. In addition to model execution, the software subsystem manages data preprocessing, train-validation splitting, performance evaluation (including confusion matrix generation), and result visualization. Together, these components enable reliable interpretation of muscle activity and serve as the computational bridge between the physical sensing hardware and higher-level control or interaction logic.

ML Code

TDNN

Time-delay neural networks are feedforward models that incorporate temporal context by concatenating time-shifted inputs, rather than explicitly modeling state over time. While computationally simple, this approach is generally less robust for complex temporal patterns and less flexible than more modern sequence models.

LSTM

LSTMs, a widely used form of recurrent neural network, were considered due to their strong ability to capture temporal dependencies and their extensive support in machine-learning frameworks such as PyTorch and TensorFlow. However, LSTMs rely on sequential processing, which makes them comparatively slower and less efficient on resource-constrained hardware. Although LSTMs are particularly well suited for forecasting tasks, the EMG signals in this project are relatively consistent over short time windows, reducing the need for long-term temporal memory.

TCN

Temporal Convolutional Networks combine the strengths of convolutional models with effective temporal modeling. TCNs use causal, dilated convolutions to capture temporal dependencies while allowing for parallel computation, making them significantly faster and more efficient than recurrent models.

Final Choice

We selected a TCN because prior work shows they often match or slightly outperform LSTMs on sequence-classification tasks such as physiological signal analysis while maintaining lower inference latency. Even when performance gains are marginal, the improved computational efficiency and suitability for embedded deployment make TCNs the best fit for this project.

TCN Architecture

model = TCN(
  num_inputs = 3, # 3 sensors
  num_channels = [32, 32, 64],
  kernel_size = 4, 
  dilations = None, 
  dilation_reset = None, 
  dropout = 0.1, 
  causal = True, 
  use_norm = 'weight_norm', 
  activation = 'relu', 
  kernel_initializer = 'xavier_uniform', 
  use_skip_connections = False, 
  input_shape = 'NCL', 
  embedding_shapes = None, 
  embedding_mode = 'add', 
  use_gate = False,  
  lookahead = 0, 
  output_projection = num_classes, 
  output_activation= None  
)

Explanation of the Architecture

num_inputs = 3

Specifies the number of input channels per time step. In this project, the model receives EMG data from three forearm sensors, each treated as a separate input channel.

num_channels = [32, 32, 64]

Defines the number of convolutional filters in each temporal block of the network. The increasing channel depth allows the model to learn progressively higher-level temporal features from the EMG signals.

kernel_size = 4

Sets the temporal width of the convolutional kernels, meaning each filter processes windows of four consecutive time steps to capture short-term temporal dependencies in the signal.

dilations = None

Uses the default exponentially increasing dilation factors across layers, enabling the network to model long-range temporal dependencies without increasing kernel size.

dilation_reset = None

Indicates that dilation factors are not periodically reset and instead follow the default dilation progression through the network depth.

dropout = 0.1

Applies a 10% dropout rate during training to reduce overfitting by randomly disabling neurons and encouraging more robust feature learning.

causal = True

Enforces causal convolutions so that predictions at a given time step depend only on past and present inputs, which is essential for real-time EMG gesture recognition.

use_norm = 'weight_norm'

Applies weight normalization to stabilize training by decoupling the magnitude and direction of weight vectors, improving convergence behavior.

activation = 'relu'

Uses the Rectified Linear Unit activation function, introducing nonlinearity while maintaining computational efficiency.

kernel_initializer = 'xavier_uniform'

Initializes convolutional weights using Xavier uniform initialization, helping maintain stable signal variance across layers at the start of training.

use_skip_connections = False

Disables residual skip connections between layers. While skip connections can improve gradient flow, they were not used in this configuration to keep the architecture simpler.

input_shape = 'NCL'

Specifies the input tensor format as (batch size, number of channels, sequence length), which is appropriate for time-series EMG data.

embedding_shapes = None

Indicates that no additional learned embeddings (e.g., for categorical metadata) are incorporated into the model.

embedding_mode = 'add'

Defines how embeddings would be combined with inputs if present; here, embeddings would be added element-wise, though none are used.

use_gate = False

Disables gated activations (as in gated TCNs). While gating can improve expressiveness, it increases computational cost and was not required for this task.

lookahead = 0

Specifies zero lookahead, meaning the model does not access future time steps. This setting is deprecated but reinforces strict causality.

output_projection = 5

Projects the final network output to five classes, corresponding to the five hand gesture categories used in the classification task.

output_activation = None

Applies no activation function at the output layer, producing raw logits that are later processed by a loss function such as softmax cross-entropy during training.

Confusion matrix for EMG gesture classification

Confusion matrix summarizing the performance of the EMG gesture classification model across all CSV files. Rows correspond to the true gesture labels and columns to the predicted labels. The model demonstrates near-perfect classification for active gestures, achieving 100% accuracy for full_fist, open_hand, and true_rest, and 90% accuracy for air_pinch, with a single misclassification as open_hand. The primary source of error arises from confusion between rest and true_rest, where most rest samples are predicted as true_rest, indicating highly similar EMG signatures between these two classes. This suggests that rest and true_rest are likely too similar in terms of forearm muscle activation—differing mainly in whether the arm is supported on a flat surface—and therefore may have been more appropriately grouped into a single class.

The confusion matrix was generated by evaluating the trained EMG gesture classification model on a held-out validation set created using an 80/20 split of the available training data. Specifically, 80% of the labeled EMG samples were used to train the model, while the remaining 20% were reserved for evaluation to assess generalization performance on unseen data. After training, the model’s predictions on this validation subset were compared against the corresponding ground-truth labels, and the resulting counts of correct and incorrect classifications were aggregated into the confusion matrix.

Next Steps

Code Architecture

.
├── README.md
├── activate_scripts
│ └── …
├── data
│ ├── air_pinch
│ │ └── …
│ ├── full_fist
│ │ └── …
│ ├── open_hand
│ │ └── …
│ ├── rest
│ │ └── …
│ └── true_rest
│ └── …
├── gpu-requirements.txt
├── ml
│ ├── checkpoints
│ │ └── best_model.pt
│ ├── confusion_matrix.py
│ ├── model.py
│ ├── quantize_tcn.py
├── python
│ ├── clean_data.py
│ ├── main.py
│ └── read_sensor_values.py
└── requirements.txt