This paper proposes a framework to recognize hand poses using a limited number of landmarks from images. This Hand Pose Recognition (HPR) system is composed of a signal processing module that extracts and processes the coordinates of specific points of the hand called landmarks, and a deep neural network module that models and classifies the hand poses. These specific points or landmarks are extracted automatically through MediaPipe software. Detecting hand poses from these points has two main advantages compared to traditional computer vision approaches: the information sent to the recognition module is smaller (points’ coordinates vs. a full image) and the classification is not affected by additional information included in the images (like the background). The experiments were carried out over two different datasets using the experimental setups of previous works. The proposed framework was able to obtain better performance than the best results reported in previous works. For example, in case of using the Tiny Hand Gesture Recognition Dataset, we obtained classification accuracies of 98.74 ± 0.08 % and 98.22 ± 0.06 % with simple or complex backgrounds, while the best reported accuracies in previous works (using the whole image) were 97.10 % and 85.30 % respectively. The proposed solution is able to provide high recognition performance independently of the background where the image is taken
This paper proposes a framework to recognize hand poses using a limited number of landmarks from images. This Hand Pose Recognition (HPR) system is composed of a signal processing module that extracts and processes the coordinates of specific points of the hand called landmarks, and a deep neural network module that models and classifies the hand poses. These specific points or landmarks are extracted automatically through MediaPipe software. Detecting hand poses from these points has two main advantages compared to traditional computer vision approaches: the information sent to the recognition module is smaller (points’ coordinates vs. a full image) and the classification is not affected by additional information included in the images (like the background). The experiments were carried out over two different datasets using the experimental setups of previous works. The proposed framework was able to obtain better performance than the best results reported in previous works. For example, in case of using the Tiny Hand Gesture Recognition Dataset, we obtained classification accuracies of 98.74 ± 0.08 % and 98.22 ± 0.06 % with simple or complex backgrounds, while the best reported accuracies in previous works (using the whole image) were 97.10 % and 85.30 % respectively. The proposed solution is able to provide high recognition performance independently of the background where the image is taken Read More