# ISL-translator
## Abstract
Sign language is the language of the deaf and mute. However, this particular population of the
world is unfortunately overlooked as sign language is not understood by the majority hearing population. In
this paper, an extensive comparative analysis of various gesture recognition techniques involving convolutional
neural networks and machine learning algorithms have been discussed and tested for real-time accuracy. Three
models: a pre-trained VGG16 with fine-tuning, VGG16 with transfer learning and a hierarchical neural network
were analysed based on number of trainable parameters. These models were trained on a self-developed dataset
consisting images of Indian Sign Language (ISL) representation of all 26 English alphabets. The performance
evaluation was also based on practical application of these models, which was simulated by varying lighting and
background environments. Out of the three, the Hierarchical model outperformed the other two models to give the
best accuracy of 98.52% for one-hand and 97% for two-hand gestures. Thereafter, a conversation interface was
built in Django using the best model (viz. hierarchical neural networks) for real-time gesture to speech conversion
and vice versa. This publicly accessible interface can be used by anyone who wishes to learn or converse in ISL.
![alt text](https://github.com/yatharth77/ISL-translator/blob/master/isl.png)
## Dataset
The dataset used for this work was based on ISL. According
to the best of the knowledge of the authors, there does not
exist an authentic and complete dataset for all the 26 alphabets of English language for ISL. Our dataset was manually
prepared by clicking various images of each finger-spelled
alphabet and applying different forms of data augmentation
techniques. At the end, the dataset contained over 1,50,000
images of all 26 categories. There were approximately 5,500
images of each alphabet. To keep the data consistent, the
same background was used for most of the images. Also, the
images were clicked in different lighting conditions to train a
robust model resistant of any such changes in the surroundings. The images in this dataset were clicked by a Redmi
Note 5 Pro, 20 megapixel camera. All the RGB images were
resized to 144×144 pixels per image so as to remove the possibility of varying sizes. Fig.2 shows a few sample images from this dataset.
![alt text](https://github.com/yatharth77/ISL-translator/blob/master/dataset.PNG)
## Methodology
In this section, we would discuss the architectures of various self-developed and pre-trained deep neural networks,
machine learning algorithms and their corresponding performances for the task of hand gesture to audio and audio to
hand gesture recognition. The complete implementation was
done on Keras using Tensorflow as the backend. A pictorial
overview of our entire framework is presented in Fig. 1. The
three individual models are briefly discussed as follows.
![alt text](https://github.com/yatharth77/ISL-translator/blob/master/flow.PNG)
• **Pre-trained VGG16 Model**: Under this approach, the
gestures were classified using a pre-trained VGG16
model based on the Imagenet dataset. We truncated
its last layer and then added custom designed layers to
provide a baseline comparison with the state of the art
networks.
• **Natural Language Based Output Networks**: For this
model, a Deep Convolutional Neural Network (DCNN)
with 26 categories was developed. Later, the output
was fed to an English Corpora based model for eradicating any errors during classification. This process
was based on the probability of the occurrence of the
particular word in the English vocabulary. Moreover,
only the top-3 accuracy scores provided by the neural
network was considered in this model.
• **Hierarchical Network**: Our final approach comprises
of a novel hierarchical model for classification which
resembles a tree-like structure. It involves initially classifying gestures into two categories (one-hand or twohand), and subsequently feeding them into further deep
neural networks. The corresponding outputs were utilized for categorizing them into the 26 English alphabets.
## Experimental Results
All the three networks were tested on our dataset of around
1,49,568 images. The testing dataset consists of hand images
clicked on a black background. The images were augmented,
resized and pre-processed as per the network requirements
mentioned earlier. The pre-trained VGG16 model with transfer learning produced an accuracy of 53% and the fine-tuned
model resulted in an accuracy of 94.52%. The reason for low
accuracy in VGG16 can be attributed to the fact that it was
initially trained for 1000 categories. However, the present
model works on 26 hand gestures and out of them, quite a
few are very similar to one another. The hierarchical model
resulted in a training loss of 0.0016, thus resulting in a training accuracy of 99% and a validation accuracy of 98.52% for
categorising one-hand features and 99% training accuracy
and 97% validation accuracy for two-hand features respectively. The SVM used in hierarchical model for categorising into one-hand and two-hand gestures, combined with the
HOG features, produced and accuracy of 96.79%. This result
is significantly better than any other machine learning algorithm for the 26 classes. The confusion matrix in Fig. 6 segregates the result obtained on the 6085 test samples into truepositives, true-negatives, false-positives and false-negatives.
![alt text](https://github.com/yatharth77/ISL-translator/blob/master/coil.PNG)
![alt text](https://github.com/yatharth77/ISL-translator/blob/master/coil_out1.PNG)
## Algorithm for formation of Valid English Words from given sequence of alphabets as input
The natural language based output network was developed
for rectifying errors made by the CNN model. The main
motive of this model is to correct the falsely predicted outcomes during ISL-conversation. Thus, a misspelled word
can be corrected by using an algorithm that takes into account the possible words in the English language that can be
formed by the predicted alphabets via intelligently changing a letter or two. Such algorithms are useful in practical terms to overcome the flaws of CNN. A 13-layer CNN
was developed which received these images, with their pix-
5
els scaled between -1 and +1. The neural net was a simple
network comprising of 3×3 convolutional filters followed by
max-pooling. The latter layers consisted dropout (0.3-0.4)
and batch normalisation for avoiding any overfitting. Adam
optimiser with a learning rate of 0.0002 was used to minimize the categorical cross-entropy loss function. The softmax layer provided output as 26 probabilities, each corresponding to the output being that particular alphabet. Exploiting this characteristic of the softmax layer, we calculated
the total probability for a given word, which is a collection
of alphabets as the sum of all the probabilities of the highest
predicted output for that alphabet.
For example, if the word that a user inputs alphabet of
‘cat’, then for each letter ‘c’, ‘a’ and ‘t’, the probabilities for
the top-3 predicted letters will be saved. This will be the
overall probability of the word being ‘cat’. Now, if the output probabilities that the CNN provided with respect to each
letter corresponded to ‘cet’, then this word will be searched
through a corpora of length=3 in the English dictionary. If no
such word exists, it will change the letters (one at a time) by
the next highest probable letter, and check it again in the dictionary. If such a word exists, then it is stored along with it’s
total probability. The model output the word with the highest
probability belonging in the English dictionary as the final
prediction. This model works on the idea that if a user wants
to converse in finger-spelled ISL, he/she is likely to depict
a word that exists in the English dictionary (apart from unu
没有合适的资源?快使用搜索试试~ 我知道了~
印度手语手势识别_HTML_Python_下载.zip
共396个文件
jpg:92个
wav:76个
pyc:50个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 129 浏览量
2023-05-01
11:46:32
上传
评论
收藏 55.9MB ZIP 举报
温馨提示
印度手语手势识别_HTML_Python_下载.zip
资源推荐
资源详情
资源评论
收起资源包目录
印度手语手势识别_HTML_Python_下载.zip (396个子文件)
settings.cpython-36.pyc.54458832 2KB
voice_2014.bin 8KB
voice_7458.bin 7KB
creative.min.css 200KB
tp.css 4KB
main_page.css 3KB
index.css 2KB
about_project.css 1KB
Emergency.css 853B
main_page2.css 315B
load.gif 2.72MB
emergif.gif 1.03MB
personal issue.gif 641KB
no water.gif 564KB
urgent meet.gif 549KB
money lost.gif 507KB
lost location.gif 423KB
crime.gif 383KB
i need money.gif 368KB
amulance.gif 346KB
robbed.gif 334KB
bills.gif 330KB
fire.gif 311KB
bullied.gif 302KB
accident.gif 216KB
.gitattributes 173B
.gitattributes 128B
fintwo_handVGG.h5 134B
one_hand144.h5 134B
about_team.html 14KB
gest_keyboard.html 14KB
Emergency.html 10KB
home.html 8KB
login.html 7KB
instructions.html 3KB
about_project.html 3KB
register.html 2KB
index.html 2KB
O_BTP.jpeg 132KB
B_resized.jpeg 30KB
A_resized.jpeg 29KB
X_resized.jpeg 28KB
S_resized.jpeg 27KB
W_resized.jpeg 26KB
P_resized.jpeg 26KB
Y_resized.jpeg 25KB
K_resized.jpeg 25KB
O_resized.jpeg 25KB
F_resized.jpeg 25KB
Q_resized.jpeg 25KB
E_resized.jpeg 25KB
D_resized.jpeg 25KB
H_resized.jpeg 24KB
R_resized.jpeg 24KB
V_resized.jpeg 24KB
J_resized.jpeg 24KB
N_resized.jpeg 24KB
Z_resized.jpeg 24KB
M_resized.jpeg 23KB
T_resized.jpeg 22KB
C_resized.jpeg 21KB
I_resized.jpeg 21KB
G_resized.jpeg 21KB
L_resized.jpeg 20KB
U_resized.jpeg 20KB
8.jpeg 14KB
7.jpeg 13KB
4.jpeg 13KB
5.jpeg 13KB
3.jpeg 13KB
6.jpeg 13KB
2.jpeg 12KB
9.jpeg 12KB
0.jpeg 12KB
1.jpeg 11KB
login3.jpg 1.56MB
bg.jpg 242KB
login2.jpg 195KB
login1.jpg 181KB
B.jpg 114KB
A.jpg 109KB
X.jpg 103KB
S.jpg 101KB
W.jpg 97KB
P.jpg 96KB
Q.jpg 96KB
K.jpg 95KB
F.jpg 95KB
D.jpg 94KB
Y.jpg 94KB
O.jpg 94KB
E.jpg 94KB
H.jpg 93KB
R.jpg 93KB
J.jpg 90KB
V.jpg 90KB
Z.jpg 89KB
N.jpg 89KB
M.jpg 87KB
T.jpg 86KB
共 396 条
- 1
- 2
- 3
- 4
资源评论
快撑死的鱼
- 粉丝: 1w+
- 资源: 9154
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 206693250008_R01C02_Grn.idat
- 瞳孔跟踪-基于OpenCV+网络摄像头的瞳孔跟踪算法实现-附项目源码+流程教程-优质项目分享.zip
- ModStartCMS v8.4.0 框架稳定性持续迭代,修复部分已知问题
- bleder 教室学校学生教育室办公室考试
- 人脸检测-使用OpenCV实现的动漫+漫画人脸检测算法-附项目源码-优质项目实战.zip
- 道路贴图,材质材料免费
- 人脸检测-基于OpenCV+Node.js+WebSockets实现的实时人脸检测应用-附项目源码-优质项目实战.zip
- 一些常见的MySQL死锁案例-mysql-deadlocks-master(源代码+案例+图解说明)
- UE4动画烘焙器-ue4.27
- 新建文件夹.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功