A Deep Neural Framework for Continuous Sign Language Recognition开题报告

 2021-11-05 07:11

1. 研究目的与意义(文献综述包含参考文献)

1.Introduction: Sign language is computer vision based intact intricate language that engages signs shaped by hand moments together with face expression and hand shapes. signing is a language for communication among people with low or no hearing sense.Sign language is employed by the hearing-impaired people for his or her communication. Using signing, we will communicate letters words or maybe sentences of general speech by using different hand signs and different hand gestures. this kind of communication helps hearing impaired people to speak or express their views. These sort of systems bridge or channel between normal people and hearing-impaired people. Human speech captured in digital format generates a 1D signal for processing whereas human signing generates 2D signals from image or video data. Classification of gestures are often identified as both static and dynamic. Static gestures involve a time invariant finger orientation whereas dynamic gestures support a time varying hand orientations and head positions. The proposed four camera model for signing Recognition (SLR) is a computer vision-based approach and doesn't employ motion or colored gloves for gesture recognition. Efficient sign language recognition systems require knowledge of feature tracking and hand orientations. Researchers during this field approached gesture classification in two major ways namely glove based and vision based. Other methods used frequency gloves to tackle the matter. the tactic is less complicated and fast to implement on computer devices with complex hardware problems to implement. Computer vision requires no electronic hardware where advanced image processing algorithms can do hand shape matching and hand tracking on the captured video data. The missing attributes in glove based approach like facial expressions and sign articulation are handled effectively using computer vision algorithms. Precision inhibits the usability of computer vision techniques making it a resourceful research field.2.Process: This research introduces a completely unique method to bring video-based language closer to real time application. Pre-filtering, segmentation and feature extraction on video frames creates a sign language feature space. Artificial Neural Network (ANN) classifiers also as Minimum Distance (MDC) on the sign feature environment should be well trained and tested repeatedly. Sobel edge operator's (SEO) power is enhanced with morphology and adaptive thresholding giving a near perfect segmentation of hand and head portions are adaptable to the applied picture of the camera. The Word Matching Score (WMS) gives the performance of the proposed method with a mean WMS of around 85.58% for MDC and 90% for ANN with a little variation of 0.3 s in classification times. Neural network classifiers with fast training algorithms will definitely make this novel of signing a recognized application. Most of SLR systems concentrate on putting an easy constant background with signers shirt matching the background. In cluttered video backgrounds tracking hands has become simpler but tracking each finger movements remains quite challenging task. Here researchers believe 3Dimensional body centered space of the signer may be used effectively for extracting finger movements. The 3D locations of fingers are referenced by setting points of knuckles in space. Creating the 3D points as spatial domain information for hand tracking and hand shaping is challenging for computer vision engineers.3.Methodolgy: To make sign language recognition system, we introduce a selfie signing recognition system capturing signs employing a smart phone front camera. The signer holds the selfie stick in one hand and signs together with their other hand. A sentence in signing is recorded employing a camera and therefore the obtained video is split into several frames. Each register a frame taken out of a collection of frames is processed and therefore the features are extracted so the features apply for nearby preceding and succeeding frames. The input sign video is processed for corresponding text or voice outputs for normal people to know a hearing-impaired person without the necessity for an interpreter. More research is often concerned on selfie-based signing recognition with real time constraints like non-uniform background, varied lighting and signer independence, to form the system more independent. Basic signing system supported by the 5 parameters and that they are hand and head recognition, hand and head orientation, hand movement, shape of hand and site of hand and head (depends abreast of back ground). Among the five parameters there are two parameters which are most vital and that they are hand and head orientation and hand movement during a particular direction. These systems help in recognizing the sign languages with better accuracy. Hand shapes and head are segmented and acquire feature vectors these feature vectors which are classified and given to neural networks for training.Two major problems surfaced during implementation: Phase one: The signs are preferably single handed and therefore the other video background variations thanks to the movement of selfie stick within the hand of the signerPhase two: The background of the signer concerning the contrast of the light where signer is present

2. 研究的基本内容、问题解决措施及方案

Purpose:The major task in sign language recognition is signer identification, hand shape extraction, hand position extraction, signer facial expressions, body posture of the signer. For an excellent realizable recognition system, the above five attributes are to be inputted to the system. Our research mainly deals with selfie videos of sign language which are divided into frames for processing. The proposed system concentrates on segmenting hand and head from the given set of frames probably for various signs in the video and features are extracted for various hand and head models. In Selfie SLRs people can communicate with the help of their phones by recording their sign video and sending it to the other person. The sent video will be decoded according to the system we designed and the signs are inevitably converted to text.Image PreprocessingSegmentation:The main objective of the segmentation phase is to remove the background and noises, leaving only the Region of Interest (ROI), which is the only useful information in the image. This is achieved via Skin Masking defining the threshold on RGB schema and then converting RGB colour space to grey scale image. Finally, Canny Edge technique is employed to identify and detect the presence of sharp discontinuities in an image, thereby detecting the edges of the figure in focus. Feature Extraction:The Speeded Up Robust Feature (SURF) technique is used to extract descriptors from the segmented hand gesture images. SURF is a novel feature extraction method which is robust against rotation, scaling, occlusion and variation in viewpoint. ClassificationThe SURF descriptors extracted from each image are different in number with the same dimension (64). However, a multiclass SVM requires uniform dimensions of feature vector as its input. Bag of Features (BoF) is therefore implemented to represent the features in histogram of visual vocabulary rather than the features as proposed. The descriptors extracted are first quantized into 150 clusters using K-means clustering. Given a set of descriptors, where K-means clustering categorizes numbers of descriptors into K numbers of cluster center.The clustered features then form the visual vocabulary where each feature corresponds to an individual sign language gesture. With the visual vocabulary, each image is represented by the frequency of occurrence of all clustered features. BoF represents each image as a histogram of features, in this case the histogram of 24 classes of sign languages gesturesWorkflow: Bag of Features modelFollowing Steps are followed to achieve this: The descriptors extracted are first clustered into 150 clusters using K-Means clustering. K-means clustering technique categorizes m numbers of descriptors into x number of clusters center. The clustered features form the basis for histogram i.e. each image is represented by frequency of occurrence of all clustered features. BoF represents each image as a histogram of features, in our case the histogram of 24 classes of sign language is generated.Analysis:In sign language recognition process, we will be using CNN LSTM, 3D CNN, GCN and their variants. In this project, we aim towards analyzing and recognizing various alphabets from a database of sign images. Database consists of various images with each image clicked in different light condition with different hand orientation. With such a divergent data set, we are able to train our system to good levels and thus obtain good results.Working process:We investigate different machine learning techniques like Support Vector Machines (SVM), Logistic Regression, K-nearest neighbors (KNN) and a neural network technique Convolution Neural Networks (CNN) for detection of sign language.Our objectives during this process will be: validation load pretrained model confusion matrix/visualize error output seq-to-seq learning attention mechanism CTC lossTo begin the project, we will need: a Dataset of images to choose from Python 2.7 pip (installer) OpenCVNow, using pip install command, we include following dependencies: NumPy Pandas Sklearn SciPy OpenCV TensorFlowRunningTo run the project, perform following steps -1. Take the dataset folder and all the required python files and put them in the same folder.2. Required files are - surf_image_processing.py (Image preprocessing folder), preprocessing_surf.py (Bag of features folder), classification.py (classification folder) and visualize_submissions.py (visualization folder).3. Run the preprocessing_surf.py file to make the csv file of training data set.4. classification.py contains the code for SVM, KNN and many other classifiers.5. cnn.py contains the code for deep learning as the name suggests.ResultsResults can be visualized by running file visualize_submissions.py

剩余内容已隐藏,您需要先支付 10元 才能查看该篇文章全部内容!立即支付

课题毕业论文、开题报告、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。