Detecting the human hand within an image is a common task in computer vision. However, it becomes extremely challenging when dealing with small, countless crowd hands, even when using state-of-the-art object detectors due to their limitations. Our model is an alternative approach by performing detection-by-segmentation, and the results outperform other detection base models in terms of accuracy and speed.
Lightweight model hand keypoints detector that mainly aimed to use on mobile web browsers. Hand keypoints detection has numerous applications in human interactive tasks e.g. augmented/virtual reality. Our hand keypoints detector model uses 2D RGB image as an input and predicts 21 human hand joints as an output.
Lightweight model to predict hand rotation and gesture in one stage. Both in hand orientation and gesture classification. The model works as a final part for hand gesture related systems by using cropped hand images as an input. By the minimalistic of the model, It can perform multi hands prediction in the same time up to 32 hands in the scene and still works in real-time along with the hand segmentation model.
This is a lightweight neural network model that detects human speech segments in audio. This model works as a fundamental task by distinguishing speech segments from background noise which is important in mostly voice related applications. Our model is designed to work with a very long audio file such as a movie and television show. In the result our model achieves faster than real-time performance by processing a 30 minutes long audio stream in 8 seconds.
End-to-End lane keeping on mobile devices with a neural network to control car steering with a single front-facing camera. The model learned all necessary road features and can precisely, stably , take a turn without humans taking control or guided at any point. Training data was collected only from a simulator with various environments to ensure that the model can handle different lighting, background environment and other noises.
Object detection is the main power in many computer vision tasks by serving a bounding box containing an object. This can be useful in surveillance, autonomous driving and many other applications. Our project uses TensorFlow’s official object detection model that is design specific for Google Edge TPU. As a result our custom trained model can perform 167FPS on Google Pixel 4 which is more than enough for using with real-time applications.
Estimating camera orientation in roll, pitch and height from an image of concert crowd. This is a necessary process to map an object into a 2D image with the correct location, orientation and size without the help of a depth sensor.
Lightweight model to predict guitar and microphone rotation within an image (or video frame) .Normally, achieving object orientation from the 2D image is category specific. Each object has their own method, difficulties, limitations and has to re-do everything from start to fit each object.
Our model performs predicting 2D object rotation from an input image of guitar or microphone with only one model and in real-time and still possible to work with other types of object by leaning with their features.
Lightweight model to predict guitar and microphone rotation within an image (or video frame) .Normally, achieving object orientation from the 2D image is category specific. Each object has their own method, difficulties, limitations and has to re-do everything from start to fit each object.
Our model performs predicting 2D object rotation from an input image of guitar or microphone with only one model and in real-time and still possible to work with other types of object by leaning with their features.
The visual system of humans is not fully developed at birth and gradually improves during the first few years of life. World Health Organization estimates there are about 19 million children in the world suffering from visual impairment. However, 70–80% of them could prevent or treat it.
Learn more