A collection of Artificial intelligence/Machine Learning (AI/ML) models

bit.studio has researched over the years and a few open-sourced models that we used in past projects.

Body Segmentation

Crowd Hand Segmentation

Detecting the human hand within an image is a common task in computer vision. However, it becomes extremely challenging when dealing with small, countless crowd hands, even when using state-of-the-art object detectors due to their limitations. Our model is an alternative approach by performing detection-by-segmentation, and the results outperform other detection base models in terms of accuracy and speed.

Facial Landmarks

Hand Keypoints

Lightweight model hand keypoints detector that mainly aimed to use on mobile web browsers. Hand keypoints detection has numerous applications in human interactive tasks e.g. augmented/virtual reality. Our hand keypoints detector model uses 2D RGB image as an input and predicts 21 human hand joints as an output.

Hand Orientation

Lightweight model to predict hand rotation and gesture in one stage. Both in hand orientation and gesture classification. The model works as a final part for hand gesture related systems by using cropped hand images as an input. By the minimalistic of the model, It can perform multi hands prediction in the same time up to 32 hands in the scene and still works in real-time along with the hand segmentation model.

Human Voice Activity

This is a lightweight neural network model that detects human speech segments in audio. This model works as a fundamental task by distinguishing speech segments from background noise which is important in mostly voice related applications. Our model is designed to work with a very long audio file such as a movie and television show. In the result our model achieves faster than real-time performance by processing a 30 minutes long audio stream in 8 seconds.

Self-Driving Car: Lane Keeping

End-to-End lane keeping on mobile devices with a neural network to control car steering with a single front-facing camera. The model learned all necessary road features and can precisely, stably , take a turn without humans taking control or guided at any point. Training data was collected only from a simulator with various environments to ensure that the model can handle different lighting, background environment and other noises.

Self-Driving Car: Sign Detection

Object detection is the main power in many computer vision tasks by serving a bounding box containing an object. This can be useful in surveillance, autonomous driving and many other applications. Our project uses TensorFlow’s official object detection model that is design specific for Google Edge TPU. As a result our custom trained model can perform 167FPS on Google Pixel 4 which is more than enough for using with real-time applications.

Learn to Compare

In this project, we question the subtle difference between "learning to classification" and "learning to recognition" though the use of deep learning on a hand shadow dataset.

Learn more

Crowd Plane Orientation

Estimating camera orientation in roll, pitch and height from an image of concert crowd. This is a necessary process to map an object into a 2D image with the correct location, orientation and size without the help of a depth sensor.

Guitar Orientation

Lightweight model to predict guitar and microphone rotation within an image (or video frame) .Normally, achieving object orientation from the 2D image is category specific. Each object has their own method, difficulties, limitations and has to re-do everything from start to fit each object.
Our model performs predicting 2D object rotation from an input image of guitar or microphone with only one model and in real-time and still possible to work with other types of object by leaning with their features.

Microphone Orientation

Lightweight model to predict guitar and microphone rotation within an image (or video frame) .Normally, achieving object orientation from the 2D image is category specific. Each object has their own method, difficulties, limitations and has to re-do everything from start to fit each object.
Our model performs predicting 2D object rotation from an input image of guitar or microphone with only one model and in real-time and still possible to work with other types of object by leaning with their features.

Pose Estimation

Shirt Classification

Style Transfer

Visual Pathologies Detection

The visual system of humans is not fully developed at birth and gradually improves during the first few years of life. World Health Organi­zation estimates there are about 19 million children in the world su­ffering from visual impairment. However, 70–80% of them could prevent or treat it.

Learn more