Deep learning was the theme of GTC 2015, and an idea being explored is “RGBD Occlusion Detection via Deep Convolutional Neural Networks” by a group of researchers at United Technologies Research Center in East Hartford, Connecticut.
Here’s the talk: RGBD Occlusion Detection via Deep Convolutional Neural Networks. Actual information starts at the 2:13 mark.
Occlusion detection can help with feature selection, which is helpful in Simultaneous Localization and Mapping (SLAM) problems in robotics, such as mapping indoor environments, object recognition, grasping and obstacle avoidance in UAVs.
The research time investigated several different ways of looking at scenes. As input, RGB + depth information was gathered from video frames. They used this information to train a neural net using convolutional filters to generate feature maps from data. This produced a good edge detector. This, combined with optical flow, seems like a promising area of research.
The whole idea of training a network to see makes one wonder how long, and how much work, does training actually take? Take a robot for example. If we think about this as a learning machine, which has little environmental knowledge, how do we send it out to ‘learn’ the environment? Is there some baseline that is used, or pre-training that must be done manually? Or is the learning supplemental, where you place on-board general purpose ‘old school’ computer vision algorithms on board and the robot starts learning and building its neural network in parallel? This neural network would eventually supplement or supplant the ‘boot strapping’ computer vision code.
This talk made me think more than answer questions that I had about the subject.