Sample-Efficient Learning
of Novel Visual Concepts

The Robotics Institute, Carnegie Melon University
*Denotes equal contribution

Spotlight Presentation at the Conference on Lifelong Learning Agents - CoLLAs 2023
Simulation task

Abstract

Despite the advances made in visual object recognition, state-of-the-art deep learning models struggle to effectively recognize novel objects in a few-shot setting where only a limited number of examples are provided. Unlike humans who excel at such tasks, these models often fail to leverage known relationships between entities in order to draw conclusions about such objects. In this work, we show that incorporating a symbolic knowledge graph into a state-of-the-art recognition model enables a new approach for effective few-shot classification. In our proposed neuro-symbolic architecture and training methodology, the knowledge graph is augmented with additional relationships extracted from a small set of examples, improving its ability to recognize novel objects by considering the presence of interconnected entities. Unlike existing few-shot classifiers, we show that this enables our model to incorporate not only objects but also abstract concepts and affordances. The existence of the knowledge graph also makes this approach amenable to interpretability through analysis of the relationships contained within it. We empirically show that our approach outperforms current state-of-the-art few-shot multi-label classification methods on the COCO dataset and evaluate the addition of abstract concepts and affordances on the Visual Genome dataset.

Introduction Video (7 Minutes)

Learning Novel Concepts

This work presents an approach to achieve novel object recognition in a few-shot manner by synergistically combining the processing capabilities of neural image pipelines with the interpretability and expandability of symbolic knowledge in the form of a knowledge graph. This unique combination allows our approach to utilize symbolic knowledge during inference to detect visible objects and identify their non-visual properties, including attributes and affordances. This architecture also allows easy integration of novel concepts through addition of further domain knowledge to the graph.

Our proposed approach demonstrates state-of-the-art performance in novel object recognition, outperforming existing methods when evaluated on the COCO novel object detection dataset. Additionally, it goes beyond existing work by allowing users to learn novel non-visual concepts in a few-shot manner. As an illustrative example, our approach successfully learns the concept of "edible" from just five sample images that demonstrate its usage in the context of hot-dogs, pizza, and sandwiches.

Learning Novel Attributes

Novel Attributes

Novel attributes are learned in a five- and fifteen-shot manner. Results are reported on 100 test images, 50 of which showed the novel concept.

Learning Novel Affordances

Novel Affordances

Novel affordances are learned in a five- and fifteen-shot manner. Results are reported on 100 test images, 50 of which showed the novel concept.

By integrating neural image processing and symbolic knowledge, our approach bridges the gap between visual recognition and conceptual understanding. This advancement not only enhances object recognition tasks but also opens up exciting possibilities for capturing and leveraging non-visual knowledge in an efficient manner. To add a novel concept to the graph, the novel concept is added as an additional output neuron to the final classifier and is added to the knowledge graph itself. In order to integrate the novel node into the graph, we introduce a novel multimodal transformer called RelaTe. This transformer leverages GloVe word embeddings to identify relevant connections between the novel concept and other, already existing, nodes in the knowledge graph, utilizing contextual information from a small set of sample images, thus eliminating the need for further human involvement.

When evaluating our approach, we achieved an accuracy of 70.3% in detecting novel objects across 16 novel object classes. Moreover, our method demonstrated an accuracy of 66.7% when tasked with detecting novel non-visual concepts.

BibTeX

@misc{bhagat2023sampleefficient,
      title={Sample-Efficient Learning of Novel Visual Concepts}, 
      author={Sarthak Bhagat and Simon Stepputtis and Joseph Campbell and Katia Sycara},
      year={2023},
      eprint={2306.09482},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}