Deep Meta-Learning: Learning to learn in the concept space

Apr. 2018

Abstract: We propose a new meta-learning framework to integrate the representation power of deep learning into meta-learning, thus called deep meta-learning, and show it can substantially improve vanilla meta-learning algorithms on various few-shot image recognition problems. For example, on 5-way-1-shot image recognition on CUB-200, it improves Meta-SGD (Li et al., 2017) from 53.34% to 66.95%. We expect its widespread use in the near future.

Conventional machine learning relies on enormous amounts of labeled data, which is not practical for many real-world problems where collecting labeled data is often expensive. In contrast, humans can learn new concepts rapidly from single images by leveraging knowledge learned before Lake et al. (2015). Recently, meta-learning, or learning to learn, pioneered by Schmidhuber (1987) and Bengio et al. (1990), draws renewed interest because of its promising performance in solving problems with small labeled data. It learns on the level of tasks instead of instances, and learns task-agnostic learning algorithms (e.g. SGD) instead of task-specific models (e.g. CNN). Remarkably, once trained, it can learn new tasks quickly from only a few examples (few-shot learning).
In meta-learning, one learns from a set of “labeled” tasks, as opposed to labeled instances, where each task consists of a training set and a test set (Figure 1). The goal is to fit to the tasks a learning algorithm that generalizes well to related new tasks, i.e., it can learn from the training data a learner (a model) that performs well on the test data (Figure 2).
However, few-shot learning remains challenging for meta-learning as a few examples are far from sufficient to describe a sophisticated concept of interest. In this work, we argue that this is due to the lack of a good representation for meta-learning and propose a deep meta-learning framework to integrate the representation power of deep learning into meta-learning. The framework is composed of three modules, a concept generator, a meta-learner, and a concept discriminator, which are learned jointly in an end-to-end manner (Figure 3). By learning to learn in the concept space rather than in the complicated instance space, deep meta-learning can substantially improve vanilla meta-learning, which is demonstrated on various few-shot image recognition problems.
On one hand, we expect the concept generator extracts representations that capture the high-level concepts of the instances from many related tasks, which can guide the meta-learner to perform task-specific few-shot learning quickly. On the other hand, we hope that the concept generator can be enhanced through the concept discriminator by handling concept discrimination tasks on external large-scale datasets (e.g. ImageNet). After observing a large number of instances and their concepts, the concept generator gradually learns the mapping from the raw instance space to the abstract concept space, and this high-capacity representation provider will greatly ease the meta-learning process.
The proposed deep meta-learning framework (DEML) can substantially improve vanilla meta-learning, including the popular Matching Nets (Vinyals et al., 2016), MAML (Finn et al., 2017), and Meta-SGD (Li et al., 2017), which is demonstrated on a number of few-shot image recognition problems (Table 1). We expect more applications of deep meta-learning.

Paper Link:
Zhou, Fengwei, Bin Wu, and Zhenguo Li. “Deep Meta-Learning: Learning to Learn in the Concept Space.” arXiv preprint arXiv:1802.03596, 2018.

Finn, Chelsea, Pieter Abbeel, and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.”International Conference on Machine Learning, 2017.
Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. “Human-level concept learning through probabilistic program induction.” Science, 2015.
Bengio, Yoshua, Samy Bengio, and Jocelyn Cloutier. “Learning a synaptic learning rule.” Université de Montréal, Département d'informatique et de recherche opérationnelle, 1990.
Li, Zhenguo, Fengwei Zhou, Fei Chen, and Hang Li. “Meta-SGD: Learning to Learn Quickly for Few-Shot Learning.” arXiv:1707.09835, 2017.
Ravi, Sachin, and Hugo Larochelle. “Optimization as a model for few-shot learning.” International Conference on Learning Representations, 2017.
Schmidhuber, Jürgen. “Genetic Programming”. Diploma thesis, Institut f. Informatik, Tech. Univ. Munich, 1987.
Vinyals, Oriol, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. “Matching networks for one shot learning.” Advances in Neural Information Processing Systems, 2016.