DeepFM: A Deep Learning Model for App Recommendation in Huawei App Store

Apr. 2018


IJCAI (International Joint Conference on Artificial Intelligence) is a top-tier conference on Artificial Intelligence (AI), which receives huge amount of submissions from both academia and industry. Accepted researches in IJCAI have contributed greatly on development of AI techniques.
The researcher from recommendation and search tem in Huawei Noah’s Ark Lab presented their research work about their devised deep learning model for recommendation, in IJCAI 2017, Melbourne, Australia. To improve the service quality of accurate and personalized recommendation in Huawei App Store, recommendation and search team proposed an end-to-end deep learning model: a Factorization-Machine based neural network (DeepFM).
The paper is publicly available at https://arxiv.org/pdf/1703.04247.pdf and https://arxiv.org/pdf/1804.04950.pdf.
Avatar

Presentation by a researcher from Huawei Noah’s Ark Lab in IJCAI 2017

It is important for recommender system to learn implicit feature interactions behind user click behaviors. By our study in Huawei App Store, we found that people often download apps for food delivery at meal-time, suggesting that the (order-2) interaction between app category and time-stamp can be used as a signal for recommendation. As a second observation, male teenagers like shooting games and RPG games, which means that the (order-3) interaction of app category, user gender and age is another signal for recommendation. In general, such interactions of features behind user click behavior can be highly sophisticated, where both low- and high-order feature interactions should play important roles. According to the insights of Wide & Deep model [1] from Google, considering low- and high-order feature interactions simultaneously brings additional improvement over the cases of considering either alone.
The key challenge in recommender system is in effectively modeling feature interactions. Some feature interactions can be easily understood, thus can be designed by experts. However, most other feature interactions are hidden in data and difficult to identify a priori (for instance, the classic association rule “diaper and beer” is mined from data, instead of discovering by experts), which can only be captured automatically by machine learning. Even for easy-to-understand interactions, it seems unlikely for experts to model them exhaustively, especially when the number of features is large.
Despite their simplicity, generalized linear models, such as FTRL [2], have shown decent performance in practice. However, a linear model lacks the ability to learn feature interactions, and a common practice is to manually include pairwise feature interactions in its feature vector. Such a method is hard to generalize to model high-order feature interactions or those never or rarely appear in the training data. To leverage feature interactions automatically, Poly-2 [3] learns a weight for each pair-wise feature interaction. However, the performance of Poly-2 might be poor when data is sparse, as the parameters associated to feature interactions cannot be correctly estimated in sparse data where many features have never or seldom occurred together. To address this limitation, Factorization-Machine (FM) [4] utilizes latent vectors to represent features and learns feature interactions from vector inner product of the correspond features, while it shows very promising results. While in principle, FM can model high-order feature interaction, in practice usually only order-2 feature interactions are considered due to high complexity.
As a powerful approach to learning feature representation, deep neural networks have the potential to learn sophisticated feature interactions. Some ideas extend CNN and RNN for CTR prediction [5, 6], but CNN-based models are biased to the interactions between neighboring features while RNN-based models are more suitable for click data with sequential dependency. [7] studies feature representations and proposes Factorization-machine supported Neural Network (FNN), which pre-trains FM before applying DNN, thus is limited by the capability of FM. Feature interaction is studied in [8] by introducing a product layer between embedding layer and fully-connected layers and proposing Product-based Neural Network (PNN).
As noted in [1], PNN and FNN, like other deep models, capture little low-order feature interactions, which are also essential for CTR prediction. To model both low- and high-order feature interactions, [1] proposes an interesting hybrid network structure (Wide & Deep) that combines a linear (“wide”) model and a deep model. In this model, two different inputs are required for the “wide” part and “deep” part, respectively, and the input of “wide” part still relies on expertise feature engineering. We can see that the existing models are biased to low- or high-order feature interactions, or rely on feature engineering. In this paper, we show it is possible to derive a learning model that is able to learn feature interactions of all orders in an end-to-end manner, without any feature engineering besides raw features.
To resolve such limitations, we propose DeepFM, which consists of two components, FM component and deep component, sharing the same input. DeepFM integrates the architectures of FM and DNN. The FM component is a factorization machine, which models pairwise (order-2) feature interactions as inner product of respective feature latent vectors. The Deep component is a feed-forward neural network, which is used to learn high-order feature interactions. It is worth pointing out that FM component and Deep component share the same feature embedding, which brings two important benefits: (1) it learns both low- and high-order feature interactions from raw features; (2) there is no need for expertise feature engineering of input.
Avatar

DeepFM Model

To verify the effectiveness of DeepFM, we conduct experiments on Criteo Kaggle and Huawei App Store datasets. The Criteo Kaggle dataset is random split into two parts: 90% for training while the rest for testing. The Huawei App Store dataset consists of eight consecutive days of users’ click records from the game center: the first seven days’ data for training while the last day’s data for testing. The result shows that DeepFM betters the state-of-the-art models by 0.36%~0.86% in terms of AUC and 0.34%~1.1% in terms of logloss on Huawei dataset.
Avatar

Performance Comparison of DeepFM and baselines

Now, DeepFM model has been applied online to serve users for a few months and it improves CTR more than 10% compared to the baseline model. We will keep optimizing DeepFM model while devise new models. We collaborate with Prof. Yong Yu and Prof. Weinan Zhang from Shanghai Jiaotong University, to develop more advanced neural network structure for recommender system. Our new result (PIN, Product-network In Network) will be introduced in our next chapter.


References
[1] Heng-Tze Cheng et al. Wide & Deep Learning for Recommender Systems. DLRS@RecSys 2016.
[2] H.Brendan McMahan et al. Ad Click Prediction: A View From the Trenches. KDD 2013.
[3] Yin-Wen Chang et al. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM. JMLR 2010.
[4] Steffen Rendle. Factorization Machines. ICDM 2010.
[5] Qiang Liu et al. A Convolutional Click Prediction Model. In CIKM 2015.
[6] Yuyu Zhang et al. Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks. AAAI 2014.
[7] Weinan Zhang et al. Deep Learning over Multi-field Categorical Data – A Case Study on User Response Prediction. ECIR 2016.
[8] Yanru Qu et al. Product-based Neural Networks for User Response Prediction. ICDM 2016.