PIN: A More Advanced Deep Learning Model for Recommender System From Noah’s Ark Lab

Aug. 2018

Deep learning, which is one of the most advanced techniques in Artificial Intelligence, achieves remarkable improvement on computer vision (CV) and nature language processing (NLP), by developing very deep neural network structure. In recent years, deep models are also studied in classic information retrieval domains, such as personalized recommender systems, online advertising. Different from CV and NLP in which deep learning has already achieved great success, the data in recommender system and online advertising are in categorical form. For instance, a data instance describing a user browsing App Store would be: “City: Shenzhen; Weekday: Friday; Time: 10 PM; Phone: Huawei Mate 20; Download history: an app for English study and an app for house renting”. We may want to predict whether this user would like to down app for second-hand good selling from Huawei App Store.
Huawei App Store is one of the most important business in Huawei Customer Business Group (CBG), and at the same time is one of the most important scenarios that researchers in recommendation and search team in Noah’s Ark Lab can verify their developed new techniques. Huawei App Store, in which there are over one million apps and billions of users globally, has large amount of data to support research on complicated machine learning models that needs large amount of data to train, including deep learning models. DeepFM model, which is developed by researchers in Huawei Noah’s Ark Lab, attracted a lot of attention from academia and industry, and served online in Huawei App Store achieving significant improvement over baseline model (for the details, please refer to the previous chapter “DeepFM: A Deep Learning Model for App Recommendation in Huawei App Store”) in the beginning of 2018.
We collaborate with Prof. Yong Yu and Prof. Weinan Zhang from Apex Data & Knowledge Management Lab in Shanghai Jiaotong University, to devise a more advanced deep learning model for recommender system, namely PIN (Product-network In Network) [1]. The corresponding research work has been accepted by TOIS (ACM Transaction on Information Systems), which is a top-tier Information Retrieval Journal. PIN model has also been served online and achieved significant improvement (details will be presented later). This research work is publicly available at
The general architecture of deep learning models for recommender systems can be summarized as: feature representation layer, feature interaction layer, fully connected layer, single model output layer, ensemble layer (optional), and output layer (optional), as shown in the figure below.

General Architecture of Deep Learning Models for Recommender Systems

Most RL-based models become unacceptably inefficient for IRS with large discrete action space as the time complexity of making a decision is linear to the size of the action space. For all DQN-based solutions [1, 2], a value function, which estimates the expected discounted cumulative reward when taking an action at this state, is learned. The policy’s decision is taken by choosing the action with the largest Q value. As can be observed, to make a decision, all the candidate items have to be evaluated, which makes both learning and utilization intractable for tasks where the size of the action space is large, which is common for IRS. Similar problem exists in most DDPG-based solutions [3, 4] where some ranking parameters are learned and a specific ranking function is applied over all the items to pick the one with the highest ranking score. Therefore, the complexity of choosing an item for these methods also grows linearly with respect to the action space.
In this work, we propose Tree-structured Policy Gradient Recommendation (TPGR) framework to achieve high efficiency as well as high effectiveness. In TPGR framework, a balanced hierarchical clustering tree is built over the items and picking an item is thus formulated as seeking a path from the root to a certain leaf of the tree, which dramatically reduces the time complexity in both the training and the decision making stages. We utilize policy gradient technique to learn how to make recommendation decisions so as to maximize long-run rewards. To the best of our knowledge, this is the first work of building tree-structured stochastic policy for large-scale interactive recommendation.
As the first step of TPGR, we need to build a balanced hierarchical clustering tree. A clustering tree is supposed to be balanced if for each node, the heights of its subtrees differ by at most one and the subtrees are also balanced. It is also required that each non-leaf nodes has the same number of child nodes, denoted as c, except for parents of leaf nodes, whose numbers of child nodes are at most c. To construct such a balanced clustering tree, we can perform balanced hierarchical clustering over items following a clustering algorithm which takes a group of vectors and an integer c as input and divides the vectors into c balanced clusters (i.e., the item number of each cluster differs from each other by at most one). By repeatedly applying the clustering algorithm until each sub-cluster is associated with only one item, a balanced clustering tree is constructed.
  • Feature representation layer: in recommendation scenario, the input field in most of cases are in categorical form (a numerical field is usually transformed to categorical form by bucketing). One-hot encoding technique is applied to represent categorical data and the resulted data is of high-dimensional and very sparse. If such high-dimensional data is fed into deep neural network directly, the number of parameters would be unacceptable. Therefore, feature representation layer is utilized to map such one-hot encoded sparse and high-dimensional data to latent vectors which are dense and low-dimensional. Note that each field has its own latent vectors, i.e., the feature representation layer is not cross-field.
  • Feature interaction layer: experts can design flexible network structure to conduct preliminary explicit feature interactions in this layer, so that high-order implicit feature interactions can be learned more effectively by fully connected layers later. It is very interesting for researchers to explore advanced structure in this layer.
  • Fully connected layer: after feature interaction layer, this layer will learn high-order implicit feature interactions as a black-box component. Still, researchers need to carefully design the number of hidden layers, number of neurons per layer, activation function, dropout rate and etc. In most of the cases, the designs of feature interaction layer and fully connected layer are co-related to each other.
  • Single model output layer: for binary-class classification problem, Sigmoid is usually applied as activation function; while for multi-class classification problem, Softmax function is performed. The value produced in the layer is the predicted value by the corresponding single model. The loss between the predicted value and the true label is back-propagated to update the parameters in neural network.
  • Ensemble layer: the functionality of this layer is to automatically observe the relationship between outputs of multiple individual models and true labels, so that the final output values fit the true labels better. The input of this layer includes not only the output value of individual models, but also input raw features, which will help this layer to choose or to transform individual models’ output.
  • The novelty of PIN model comes from the design of feature interaction layer. Ignoring the design of feature interaction layer will result in losing control of network learning so that important feature interactions cannot be learned, such FNN [3] and Wide & Deep [2]. Over-design of feature interaction layer will limit the learning space of the neural network, so that the feature interactions not considered by experts cannot be learned effectively. For instance, PNN [4] and DeepFM [5] design the feature interaction layer to be inner-product or outer-product of pairwise features. Neither of the above two cases learns the feature interaction effectively.
    PIN model utilizes the learning power of neural networks to model feature interactions, i.e., pairwise feature interactions of two fields are model by a sub-network (instead of inner-product or outer-product in PNN or DeepFM). Different types of feature interactions can be learned via the parameters in the sub-networks, which are trained jointly with the overall prediction model. When there are m different fields in data, there are m(m-1)/2 sub-networks. The network structure of PIN model is presented in the figure below.

    Framework of Product-based Neural Network

    This figure presents the framework of product-based neural network, where different definitions of product operation lead to different models. Defining as inner product leads to IPNN (b), defining as matrix product leads to KPNN (c) and defining as sub-network leads to PIN (d).

    We verify the effectiveness of PIN on three public datasets and one Huawei App Store dataset. The results consistently show the superior of PIN model.

    Offline evaluation of models on four datasets (Criteo, Avazu, iPinyou, Huawei)

    PIN model has served online in game recommendation in Huawei App Store for a few months. It outperforms the baseline by more 30% in terms of CTR, and betters DeepFM by around 10%.

    To help researchers from both academia and industry to implement PIN model, we make our source code publicly available at Github:
    [1] Qu, Yanru, et al. “Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data”. To appear in TOIS 2018.
    [2] Cheng, Heng-Tze, et al. "Wide & Deep Learning for Recommender Systems." Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016.
    [3] Zhang, Weinan, Tianming Du, and Jun Wang. "Deep Learning over Multi-field Categorical Data." ECIR 2016.
    [4] Qu, Yanru, et al. "Product-based Neural Networks for User Response Prediction". ICDM 2016.
    [5] Guo, Huifeng, et al. “DeepFM: A Factorization-Machine based Neural Network for CTR Prediction”. IJCAI 2017.