Machine learning applications in the e-commerce domain (4)


(Continued from the previous chapter)

Deep learning

  In relation to deep learning, the two-layer network Word2Vec developed by Google was revolutionary. If you input text, you can obtain a vector set as the output, i.e., the feature vector of words in the text. By grouping the words and creating vectors, you can make judgments on similarity, and infer the meaning. Since it uses figures, it is scalable by processing in parallel and this is one of the great advantages of Word2Vec. It is not only applicable to the field of text analysis, but to any kind of data. At Rakuten, we have developed an extended version called Category2Vec and have released it as OSS (ref: We are training it with various kinds of EC data and are beginning to see the possibility of application on a broad scale for product/user analysis and categorization, and estimation of loss data. While Word2Vec is not deep learning itself, it is worth understanding this as a technology in numerical form that can be used in a deep neural network.

 Deep learning is a neural network method, which represents the structure of the human brain in software in a complex way with multiple layers. Typical examples are convolutional neural network (CNN), recurrent neural network (RNN) and LSTM (long-short term memory), a powerful RNN. Neural networks can be used for both supervised learning and unsupervised learning, and deep learning makes uses of the multi-layer feature of neural networks, so data can be obtained for a wide range of concepts from concrete to abstract. There are a variety of ways it can be used. In unsupervised learning (e.g., Autoencoder), it can be used for pre-training as a method to avoid over-fitting. In supervised learning it can be used for fine tuning, and there are also approaches of “CNN without pre-training “

 For pre-training, the key is feature values. Feature values are the variables used as inputs in machine learning, which show the features in the target data that you want to learn. For example, we know in character recognition and image recognitions, that by inputting the widely-used SIFT (scale invariant feature transform) together with SURF (speeded-up robust features) as features, accuracy is improved. In this case, it is necessary to create the feature value for input in advance as part of pre-processing. If we understand this in a broad sense as a method to help increase accuracy by pre-processing data, separately to machine learning, this has been known for a long time in the history of computing as a method for various data uses. However, in some cases of deep learning, it is possible to automatically obtain the feature values within the multi-layered structure. Thanks to this, accuracy has improved dramatically in the fields of image recognition and object recognition, so that we have come close to the complex processing of the human brain.

 Deep learning became widely known in the news following the significant win for a CNN without pre-training at ILSVRC 2012 (ImageNet Large Scale Visual Recognition Challenge). Partly due to that, generally there are many cases of application of deep learning in the field of image recognition.

 At Rakuten, we have deployed the major algorithms, CNN and RNN, in the C2C application QuickSell of our French group company PriceMinister. It automatically recognizes the object in the photo showing the item that the user would like to sell, and shows the category of the product, making it much easier for users to sell items (ref: In Japan, we have released the same technology as the Moshi Kore function in our C2C app Rakuma (ref: Another example of the application of image recognition is a digital signage service made by Rakuten Institute of Technology showing an unbeatable AI rock-paper-scissors game that detects the hand shape of the contestant instantly. This also uses CNN as its basis. The demo garnered even more attention than we expected and was featured on TV Tokyo’s Trend Tamago and Akira Ikegami’s show on TBS (ref:

 In addition to using deep learning for images, at Rakuten Institute of Technology we are also developing our own machine translation technology using the multilingual subtitle data of Viki, a video streaming service in the Rakuten group. The processing uses an attentional RNN – a recurrent neural network with an attention mechanism that handles data containing ordered text, speech and movies. Thanks to the high quality of Viki subtitle data, our TV drama translation has reached a global level of accuracy. Viki (a Rakuten group company) has started providing a service for learning Korean and Chinese by watching TV dramas. This is based on machine translation and NLP technology. The service currently offers Chinese/English and Korean/English but will be expanded to other languages in the near future. (ref. Viki’s Learn Mode Will Take Your Language Learning To A New Level )

(ref. Viki’s New Feature - Learn Mode! | Learn Korean & Chinese While Watching Dramas )

 As you can see from this, even in Rakuten, there are relatively many examples of its application in image recognition. Although I can’t yet reveal the details, use for marketing purposes is also well underway.

Releasing data to researchers

 Lastly, I would like to talk about data. Due to the rise of big data and long tails, the volume and diversity of available data is increasing. That trend is also encouraging the use of AI technology in EC. Having said that, there are so many types of data that one company cannot even come close to making use of all its data.

 To exploit such data, it is important to analyze from different angles and create models. And for that it is necessary to gather broad insight. In academia, data sets are provided by various universities, research institutions and associations for research purposes, and national/local governments release public data for the expansion of public services and use by business, so the concept of open data is relevant for businesses too. Of course we take due care to protect the personal information and privacy of users and only provide data that has been cleared for copyrights and portrait rights, etc. to universities and researchers for research purposes. By having a wide range of researchers make use of our company’s data, as well as contributing to academic research, we can explore ideas that would not have been possible by business alone, learn analysis methods that we could not have practiced, and by sharing the results, it will lead to further data use, AI technology use, and encourage further creative innovation.

 At Rakuten, we provide our business data for researchers to use in their research (ref: In addition to EC product data (156 million items) and review data (64 million items), which are particularly useful for business, we also release hotel facility and review data, golf course facility and review data, recipe text and image data, and Rakuten Viki video attribute and user behavior rating data. We also release annotated data, including a corpus provided by Tsukuba University of Rakuten Travel review data with sentence sentiment labels, annotated data with category labels of product images that can be used in image recognition research, as well as character-based annotated data.

 This data released for research purposes has already been used by over 70 different labs in universities and research institutions. In addition, in 2015 Rakuten Institute of Technology Singapore invited researchers and data scientists from all over the world to a data analysis contest (ref: and we also held a similar contest in France (event site: I currently serve as a trustee of the Database Society of Japan (DBSJ) and am responsible for with industry-academia collaboration. We are now planning a data competition together with DBSJ this year with the aim of expanding data provided for research purposes.

In Conclusion

 This brings to an end my summary of machine learning as it relates to AI, focusing on use in EC.

(The end)