Machine learning applications in the e-commerce domain (1)


As an introduction to AI, let’s look at the different applications of machine learning in e-commerce

 As a first step to explaining AI, which will without a doubt hugely transform our society over the next few years, let’s walk through an overview of machine learning. The way that machine learning (ML) is used in practice in Rakuten’s e-commerce (EC) platform Rakuten Ichiba, for example, is not very well known, so here I will cover the use of ML methodically section by section.

The importance of machine learning as an AI technology

 The term “AI” is often used in a rather ambiguous way these days (the first instance of use of the term is said to have been by John McCarthy at the Dartmouth Summer Research Project on Artificial Intelligence – the Dartmouth Conference – in 1956). While its conceptual aspects and the transformation of the debate may be complex, a widely accepted definition of AI is, “Software or systems aimed at computers performing the intellectual tasks performed by the human brain. Specifically, programs for recognizing the environment and objects, understanding natural language used by humans, making logical inferences, and learning from experience.” The technologies which form the basis for realization of AI are wide reaching – pretty much covering the whole history of computer science itself! This includes natural language processing (NLP), pattern recognition, image processing, speech recognition, machine learning, robotics and so on – the list is endless. In the wave of big data in recent years, many activities of people and companies have become digitalized, and as the amount of available data increases, the opportunities to make use of such data are expanding too, and the various AI AI technology is used in a truly broad range of situations in our day-to-day lives from medicine, to transportation, to power supply. Despite this, I sometimes hear people assume that AI is not much used in EC. But in actual fact, AI is used in many ways even in EC. By the very nature of EC, salesmen do not normally interact directly with customers and provide high added-value services based on their customers’ wishes. Moreover, the very essence of the business is removing precisely those middle men. Since it is an online business, in order to increase the value provided to customers, it is essential to provide personalized functions that offer information and services attune to each customer by using knowledge and technology on leveraging data in order to understand customers’ wishes, or by inferring which products and services to offer and providing recommendations.  In addition, in order to improve the accuracy of product search, we perform detailed analysis of product data and analysis of review data by using natural language processing, including sentiment analysis. Recently with the growth in multimedia data, its use also extends to product image/video search functions using image recognition technology, music/movie content search and recommendation functions using music recognition technology, and dialogue-based UI functions using speech recognition technology.  As the trend in big data has continued for close to ten years now, the data stored has become massive in volume and diverse, and it has become difficult to handle such volume and variety by the human hand alone. Against the background of such explosion of data, the importance of machine learning, an AI technology which is present across all the above-mentioned fields of application, including natural language processing and image recognition, is on the increase. Now is the time for you to apply machine learning too!

 For example, in the case of recommendation, instead of using only customer purchase data, by combining it with customer attribute data and product browsing history, etc., we can improve the conversion rate (CVR) of recommended products. In order to pursue this and maximize its effect, we aim to increase the data for combination as much as possible. As the types of data that are combined increase, it becomes more difficult for humans to define models to effectively integrate the data. In the business world, there are various practical restrictions on the use of time and budget, and under such unforgiving constraints, we are reaching the limit of the number of possible data combinations that can be conceived of by humans.

Supervised learning

 The application of machine learning in EC is extremely wide ranging. The first methods of machine learning that you need to know are the two major categories: supervised learning and unsupervised learning. In supervised learning, sample data provided in advance is treated as a “teaching sample”, i.e., training data, and data identification and derivation of rules is conducted based on that. For example, (although I’ll start with one that you may think it not really machine learning) regression analysis, which is the most typical method of supervised learning and is known as a statistical method, is used in EC for trend prediction such as forecasting which products will sell well. In Rakuten, we have had a department working with big data for about five years. They build systems that predict product sales volume and take account of seasonality and other events using a non-linear regression model. This system estimates the sales of each product by making it learn using sales volume as the response variable, and date, time, month-end, national holiday, sales promotions (campaigns, etc.), weather and temperature, etc. as explanatory variables, and drawing out relationships and rules to predict the sales of each product. Notice that weather is one of the explanatory variables. Usually, when it rains or snows heavily, EC sales increase. That’s what’s interesting here. The sales volume of minor products can be predicted to a high degree of accuracy, so we often make surprising discoveries. The accuracy of predictions made by humans is necessarily poorer. Almost 200 million products are currently traded on Rakuten and the volume of long tails is enormous, so we cannot cover all of them. That’s why a platform like this is needed. Since it doesn’t rely on manpower, we can avoid huge losses caused by mistaken orders and can bring risk down to a controllable level.

 Although I focused on prediction, another well-known type of supervised learning is support vector machine (SVM) – a binary classification pattern recognition model of discriminant analysis. Such pattern recognition models are typically used for the classification of text or data. In EC, it is used for categorizing products and users. It is also used for security purposes. For example, there are cases where we input the IP address, location, access pattern and query terms, etc. from previous fraudulent cases as feature values in machine learning and create a classifier to detect instances of fraudulent access from among all the many users. Support vector regression (SVR), which uses SVM in the above-mentioned regression analysis, is one such example.

 In discriminant learning, group learning – the method of increasing accuracy by combining simple discriminators (weak learners) – is garnering particular attention. Examples of this are the different kinds of boosting (AdaBoost、XGBoost) and Random Forests. One case where this was used in Rakuten in the past is when Adaboost was used to extract images for the purpose of selecting the clearest images from among all of a product’s images, and XGBoost is applied on a large scale for the purpose of increasing accuracy of data categorization. Random Forests, which can achieve a high level of accuracy with only limited domain knowledge, is an interesting example of application to business. Rakuten operates more than 70 different businesses and has a great variety of data. One of these businesses is the horseracing business which provides a service in which betting slips can be purchased for regional horseraces. Two years ago we held a hackathon in which we got together with young students and engineers to brainstorm on the topic of services and apps to attract people to regional horseracing. One team built a model to predict the winning horse. When this model was used to predict the order of horses at Oi Racecourse, it demonstrated an amazing level of accuracy. Those engineers had no knowledge of horseracing, but they tried out different methods and chose Random Forests as the best one and made a system that even surprised the professional horseracing pundits that were there. I will cover other cases later in this article in which results can be produced without domain knowledge, and this topic deserves serious consideration.

 One issue that arises in supervised learning is that the machine overly adapts to the sample data as a result of learning from the sample training data and when it is applied to actual data or unknown data, the discrimination or prediction accuracy deteriorates. This is called over-fitting and is a very deep-rooted problem. Depending on the actual approach, there are always some cases where the sample data range is small, or external factors not included in the model have been underestimated and the model cannot react to changes in the external environment. This is particularly difficult to avoid.

 For example, if we make a model to predict the price of financial instruments, the data sample is much smaller compared to image data for object recognition. In practice, prices are significantly affected by current affairs and natural disasters, etc., so even if the back test is perfect, when making real predictions, it may not go as you expected. As such, when applied supervised learning, you need to check the flexibility, complexity and learning results of the model while avoiding over-fitting as much as possible.