On the other hand, unsupervised learning is a method in which actual data is analyzed without receiving any prior sample data so that the intrinsic structure and characteristics existing in the data are extracted. For example, in recommendation, which is an important function in EC, we often use data clustering, which is an unsupervised learning method that categorizes the customers and products to recommend. As well as supervised learning, unsupervised learning is also used for security purposes. When detecting log-in attacks, we deploy clustering to determine what kind of attack patterns there are.
Typical methods of unsupervised learning are the well-known k-means, text mining by latent semantic indexing (LSI), and the topic model method (latent dirichlet allocation; LDA). k-means is one major clustering method. Calculations are made iteratively and data is partitioned into k clusters. It is a simple algorithm that everyone comes across at some point and it is applied in many situations. When you need start by dividing users into three types, it is a very convenient method. Which aspect you use to divide users becomes extremely important.
LSI is useful for collecting similar text and data and finding the commonalities (topics). It is called a topic model, which basically means that it is possible collect similar customers, products or recipes, etc. and then apply those topic groups. Technically speaking, it is a dimensional reduction of tf-idf – the first step in comparing similarity of text – with greater efficiency. For example, even if the words or values in the text or data are different, by considering the similarity of items with close meaning, the problem of ambiguity can also be solved. And it also enables categorization based on the meaning of the data. As such, by identifying products which have different but similar descriptions, this can be used in many ways for product search and expansion of recommendation. Of course, it’s possible to manually use a dictionary or thesaurus, but it would be difficult to cover every single word, so this method can be valuable.
Similar to LSI, LDA is also a topic model method, which estimates the meaning of the words and values used in the text or data. In the sense that the dimension is reduced to topic, LDA is close to LSI, but in LSI you cannot consider similarity of words that did not exist in the data which was applied. This is expanded on in a version called pLSI (probabilistic LSI), which enables data classification considering the probability distribution of the word, and in LDA it is additionally possible to handling the topics expressed in the text, i.e., the commonalities in collected text with fluctuations. It may be easier to understand by looking at the code rather than reading an explanation, but LDA additionally covers ambiguity of words in a text and summarizing text. Generally speaking, you can understand it as being like a LSI probability extension.
LSI and LDA are common methods of text mining and are often explained as methods of document classification for natural language processing. As such, many people assume that they are not used very much in EC. In fact, in addition to recommendation, many of the tasks in analysis of product and review data can be viewed as NLP document classification tasks, so unsupervised learning techniques are used relatively often (the discriminant analysis in supervised learning is of course often used as well). In other words, a topic model involves the relationship between words, text, and text commonality – topics. When categorizing text with common topics from words and text, if for example we replace words with products, replace the text formed from those words with the mass of user behavior data including product purchase, browsing and search data, and replace text topics with users or the prospective objectives/preferences/styles of consumers, then this could be equivalent to the “hidden needs” of users and consumers which are not outwardly expressed. Based on that metaphor, document classification techniques could surely be applied to marketing.
Unsupervised learning is a method that by nature does not need sample data, so in the context of problem solving for business, it can be used to distance yourself from unspoken business practices and assumptions. For example, every EC business and retail business deploys a loyalty program which grades customers according to their repeat rates and purchase amount, and they launch various initiatives based on that. However, when grouping customers using unsupervised learning and purely focusing on maximizing the effectiveness of initiatives, a segmentation which is completely different to the loyalty program grades emerges. The analysis shows that for certain genres, the initiatives by grade essentially have no meaning. In that respect, unsupervised learning contains the potential to revolutionize existing norms in marketing by making use of it in a venturous way, while maintaining an appropriate distance from business/domain knowledge. This could be a very exciting area for the application of topic models as well. Particularly in the last 10 years, the development of the internet, the rise of mobile technology and the evolution of social media has greatly transformed the technology environment for consumers. Purchasing behavior has changed dramatically too. As such, there are many cases where existing business customs and assumptions are collapsing. Therefore, it is an extremely important approach to discard existing business norms and domain knowledge.
At Rakuten, we see the topic of building a master general-purpose catalog of the explanations and data of all the products of the various merchants as a document classification problem to which we apply different NLP methods. However, there are as many as 200 million products traded on Rakuten stretching across many genres from foods to clothes, electrical goods, digital content, sports equipment and cars and so on. When considering the application of supervised learning, it is extremely difficult to find training data that corresponds to all of the types of products. That also could lead to frequent over-fitting. So, rather than using pure supervised learning, we employ a kind of bootstrapping approach in which we repeat resampling.
To illustrate what semi-supervised learning is, using a limited sample of training data, first we train the model and then afterwards we categorize the actual data which has been obtained. Highly accurate data from those results is then used as sample data and training is repeated. In doing so, even when it is difficult to provide much training data, we can still achieve the same kind of effect as with supervised learning. Semi-supervised learning is a particularly effective approach for cases such as Rakuten Ichiba where the product data is diverse and massive. It can prove to be a considerably intelligent method.
I talked about viewing data as a document classification problem, but there are also various other learning methods which can be applied by viewing it as a natural language processing problem. For example, if we view the task of attribution extraction from product explanations for the purpose of building master catalog data as a sequence labeling problem, and use an effective technique, we can use a more sophisticated approach for analyzing product data. In sequence labeling, in the input the data is given sequences and in the output a label is assigned to each sequence of data. To give a specific example, if we give text as the input and estimate the part of speech of each word in the text, then we can receive an output of words labeled with their part of speech. There are cases where, rather than labeling the part of speech of each word individually, it is better to assign the parts of speech of each word considering the structure of the whole text and produce the output all together. This is where structural learning comes in. http://www.aclweb.org/anthology/I13-1190 ）
In such cases, at Rakuten we use a structural learning method called structured SVM, or conditional random field (CRF). Structural learning is based on supervised learning but there are also unsupervised and semi-supervised methods too. Moreover, CRF does not stop at assigning the data and categorization; it also calculates the probability that certain data will be categorized with a particular label. This enables more advanced estimation. As such, CRF is often used in morphological analysis and named entity recognition. As an example of use in Rakuten Ichiba, we apply a combination of LDA and CRF to extract product attributes and values from product explanations and reviews as well as review aspect and rating from review text (positive or negative) (ref: http://www.aclweb.org/anthology/I13-1190 )