Learning Representations from Product Titles for Modeling Large-scale Transaction Logs
Shopping basket data analysis is significant in understanding the shopping behaviors of customers. Existing models such as association rules are poor at modeling products which have short purchase histories and cannot be applied to new products (the cold-start problem). In this paper, we propose BASTEXT, an efficient model of shopping baskets and the texts associated with the products (e.g., product titles). The model's goal is to learn the product representations from the textual contents, that can capture the relationships between the products in the baskets. Given the products already in a basket, a classifier identifies whether a potential product is relevant to the basket or not, based on their vector representations. This enables us to learn high-quality representations of the products. The experiments demonstrate that BASTEXT can efficiently model millions of baskets, and that it outperforms the state-of-the-art methods in the next product recommendation task. Besides, we will also show that BASTEXT is a strong baseline for keyword-based product search.
READ FULL TEXT