Zone24x7 is the Innovations Partner for a Tier 1 US department store retailer recording sales over $19 Bn across more than 1100 locations. Zone24x7 manages the innovation center for the retailer in Milpitas California since 2012.
With Innovation Services, Zone24x7 conceptualize, design and implement technology proof of concepts, demos, prototypes to explore the capabilities of latest technologies and evaluate their usefulness for the client organization.
However, returned goods pose major challenges in terms of return logistics management impacting operational costs. Hence, the retailer was interested to explore technology solutions which could help them stem the flow of returns without impacting customer experience. After taking these concerns into consideration we decided to explore the possibilities of using predictive analytics to predict potential return goods.
There were 2 major challenges in front of the technical team. How to build a predictive analytics model which could
- Predict at the point of purchase, items most likely to be returned by a particular customer from a shopping cart
- Predict when a product would be returned, if a product is predicted to be returned by a customer from a shopping cart.
How We Helped
The problem was segregated into three major areas
- Explore and identify both purchase and returns behavior of both customers & products
- Build a predictive analytics model to predict at the point of purchase what items could be returned in a given cart for a given customer
- Build a predictive analytics model to predict when a product would be returned, if a product is predicted to be returned by a customer from a shopping cart.
Zone24x7 Data Science specialists performed an exploratory analysis on the existing data to recognize purchases and returns behavior. Then to build a set of features that would be able to tackle the returns patterns and build a machine learning model on top of it.
The main data source for this project was point of sales transactions that were recorded in the client’s enterprise data warehouse (EDW). This contained both returned and non-returned sales transactions. This contained attributes such as unique identifiers of the cart, transaction date, transaction time, product code, product subclass, quantity, tax amount, promotion scheme code, customer id, DOB, gender and civil status. The first iteration of the model was built using this data and the accuracy values were not satisfactory.
It was observed that the data set does not contain enough product and customer data. Therefore, a product information data source and a customer information data source were mapped to further expand the attribute set.
At the end of the second iteration, model accuracy significantly increased with these new attributes. As a further improvement, customer history data such as return rate, returned amount, purchased amount and average order values were derived using the data at hand and was added as new attributes.
The team conducted statistical analysis to identify patterns and better understand the data sets and domain.The significance of this study, when compared with existing literature, is its high accuracy achieved through ensemble learning.
This was achieved by applying advanced data science techniques such as ensemble modeling and hyperparameter tuning. Ensembling methods such as stacking and blending. By applying blending algorithm type which focused on blending similar models gave a higher accuracy compared to singular models. Using hyperparameter tuning, accuracy of the blending models have also been increased. At present, an overall prediction accuracy of 92% and a F1 score of 81% is achieved.
Impact of Our Work
Improvements of Accuracy over model iterations
Accuracy of product return duration