MARKET BASKET ANALYSIS USING APRIORI ALGORITHM

Market

In the business intelligence world, “market basket analysis” helps retailers better understand – and ultimately serve – their users by predicting their purchasing behaviors.


In the retail and restaurant businesses, market basket analysis (MBA) is a set of statistical affinity calculations that help managers better understand – and ultimately serve – their customers by highlighting purchasing patterns. In simplest terms, MBA shows what combinations of products most frequently occur together in orders. These relationships can be used to increase profitability through cross-selling, recommendations, promotions, or even the placement of items on a menu or in a store.


Market basket analysis uses association rule mining under the hood to identify products frequently bought together. Before we get into the nitty gritty of market basket analysis, let us get a basic understanding of association rule mining. It finds association between different objects in a set. In the case of market basket analysis, the objects are the products purchased by a customer and the set is the transaction. In short, market basket analysis

  • is a unsupervised data mining technique
  • that uncovers products frequently brought together
  • and creates if-then scenario rules

    
        
I did my internship on "Market Basket Analysis using Apriori Algorithm" from suven consultants under the guidance of  ROCKY JAGTIANI.

    Well know let us begin with the analysis of Market Basket it is divided into two important parts that is 
  • ASSOCIATION RULE MINING
  • APRIORI ALGORITHM
                        
ASSOCIATION RULE MINING:

      Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It identifies frequent if-then associations called association rules which consists of an antecedent (if) and a consequent (then).Basket Data Analysis (or Market Basket Analysis) in retailing, clustering and classification.

An antecedent is something that’s found in data, and a consequent is an item that is found in combination with the antecedent. This key technique is used by large retailers like Amazon,
 Flipkart and and many others to analyze costumers buying habits by finding association 
between  different items that customers place in their “shopping baskets”. The discovery of these associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers. The strategies may include:
  • Changing the store layout according to trends
  • Customer behavior analysis
  • Catalog design
  • Cross marketing on online stores
  • What are the trending items customers buy
  • Customized emails with add-on sales etc..
DIFFERENCE BETWEEN ASSOCIATION AND RECOMMENDATION:

Association rules do not extract an individual's preference, rather find relationships between sets of elements of every distinct transaction. This is what makes them different than Collaborative filtering which is used in recommendation systems.

“Frequently Bought Together” → Association

“Customers who bought this item also bought” → Recommendation.


APRIORI ALGORITHM:


                                        
                                            Apriori Algorithm in Data Mining: Implementation With Examples


        
The above given flow chart is the exact implementation of Apriori Algorithm. Apriori algorithm assumes that any subset of a frequent itemset must be frequent. Its the algorithm behind Market Basket Analysis.

Apriori Algorithm uses various sets of rules we will be using the matrices to solve MARKET BASKET ANALYSIS.  Let us take an example using the following image.

                            

  1. SUPPORT:  It is the ratio of transactions involving a particular item to the total number of transactions. It defines the popularity of an item. It ranges between 0 and 1. This measure gives an idea of how frequent an itemset is in all the transactions. Consider itemset1 = {bread} and itemset2 = {shampoo}. There will be far more transactions containing bread than those containing shampoo. So as you rightly guessed, itemset1 will generally have a higher support than itemset2. Now consider itemset1 = {bread, butter} and itemset2 = {bread, shampoo}. Many transactions will have both bread and butter on the cart but bread and shampoo? Not so much. So in this case, itemset1 will generally have a higher support than itemset2. Mathematically, support is the fraction of the total number of transactions in which the itemset occurs.   
    2. CONFIDENCE:  It is the ratio of number of transactions involving two items X               and Y by the number of transactions involving X. Therefore, it tells the                           possibility of  how often items X and Y occur together, given the number of times X        has occurred. It ranges between 0 and 1.

                

  3. LIFT:  Lift indicates certainty of a rule. How much sale of X has increased when B         is sold?
        
    Example: A => B [support = 5%, confidence = 80%]


                    

 To summarize the three parts into mathematical formula the image is given below:


                        Association Rules – NoSimpler

                    

    IMPLEMENTATION:

    While implementing on the machine learning model we have used a standard dataset 
    GROCERIES.CSV.

    1. EXPLORATORY DATA ANALYSIS OVER THE DATA:


    TO CHECK THE DATA SET SHAPE.    


 FIND THE TOP 20 ITEMS THAT OCCUR IN THE DATA SET:

              
                


     From the above EDA we can comment that Top 5 (whole milk, other vegetables, rolls/buns, soda, yogurt)Items are responsible to 21.74% to of the entire sale! and Top 20 Items are contributing 50.37% to the over all sale! in a supermarket.



2. PRUNE THE DATA SET:

                 
    
                

Here length_trans=2 indicates that we are interested in transactions with at least two items 
and their cumulative sales should account for 50% of the total sales.
   This is a very good data set because it contains top 19 items and very good (effectively, 
half) amount of rows/ transactions from the original data set. So, we will proceed with this.
        

3. APPLY APRIORI ALGORITHM AND OBTAIN RULES:


          
           


The above code is used to find one percent of the transactions in the because we have given min_support = 0.01 .



                




Support = 0.031163. This means that the itemset of Tropical Fruit, Whole Milk and Bottled Water show up together in 3.1% of transactions.

Confidence = 0.220472. If customer buys Tropical fruit & Whole Milk they will also buy Bottled Water 22.0% of the time.
   

Lift = 2.857132. Bottled Water are 2.85 times more likely to be purchased alongside Tropical Fruit & Whole Milk than they are relative to their general purchase rate

Lastly, we can adjust and tune the algorithm's parameters to generate many other different rules.


This is a significant piece of information, as this can prompt a retailer to bundle specific products like these together or run a marketing scheme that offers discount on buying root vegetables along with these other three products

Comments

Post a Comment

Popular posts from this blog

SENTIMENT ANALYSIS ON IMDB