SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

  • Upload
    nadut

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    1/10

    PageSQL Server 2005 Data Mining

    09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp

    Subscription ServicesSubscribe

    Renew Subscriptionasp.netNOWNewsletterChange of AddressPay An InvoiceSubscription Packages

    asp.netPROArticlesAffiliateSpotlight411asp.net DirectoryNew ProductsBook ReviewsProduct ReviewsOpinionBack IssuesReprints/E-printsSearch

    DownloadsPremium Downloads

    InformantContact UsAdvertisewith UsWrite For Us

    LatestFeatures

    NUnitASP

    Under the Hood

    Dynamic Dropdowns

    ValidateUser-enteredData

    Total Recall

    Article Rating

    Rate this article on ascale from 0 to 5

    nmlkj 5 Best

    nmlkj 4

    nmlkj 3

    nmlkj 2

    nmlkj 1

    nmlkj 0 Worst

    Submit

    Email

    Tell a friendabout this article!

    asp:featureLANGUAGES: VB.NETASP.NETVERSIONS:ALL

    SQLServer 2005DataMin ingCreate a Web Cross-

    sell Application

    By Raman Iyer and Jesper Lind

    The concept of cross-sell isfamiliar to most of us. Whatyour friendly neighborhoodMcDonalds salesperson doeswhen you order a cheeseburgeris exactly what Amazon.com orbuy.com are doing online whenyou add items to your shoppingcart and you get a list of otheritems you might also like.You can add this functionality toyour ASP.NET page byemploying the power of data

    mining, using simple SQL-likequeries to produce high-qualityrecommendations. Microsoft

    SQL Server 2005, currentlyavailable to over 200,000MSDN subscribers in Beta 2,includes advanced data miningcapabilities that are availableprogrammatically via standard

    interfaces like ADO.NET. Thiswill allow you to integrate cross-sell into your Web storeapplication with minimal effort.Before developing a W eb

    cross-sell application, we needto build the server-side

    intelligence that will enable theapplication to come up withsmart productrecommendations. Thisprocess involves:

    Preparing the data you already have about past customersfor mining;Designing amining model for the purpose of makingrecommendations to new customers;Deploying themodel to Analysis Server and training it withthe data prepared earlier;andSetting up security to allow ASP.NET to query thetrainedmodel.

    http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    2/10

    PageSQL Server 2005 Data Mining

    09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp

    The first part of this article will explain key concepts andwalk you through the above process us ing the SQLServer 2005 Data Mining tools. In the latter half welldemonstrate the ease with which you can integrate thepredictive power of the mining model into your ASP.NETWeb application using straightforward database access

    code.Before You StartInstall Microsoft SQL Server 2005 Analysis Services Beta2. This will set up the server components as well as the

    design and management tools that well use in this article:Business Intelligence Studio and SQL ServerManagement Studio, respectively. W e also recommendyou go over the data mining tutorial included with Beta 2.Note: Well refer to your running instance of SQL Server2005 Analysis Services as Analysis Server in the rest ofthe article. The service shows up as Analysis Services(MSSQLSERVER) under Services in AdministrativeTools.Mining Your Customer Purchase DataHere we outline the process of building the back-endframework that mines your historical customer moviepurchase transactions and extracts the knowledge neededto make cross-sell recommendations to new customers.This knowledge is embedded in the mining model thatwell design.Data preparation is an important aspect of any datamining process. With SQL Server Data Mining, however,it is also possible to mine the transaction data in yourrelational database directly. For simplicity, well take thisapproach and assume that a single transaction tablecontains your customer purchase information, as shownin Figure 1.

    Figure 1: Movie purchase data used by sample application.

    The Analysis Services project in SQL Server 2005Business Intelligence Development Studio provides theframework for modeling data and building a mining modelthat learns customer buying patterns from existing datagathered from prior purchases. We then use the trainedmodel to generate recommendations for new customers.The first step is to identify the entity whose behavior weare interested in analyzing for the purpose of our cross-sell application. A case represents all information (also

    referred to as attributes) known about this entity. In thisscenario, each distinct customer in the CustomerMoviestable and the set of movies they purchased forms a case.SQL 2005 Data Mining uses the concept of a nestedtable to represent a variable-length collection of attributesof the same kind associated with a case. For each

    customer there is a set of rows containing the list of

    movies purchased, which can be represented as a nestedtable (as shown in Figure 2).

    Figure 2: The miningmodelsview of the customer moviepurchase data.

    http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    3/10

    PageSQL Server 2005 Data Mining

    09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp

    The definition of a case and its associated attributes isknown as a mining structure in SQL 2005 Data Mining.The next step is to build a mining model by selecting amining algorithm and specifying how the columns in themining structure will be used by the algorithm to process

    the input data and extract useful knowledge from it. TheAssociation Rules algorithm is a good fit for our scenario.It learns which items are likely to be bought together andutilizes that information to predict other items given the

    items the customer has selected. Marking the Moviesnested table as both Input and Predictable enables themodel to make predictions using the movies provided asinput. Note that the mining model in SQL Server 2005Data Mining is a database object that holds both thedefinition of the input to the knowledge extraction processand the output, which consists of patterns or ruleslearned by applying the selected data mining algorithm toprocess the input data.As well see, the Mining Model Wizard in SQL Server2005 Business Intelligence Development Studio builds a

    mining structure and a model using the algorithm selectedon the first page.Next, the mining model definition is sent as part of adeployment package to the server where it is trained.Training cases consist of information we have collectedfrom past purchases. The deployment package isgenerated from the Analysis Services project that wedesign the model in, and it includes bindings to the datasource that Analysis Server uses for obtaining the trainingcases.

    To improve the quality of recommendations made for newcustomers the model can be periodically re-trained asmore customer data is added to the transaction database.For large datasets this will typically be carried out duringoff-peak hours or against a replica of the transactiondatabase. SQL Server Data Transformation Services

    (DTS) can be used to set up a package to perform suchperiodic updates.Finally, we must set up permissions so our applicationcan query the trained model.Building a Cross-sell Mining Model in SQL Server2005See the end of this article for information aboutdownloading the complete Analysis Server project built

    using the steps outlined here for producing the miningmodel well utilize in the ASP.NET code sample later inthis article:1) Create a new Analysis Services project namedMovieRecommendations in Business IntelligenceDevelopment Studio.

    2) Add a new DataSource pointing toMovieData.mdb, the Access database included with thesample project (available for download; see end of articlefor details).3) Add a DataSource View based on theDataSource. Select the only table, CustomerMovies,present in the DataSource.4) Right-click on the Mining Models collection andselect New Mining Model to launch the Mining ModelWizard.5) Pick From existing relational database or datawarehouse on the Select Definition Method page and clickNext.6) Pick Microsoft Association Rules as the datamining technique to use on the next page.7) Select the DataSource View created in step 3.

    http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    4/10

    PageSQL Server 2005 Data Mining

    09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp

    8) On the next page, the single CustomerMoviestable present in our DataSource View is shown. Mark it asboth Case and Nested.9) Click Next to go to the Training Data page wherewe need to specify the columns we are interested inincluding from each selected table (see Figure 3). In our

    scenario, the CustomerMovies table serves as the sourcefor our cases as well as the nested table data associatedwith each case (this is why it is listed twice). From thefirst instance pick CustomerId as the key since it uniquely

    identifies each customer. From the second instance pickMovie as Key, Input, and Predictable.10) Click Finish to complete the wizard and build theMovie Recommendations mining structure and model (seeFigure 4).

    Figure 3: Selectingcolumns we are interested in modeling for cross-sell in the D ataMining Wizard.

    Figure 4: The cross-sell miningmodel in BusinessIntelligence Development Studio.

    The above steps create a definition of our cross-sellmining model and associated objects in the developmentenvironment. There are two steps to deploy the miningmodel to the Analysis Server and train it:

    1) Right-click on the MovieRecommendations projectin the Solution Explorer and select Properties. Verify thatthe Server property in the Deployment section ofConfiguration Properties points to the server hosting yourAnalysis Services instance. Close the dialog box.

    2) Right-click again on the MovieRecommendationsproject in the Solution Explorer and select Deploy. Thissends the client-side definitions to the server and initiatestraining of the mining model.We must set up access permissions in Analysis Server

    for IIS using the SQL management tool for our ASP.NETapplication to use the trained mining model:1) Open SQL Server Management Studio.

    http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    5/10

    PageSQL Server 2005 Data Mining

    09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp

    2) Click Connect in the Object Explorer, selectAnalysis Server, and connect to the Analysis Serverhosting your model.3) Locate the MovieRecommendations database inthe Databases collection, right click on Roles, and selectNew Role. This brings up the Create Role dialog box.

    4) On the General page enter Internet_User as therole name. Check the Read Definition checkbox underSet the database permissions for this role.5) Click Membership in the left pane and select a

    page. Add your IIS user (the default isIUSR_machinename) to the role by clicking Add.6) Now click Mining Structure in the left pane. Thisshows the Customer Movies mining structure and theCustomer Movies mining model owned by the miningstructure. Drop down the permission list under Accessand select Read for both objects. Also check the ReadDefinition checkbox for both.7) Click OK to add the Role with the abovepermission set.Recommending Products Based on the UsersShopping BasketNow were ready to produce movie recommendations inour Web application by running a SQL-like query againstthe Analysis Server that holds our trained mining model.Weve put together a minimal application (shown in Figure5) that demonstrates the ideas behind a real deployment,focusing on the generation of the prediction query forgetting recommendations. The Web customer is assumedto have one or more items in the shopping basket, and forsimplicity we have a text box where items can be enteredmanually (separated by semicolons). Clicking Add Itemsto Cart displays the items in the shopping basket and alsoshows a list of recommendations.

    Figure 5: A simple shopping basket application.

    The code behind the button click is shown in Figure 6;you can see that its quite simple.Private Sub Button1_Click(ByVal sender As

    Object, _

    ByVal e As System.EventArgs) ' Handles

    Me.Button1.Click

    ' Parse the input into an ArrayList of strings.

    Dim alInputItems As New ArrayList()Dim splitchar As Char() = {";"c}

    Dim szInputItems As String() =

    Me.TextBox1.Text.Split(splitchar, 20)

    Dim i As Integer

    For i = 0 To szInputItems.Length - 1

    alInputItems.Add(szInputItems(i).Trim())

    Next i

    ' Add items to the shopping basket.

    dgShoppingBasket.DataSource = alInputItems

    dgShoppingBasket.DataBind()

    ' Get top 5 recommendations.

    Dim alRecommendedItems As New ArrayList(5)

    http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    6/10

    PageSQL Server 2005 Data Mining

    09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp

    GetRecommendations(alInputItems,

    alRecommendedItems)

    ' Display recommendations.

    dgRecommendations.DataSource =

    alRecommendedItems

    dgRecommendations.DataBind()

    End Sub 'Button1_Click

    Figure 6: Populate shoppingbasketand recommendations.

    The method builds an ArrayList of strings from the items

    in the shopping basket and passes it to theGetRecommendations subroutine, requesting the top fiverecommendations based on the input items. We use two

    DataGrid objects: dgShoppingBasket, to hold the items inthe users shopping basket; and dgRecommendations, todisplay the generated recommendations. The realworkhorse is the GetRecommendations subroutine.The core of the GetRecommendations subroutine is theconstruction of the prediction join query (see Figure 7)that gets sent to Analysis Server and returns the list offive recommendations.SELECT FLATTENED

    TopCount(Predict([Customer Movies],

    INCLUDE_STATISTICS),

    $AdjustedProbability,

    5)

    FROM [Movie Recommendations]

    NATURAL PREDICTION JOIN

    ( SELECT ( SELECT 'Star Wars' AS [Movie]

    UNION

    SELECT 'The Matrix' AS

    [Movie] )

    AS [Customer Movies] ) AS t

    Figure 7: Obtain recommendationsusingDMX prediction join againstminingmodel.

    The SQL-like query language supported by AnalysisServer for querying mining models is called DMX. TheDMX query in Figure 7 performs a prediction join thattakes the two movies f rom the users shopping basket,

    forms a case, and joins it with the mining model toproduce an output rowset containing a list of predictedrecommendations.The GetRecommendations subroutine, shown in Figure 8,stores most of the query in compile-time string templates.

    What needs to be filled in is the input data.Private Shared Sub GetRecommendations( _

    ByVal vInputItems As ArrayList, _

    ByRef vRecommendedItems As ArrayList)

    ' Templates for generating DMX prediction join

    statement.

    Dim strDMX1 As String = _

    "SELECT FLATTENED TopCount(" + _

    "Predict([Customer Movies],

    INCLUDE_STATISTICS)," + _

    "$AdjustedProbability, 5) From [Movie

    Recommendations] " + _

    "NATURAL PREDICTION JOIN (SELECT ("

    Dim strDMX2 As String = ") AS [Customer

    Movies]) AS t"

    ' Iterate shopping basket and produce inputcase.

    Dim cItems As Integer = vInputItems.Count

    Dim strDMX As String = ""

    Dim i As Integer

    For i = 0 To cItems - 1

    Dim item As String = vInputItems(i).ToString()

    item = item.Replace("'", "''")

    strDMX += "SELECT " + "'" + item + "' AS "+

    "[Movie]"

    If i < cItems - 1 Then

    strDMX += " UNION "

    End If

    Next i

    http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    7/10

  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    8/10

    PageSQL Server 2005 Data Mining

    09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp

    looks exactly like ADO.NET code but offers betterperformance. (To use ADOMD.NET youll need to add areference to Microsoft.AnalysisServices.AdomdClient.dll,installed by the SQL Server 2005 Beta 2 ClientComponents Setup. If it is not listed under .NET in theAdd Reference dialog box in Visual Studio, browse to

    Program Files\Microsoft.NET\ADOMD.NET\90 and add itfrom that location.)The query results are fetched, again using a standard

    data reader interface exposed by ADOMD.NET and thearray list of output items populated with therecommendations returned by the prediction join. Notethat the DMX query we generate uses the FLATTENEDkeyword to avoid having to perform hierarchical resultnavigation to fetch the results.Under the HoodIn this section we explain what happens on the serverwhen it receives a prediction query and how theknowledge acquired by the mining model may be exploredfurther.During the training process, the Association Rules modellearns a set of rules that are used to generate productrecommendations. If a rule such as [Camera, Film] ->Batteries was discovered and the customers shoppingbasket contains Camera and Film, then this rule fires. Ofcourse, there may be other rules that predict Batteries aswell, in which case the rule with the highest score isused. The score assigned to a rule, also known as itsImportance, takes into account and compensates for thefact that the probability for a rule may be high just

    because the target item is popular in the dataset. Assumethat 10% of customers buy Star Wars irrespective ofother purchases, and Blade Runner is bought by just 3%.If two rules predict Star Wars and Blade Runner with thesame probability, the score forStar Wars will be lowersince its so popular. However, the advanced user can

    fine tune the score using algorithm parameters.Finally, the top n highest scoring rules are used togenerate the recommendations. The last parameter in theTopCount function (again, see Figure 7) sets an upperlimit on the number of items returned by the prediction

    algorithm.The rules are organized by the items predicted and sortedin descending order based on the score. The predictionalgorithm can avoid looking at very large sets of rules toachieve good prediction performance.Additional insight into a customers purchasing behaviormay be gained by using the viewers supplied as part ofthe Business Intelligence Development Studio. Theseinclude the Association Rules Viewer (shown in Figure 9)for browsing the rules and the Dependency Net Viewerthat graphically shows the relationship between items.The graph layout algorithm illustrates how the strongestcorrelated products are clustered.

    http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    9/10

    PageSQL Server 2005 Data Mining

    09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp

    Figure 9: Viewing model content.

    The Association Rules viewer displays the importance foreach rule and ranks the rules by this score. As explainedearlier, importance serves as a better measure thanprobability for finding interesting rules in your data.Conclusion

    This article merely scratches the surface of thepossibilities that SQL Server 2005 Data Mining presentsfor your ASP.NET applications. One idea for extendingthis sample would be to add other customer attributessuch as demographics as inputs to potentially improve thequality of the recommendations. Other possible Webapplications include:

    Targeted ads based on analysis of browsing behavior usingthe Sequence_Clustering algorithm.An online Help desk that finds the most appropriate answerfor a users query by using theNeural_Network algorithm inconjunction with text mining technologies available in SQLServer 2005 Data Transformation Services.

    The data mining tutorial for SQL Server 2005 AnalysisServices is available on the Readiness Kit CD includedwith the Beta 2 package. Visit the Microsoft betanewsgroup atmicrosoft.beta.yukon.analysisservices.datamining or theData Mining forum at http://www.sqljunkies.com/Forums/ShowForum.aspx?ForumID=38 if you have questions aboutSQL Server 2005 Data Mining. This article offers anoutline of the simplified mining process used for thisspecific scenario. For a generalized version of thismethodology, refer to CRISP-DM at http://www.crisp-dm.org.The sample code in this article is available fordownload.Raman Iyer is a Software Design Engineer at Microsoft

    Corp. and a founding member of the SQL Server Data

    Mining development team there. He can be reached atmailto:[email protected].

    Jesper Lind is a Research Software Design Engineer atMicrosoft Research and a member of the M achine

    Learning and Statistics team. He can be reached atmailto:[email protected].

    http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.sqljunkies.com/Forums/http://www.crisp/mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.crisp/http://www.sqljunkies.com/Forums/http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
  • 8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application

    10/10

    Page SQL Server 2005 Data Mining

    09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri l/asp200410ri l.asp

    Informant Communications Group, Inc.

    5105 Florin Perkins Road

    Sacramento, CA 95826Phone: (916) 379-0609 Fax: (916) 379-0610

    Copyright 2005 Informant Communications Group. All Rights Reserved. Site Use Agreement Send feedback

    to the Webmaster Im portant information about privacy

    http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp