Upload
nadut
View
221
Download
0
Embed Size (px)
Citation preview
8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
1/10
PageSQL Server 2005 Data Mining
09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
Subscription ServicesSubscribe
Renew Subscriptionasp.netNOWNewsletterChange of AddressPay An InvoiceSubscription Packages
asp.netPROArticlesAffiliateSpotlight411asp.net DirectoryNew ProductsBook ReviewsProduct ReviewsOpinionBack IssuesReprints/E-printsSearch
DownloadsPremium Downloads
InformantContact UsAdvertisewith UsWrite For Us
LatestFeatures
NUnitASP
Under the Hood
Dynamic Dropdowns
ValidateUser-enteredData
Total Recall
Article Rating
Rate this article on ascale from 0 to 5
nmlkj 5 Best
nmlkj 4
nmlkj 3
nmlkj 2
nmlkj 1
nmlkj 0 Worst
Submit
Tell a friendabout this article!
asp:featureLANGUAGES: VB.NETASP.NETVERSIONS:ALL
SQLServer 2005DataMin ingCreate a Web Cross-
sell Application
By Raman Iyer and Jesper Lind
The concept of cross-sell isfamiliar to most of us. Whatyour friendly neighborhoodMcDonalds salesperson doeswhen you order a cheeseburgeris exactly what Amazon.com orbuy.com are doing online whenyou add items to your shoppingcart and you get a list of otheritems you might also like.You can add this functionality toyour ASP.NET page byemploying the power of data
mining, using simple SQL-likequeries to produce high-qualityrecommendations. Microsoft
SQL Server 2005, currentlyavailable to over 200,000MSDN subscribers in Beta 2,includes advanced data miningcapabilities that are availableprogrammatically via standard
interfaces like ADO.NET. Thiswill allow you to integrate cross-sell into your Web storeapplication with minimal effort.Before developing a W eb
cross-sell application, we needto build the server-side
intelligence that will enable theapplication to come up withsmart productrecommendations. Thisprocess involves:
Preparing the data you already have about past customersfor mining;Designing amining model for the purpose of makingrecommendations to new customers;Deploying themodel to Analysis Server and training it withthe data prepared earlier;andSetting up security to allow ASP.NET to query thetrainedmodel.
http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
2/10
PageSQL Server 2005 Data Mining
09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
The first part of this article will explain key concepts andwalk you through the above process us ing the SQLServer 2005 Data Mining tools. In the latter half welldemonstrate the ease with which you can integrate thepredictive power of the mining model into your ASP.NETWeb application using straightforward database access
code.Before You StartInstall Microsoft SQL Server 2005 Analysis Services Beta2. This will set up the server components as well as the
design and management tools that well use in this article:Business Intelligence Studio and SQL ServerManagement Studio, respectively. W e also recommendyou go over the data mining tutorial included with Beta 2.Note: Well refer to your running instance of SQL Server2005 Analysis Services as Analysis Server in the rest ofthe article. The service shows up as Analysis Services(MSSQLSERVER) under Services in AdministrativeTools.Mining Your Customer Purchase DataHere we outline the process of building the back-endframework that mines your historical customer moviepurchase transactions and extracts the knowledge neededto make cross-sell recommendations to new customers.This knowledge is embedded in the mining model thatwell design.Data preparation is an important aspect of any datamining process. With SQL Server Data Mining, however,it is also possible to mine the transaction data in yourrelational database directly. For simplicity, well take thisapproach and assume that a single transaction tablecontains your customer purchase information, as shownin Figure 1.
Figure 1: Movie purchase data used by sample application.
The Analysis Services project in SQL Server 2005Business Intelligence Development Studio provides theframework for modeling data and building a mining modelthat learns customer buying patterns from existing datagathered from prior purchases. We then use the trainedmodel to generate recommendations for new customers.The first step is to identify the entity whose behavior weare interested in analyzing for the purpose of our cross-sell application. A case represents all information (also
referred to as attributes) known about this entity. In thisscenario, each distinct customer in the CustomerMoviestable and the set of movies they purchased forms a case.SQL 2005 Data Mining uses the concept of a nestedtable to represent a variable-length collection of attributesof the same kind associated with a case. For each
customer there is a set of rows containing the list of
movies purchased, which can be represented as a nestedtable (as shown in Figure 2).
Figure 2: The miningmodelsview of the customer moviepurchase data.
http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
3/10
PageSQL Server 2005 Data Mining
09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
The definition of a case and its associated attributes isknown as a mining structure in SQL 2005 Data Mining.The next step is to build a mining model by selecting amining algorithm and specifying how the columns in themining structure will be used by the algorithm to process
the input data and extract useful knowledge from it. TheAssociation Rules algorithm is a good fit for our scenario.It learns which items are likely to be bought together andutilizes that information to predict other items given the
items the customer has selected. Marking the Moviesnested table as both Input and Predictable enables themodel to make predictions using the movies provided asinput. Note that the mining model in SQL Server 2005Data Mining is a database object that holds both thedefinition of the input to the knowledge extraction processand the output, which consists of patterns or ruleslearned by applying the selected data mining algorithm toprocess the input data.As well see, the Mining Model Wizard in SQL Server2005 Business Intelligence Development Studio builds a
mining structure and a model using the algorithm selectedon the first page.Next, the mining model definition is sent as part of adeployment package to the server where it is trained.Training cases consist of information we have collectedfrom past purchases. The deployment package isgenerated from the Analysis Services project that wedesign the model in, and it includes bindings to the datasource that Analysis Server uses for obtaining the trainingcases.
To improve the quality of recommendations made for newcustomers the model can be periodically re-trained asmore customer data is added to the transaction database.For large datasets this will typically be carried out duringoff-peak hours or against a replica of the transactiondatabase. SQL Server Data Transformation Services
(DTS) can be used to set up a package to perform suchperiodic updates.Finally, we must set up permissions so our applicationcan query the trained model.Building a Cross-sell Mining Model in SQL Server2005See the end of this article for information aboutdownloading the complete Analysis Server project built
using the steps outlined here for producing the miningmodel well utilize in the ASP.NET code sample later inthis article:1) Create a new Analysis Services project namedMovieRecommendations in Business IntelligenceDevelopment Studio.
2) Add a new DataSource pointing toMovieData.mdb, the Access database included with thesample project (available for download; see end of articlefor details).3) Add a DataSource View based on theDataSource. Select the only table, CustomerMovies,present in the DataSource.4) Right-click on the Mining Models collection andselect New Mining Model to launch the Mining ModelWizard.5) Pick From existing relational database or datawarehouse on the Select Definition Method page and clickNext.6) Pick Microsoft Association Rules as the datamining technique to use on the next page.7) Select the DataSource View created in step 3.
http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
4/10
PageSQL Server 2005 Data Mining
09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
8) On the next page, the single CustomerMoviestable present in our DataSource View is shown. Mark it asboth Case and Nested.9) Click Next to go to the Training Data page wherewe need to specify the columns we are interested inincluding from each selected table (see Figure 3). In our
scenario, the CustomerMovies table serves as the sourcefor our cases as well as the nested table data associatedwith each case (this is why it is listed twice). From thefirst instance pick CustomerId as the key since it uniquely
identifies each customer. From the second instance pickMovie as Key, Input, and Predictable.10) Click Finish to complete the wizard and build theMovie Recommendations mining structure and model (seeFigure 4).
Figure 3: Selectingcolumns we are interested in modeling for cross-sell in the D ataMining Wizard.
Figure 4: The cross-sell miningmodel in BusinessIntelligence Development Studio.
The above steps create a definition of our cross-sellmining model and associated objects in the developmentenvironment. There are two steps to deploy the miningmodel to the Analysis Server and train it:
1) Right-click on the MovieRecommendations projectin the Solution Explorer and select Properties. Verify thatthe Server property in the Deployment section ofConfiguration Properties points to the server hosting yourAnalysis Services instance. Close the dialog box.
2) Right-click again on the MovieRecommendationsproject in the Solution Explorer and select Deploy. Thissends the client-side definitions to the server and initiatestraining of the mining model.We must set up access permissions in Analysis Server
for IIS using the SQL management tool for our ASP.NETapplication to use the trained mining model:1) Open SQL Server Management Studio.
http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
5/10
PageSQL Server 2005 Data Mining
09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
2) Click Connect in the Object Explorer, selectAnalysis Server, and connect to the Analysis Serverhosting your model.3) Locate the MovieRecommendations database inthe Databases collection, right click on Roles, and selectNew Role. This brings up the Create Role dialog box.
4) On the General page enter Internet_User as therole name. Check the Read Definition checkbox underSet the database permissions for this role.5) Click Membership in the left pane and select a
page. Add your IIS user (the default isIUSR_machinename) to the role by clicking Add.6) Now click Mining Structure in the left pane. Thisshows the Customer Movies mining structure and theCustomer Movies mining model owned by the miningstructure. Drop down the permission list under Accessand select Read for both objects. Also check the ReadDefinition checkbox for both.7) Click OK to add the Role with the abovepermission set.Recommending Products Based on the UsersShopping BasketNow were ready to produce movie recommendations inour Web application by running a SQL-like query againstthe Analysis Server that holds our trained mining model.Weve put together a minimal application (shown in Figure5) that demonstrates the ideas behind a real deployment,focusing on the generation of the prediction query forgetting recommendations. The Web customer is assumedto have one or more items in the shopping basket, and forsimplicity we have a text box where items can be enteredmanually (separated by semicolons). Clicking Add Itemsto Cart displays the items in the shopping basket and alsoshows a list of recommendations.
Figure 5: A simple shopping basket application.
The code behind the button click is shown in Figure 6;you can see that its quite simple.Private Sub Button1_Click(ByVal sender As
Object, _
ByVal e As System.EventArgs) ' Handles
Me.Button1.Click
' Parse the input into an ArrayList of strings.
Dim alInputItems As New ArrayList()Dim splitchar As Char() = {";"c}
Dim szInputItems As String() =
Me.TextBox1.Text.Split(splitchar, 20)
Dim i As Integer
For i = 0 To szInputItems.Length - 1
alInputItems.Add(szInputItems(i).Trim())
Next i
' Add items to the shopping basket.
dgShoppingBasket.DataSource = alInputItems
dgShoppingBasket.DataBind()
' Get top 5 recommendations.
Dim alRecommendedItems As New ArrayList(5)
http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
6/10
PageSQL Server 2005 Data Mining
09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
GetRecommendations(alInputItems,
alRecommendedItems)
' Display recommendations.
dgRecommendations.DataSource =
alRecommendedItems
dgRecommendations.DataBind()
End Sub 'Button1_Click
Figure 6: Populate shoppingbasketand recommendations.
The method builds an ArrayList of strings from the items
in the shopping basket and passes it to theGetRecommendations subroutine, requesting the top fiverecommendations based on the input items. We use two
DataGrid objects: dgShoppingBasket, to hold the items inthe users shopping basket; and dgRecommendations, todisplay the generated recommendations. The realworkhorse is the GetRecommendations subroutine.The core of the GetRecommendations subroutine is theconstruction of the prediction join query (see Figure 7)that gets sent to Analysis Server and returns the list offive recommendations.SELECT FLATTENED
TopCount(Predict([Customer Movies],
INCLUDE_STATISTICS),
$AdjustedProbability,
5)
FROM [Movie Recommendations]
NATURAL PREDICTION JOIN
( SELECT ( SELECT 'Star Wars' AS [Movie]
UNION
SELECT 'The Matrix' AS
[Movie] )
AS [Customer Movies] ) AS t
Figure 7: Obtain recommendationsusingDMX prediction join againstminingmodel.
The SQL-like query language supported by AnalysisServer for querying mining models is called DMX. TheDMX query in Figure 7 performs a prediction join thattakes the two movies f rom the users shopping basket,
forms a case, and joins it with the mining model toproduce an output rowset containing a list of predictedrecommendations.The GetRecommendations subroutine, shown in Figure 8,stores most of the query in compile-time string templates.
What needs to be filled in is the input data.Private Shared Sub GetRecommendations( _
ByVal vInputItems As ArrayList, _
ByRef vRecommendedItems As ArrayList)
' Templates for generating DMX prediction join
statement.
Dim strDMX1 As String = _
"SELECT FLATTENED TopCount(" + _
"Predict([Customer Movies],
INCLUDE_STATISTICS)," + _
"$AdjustedProbability, 5) From [Movie
Recommendations] " + _
"NATURAL PREDICTION JOIN (SELECT ("
Dim strDMX2 As String = ") AS [Customer
Movies]) AS t"
' Iterate shopping basket and produce inputcase.
Dim cItems As Integer = vInputItems.Count
Dim strDMX As String = ""
Dim i As Integer
For i = 0 To cItems - 1
Dim item As String = vInputItems(i).ToString()
item = item.Replace("'", "''")
strDMX += "SELECT " + "'" + item + "' AS "+
"[Movie]"
If i < cItems - 1 Then
strDMX += " UNION "
End If
Next i
http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
7/10
8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
8/10
PageSQL Server 2005 Data Mining
09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
looks exactly like ADO.NET code but offers betterperformance. (To use ADOMD.NET youll need to add areference to Microsoft.AnalysisServices.AdomdClient.dll,installed by the SQL Server 2005 Beta 2 ClientComponents Setup. If it is not listed under .NET in theAdd Reference dialog box in Visual Studio, browse to
Program Files\Microsoft.NET\ADOMD.NET\90 and add itfrom that location.)The query results are fetched, again using a standard
data reader interface exposed by ADOMD.NET and thearray list of output items populated with therecommendations returned by the prediction join. Notethat the DMX query we generate uses the FLATTENEDkeyword to avoid having to perform hierarchical resultnavigation to fetch the results.Under the HoodIn this section we explain what happens on the serverwhen it receives a prediction query and how theknowledge acquired by the mining model may be exploredfurther.During the training process, the Association Rules modellearns a set of rules that are used to generate productrecommendations. If a rule such as [Camera, Film] ->Batteries was discovered and the customers shoppingbasket contains Camera and Film, then this rule fires. Ofcourse, there may be other rules that predict Batteries aswell, in which case the rule with the highest score isused. The score assigned to a rule, also known as itsImportance, takes into account and compensates for thefact that the probability for a rule may be high just
because the target item is popular in the dataset. Assumethat 10% of customers buy Star Wars irrespective ofother purchases, and Blade Runner is bought by just 3%.If two rules predict Star Wars and Blade Runner with thesame probability, the score forStar Wars will be lowersince its so popular. However, the advanced user can
fine tune the score using algorithm parameters.Finally, the top n highest scoring rules are used togenerate the recommendations. The last parameter in theTopCount function (again, see Figure 7) sets an upperlimit on the number of items returned by the prediction
algorithm.The rules are organized by the items predicted and sortedin descending order based on the score. The predictionalgorithm can avoid looking at very large sets of rules toachieve good prediction performance.Additional insight into a customers purchasing behaviormay be gained by using the viewers supplied as part ofthe Business Intelligence Development Studio. Theseinclude the Association Rules Viewer (shown in Figure 9)for browsing the rules and the Dependency Net Viewerthat graphically shows the relationship between items.The graph layout algorithm illustrates how the strongestcorrelated products are clustered.
http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
9/10
PageSQL Server 2005 Data Mining
09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp
Figure 9: Viewing model content.
The Association Rules viewer displays the importance foreach rule and ranks the rules by this score. As explainedearlier, importance serves as a better measure thanprobability for finding interesting rules in your data.Conclusion
This article merely scratches the surface of thepossibilities that SQL Server 2005 Data Mining presentsfor your ASP.NET applications. One idea for extendingthis sample would be to add other customer attributessuch as demographics as inputs to potentially improve thequality of the recommendations. Other possible Webapplications include:
Targeted ads based on analysis of browsing behavior usingthe Sequence_Clustering algorithm.An online Help desk that finds the most appropriate answerfor a users query by using theNeural_Network algorithm inconjunction with text mining technologies available in SQLServer 2005 Data Transformation Services.
The data mining tutorial for SQL Server 2005 AnalysisServices is available on the Readiness Kit CD includedwith the Beta 2 package. Visit the Microsoft betanewsgroup atmicrosoft.beta.yukon.analysisservices.datamining or theData Mining forum at http://www.sqljunkies.com/Forums/ShowForum.aspx?ForumID=38 if you have questions aboutSQL Server 2005 Data Mining. This article offers anoutline of the simplified mining process used for thisspecific scenario. For a generalized version of thismethodology, refer to CRISP-DM at http://www.crisp-dm.org.The sample code in this article is available fordownload.Raman Iyer is a Software Design Engineer at Microsoft
Corp. and a founding member of the SQL Server Data
Mining development team there. He can be reached atmailto:[email protected].
Jesper Lind is a Research Software Design Engineer atMicrosoft Research and a member of the M achine
Learning and Statistics team. He can be reached atmailto:[email protected].
http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.sqljunkies.com/Forums/http://www.crisp/mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.crisp/http://www.sqljunkies.com/Forums/http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp8/8/2019 SQL Server 2005 Data Mining-Create a Web Cross-Sell Application
10/10
Page SQL Server 2005 Data Mining
09/03/2005 17:34:http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri l/asp200410ri l.asp
Informant Communications Group, Inc.
5105 Florin Perkins Road
Sacramento, CA 95826Phone: (916) 379-0609 Fax: (916) 379-0610
Copyright 2005 Informant Communications Group. All Rights Reserved. Site Use Agreement Send feedback
to the Webmaster Im portant information about privacy
http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asphttp://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp