94
PREDICTIVE ANALYTICS AND SAP HANA RDP267 Hands-on Exercises SAP TechEd 2013 Getting started with your session Login credentials and group numbers can be found in the ‘My Reservation’ tab on the SAP TechEd Virtual Hands-On Workshops website (https://saptechedhandson.sap.com/). Important: Some of the sessions use place holders for users (e.g. CD300_XX) or objects (e.g. ZCD400_Exercise_##). The place holders XX or ## must be replaced with your assigned group number, which you find in the ‘My Reservation’ tab on the above mentioned website.

RDP267 Exercise VHO

Embed Size (px)

DESCRIPTION

RDP267 Exercise VHO

Citation preview

  • PREDICTIVE ANALYTICS AND SAP HANA

    RDP267

    Hands-on Exercises SAP TechEd 2013

    Getting started with your session Login credentials and group numbers can be found in the My Reservation tab on the SAP TechEd Virtual Hands-On Workshops website (https://saptechedhandson.sap.com/). Important: Some of the sessions use place holders for users (e.g. CD300_XX) or objects (e.g. ZCD400_Exercise_##). The place holders XX or ## must be replaced with your assigned group number, which you find in the My Reservation tab on the above mentioned website.

  • 2

    INITIAL SETUP ................................................................................................................................................. 4 CHAPTER 1 .................................................................................................................................................... 10 Use HANA Studio and SQL Script to create a PAL procedure for C4.5 algorithm ................................. 11 Use HANA Studio and SQL Script to run the trained C4.5 Model ............................................................ 27 Use HANA Studio and AFM to create a PAL procedure for C4.5 algorithm ............................................ 36 CHAPTER 2 .................................................................................................................................................... 55 Use HANA Studio and SQL Script to create a PAL procedure for Outlier algorithm ............................. 56 Use HANA Studio and AFM to create a PAL procedure for Outlier algorithm ........................................ 65 CHAPTER 3 .................................................................................................................................................... 76 Use R Studio to develop a Generalized Linear Model ............................................................................... 77

  • 3

    HANA and Predictive

    BEFORE YOU START In the Hands-on session RDP267 you have the opportunity to select your exercises depending on your personal area of interest. However, you find the solutions to all exercises as reference this way you can also see the solution of the exercises you did not finish. Due to time constraints during the Hands-on session, it is recommended that you first take a look at the different exercises and then decide which ones you want to work through first. HANA and Predictive Chapter 1 45 minutes (4 exercises) Chapter 2 30 minutes (2 exercises) Chapter 3 15 minutes (2 exercises)

  • 4

    INITIAL SETUP

    During the exercises, you will work on a SAP HANA system with the following system properties:

    Host name: coe-he-084.wdf.sap.corp

    Instance number: 10

    SAP System ID (SID): M31

    Database user name: RDP267_# (# = your assigned student ID, maybe 1 or 2 digits)

    Password: Initial1

    Database Schema RDP267

    Student exercise package RDP267.sessionX.# (X = your assigned session number, # = student ID)

    Solution package RDP267.solution

    As preparatory steps, make sure a connection to the backend SAP HANA database system is defined with your assigned user (RDP267_#).

    Explanation Screenshot

    1. Start the SAP HANA Studio (by clicking on the desktop icon for it.

    2. Open the Development Perspective from the SAP HANA Studio start screen

    Either Open Development from the overview screen or select from the studio menu: Windows > Open Perspective > other > SAP HANA Development

    For the virtual Hands-on workshops the user and password is unique and was changed before you get access to the system. Therefore the password of the secure store needs to be recovered/unlocked.

  • 5

    Explanation Screenshot

    You need to recover the password first. Choose Window->Preferences

    Then open General -> Security -> Secure Storage

    Click on Recover Password

  • 6

    Explanation Screenshot

    Answer the questions Question 1: 1972 Question 2: Hoffenheim And click OK.

    And click OK.

    And click OK.

    Unlock the Secure Store by clicking on Unlock. Now you can continue with the exercises.

  • 7

    Explanation Screenshot

    3. Connect to the HANA System.

    On the left, select the SAP HANA System

    View ( ) from the available Views (Workspace, SAP HANA Repository Browser, SAP HANA System)

    In the HANA Systems View right-click on the background of the white area, then^

    Select > Add System from the context menu

  • 8

    Explanation Screenshot

    4. Specify Connection details:

    Enter the connection details

    Host Name: coe-he-084.wdf.sap.corp

    Instance Number: 10

    Click > Next

    Enter your assigned user and password credentials

    User Name: RDP267_XX (replace XX with your assigned student id)

    Password: Initial1

    Click > Finish.

    The SAP HANA System View will show the new connection

  • 9

    Explanation Screenshot

    5. Explore the SAP HANA Database Catalog and Repository Content structure for the Workshop

    The Database Catalog, i.e. the SAP HANA database schema with the tables for this workshop is the schema RDP267.

    To explore the schema:

    Expand the Catalog folder > RDP267 > Tables

    Note: In order to browse a table, right-click on the table and select > Open Content from the context menu.

  • 10

    CHAPTER 1

    In this chapter we will perform Hands-on Exercises in HANA Studio using the HANA PAL library Estimated time: 45 minutes Objective Use both SQL Script and the new Application Function Modeler (AFM) within HANA Studio to create and execute PAL procedures. What you will learn

    How to create and execute PAL Algorithms via SQL Script

    How to run the trained model via SQL script

    How to create and execute PAL Algorithms via AFM

    Exercise description

    Use HANA Studio and SQL Script to create a PAL procedure for C4.5 algorithm

    Use HANA Studio and SQL Script to run the trained C4.5 Model

    Use HANA Studio and AFM to create a PAL procedure for C4.5 algorithm

  • 11

    Use HANA Studio and SQL Script to create a PAL procedure for C4.5 algorithm

    Explanation Screenshot

    1. Click on the system M31 (RDP267_XX) to open the connection (where XX represents your group number).

    2. Click the icon.

  • 12

    Explanation Screenshot

    3. . Right mouse click in the middle of the blank area. Click the Open File...

    menu item to execute it.

    You can also press o.

    Open the PAL C4.5 CREATEDT RDP267XX version1.sql template script file from the Student (Local) directory: D:\Files\Session\RDP267 (type this share directory into the top line of the Open File dialog box)

  • 13

    Explanation Screenshot

    4. Click the system M31 (RDP267_XX, where XX represents your group number) select it, so that you can open another SQL Console.

    5. Click the icon to launch another SQL Console window (separate tab).

    This opens a new SQL Console window in which to create your own script for your user and group number. You can either type the text yourself, or paste in text from the template In either case, remember to replace XX with your group number. Note: if you choose to copy and paste, be sure to stop and take the time to understand the copied code!

    6. Click the tab M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    to select it.

    Here you see the text that goes into other SQL Console tab. You can Select the relevant script text area to be copied and copy with ctrl+c, or simply read the text and type the exact same thing in the SQL Console in the other tab.

  • 14

    Explanation Screenshot

    7. Click the tab M31 - SQL

    Console to select it.

    Type the text, or paste the copied text with ctrl+v; in either case, be sure to change XX to your group number

    8. Click the tab M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    9. Click the tab *M31 - SQL Console

    to select it.

  • 15

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v.

    Here we define the set of columns that you will train the Decision Tree on. By convention, the final column (FIVEYEARSURVIVAL) is assumed to be the "Dependent Column" of this algorithm and all other columns are assumed to be "Independent".

    10. Click the tab M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste select the relevant script text area to be copied and copy with ctrl+c.

    11. Click the *M31 - SQL Console

    tab to select it.

  • 16

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v.

    Here we define the columns of the two output tables that will be populated by this CREATEDT algorithm. One output will be the Decision Tree in JSON format. The other will be the same tree in PMML format.

    12. Click the M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    13. Click the *M31 - SQL Console

    tab to select it.

  • 17

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v.

    And finally, here we define the column definition of the generic "Input Control Parameter" table that is used by every PAL algorithm.

    14. Click the M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    15. Click the *M31 - SQL Console

    tab to select it.

  • 18

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v and make sure you change XX to your group number.

    Here we define and populate the "Signature table" for this algorithm. You define the 2 input tables and 2 output tables that this particular CREATEDT PAL algorithm expects. These are the table (types) you created in the script above.

    16. Click the M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    17. Click the *M31 - SQL Console

    tab to select it.

  • 19

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v and change XX to your group number.

    Here we ensure that the system user has select rights on your Signature table. This is because the AFL generate wrapper proc is owned by SYSTEM and is run with definer's rights.

    18. Click the M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    19. Click the *M31 - SQL Console

    tab to select it.

  • 20

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v and change XX to your group number.

    This part of the script calls the wrapper proc to create your own PAL CREATEDT proc.

    20. Click the M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    21. Click the *M31 - SQL Console

    tab to select it.

  • 21

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste

    the copied text with ctrl+v. Here we create a temporary table for the control parameters to be used during training of this CREATEDT model. The definition of the temporary table is in turn based on the table type definition you established earlier. See PAL Development guide for full explanation of all parameters. http://help.sap.com/hana/SAP_HANA_Predictive_Analysis_Library_PAL_en.pdf

    22. Click the M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    23. Click the *M31 - SQL Console

    tab to select it.

  • 22

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste

    the copied text with ctrl+v. Here we create the two physical output tables (based on the table type definitions you established earlier in the script).

    24. Click the M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    25. Click the *M31 - SQL Console

    tab to select it.

  • 23

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v and change XX to your group number.

    Here we define a new DB view which is subest of all columns available from the column view sap.hhp.fnd/CA_INTERACTIONS_PRED and which matches to the input data table type you defined earlier. Then we call your PAL CREATEDT procedure passing in the two input tables/views it expects.

    26. Click the M31 - PAL C4.5 CREATEDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    27. Click the *M31 - SQL Console

    tab to select it.

  • 24

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v.

    This part of the script will review both output tables.

    28. Click the Execute icon . Note: Upon execution, you may see numerous error messages for the DROP TYPE, DROP TABLE, and DROP VIEW statements. This is to be expected, we drop first simply as best practice, and these errors are not a problem. If you see other types of errors, review your code and look for discrepancies. In particular, look for cases where you missed substituting XX for your group number.

    29. There are several Result tabs.

    Click the first Result tab to select it. Your content should look similar to the illustration here.

  • 25

    Explanation Screenshot

    30. Click the

    nextResult tab to select it.

    Review the output table. While the JSON and PMML formats are not easily "human readable" there are Viz options on top of this trained model. We can also use this JSON format to predict the outcome of a new patient via SQL Script proc and that is what we plan to do later in this TechEd Hands on workshop for SAP HANA and Predictive.

    31. Click the third

    Result tab to select it.

  • 26

    Explanation Screenshot

    32. Review the content of this Result tab, it should look similar to the depiction shown here. This concludes this exercise; close the open SQL Console tabs to clean up your work area. To do so, simply click the X in the upper left of each tab. You can select No to the question, Save Changes?.

  • 27

    Use HANA Studio and SQL Script to run the trained C4.5 Model

    Explanation Screenshot

    1. Click the M31 (RDP267_XX) system to select it.

    2. Click the icon, to open an SQL Console again.

    3. Right lick in the blank space in the SQL Console, and then choose the Open File...

    menu item to execute it. Alternatively, you can press o.

    Open the PAL C4.5 PREDICTWITHDT XX version1.sql script from the file share.

  • 28

    Explanation Screenshot

    4. Click the system M31 RDP267_XX, where XX represents your group number

    5. Click the SQL Console icon to open again another SQL Console..

    6. Click the tab M31 - PAL C4.5 PREDICTWITHDT RDP267XX version1.sql

    to select it. This opens a new SQL Console window in which you create your own code for your user and group number. You can choose to type the code manually by looking at the provided script and reproducing each section precisely. Alternately, you can copy paste in text from the template, but be sure to stop and understand the steps you are taking! In any case, remember to replace XX with your group number.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

  • 29

    Explanation Screenshot

    7. Click the M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v and change XX to your group number.

    This step sets your schema for your user. Also, here you create the input table type for the data to be predicted.

    8. Click the M31 - PAL C4.5 PREDICTWITHDT RDP267XX version1.sql

    tab to select it.

    Note the comments in the template script. For the JSON model input table to the PREDICTWITH DT PAL algorithm we can simply use the JSON Model output table from the CREATEDT PAL Algorithm that you created in a previous step exercise in Chapter 1. This is included for informational purposes; no action needs to be taken regarding these comments.

  • 30

    Explanation Screenshot

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    9. Click the *M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste

    the copied text with ctrl+v. In this step, we create the table types for the Input Control Parameters table and the Result table

    10. Click the M31 - PAL C4.5 PREDICTWITHDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    11. Click the *M31 - SQL Console

    tab to select it.

  • 31

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v. In either case,change XX to your group number.

    Here we create and populate the Signature table and allow select access on it by SYSTEM user.

    12. Click the M31 - PAL C4.5 PREDICTWITHDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    13. Click the M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v. In either case, change XX to your group

    number. Here we create the PAL PREDICTWITHDT procedure for your user group number.

    14. Click the M31 - PAL C4.5 PREDICTWITHDT RDP267XX version1.sql

    tab to select it.

  • 32

    Explanation Screenshot

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    15. Click the M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v.

    Here we create and populate a Temp table as the Input Control Parameter table.

    16. Click the M31 - PAL C4.5 PREDICTWITHDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste select the relevant script text area to be copied and copy with ctrl+c.

    17. Click the M31 - SQL Console

    tab to select it.

  • 33

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply

    pastethe copied text with ctrl+v. This creates the physical output table for the results of the prediction

    18. Click the M31 - PAL C4.5 PREDICTWITHDT RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    19. Click the *M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v. In either case,change XX to your group

    number. This will call you PAL procedure, passing in some diagnosis and genomic biomarker information from a newly diagnosed patient and then review the predicted results

  • 34

    Explanation Screenshot

    20. Click Execute (F8) . Note: Upon execution, you may see numerous error messages for the DROP TYPE, DROP TABLE, and DROP VIEW statements. This is to be expected, we drop first simply as best practice, and these errors are not a problem. If you see other types of errors, review your code and look for discrepancies. In particular, look for cases where you missed substituting XX for your group number.

    21. Click the first

    Result tab to select it.

    The contents should resemble the depiction shown here.

    22. Click the second

    Result tab to select it.

  • 35

    Explanation Screenshot

    The contents of your Result tab should resemble the depiction shown here. This prediction suggests that a drug chemotherapy protocol of CAV and Protocol Timing of Neo adjuvant (before Surgery) would give this patient the best chance of 5 year survival given their diagnosis and biomarker information. This concludes this exercise; close the open SQL Console tabs to clean up your work area. To do so, simply click the X in the upper left of each tab. You can select No to the question, Save Changes?.

  • 36

    Use HANA Studio and AFM to create a PAL procedure for C4.5 algorithm

    Explanation Screenshot

    1. Click the SAP HANA Development button

    .

    2. Click the Project

    Explorer tab to select it.

    Right click in the blank part of the Project Explorer area, to invoke the menu.

    3. Click the

    Project... menu item to execute it. You can also press r.

  • 37

    Explanation Screenshot

    4. Click Project .

    5. Click Next ..

  • 38

    Explanation Screenshot

    6. Enter PROJ_RDP267_XX, where XX represents your student number in the Project name: .field.

    7. Click Next . You can also press Alt+n.

  • 39

    Explanation Screenshot

    8. Do not select any referenced projects; instead simply click Finish

    .

    9. Click the

    Window menu item.

    10. Click the Preferences

    menu item to execute it.

  • 40

    Explanation Screenshot

    11. Click SAP HANA Development .

    12. Click Repository Access

    .

    13. Check to make sure the regi location in your preferences matches the illustration shown here. If it does, simply click the Cancel button. If it does not, Enter C:\Program Files\sap\hdbclient\regi.exe in the Location: box and hit the OK button.

    14.

  • 41

    Explanation Screenshot

    15. Right click on PROJ_RDP267_XX ,where XX refers to your student number.

    16. From the menu, choose Team, then click the ShareProject

    menu

    item to execute it..

    In the first dialog box for Share Project, choose SAP HANA Repository, then click on the Next button.

  • 42

    Explanation Screenshot

    17. Click Add Workspace...

    .

    18. Click M31 (RDP267_XX) Tech Ed 2013, where XX represents your student number.

    .

    19. Enter WS_RDP267_XX, where XX represents your student number, in the Workspace Name: box. Leave the value for Workspace Root as the default.

  • 43

    Explanation Screenshot

    20. Click Finish .

    21. Select the entry WS_RDP267_XX [M31 (RDP267_XX), coe-he-084.wdf.sap.corp, 10] where XX represents your student number, by clicking it. Do not click Finish button here yet though, you still have to select the Repository Package in the next step.

  • 44

    Explanation Screenshot

    22. Click the Browse... button next to the Repository Package field. .

    23. Expand WS_RDP267_XX [M31 (RDP267_99), coe-he-084.wdf.sap.corp, 10], where XX represents your student number. .

    24. Click RDP267 .

  • 45

    Explanation Screenshot

    25. Click OK .

    26. Click Finish . You can also press Alt+f.

    27. Expand your project by clicking

    on the arrow icon next to it,

  • 46

    Explanation Screenshot

    28. Right mouse click on PROJ_RDP267_XX (where XX represents your student number)

    29. Choose New, then click the Other... Ctrl+N

    me menu item to execute it.

    30. Click AFL Connector File

    .

    31. Click &Next . You can also press Alt+n.

  • 47

    Explanation Screenshot

    33. Select your project, and enter MY_AFM_CDT in the File name: box. Next, click Finish

    .

    34. Double Click MY_AFM_CDT.aflpmml

    .

    35. Click the arrow icon next to the Classification functio .

  • 48

    Explanation Screenshot

    Select . and drag and drop to the main design panel

    36. Click the icon.

    37. Expand the hierarchy tree as shown under your project; Catalog > SAP_HHP > Tables. Select the CREATEDT_TECHED table.

  • 49

    Explanation Screenshot

    Drag the table CREATEDT_TECHED

    to the main area.

    38. On the object for this table, click the icon for Open Data

    Preview .

    39. Click in the area to the right of the scroll bar to scroll to the right.

    40. Click the *MY_AFM_CDT

    tab to select it.

  • 50

    Explanation Screenshot

    41. Drag a connecting line from CREATEDT... and release it onto the Training space.

    42. Next, click on the object for JsonModel (upper right). This will launch the Properties tab.

    43. Use the Plus icon (+) on the right hand side of the Properties tab to add two entries to the Output. Enter these values as shown here.

    44. Click the button

    Again make the following entries as shown here

  • 51

    Explanation Screenshot

    Next, click anywhere in the white space of your model. This will launch the Procedure Properties dialog box below.

    45. Click Open . You can also press Alt+Down Arrow.

    Select your user's schema from the dropdown list

  • 52

    Explanation Screenshot

    46. Click the object.

    47. Enter 100 in

    the (INTEGER) box. Adjust the MIN_NUMS_RECORDS parameter to 100

    48. Click Save (Ctrl+S) .

    49. Click on MY_AFM_CDT.aflmodel

    with the right mouse button.

  • 53

    Explanation Screenshot

    50. Click the Activate

    menu item to execute it. You can also press a.

    51. Click RDP267.PROJ_RDP267_99::MY_AFM_CDT.model

    .

    Select the Call button at top right of AFM screen

    52. Click OK .

    53. Click the object.

  • 54

    Explanation Screenshot

    You should see data resembling the content shown in the example here.

    Chapter Summary: In this chapter you learned via HANDS ON exercises how to create and execute PAL Algorithms via both SQL Script and via AFM.

  • 55

    CHAPTER 2

    In this chapter we will perform Hands-on Exercises using the HANA R integration. Estimated time: 30 mins Objective Use both SQL Script and the new Application Function Modeler (AFM) within HANA Studio to create and execute PAL procedures. What you will learn

    How to create and execute PAL Algorithms via SQL Script

    How to create and execute PAL Algorithms via AFM

    Exercise description

    Use HANA Studio and SQL Script to create a PAL procedure for Outlier algorithm

    Use HANA Studio and AFM to create a PAL procedure for Outlier algorithm

  • 56

    Use HANA Studio and SQL Script to create a PAL procedure for Outlier algorithm

    Explanation Screenshot

    1. Click M31 (RDP267_XX), where XX represents your student number, to open another SQL Console.

    2. Click the icon for the SQL

    Console .

  • 57

    Explanation Screenshot

    3. Right mouse click the blank space, and from the menu, select the Open File...

    menu item to execute it.

    Open the template script PAL Anomaly Detection RDP267XX version1.sql from the file share. Enter the share location D:\Files\Session\RDP267 In the top field of the Open File dialog box.

    4. Click M31 (RDP267_XX), where XX represents your group number, to open another SQL Console.

  • 58

    Explanation Screenshot

    5. Click the the icon for the SQL

    Console

    This opens a new SQL Console window in which to create your own script for your user and group number. You can either type the text yourself, or paste in text from the template In either case, remember to replace XX with your group number. Note: if you choose to copy and paste, be sure to stop and take the time to understand the copied code!

    6. Click the M31 - PAL Anomaly Detection RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    7. Click the M31 - SQL

    Console tab to select it.

  • 59

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v and change XX to your group number.

    Here we set to your user's schema, create the table types that define the input table, and the output table for this PAL Anomaly Algorithm.

    8. Click the M31 - PAL Anomaly Detection RDP267XX version1.sql

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply select the relevant script text area to be copied and copy with ctrl+c.

    9. Click the *M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v.

    Here we create the generic PAL table type for the input control parameters

  • 60

    Explanation Screenshot

    10. Click the M31 - PAL Anomaly Detection RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste select the relevant script text area to be copied and copy with ctrl+c.

    11. Click the *M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v and change XX to your group number.

    Here we create and populate the signature table for this Algorithm (which in this case contains 2 input table (types) and 1 output table type). Here we also allow the SYSTEM user select access on your signature table.

    12. Click the M31 - PAL Anomaly Detection RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

  • 61

    Explanation Screenshot

    13. Click the *M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v and change XX to your group number.

    Here we are calling the AFL wrapper procedure to create your new PAL procedure

    14. Click the M31 - PAL Anomaly Detection RDP267XX version1.sql

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply select the relevant script text area to be copied and copy with ctrl+c.

    15. Click the *M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v.

    Create and populate the Input Control Paramater table for this algorithm. See the PAL Development Guide for more details.

  • 62

    Explanation Screenshot

    16. Click the M31 - PAL Anomaly Detection RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    17. Click the *M31 - SQL Console

    tab to select it.

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste

    the copied text with ctrl+v. This step creates the physical output table based on the table type definition established earlier in this script

    18. Click the M31 - PAL Anomaly Detection RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    19. Click the *M31 - SQL Console

    tab to select it.

  • 63

    Explanation Screenshot

    Either type the aforementioned block of code precisely by looking at the script in the other tab and reproducing it here, or simply paste the copied text with ctrl+v and change XX to your group number.

    This will call your PAL procedure passing in data from the mentioned column view

    20. Click the M31 - PAL Anomaly Detection RDP267XX version1.sql

    tab to select it.

    The corresponding block of code shown here should be reproduced in the other SQL Console tab. If typing manually, be sure to reproduce it precisely. If utilizing copy/paste, select the relevant script text area to be copied and copy with ctrl+c.

    21. Click the *M31 - SQL Console

    tab to select it.

    Paste the copied text with ctrl+v Review the Output table to see which Patients are statistical outliers based on the 4 clusters that were defined in the Input Control Parameters.

    22. Click Execute (F8) . You may see some error messages about the Drop Type and Drop Table statements, but you can ignore those errors.

  • 64

    Explanation Screenshot

    23. Click thefirst

    Result tab to select it.

    Your output should look similar to the depiction shown here.

    24. Click the second

    Result tab to select it.

    These patients are the statistical outliers based on our model parameters. This may lead to insight and further analysis - e.g. why do some patients live longer than others after Diagnosis, and is their longevity only related to their age at Diagnosis?

  • 65

    Use HANA Studio and AFM to create a PAL procedure for Outlier algorithm

    Explanation Screenshot

    1. Click on the SAP HANA Development button

    .

    2. Click the Project

    Explorer tab to select it.

    3. Right mouse click on PROJ_RDP267_XX (where XX represents your group number) . This is the project you created in a previous exercise.

    4. From the menu, select new, and then chose the Other... menu item.

  • 66

    Explanation Screenshot

    5. Click AFL Connector File .

    6. Click Next .

  • 67

    Explanation Screenshot

    7. Enter MY_AFM_AD in the File name: box.

    8. Click Finish . You can also press Alt+f.

    9. Double-click on MY_AFM_AD.aflpmml

    .

    10. Click . Open the Clustering group of PAL Algorithms by clicking on the arrow next to the arrow icon for Clustering.

    .

  • 68

    Explanation Screenshot

    Drag Anomaly Detection and drop it in the main space.

    11. Click the plus sign icon. .

    12. In the left hand Catalog hierarchy under your project, in the schema SAP_HHP, under Tables, choose the ANOMALIES table and drag it into the main space.

    Drag an arrow from the

    ANOMALIES icon and drop it on the Data icon.

  • 69

    Explanation Screenshot

    13. On the ANOMALIES icon, click

    the icon for Open Data Preview.

    14. Click the Refresh icon . Change the max rows to 47000 and click refresh again.

    15. Click the Analysis

    tab.

    Drag PATIENT_ID and drop it on the Labels axis space.

  • 70

    Explanation Screenshot

    Drag DAYS_DIAG_DEATH and drop it on the value axis space.

    Drag AGE_DIAG and drop it on the

    value axis space.

  • 71

    Explanation Screenshot

    16. Click the button for Scatter

    charts . Review the Scatter Plot. Notice some outliers.

    17. Click the

    *MY_AFM_AD

    tab to select it.

    18. Click the Result

    icon.

    19. Select the entry

    by clicking it.

  • 72

    Explanation Screenshot

    Select anywhere in the whitespace of your model.

    20. Click Open . You can also press Alt+Down Arrow.

    21. Select the entry RDP267_XX (where XX represents your group

    number) by clicking it.

    22. Click Save .

  • 73

    Explanation Screenshot

    23. In the right hand hierarchy, select your project and right click to invoke the menu. Choose Team, then the Activate

    menu item to execute it.

    24. Click on the procedure RDP267.PROJ_RDP267_XX::MY_AFM_AD.model (where XX represents your group number)

    .

    25. Click the SQL tab to select it.

  • 74

    Explanation Screenshot

    26. Click the

    Overview tab to select it.

    Select the Call button at top right of AFM screen

    27. In the Call Procedure Success dialog box, click the OK button.

    .

    28. On the Result icon, select the

    Open Data Preview icon.

  • 75

    Explanation Screenshot

    In the result set, look for the the outliers.

    Chapter Summary In this chapter you learned via HANDS ON exercises how to create and execute PAL Algorithms via both SQL Script and via AFM

  • 76

    CHAPTER 3

    In this chapter we will perform Hands-on Exercises to run our trained predictive models Estimated time: 20 minutes Objective The objective of this chapter is give you an understanding of the fundamentals of the HANA/R connectivity through a real-life example and application of a widely used statistical method. What you will learn

    How to use R Studio for the Generalized Linear Model (GLM)

    How to create a HANA SQL Script that calls an R GLM Algorithm

    Exercise description

    Use R Studio to develop a Generalized Linear Model (GLM)

  • 77

    Use R Studio to develop a Generalized Linear Model

    Explanation Screenshot

    1. Click Start .

    2. From the All Programs menu, expand the RStudio folder and click

    the RStudio menu item to execute it.

    3. Click the Open File icon to open

    the R script.

  • 78

    Explanation Screenshot

    4. Enter the share location in the top field of the Open File dialog. D:\Files\Session\RDP267 and hit the green arrow button next to that.

    Select the script named "R GLM GROUP XX template scrtipt v1.R", and then click the Open button.

    5. Click the Maximize icon in the upper right of the window showing

    the script .

    6. Locate the uid (user id) parameter in the script. Replace the XX with your student number.

  • 79

    Explanation Screenshot

    7. For the pwd parameter, change the value to Initial1.

    We now want to extract the data we need for developing statistical models from HANA. This requires setting the parameters, but also creating the necessary connections. Please use the access data that has been provided to you in this workshop. 8. Select the region of the script as shown (only items #1. Through #5.)

    Next, click the Run button .

    9. In the right-hand window Workspace tab,

    click . This opens up the GLM_Analysis dataset so you can view its contents.

  • 80

    Explanation Screenshot

    10. In the upper right part of the window

    containing the result set, click To maximize the screen to display the data.

    11. Scroll to the right. Here you can see the variables that were read from SAP HANA. The dataset contains the demographic information of the patient and also the type of cancer with which they have been diagnosed.

  • 81

    Explanation Screenshot

    13. Close the tab containing the result set by clicking on the X at the top right of the tab.

    14. Select region shown here (#6.

    only) and click .

    15. In the right-hand window Workspace tab,

    Click . This will display the result set from the command previously executed.

  • 82

    Explanation Screenshot

    16. In the upper right part of the window containing the result set the icon to maximize the size of the result set

    window .

    17. Scroll to the right.

  • 83

    Explanation Screenshot

    19. Close the tab after reviewing the data.

    20. Select region depicted here (a specific part of #7. only) and click

    .

  • 84

    Explanation Screenshot

    21. Maximize the Console window by clicking the icon in the upper right .

  • 85

    Explanation Screenshot

    Generalized Linear Models can be viewed as an extension to Regression Models in that they allow 2 fundamental additions:

    They allow for the error models to be extended beyond the normality assumption.

    They allow for a generic use of categorical variables (as opposed to continuous variables).

    In the current example we are using a categorical binary variable (ONEYEARSURVIVAL) and are modeling it with the AGE_DIAG which is the age at which a specific patient was diagnosed with cancer. In other words: we are trying to measure the effect that the time of diagnosis has in life on the probability of someone surviving a year. 1.) The first section of the output we see here explains the main characteristics of the Residuals distribution. 2.) Following it, the estimates of the parameters are provided, with an estimated standard error, a z value and a p-value associated to it. 3.) Then the deviance section + the AIC (Akaike Information Criterion) are given. In theory, the deviance has a Chi Squared distribution. The smaller the deviance, the better the model. Similarily the AIC represents a goodness-of-fit statistic that allows to evaluate the model adequacy. 4.) The Number of Fisher Scoring iterations equals the number of times the convergence criterion function had to be evaluated for the method to obtain the numerical result.

  • 86

    Explanation Screenshot

    22. For the generation of predictions, the function "predict" is used, indicating with which model the values are to be created. In the window showing the script, select the region depicted here (just that one line) and click "Run".

    23. In the Console window, type in the text View(predicted1) an hit the

    Enter key. This allows you to view the predicted values.

    24. The window in the top left shows the result set. Maximize it by clicking the icon, click .

  • 87

    Explanation Screenshot

    25. Use the vertical scroll bar to display the desired screen area.

    26. Click . After viewing the values, please close the tab.

  • 88

    Explanation Screenshot

    There are four more models provided for you to test (model2 through model5). You are welcome to test these using the techniques described previously, and are invited to attempt to interpret the results.

    Now that we have developed the model in RStudio, we want to use it in HANA and are going to develop an R procedure that is directly embedded and executable in HANA. For that purpose: 29. Go back into the SAP HANA Studio. Click the button in the upper left for the Modeler perspective. In the SAP HANA Systems area in the upper left, Right mouse click on your system connection M31 (RDP267_XX) TechEd where XX represents your student number.

  • 89

    Explanation Screenshot

    30. From the menu, choose the SQL Console

    menu item to execute it.

    31. Click the File menu item to execute it.

    32. Click the Open File...

    menu item to execute it.

  • 90

    Explanation Screenshot

    Enter the share location in the field at the top of the dialog box, and hit the green arrow button: D:\Files\Session\RDP267

    Select the file GLM scoring function calling R from HANA template XX.sql

    33. Click Open .

    34. This is the code you will see.

    35. Now copy the first section into the SQL console you opened before.

    36. Replace all XX with your user ID. Run the code. Here you are creating the table with which you will create the models.

  • 91

    Explanation Screenshot

    37. Go back to the code and select the next part and copy. Go back to your SQL console.

    38. Paste it. Replace all XX with your user ID and run the code. Here you will be creating the data of patients that are going to be scored with the created model. The new table will also contain the prediction.

    39. Go back to the code and copy the part, where the R procedure is created. Go back to your SQL console.

    40. Paste the code and replace all XX with your user ID. Now run it. Here you are creating the R procedure. Note that we took the code we created in RStudio to perform this action.

  • 92

    Explanation Screenshot

    41. Finally, go back to the code and copy the last part. Go back with it to your SQL console.

    42. Make the replacements of the XX with your user ID and run it.

    43. Scrolling to the right of the results you will now see the predicted values!

    Chapter Summary: In this chapter you learned the fundamentals of the HANA/R connectivity through a real-life HANDS-ON example and application of Generalized Linear Models, a widely used statistical method. Thank you for participating in this SAP TechEd Virtual Hands-On Workshop! Please, take a few minutes to answer a couple of feedback questions concerning your session.

    Find a shortcut to the survey on the desktop of your virtual laptop image or visit https://www.sapsurvey.com/cgi-bin/qwebcorporate.dll?idx=FSQCZ7

  • 93

    2013 by SAP AG. All rights reserved. SAP and the SAP logo are registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an SAP company. Sybase and the Sybase logo are registered trademarks of Sybase Inc. Sybase is an SAP company. Crossgate is a registered trademark of Crossgate AG in Germany and other countries. Crossgate is an SAP company.

  • 94