22
Data Data Modelling in SAS Modelling in SAS How SAS is Used How SAS is Used for for Research Research and and Teaching Teaching to to Enable Enable Students Students to to Become Become More More Marketable Marketable Iveta Iveta Stankovi Stankovi č č ov ov Æ Æ Comenius Comenius University University Faculty Faculty of of Management Management Bratislava, Slovakia Bratislava, Slovakia iveta iveta . . stankovicova stankovicova @ @ fm fm . . uniba uniba . . sk sk

Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

Embed Size (px)

Citation preview

Page 1: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

DataData Modelling in SASModelling in SAS

How SAS is Used How SAS is Used forfor ResearchResearch and and TeachingTeaching to to EnableEnable StudentsStudents toto BecomeBecome More More MarketableMarketable

Iveta Iveta StankoviStankoviččovovááComeniusComenius UniversityUniversityFacultyFaculty ofof ManagementManagementBratislava, SlovakiaBratislava, Slovakiaivetaiveta..stankovicovastankovicova@@fmfm..unibauniba..sksk

Page 2: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

22

DataData

!! CurrentCurrent ageage isis characteristiccharacteristic ofofinformationinformation explosionexplosion

!! DataData areare generatedgenerated::�� ForFor researchresearch purposespurposes ((historicallyhistorically, , forfor datadata

analysisanalysis) ) �� experimentalexperimental datadata�� AsAs operationaloperational datadata ((todaytoday, in , in businessbusiness) ) ��

opportunisticopportunistic datadata ((HuberHuber 1977)1977)

Page 3: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

33

DataData

DirtyDirtyCleanCleanHygieneHygieneDynamicDynamicStaticStaticStateState

MassiveMassiveSmallSmallSizeSize

PassivelyPassivelyobservedobserved

ActivelyActivelycontrolledcontrolledGenerationGeneration

CommercialCommercialScientificScientificValueValue

OperationalOperationalReaserchReaserchPurposePurpose

OpportunisticOpportunisticDataData

ExperimentalExperimentalDataData

Page 4: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

44

Data Data InformationInformation

!! ItIt isis necessarynecessary to to obtainobtain informationinformation from from massivemassive amountsamounts ofof operationaloperational datadata forfordecisiondecision makingmaking ofof managersmanagers ((businessbusinessdecisiondecision supportsupport))

!! ItIt isis necessarynecessary to to exploreexplore and model and model relationshipsrelationships in in data predictivedata predictive modellingmodelling((fundamentalfundamental tasktask))

!! Data Data ModellingModelling = Data = Data MiningMining(cca 1963)(cca 1963)

Page 5: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

55

DataData MiningMining -- DDefinitionefinition!! SelectionSelection processprocess, , researchresearch and and modellingmodelling

basedbased on on greatgreat volumevolume ofof datadata in in orderorder to to detectdetect previousprevious unknownunknown informationinformationpatternspatterns forfor advantageadvantage in in thethe competiotioncompetiotionenvironmentenvironment

!! UseUse statisticalstatistical methodsmethods and and furtherfurther methodsmethodsin in borders borders on on artificialartificial intelligenceintelligence

!! MultidisciplinaryMultidisciplinary lineagelineage

Page 6: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

66

DataData MiningMining �� SAS SAS ddefinitionefinition!! Advanced methods for exploringAdvanced methods for exploring and and

modelling relationships in largemodelling relationships in large amountsamounts of of datadata

Characteristics:Characteristics:1.1. datadata �� massive, operational, opportunisticmassive, operational, opportunistic2.2. users andusers and sponsorssponsors �� nonnon--researchers, researchers,

business orientedbusiness oriented3.3. methodologymethodology �� multidisciplinary, via multidisciplinary, via

computercomputer

Page 7: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

77

DataData MiningMining �� AnalyticalAnalytical toolstools!! StatisticsStatistics!! ArtificialArtificial intelligence (AI)intelligence (AI)!! KnowledgeKnowledge discovery in databases (KDD)discovery in databases (KDD)!! MachineMachine learninglearning!! PatternPattern recognitionrecognition methodologymethodology!! NeurocomputingNeurocomputing

Page 8: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

88

Data Data MiningMining �� StepsSteps, , CycleCycle

1.1. IdentifyingIdentifying businessbusinessproblemproblem

2.2. TransformingTransforming datadataintointo actionableactionableresults results

3.3. ActingActing accordingaccording to to achievedachieved resultsresults

4.4. MeasuringMeasuring thetheresultsresults

4.4.

1.1.

3.3.

2.2.

Page 9: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

99

DataData MiningMining -- ActivitiesActivities

!! ClassificationClassification!! AffinityAffinity groupinggrouping or or associationassociation rulesrules!! ClusteringClustering, , segmentationsegmentation!! EstimationEstimation!! PredictionPrediction!! DescriptionDescription and and visualizationvisualization

Page 10: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1010

Data Data MiningMining -- PeoplePeople

!! DomainDomain expertsexperts!! Data Data expertsexperts!! AnalyticalAnalytical expertsexperts

Page 11: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1111

Data Data MiningMining -- ProcessesProcesses

1.1. Model Model makingmaking!! historicalhistorical datadata::

1.1. trainingtraining2.2. testtest3.3. validationvalidation

2.2. ApplyApply modelmodel!! new new datadata!! predictionprediction

Data MiningSystem

Algorithm

Training Test

Model

Score Model

Results

Training

Eval

Prediction

Page 12: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1212

DataData MiningMining �� PracticePractice

1.1. GoalGoal definitiondefinition2.2. SelectionSelection ofof datadata sourcessources3.3. PreparationPreparation ofof datadata forfor modellingmodelling4.4. SelectionSelection and and transformationtransformation ofof variablesvariables5.5. ProcessingProcessing and and evaluationevaluation ofof thethe modelmodel6.6. Model Model verification verification 7.7. ImplementationImplementation and model and model maintenancemaintenance

Page 13: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1313

DataData MiningMining �� SAS SAS solutionsolutionSEMMA SEMMA methodologymethodology::1.1. SSampleample �� identify input data sets, sample

from a large data set (training, test and validation data sets)

2.2. EExplorexplore �� explore data set statistically and graphically

3.3. MModifyodify �� prepare the data for analysis(data manipulation and transformation))

4.4. MModel odel �� fit a predictive model5.5. AAssessssess �� compare competing models

Page 14: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1414

DataData MiningMining -- MethodsMethods

!! StatisticalStatistical methodsmethods -- linearlinear and and logisticlogisticregressionregression, , multidimensionalmultidimensional methodsmethods,,timetime seriesseries analysisanalysis ......

!! NonNon--statisticalstatistical methodsmethods -- neuralneuralnetworksnetworks, , geneticgenetic algorithmalgorithm ......

!! MixedMixed methodsmethods -- classificacionclassificacion and and regressionregression treestrees ......

Page 15: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1515

SAS SAS SystemSystem atat ComeniusComeniusUniversityUniversity Bratislava (CU)Bratislava (CU)

!! November 1999November 1999 �� signedsigned a a licenselicensecontractcontract betweenbetween CU Bratislava andCU Bratislava and SAS SAS Institute Institute GmbHGmbH on on providingproviding 50 50 licenceslicencesofof SAS SAS SystemSystem

!! November 2001November 2001 -- additionaddition to to thethelicencelicence contractcontract withwith EnterpriseEnterprise GuideGuide

Page 16: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1616

SAS SAS SystemSystem atat FacultyFaculty ofofManagementManagement Bratislava (FM)Bratislava (FM)

!! Faculty Faculty ofof MManagement anagement -- 25 license25 licensess!! BeginningBeginning withwith SAS SAS educationeducation (V 6.12) (V 6.12) --

summersummer termterm in in academicacademic year year 1999/20001999/2000

!! CurrentCurrent daysdays �� SAS V8.2 and Enterprise SAS V8.2 and Enterprise Guide VGuide V2.02.0

Page 17: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1717

SubjectsSubjects ofof StatisticsStatistics

3 compulsory 3 compulsory subjectssubjects::!! IntroductionIntroduction to Sto Statisticstatistics

�� ((1st 1st yearyear,, summersummer termterm �� 4 ho4 hoursurs/week)/week)!! SStatistics on PCtatistics on PC

�� (2(2ndnd yearyear, , winterwinter termterm �� 2 ho2 hoursurs/week)/week)!! SStatistical tatistical MMethodsethods

�� (2(2ndnd yearyear, , summersummer termterm -- 4 ho4 hoursurs/week)/week)

1 elective 1 elective subjectsubject::!! QQuantitative methoduantitative methods s ((inin SASSAS SystemSystem))

�� (3(3rdrd yearyear, , summersummer termterm -- 2 ho2 hoursurs/week)/week)

Page 18: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1818

SubjectsSubjects contentscontents

ContentsContents ofof compulsorycompulsory subjectssubjects::�� mathematicalmathematical statisticsstatistics methodsmethods areare

includedincluded intointo thethe basicbasic modul (SAS/BASE, modul (SAS/BASE, SAS/STAT, SAS/ETS)SAS/STAT, SAS/ETS)

ContentsContents ofof electiveelective subjectsubject::�� logisticlogistic regressionregression, , principalprincipal componentscomponents

analysisanalysis (PCA), (PCA), clustercluster analysisanalysis, , factorfactoranalysisanalysis, , discriminationaldiscriminational analysisanalysis(SAS/STAT, EG)(SAS/STAT, EG)

Page 19: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

1919

SAS SAS SytemSytem �� offeredoffered in Menuin Menu

!! OverviewOverview ofof modulesmodules anan applicationsapplications ofof SAS SAS SystemSystem V8.2 V8.2 forfor creationcreation ofof statisticalstatisticalanalysisanalysis in in thethe menu menu modemode ((knowledgeknowledge ofofSAS SAS codecode isis notnot requiredrequired))

SAS/ASSIST softwareSAS/ASSIST softwareSAS/INSIGHT softwareSAS/INSIGHT softwareSAS AnalystSAS AnalystSAS/SAS/EnterpriseEnterprise GuideGuide

Page 20: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

2020

ActivitiesActivities

OutputsOutputs fromfrom SAS SAS educationeducation: : !! ProjectsProjects �� outputoutput fromfrom eacheach subjectsubject!! StudentStudent ResearchResearch ActivityActivity CompetitionCompetition �� 3rd 3rd

year, cca 15 year, cca 15 worksworks/per year/per year!! ThesisThesis worksworks

�� informationinformation systemsystem (modul AF)(modul AF)�� datadata analysisanalysis (module BASE, STAT, QC, ...)(module BASE, STAT, QC, ...)�� ScorecardScorecard ((EnterpriseEnterprise GuideGuide, , EnterpriseEnterprise MinerMiner))

!! ConferenceConference SAS SAS ForumForum -- participationparticipation ofofteachersteachers and and studentsstudents

Page 21: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

2121

PlansPlans

EExtensionxtension ofof plansplans forfor SAS SAS exploitationexploitation in in followingfollowingsubjectssubjects::

!! MultidimensionalMultidimensional MMethodethodss of of AAnalysisnalysis!! TTimeime SSerieseries AnalysisAnalysis!! MMarketingarketing RResearchesearch!! Data MiningData Mining!! FFinancialinancial AAnalysnalysiiss!! QQualityuality CControlontrol!! OOperationalperational MManaanagegementment

Page 22: Data Modelling in SAS - · PDF fileData Modelling in SAS. How SAS is Used . for. Research. and Teaching to Enable Students to Become More Marketable. Iveta Stankovi. č. ov. Æ. Comenius

2222

ThanksThanks forfor youryour attention!attention!