Upload
vantuong
View
219
Download
0
Embed Size (px)
Citation preview
DataData Modelling in SASModelling in SAS
How SAS is Used How SAS is Used forfor ResearchResearch and and TeachingTeaching to to EnableEnable StudentsStudents toto BecomeBecome More More MarketableMarketable
Iveta Iveta StankoviStankoviččovovááComeniusComenius UniversityUniversityFacultyFaculty ofof ManagementManagementBratislava, SlovakiaBratislava, Slovakiaivetaiveta..stankovicovastankovicova@@fmfm..unibauniba..sksk
22
DataData
!! CurrentCurrent ageage isis characteristiccharacteristic ofofinformationinformation explosionexplosion
!! DataData areare generatedgenerated::�� ForFor researchresearch purposespurposes ((historicallyhistorically, , forfor datadata
analysisanalysis) ) �� experimentalexperimental datadata�� AsAs operationaloperational datadata ((todaytoday, in , in businessbusiness) ) ��
opportunisticopportunistic datadata ((HuberHuber 1977)1977)
33
DataData
DirtyDirtyCleanCleanHygieneHygieneDynamicDynamicStaticStaticStateState
MassiveMassiveSmallSmallSizeSize
PassivelyPassivelyobservedobserved
ActivelyActivelycontrolledcontrolledGenerationGeneration
CommercialCommercialScientificScientificValueValue
OperationalOperationalReaserchReaserchPurposePurpose
OpportunisticOpportunisticDataData
ExperimentalExperimentalDataData
44
Data Data InformationInformation
!! ItIt isis necessarynecessary to to obtainobtain informationinformation from from massivemassive amountsamounts ofof operationaloperational datadata forfordecisiondecision makingmaking ofof managersmanagers ((businessbusinessdecisiondecision supportsupport))
!! ItIt isis necessarynecessary to to exploreexplore and model and model relationshipsrelationships in in data predictivedata predictive modellingmodelling((fundamentalfundamental tasktask))
!! Data Data ModellingModelling = Data = Data MiningMining(cca 1963)(cca 1963)
55
DataData MiningMining -- DDefinitionefinition!! SelectionSelection processprocess, , researchresearch and and modellingmodelling
basedbased on on greatgreat volumevolume ofof datadata in in orderorder to to detectdetect previousprevious unknownunknown informationinformationpatternspatterns forfor advantageadvantage in in thethe competiotioncompetiotionenvironmentenvironment
!! UseUse statisticalstatistical methodsmethods and and furtherfurther methodsmethodsin in borders borders on on artificialartificial intelligenceintelligence
!! MultidisciplinaryMultidisciplinary lineagelineage
66
DataData MiningMining �� SAS SAS ddefinitionefinition!! Advanced methods for exploringAdvanced methods for exploring and and
modelling relationships in largemodelling relationships in large amountsamounts of of datadata
Characteristics:Characteristics:1.1. datadata �� massive, operational, opportunisticmassive, operational, opportunistic2.2. users andusers and sponsorssponsors �� nonnon--researchers, researchers,
business orientedbusiness oriented3.3. methodologymethodology �� multidisciplinary, via multidisciplinary, via
computercomputer
77
DataData MiningMining �� AnalyticalAnalytical toolstools!! StatisticsStatistics!! ArtificialArtificial intelligence (AI)intelligence (AI)!! KnowledgeKnowledge discovery in databases (KDD)discovery in databases (KDD)!! MachineMachine learninglearning!! PatternPattern recognitionrecognition methodologymethodology!! NeurocomputingNeurocomputing
88
Data Data MiningMining �� StepsSteps, , CycleCycle
1.1. IdentifyingIdentifying businessbusinessproblemproblem
2.2. TransformingTransforming datadataintointo actionableactionableresults results
3.3. ActingActing accordingaccording to to achievedachieved resultsresults
4.4. MeasuringMeasuring thetheresultsresults
4.4.
1.1.
3.3.
2.2.
99
DataData MiningMining -- ActivitiesActivities
!! ClassificationClassification!! AffinityAffinity groupinggrouping or or associationassociation rulesrules!! ClusteringClustering, , segmentationsegmentation!! EstimationEstimation!! PredictionPrediction!! DescriptionDescription and and visualizationvisualization
1010
Data Data MiningMining -- PeoplePeople
!! DomainDomain expertsexperts!! Data Data expertsexperts!! AnalyticalAnalytical expertsexperts
1111
Data Data MiningMining -- ProcessesProcesses
1.1. Model Model makingmaking!! historicalhistorical datadata::
1.1. trainingtraining2.2. testtest3.3. validationvalidation
2.2. ApplyApply modelmodel!! new new datadata!! predictionprediction
Data MiningSystem
Algorithm
Training Test
Model
Score Model
Results
Training
Eval
Prediction
1212
DataData MiningMining �� PracticePractice
1.1. GoalGoal definitiondefinition2.2. SelectionSelection ofof datadata sourcessources3.3. PreparationPreparation ofof datadata forfor modellingmodelling4.4. SelectionSelection and and transformationtransformation ofof variablesvariables5.5. ProcessingProcessing and and evaluationevaluation ofof thethe modelmodel6.6. Model Model verification verification 7.7. ImplementationImplementation and model and model maintenancemaintenance
1313
DataData MiningMining �� SAS SAS solutionsolutionSEMMA SEMMA methodologymethodology::1.1. SSampleample �� identify input data sets, sample
from a large data set (training, test and validation data sets)
2.2. EExplorexplore �� explore data set statistically and graphically
3.3. MModifyodify �� prepare the data for analysis(data manipulation and transformation))
4.4. MModel odel �� fit a predictive model5.5. AAssessssess �� compare competing models
1414
DataData MiningMining -- MethodsMethods
!! StatisticalStatistical methodsmethods -- linearlinear and and logisticlogisticregressionregression, , multidimensionalmultidimensional methodsmethods,,timetime seriesseries analysisanalysis ......
!! NonNon--statisticalstatistical methodsmethods -- neuralneuralnetworksnetworks, , geneticgenetic algorithmalgorithm ......
!! MixedMixed methodsmethods -- classificacionclassificacion and and regressionregression treestrees ......
1515
SAS SAS SystemSystem atat ComeniusComeniusUniversityUniversity Bratislava (CU)Bratislava (CU)
!! November 1999November 1999 �� signedsigned a a licenselicensecontractcontract betweenbetween CU Bratislava andCU Bratislava and SAS SAS Institute Institute GmbHGmbH on on providingproviding 50 50 licenceslicencesofof SAS SAS SystemSystem
!! November 2001November 2001 -- additionaddition to to thethelicencelicence contractcontract withwith EnterpriseEnterprise GuideGuide
1616
SAS SAS SystemSystem atat FacultyFaculty ofofManagementManagement Bratislava (FM)Bratislava (FM)
!! Faculty Faculty ofof MManagement anagement -- 25 license25 licensess!! BeginningBeginning withwith SAS SAS educationeducation (V 6.12) (V 6.12) --
summersummer termterm in in academicacademic year year 1999/20001999/2000
!! CurrentCurrent daysdays �� SAS V8.2 and Enterprise SAS V8.2 and Enterprise Guide VGuide V2.02.0
1717
SubjectsSubjects ofof StatisticsStatistics
3 compulsory 3 compulsory subjectssubjects::!! IntroductionIntroduction to Sto Statisticstatistics
�� ((1st 1st yearyear,, summersummer termterm �� 4 ho4 hoursurs/week)/week)!! SStatistics on PCtatistics on PC
�� (2(2ndnd yearyear, , winterwinter termterm �� 2 ho2 hoursurs/week)/week)!! SStatistical tatistical MMethodsethods
�� (2(2ndnd yearyear, , summersummer termterm -- 4 ho4 hoursurs/week)/week)
1 elective 1 elective subjectsubject::!! QQuantitative methoduantitative methods s ((inin SASSAS SystemSystem))
�� (3(3rdrd yearyear, , summersummer termterm -- 2 ho2 hoursurs/week)/week)
1818
SubjectsSubjects contentscontents
ContentsContents ofof compulsorycompulsory subjectssubjects::�� mathematicalmathematical statisticsstatistics methodsmethods areare
includedincluded intointo thethe basicbasic modul (SAS/BASE, modul (SAS/BASE, SAS/STAT, SAS/ETS)SAS/STAT, SAS/ETS)
ContentsContents ofof electiveelective subjectsubject::�� logisticlogistic regressionregression, , principalprincipal componentscomponents
analysisanalysis (PCA), (PCA), clustercluster analysisanalysis, , factorfactoranalysisanalysis, , discriminationaldiscriminational analysisanalysis(SAS/STAT, EG)(SAS/STAT, EG)
1919
SAS SAS SytemSytem �� offeredoffered in Menuin Menu
!! OverviewOverview ofof modulesmodules anan applicationsapplications ofof SAS SAS SystemSystem V8.2 V8.2 forfor creationcreation ofof statisticalstatisticalanalysisanalysis in in thethe menu menu modemode ((knowledgeknowledge ofofSAS SAS codecode isis notnot requiredrequired))
SAS/ASSIST softwareSAS/ASSIST softwareSAS/INSIGHT softwareSAS/INSIGHT softwareSAS AnalystSAS AnalystSAS/SAS/EnterpriseEnterprise GuideGuide
2020
ActivitiesActivities
OutputsOutputs fromfrom SAS SAS educationeducation: : !! ProjectsProjects �� outputoutput fromfrom eacheach subjectsubject!! StudentStudent ResearchResearch ActivityActivity CompetitionCompetition �� 3rd 3rd
year, cca 15 year, cca 15 worksworks/per year/per year!! ThesisThesis worksworks
�� informationinformation systemsystem (modul AF)(modul AF)�� datadata analysisanalysis (module BASE, STAT, QC, ...)(module BASE, STAT, QC, ...)�� ScorecardScorecard ((EnterpriseEnterprise GuideGuide, , EnterpriseEnterprise MinerMiner))
!! ConferenceConference SAS SAS ForumForum -- participationparticipation ofofteachersteachers and and studentsstudents
2121
PlansPlans
EExtensionxtension ofof plansplans forfor SAS SAS exploitationexploitation in in followingfollowingsubjectssubjects::
!! MultidimensionalMultidimensional MMethodethodss of of AAnalysisnalysis!! TTimeime SSerieseries AnalysisAnalysis!! MMarketingarketing RResearchesearch!! Data MiningData Mining!! FFinancialinancial AAnalysnalysiiss!! QQualityuality CControlontrol!! OOperationalperational MManaanagegementment
2222
ThanksThanks forfor youryour attention!attention!