28
Materi Business Problems and Data Science Solutions From Business Problems to Data Mining Tasks Supervised Versus Unsupervised Methods Data Mining and Its Results The Data Mining Process Implications for Managing the Data Science Team The data science process The roles in a data science project Stages of a data science project Setting expectations Loading data into R Working with data from files Working with well-structured data from files or URLs Using R on less-structured data Working with relational databases A production-size example Exploring data Using summary statistics to spot problems Spotting problems using graphics and visualization Visually checking distributions for a single variable Visually checking relationships between two variables Managing data Cleaning data Data transformations Sampling for modeling and validation Creating a sample group column Data Mining Patterns Cluster analysis Anomaly detection Association rules Data Mining Sequences Finding frequent items in a dataset Mining the Agrawal data for frequent sets Determining sequences in training and careers Similarities in the sequence Text Mining Using Text processing Using Text clusters Analyzing the XML text

Statistik By

Embed Size (px)

DESCRIPTION

Aa

Citation preview

Ga dipakaiMateriSourceBusiness Problems and Data Science SolutionsOReilly - Data Science for BusinessFrom Business Problems to Data Mining TasksSupervised Versus Unsupervised MethodsData Mining and Its ResultsThe Data Mining ProcessImplications for Managing the Data Science TeamThe data science processManning - Practical Data Science with RThe roles in a data science projectStages of a data science projectSetting expectationsLoading data into RManning - Practical Data Science with RWorking with data from filesWorking with well-structured data from files or URLsUsing R on less-structured dataWorking with relational databasesA production-size exampleExploring dataManning - Practical Data Science with RUsing summary statistics to spot problemsSpotting problems using graphics and visualizationVisually checking distributions for a single variableVisually checking relationships between two variablesManaging dataManning - Practical Data Science with RCleaning dataData transformationsSampling for modeling and validationCreating a sample group columnData Mining PatternsPACKT - R for Data ScienceCluster analysisAnomaly detectionAssociation rulesData Mining SequencesPACKT - R for Data ScienceFinding frequent items in a datasetMining the Agrawal data for frequent setsDetermining sequences in training and careersSimilarities in the sequenceText MiningPACKT - R for Data ScienceUsing Text processingUsing Text clustersAnalyzing the XML textData Analysis - Regression AnalysisPACKT - R for Data ScienceUsing Simple regressionUsing Multiple regressionUsing Multivariate regression analysisUsing Robust regressionData Analysis - CorrelationPACKT - R for Data ScienceVisualizing correlationsCovariancePearson correlationPolychoric correlationTetrachoric correlationA heterogeneous correlation matrixData Analysis - ClusteringPACKT - R for Data ScienceK-means clusteringSelecting clusters based on Bayesian informationAffinity propagation clusteringHierarchical clusteringData Visualization - R GraphicsPACKT - R for Data ScienceInteractive graphicsUsing The latticist packageUsing The ggplot2 packageData Visualization - PlottingPACKT - R for Data ScienceUsing Scatter plotsUsing Scatterplot matricesUsing Bar charts and plotsData Visualization - 3DPACKT - R for Data ScienceGenerating 3D graphicsLattice Cloud 3D scatterplotUsing scatter3dUsing RgoogleMapsBig Data with R

HS dipakaiChapterMateriSourceSmall CapsBusiness Problems & Data Science SolutionsOReilly - Data Science for Business -> Kasih pengantar dikit dari siniThe Data Science ProcessManning - Practical Data Science with RThe roles in a data science projectStages of a data science projectSetting expectationsImplications for managing the data science teamOReilly - Data Science for BusinessSmall CapsR Package for Data ScienceLoading Data into RManning - Practical Data Science with RWorking with data from filesWorking with well-structured data from files or URLsUsing R on less-structured dataWorking with relational databasesA production-size exampleExploring DataManning - Practical Data Science with RUsing summary statistics to spot problemsSpotting problems using graphics and visualizationVisually checking distributions for a single variableVisually checking relationships between two variablesManaging DataManning - Practical Data Science with RCleaning dataData transformationsSampling for modeling and validationCreating a sample group columnSmall CapsData Mining with RData Mining and Its ResultsOReilly - Data Science for BusinessData Mining TasksThe Data Mining ProcessData Mining PatternsPACKT - R for Data ScienceCluster analysisAnomaly detectionAssociation rulesData Mining SequencesPACKT - R for Data ScienceFinding frequent items in a datasetMining the Agrawal data for frequent setsDetermining sequences in training and careersSimilarities in the sequenceText MiningPACKT - R for Data ScienceUsing Text processingUsing Text clustersAnalyzing the XML textSmall CapsData Analysis with RData Analysis - Regression AnalysisPACKT - R for Data ScienceUsing Simple regressionUsing Multiple regressionUsing Multivariate regression analysisUsing Robust regressionData Analysis - CorrelationPACKT - R for Data ScienceVisualizing correlationsCovariancePearson correlationPolychoric correlationTetrachoric correlationA heterogeneous correlation matrixData Analysis - ClusteringPACKT - R for Data ScienceK-means clusteringSelecting clusters based on Bayesian informationAffinity propagation clusteringHierarchical clusteringSmall CapsData Visualization with RData Visualization - R GraphicsPACKT - R for Data ScienceInteractive graphicsUsing the latticist packageUsing the ggplot2 packageData Visualization - PlottingPACKT - R for Data ScienceUsing scatter plotsUsing scatterplot matricesUsing bar charts and plotsData Visualization - 3DPACKT - R for Data ScienceGenerating 3D graphicsLattice cloud 3D scatterplotUsing scatter3DUsing RgoogleMapsBig data with R

ExamplesExamplesContent#PagesReading Data into RReading from CSV21Viewing the DataIdentifying Missing ValuesSpecifying Data TypesReading Microsoft Excel SpreadsheetsWriting to CSVSaving RdataData from Internet DocumentsData from Google DriveSummarising DataLoad the Data11Dataset IndexingTextual SummariesPlyR: Summarise per Group to new Data FramePlyR: Summarise per Group to Original Data FramePlyR: Select One Observation Per GroupExploring Data with GGPlot2Preparing the Dataset63Collecting InformationScatter PlotAdding a Smooth Fitted CurveHistograms and Bar ChartsDensity DistributionsBox Plot DistributionsCumulative Distribution PlotParallel Coordinates PlotTransform and Manipulate DataDrop Unused Levels17Reorder LevelsAdd a ColumnTransform Using DPlyRSummarise Data Using dplyr()Removing ColumnsCluster AnalysisIntroducing Cluster Analysis57Load Weather Dataset for ModellingDistance CalculationK-Means BasicsAnimate Cluster BuildingVisualise the Cluster: Radial Plot Using GGPlot2Association RulesThe last.fm Dataset15Understanding the Algorithm: Sample DatasetThe arules packageDecision TreesLoad Example Weather Dataset87Summary of the Weather DatasetBuild Tree to Predict RainTomorrowVisualise Decision TreesText MiningLoading a Corpus41Exploring the CorpusPreparing the CorpusCreating a Document Term MatrixExploring the Document Term MatrixConversion to Matrix and Save to CSVPlotting Word FrequenciesMultivariate Adaptive Regression SplinesLoad and Configure9Variables to IgnoreClean and FinaliseBuild ModelEvaluate Model with Error MatrixSocial Network AnalysisLoad DataTransform Data into an Adjacency MatrixBuild a GraphPlot the GraphDealing with Big DataLoading Big Data from CSV15Efficient Data Manipulation with Dplyr

Case StudiesCase StudyContentsTopicsSourceAustralian Government's open data web siteATO Web AnalyticsCleaning dataTogawareATO Entry PagesUsing the ggplot2 packageATO Browser DataATO Top 100 KeywordsDriving Visual Analysis with Automobile Data with RAcquiring automobile fuel efficiency dataExploring DataCookbookImporting automobile fuel efficiency data into RUsing the ggplot2 packageExploring and describing fuel efficiency dataAnalyzing automobile fuel efficiency over timeInvestigating the makes and models of automobilesSimulating American Football Data with RAcquiring and cleaning football dataData transformationsCookbookAnalyzing and understanding football dataVisualizing correlationsConstructing indexes to measure offensive and defensive strengthUsing the ggplot2 packageSimulating a single game with outcomes decided by calculationsUsing scatter plotsSimulating multiple games with outcomes decided by calculationsModeling Stock Market Data with RSummarizing the dataAnalyzing the XML textCookbookCleaning and exploring the dataExploring DataGenerating relative valuationsCleaning dataScreening stocks and analyzing historical pricesUsing the ggplot2 packageVisually Exploring Employment Data with RImporting employment data into RData transformationsCookbookExploring the employment dataAnomaly detectionObtaining and merging additional dataUsing the ggplot2 packageAdding geographical informationVisualizing geographical distributions of payAnimating maps for a geospatial time seriesAnalyzing supercar dataGet the dataData Explorationhttp://www.sharpsightlabs.com/data-analysis-example-r-supercars-part1/Examine the dataCleaning datahttp://www.r-bloggers.com/data-analysis-example-with-ggplot-and-dplyr-analyzing-supercar-data-part-2/Remove duplicate recordsUsing the ggplot2 packageData Exploration with ggplot2 and dplyrPlotting the Iris DataLoad the DataData Explorationhttp://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/iris_plots/Simple Scatter PlotsUsing scatter plotsPairs Scatter PlotsClassifying Emails as Spam or Ham using RTextToolsObtaining the data and loading it into RData transformationshttp://www.r-bloggers.com/classifying-emails-as-spam-or-ham-using-rtexttools/Split the data into train/test setsUsing Text processingBuild the modelComparing the model to the BenchmarkR Data Analysis Examples: Canonical Correlation AnalysisDescription of the dataData Explorationhttp://www.ats.ucla.edu/stat/r/dae/canonical.htmCanonical correlation analysisCanonical CorrelationR Canonical Correlation AnalysisSample Write-Up of the AnalysisMarket Basket Analysis with RApriori Recommendation with RAssociation ruleshttp://www.salemmarafi.com/code/market-basket-analysis-with-r/comment-page-1/Sorting stuff outData Visualization - R GraphicsTargeting ItemsVisualizationCustomer Segmentation with RPivot & CopyCluster analysishttp://www.salemmarafi.com/code/customer-segmentation-excel-and-r/#r-step1Distances and ClustersK-means clusteringSolving for optimal cluster centersTop deals by clusters

http://www.sharpsightlabs.com/data-analysis-example-r-supercars-part1/http://www.r-bloggers.com/data-analysis-example-with-ggplot-and-dplyr-analyzing-supercar-data-part-2/http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/iris_plots/http://www.r-bloggers.com/classifying-emails-as-spam-or-ham-using-rtexttools/http://www.ats.ucla.edu/stat/r/dae/canonical.htmhttp://www.salemmarafi.com/code/market-basket-analysis-with-r/comment-page-1/http://www.salemmarafi.com/code/customer-segmentation-excel-and-r/#r-step1

ExerciseMateriExerciseSourceBusiness Problems & Data Science SolutionsThe Data Science ProcessThe roles in a data science projectStages of a data science projectSetting expectationsImplications for managing the data science teamR Package for Data ScienceLoading Data into RWorking with data from filesReading data into Rhttp://rstudio-pubs-static.s3.amazonaws.com/1776_dbaebbdbde8d46e693e5cb60c768ba92.htmlImporting Data into R Using Excelhttp://case.truman.edu/Documents/R%20Entering%20Data.pdfReading Data into Rhttp://handsondatascience.com/ReadO.pdfWorking with well-structured data from files or URLsUsing R on less-structured dataWorking with relational databasesUsing R with databaseshttp://www.ibm.com/developerworks/data/library/techarticle/dm-1402db2andr/A production-size exampleExploring DataUsing summary statistics to spot problemsSummarising Datahttp://handsondatascience.com/SummaryO.pdfSpotting problems using graphics and visualizationExploring Data with GGPlot2http://handsondatascience.com/GGPlot2O.pdfVisually checking distributions for a single variableVisually checking relationships between two variablesManaging DataCleaning dataData transformationsTransform and Manipulate Datahttp://handsondatascience.com/TransformO.pdfSampling for modeling and validationCreating a sample group columnData Mining with RData Mining and Its ResultsData Mining TasksThe Data Mining ProcessData Mining PatternsCluster analysisCluster Analysishttp://handsondatascience.com/ClustersO.pdfAnomaly detectionAssociation rulesAssociation Ruleshttp://handsondatascience.com/ARulesO.pdfData Mining SequencesFinding frequent items in a datasetMining the Agrawal data for frequent setsDetermining sequences in training and careersSimilarities in the sequenceText MiningUsing Text processingText Mininghttp://handsondatascience.com/TextMiningO.pdfUsing Text clustersAnalyzing the XML textData Analysis with RData Analysis - Regression AnalysisUsing Simple regressionUsing Multiple regressionUsing Multivariate regression analysisMultivariate Adaptive Regression Splineshttp://handsondatascience.com/MarsO.pdfUsing Robust regressionR Data Analysis Examples: Robust Regressionhttp://www.ats.ucla.edu/stat/r/dae/rreg.htmData Analysis - CorrelationR Data Analysis Examples: Canonical Correlation Analysishttp://www.ats.ucla.edu/stat/r/dae/canonical.htmVisualizing correlationsCovariancePearson correlationCanonical correlationPolychoric correlationTetrachoric correlationA heterogeneous correlation matrixData Analysis - ClusteringCluster Analysis with Rhttps://rpubs.com/gabrielmartos/ClusterAnalysisK-means clusteringSelecting clusters based on Bayesian informationAffinity propagation clusteringHierarchical clusteringData Visualization with RData Visualization - R GraphicsVisualizing Google Analytics Data With Rhttp://online-behavior.com/analytics/rInteractive graphicsUsing the latticist packageUsing the ggplot2 packageanalyzing supercar data, part 1http://www.sharpsightlabs.com/data-analysis-example-r-supercars-part1/analyzing supercar data, part 2http://www.r-bloggers.com/data-analysis-example-with-ggplot-and-dplyr-analyzing-supercar-data-part-2/Data Visualization - PlottingPlotting the Iris Datahttp://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/iris_plots/Using scatter plotsUsing scatterplot matricesUsing bar charts and plotsData Visualization - 3DScatterplot3d: 3D graphics - R software and data visualizationhttp://www.sthda.com/english/wiki/scatterplot3d-3d-graphics-r-software-and-data-visualizationGenerating 3D graphicsLattice cloud 3D scatterplotUsing scatter3DUsing RgoogleMapsBig data with RDealing with Big Datahttp://handsondatascience.com/BigDataO.pdf

http://rstudio-pubs-static.s3.amazonaws.com/1776_dbaebbdbde8d46e693e5cb60c768ba92.htmlhttp://www.ibm.com/developerworks/data/library/techarticle/dm-1402db2andr/http://handsondatascience.com/SummaryO.pdfhttp://handsondatascience.com/TransformO.pdfhttp://handsondatascience.com/GGPlot2O.pdfhttp://handsondatascience.com/MarsO.pdfhttp://handsondatascience.com/TextMiningO.pdfhttp://handsondatascience.com/BigDataO.pdfhttp://handsondatascience.com/ARulesO.pdfhttp://handsondatascience.com/ClustersO.pdfhttp://online-behavior.com/analytics/rhttp://www.sharpsightlabs.com/data-analysis-example-r-supercars-part1/http://www.r-bloggers.com/data-analysis-example-with-ggplot-and-dplyr-analyzing-supercar-data-part-2/https://rpubs.com/gabrielmartos/ClusterAnalysishttp://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/iris_plots/http://www.sthda.com/english/wiki/scatterplot3d-3d-graphics-r-software-and-data-visualizationhttp://www.ats.ucla.edu/stat/r/dae/canonical.htmhttp://www.ats.ucla.edu/stat/r/dae/rreg.htm