Ma14 Data Sources

Embed Size (px)

Citation preview

  • 8/10/2019 Ma14 Data Sources

    1/7

    DATA SOURCES Data Mining / Knowledge Discovery from Databases / Machine Learning / Analytics

    These are popular data sources for machine intelligence in general:

    a) A book on R:

    http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf

    It has a good intro to R, the most popular and open source Data Mining and Statistical Package. It gives alot of references for R as well as Data. It also gives a few cases. You can play with data.

    b) Has many competitions running on analytics. They can get data on real problem and work on thesame. One needs to register and participate.

    www.kaggle.com->

    c) Most well known site for experiments with Machine Learning:

    http://archive.ics.uci.edu/ml/

    d) A big data source:

    http://www.kdnuggets.com/datasets/index.html

    Acknowledgement: Dr. J. Sethuraman, FP Alumni.

    An extract on Cases and Data Sources

    are separately loaded (in .pdf format)

    Source:http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf

    You can open the Word file in Draft

    mode for proper viewing:

    http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://www.kaggle.com/http://www.kaggle.com/http://archive.ics.uci.edu/ml/http://archive.ics.uci.edu/ml/http://www.kdnuggets.com/datasets/index.htmlhttp://www.kdnuggets.com/datasets/index.htmlhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://www.kdnuggets.com/datasets/index.htmlhttp://archive.ics.uci.edu/ml/http://www.kaggle.com/http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf
  • 8/10/2019 Ma14 Data Sources

    2/7

    Source:http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf

    Online ResourcesThis chapter presents links to online resources on R and data mining, includes books, documents,

    tutorials and slides. A list of links is also available athttp://www.rdatamining.com/resources/onlinedocs

    .

    Chapter 15.

    15.1 R Reference Cards

    R Reference Card, by Tom Shorthttp://cran.r-project.org/doc/contrib/Short-refcard.pdf

    R Reference Card for Data Mining, by Yanchang Zhao

    http://www.rdatamining.com/docs

    R Reference Card, by Jonathan Baronhttp://cran.r-project.org/doc/contrib/refcard.pdf

    R Functions for Regression Analysis, by Vito Riccihttp://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf

    R Functions for Time Series Analysis, by Vito Riccihttp://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf

    15.2 RQuick-Rhttp://www.statmethods.net/

    R Tips: lots of tips for R programminghttp://pj.freefaculty.org/R/Rtips.html

    R Tutorialhttp://www.cyclismo.org/tutorial/R/index.html

    The R Manuals, including an Introduction to R, R Language Definition, R Data Import/Export, andother R manuals

    http://cran.r-project.org/manuals.html

    R You Ready?http://pj.freefaculty.org/R/RUReady.pdf

    R for Beginnershttp://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf

    Econometrics in Rhttp://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf

    Using R for Data Analysis and Graphics - Introduction, Examples and Commentaryhttp://www.cran.r-project.org/doc/contrib/usingR.pdf

    Lots of R Contributed Documents, including non-English ones

    http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://www.rdatamining.com/resources/onlinedocshttp://cran.r-project.org/doc/contrib/Short-refcard.pdfhttp://www.rdatamining.com/docshttp://cran.r-project.org/doc/contrib/refcard.pdfhttp://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdfhttp://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdfhttp://www.statmethods.net/http://pj.freefaculty.org/R/Rtips.htmlhttp://www.cyclismo.org/tutorial/R/index.htmlhttp://cran.r-project.org/manuals.htmlhttp://pj.freefaculty.org/R/RUReady.pdfhttp://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdfhttp://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdfhttp://www.cran.r-project.org/doc/contrib/usingR.pdfhttp://www.cran.r-project.org/doc/contrib/usingR.pdfhttp://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdfhttp://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdfhttp://pj.freefaculty.org/R/RUReady.pdfhttp://cran.r-project.org/manuals.htmlhttp://www.cyclismo.org/tutorial/R/index.htmlhttp://pj.freefaculty.org/R/Rtips.htmlhttp://www.statmethods.net/http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdfhttp://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdfhttp://cran.r-project.org/doc/contrib/refcard.pdfhttp://www.rdatamining.com/docshttp://cran.r-project.org/doc/contrib/Short-refcard.pdfhttp://www.rdatamining.com/resources/onlinedocshttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf
  • 8/10/2019 Ma14 Data Sources

    3/7

    http://cran.r-project.org/other-docs.html

    The R Journalhttp://journal.r-project.org/current.html

    Learn R Toolkit

    http://processtrends.com/Learn_R_Toolkit.htm

    Resources to help you learn and use R at UCLAhttp://www.ats.ucla.edu/stat/r/

    R Tutorial - An R Introduction to Statisticshttp://www.r-tutor.com/

    Cookbook for Rhttp://wiki.stdout.org/rcookbook/

    Slides for a couple of R short courseshttp://courses.had.co.nz/

    Tips on memory in Rhttp://www.matthewckeller.com/html/memory.html

    15.3 Data Mining

    Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach and Vipin KumarLecture slides (in both PPT and PDF formats) and three sample chapters on classification,association and clustering available at the link below.http://www-users.cs.umn.edu/%7Ekumar/dmbook

    Tutorial on Data Mining Algorithms by Ian Wittenhttp://www.cs.waikato.ac.nz/~ihw/DataMiningTalk/

    Mining of Massive Datasets, by Anand Rajaraman and Je UllmanThe whole book and lecture slides are free and downloadable in PDF format.http://infolab.stanford.edu/%7Eullman/mmds.html

    Lecture notes of data mining course, by Cosma Shalizi at CMUR code examples are provided in some lecture notes, and also in solutions to home works.http://www.stat.cmu.edu/%7Ecshalizi/350/

    Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavanand Hinrich Schutze at Stanford University

    It covers text classification, clustering, web search, link analysis, etc. The book and lectureslides are free and downloadable in PDF format.http://nlp.stanford.edu/IR-book/

    Statistical Data Mining Tutorials, by Andrew Moorehttp://www.autonlab.org/tutorials/

    Tutorial on Spatial and Spatio-Temporal Data Mininghttp://www.inf.ufsc.br/%7Evania/tutorial_icdm.html

    15.4. DATA MINING WITH R

    Tutorial on Discovering Multiple Clustering Solutionshttp://dme.rwth-aachen.de/en/DMCS

    http://cran.r-project.org/other-docs.htmlhttp://journal.r-project.org/current.htmlhttp://processtrends.com/Learn_R_Toolkit.htmhttp://www.ats.ucla.edu/stat/r/http://www.r-tutor.com/http://wiki.stdout.org/rcookbook/http://courses.had.co.nz/http://www.matthewckeller.com/html/memory.htmlhttp://www-users.cs.umn.edu/~kumar/dmbookhttp://www.cs.waikato.ac.nz/~ihw/DataMiningTalk/http://infolab.stanford.edu/~ullman/mmds.htmlhttp://www.stat.cmu.edu/~cshalizi/350/http://nlp.stanford.edu/IR-book/http://www.autonlab.org/tutorials/http://www.inf.ufsc.br/~vania/tutorial_icdm.htmlhttp://dme.rwth-aachen.de/en/DMCShttp://dme.rwth-aachen.de/en/DMCShttp://www.inf.ufsc.br/~vania/tutorial_icdm.htmlhttp://www.autonlab.org/tutorials/http://nlp.stanford.edu/IR-book/http://www.stat.cmu.edu/~cshalizi/350/http://infolab.stanford.edu/~ullman/mmds.htmlhttp://www.cs.waikato.ac.nz/~ihw/DataMiningTalk/http://www-users.cs.umn.edu/~kumar/dmbookhttp://www.matthewckeller.com/html/memory.htmlhttp://courses.had.co.nz/http://wiki.stdout.org/rcookbook/http://www.r-tutor.com/http://www.ats.ucla.edu/stat/r/http://processtrends.com/Learn_R_Toolkit.htmhttp://journal.r-project.org/current.htmlhttp://cran.r-project.org/other-docs.html
  • 8/10/2019 Ma14 Data Sources

    4/7

    Time-Critical Decision Making for Business Administrationhttp://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm

    A paper on Open-Source Tools for Data Mining, published in 2008http://eprints.fri.uni-lj.si/893/1/2008-OpenSourceDataMining.pdf

    An overview of data mining toolshttp://onlinelibrary.wiley.com/doi/10.1002/widm.24/pdf

    Textbook on Introduction to social network methodshttp://www.faculty.ucr.edu/~hanneman/nettext/

    Information Difusion In Social Networks: Observing and In uencing Societal Interests, a tutorial atVLDB'11http://www.cs.ucsb.edu/~cbudak/vldb_tutorial.pdf

    Tools for large graph mining: structure and difusion, a tutorial at WWW2008http://cs.stanford.edu/people/jure/talks/www08tutorial/

    Graph Mining: Laws, Generators and Toolshttp://www.stanford.edu/group/mmds/slides2008/faloutsos.pdf

    A tutorial on outlier detection techniques at ACM SIGKDD'10http://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdf

    A Taste of Sentiment Analysis - 105-page slides in PDF formathttp://statmath.wu.ac.at/research/talks/resources/sentimentanalysis.pdf

    Data Mining with R - Learning by Case Studieshttp://www.liaad.up.pt/~ltorgo/DataMiningWithR/

    Data Mining Algorithms In Rhttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R

    Statistics with Rhttp://zoonek2.free.fr/UNIX/48_R/all.html

    Data Mining Desktop Survival Guidehttp://www.togaware.com/datamining/survivor/

    15.5 Classication/Prediction with R

    An Introduction to Recursive Partitioning Using the RPART Routineshttp://www.mayo.edu/hsr/techrpt/61.pdf

    Visualizing classier performance with package ROCRhttp://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt

    15.6 Time Series Analysis with R

    An R Time Series Tutorialhttp://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htm

    Time Series Analysis with R

    http://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htmhttp://eprints.fri.uni-lj.si/893/1/2008-OpenSourceDataMining.pdfhttp://onlinelibrary.wiley.com/doi/10.1002/widm.24/pdfhttp://www.faculty.ucr.edu/~hanneman/nettext/http://www.cs.ucsb.edu/~cbudak/vldb_tutorial.pdfhttp://cs.stanford.edu/people/jure/talks/www08tutorial/http://www.stanford.edu/group/mmds/slides2008/faloutsos.pdfhttp://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdfhttp://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdfhttp://statmath.wu.ac.at/research/talks/resources/sentimentanalysis.pdfhttp://www.liaad.up.pt/~ltorgo/DataMiningWithR/http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_Rhttp://zoonek2.free.fr/UNIX/48_R/all.htmlhttp://www.togaware.com/datamining/survivor/http://www.mayo.edu/hsr/techrpt/61.pdfhttp://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppthttp://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppthttp://www.mayo.edu/hsr/techrpt/61.pdfhttp://www.togaware.com/datamining/survivor/http://zoonek2.free.fr/UNIX/48_R/all.htmlhttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_Rhttp://www.liaad.up.pt/~ltorgo/DataMiningWithR/http://statmath.wu.ac.at/research/talks/resources/sentimentanalysis.pdfhttp://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdfhttp://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdfhttp://www.stanford.edu/group/mmds/slides2008/faloutsos.pdfhttp://cs.stanford.edu/people/jure/talks/www08tutorial/http://www.cs.ucsb.edu/~cbudak/vldb_tutorial.pdfhttp://www.faculty.ucr.edu/~hanneman/nettext/http://onlinelibrary.wiley.com/doi/10.1002/widm.24/pdfhttp://eprints.fri.uni-lj.si/893/1/2008-OpenSourceDataMining.pdfhttp://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm
  • 8/10/2019 Ma14 Data Sources

    5/7

    http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf

    Using R (with applications in Time Series Analysis)http://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdf

    CRAN Task View: Time Series Analysishttp://cran.r-project.org/web/views/TimeSeries.html

    15.7 Association Rule Mining with R

    Introduction to association rules: A computational environment for mining association rules andfrequent item setshttp://cran.csiro.au/web/packages/arules/vignettes/arules.pdf

    Visualizing Association Rules: Introduction to arulesVizhttp://cran.csiro.au/web/packages/arulesViz/vignettes/arulesViz.pdf

    Association Rule Algorithms In Rhttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mining

    15.8 Spatial Data Analysis with R

    Applied Spatio-temporal Data Analysis with FOSS: R+OSGeohttp://www.geostat-course.org/GeoSciences_AU_2011

    Spatial Regression Analysis in R - A Workbookhttp://geodacenter.asu.edu/system/files/rex1.pdf

    15.9 Text Mining with R

    Text Mining Infrastructure in Rhttp://www.jstatsoft.org/v25/i05

    Introduction to the tm Package Text Mining in Rhttp://cran.r-project.org/web/packages/tm/vignettes/tm.pdf

    Text Mining Handbook with R code exampleshttp://www.casact.org/pubs/forum/10spforum/Francis_Flynn.pdf

    Distributed Text Mining in Rhttp://epub.wu.ac.at/3034/

    15.10 Social Network Analysis with R

    R for networks: a short tutorialhttp://sites.stat.psu.edu/~dhunter/Rnetworks/

    15.11. DATA CLEANSING AND TRANSFORMATION WITH R

    Social Network Analysis in R

    http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdfhttp://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdfhttp://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdfhttp://cran.r-project.org/web/views/TimeSeries.htmlhttp://cran.csiro.au/web/packages/arules/vignettes/arules.pdfhttp://cran.csiro.au/web/packages/arulesViz/vignettes/arulesViz.pdfhttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mininghttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mininghttp://www.geostat-course.org/GeoSciences_AU_2011http://geodacenter.asu.edu/system/files/rex1.pdfhttp://www.jstatsoft.org/v25/i05http://www.jstatsoft.org/v25/i05http://geodacenter.asu.edu/system/files/rex1.pdfhttp://www.geostat-course.org/GeoSciences_AU_2011http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mininghttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mininghttp://cran.csiro.au/web/packages/arulesViz/vignettes/arulesViz.pdfhttp://cran.csiro.au/web/packages/arules/vignettes/arules.pdfhttp://cran.r-project.org/web/views/TimeSeries.htmlhttp://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdfhttp://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdfhttp://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf
  • 8/10/2019 Ma14 Data Sources

    6/7

    http://files.meetup.com/1406240/sna_in_R.pdf

    A detailed introduction to Social Network Analysis with package snahttp://www.jstatsoft.org/v24/i06/paper

    A statnet Tutorial

    http://www.jstatsoft.org/v24/i09/paper

    Slides on Social network analysis with Rhttp://user2010.org/slides/Zhang.pdf

    Tutorials on using statnet for network analysishttp://csde.washington.edu/statnet/resources.shtml

    15.11 Data Cleansing and Transformation with R

    Tidy Data and Tidy Toolshttp://vita.had.co.nz/papers/tidy-data-pres.pdf

    The data.table package in Rhttp://files.meetup.com/1677477/R_Group_June_2011.pdf

    15.12 Big Data and Parallel Computing with R

    State of the Art in Parallel Computing with Rhttp://www.jstatsoft.org/v31/i01/paper

    Taking R to the Limit, Part I - Parallelization in R

    http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/

    Taking R to the Limit, Part II - Large Datasets in Rhttp://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/

    Tutorial on MapReduce programming in R with package rmrhttps://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorial

    Distributed Data Analysis with Hadoop and Rhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-R

    Massive data, shared and distributed memory, and concurrent programming: bigmemory andFor eachhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdf

    High Performance Computing with Rhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdf

    R with High Performance Computing: Parallel processing and large memoryhttp://files.meetup.com/1781511/HighPerformanceComputingR-Szczepanski.pdf

    Parallel Computing in Rhttp://blog.revolutionanalytics.com/downloads/BioC2009%20ParallelR.pdf

    http://files.meetup.com/1406240/sna_in_R.pdfhttp://www.jstatsoft.org/v24/i06/paperhttp://www.jstatsoft.org/v24/i09/paperhttp://user2010.org/slides/Zhang.pdfhttp://csde.washington.edu/statnet/resources.shtmlhttp://www.jstatsoft.org/v31/i01/paperhttp://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/https://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorialhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-Rhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-Rhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdfhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdfhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdfhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdfhttp://files.meetup.com/1781511/HighPerformanceComputingR-Szczepanski.pdfhttp://blog.revolutionanalytics.com/downloads/BioC2009%20ParallelR.pdfhttp://blog.revolutionanalytics.com/downloads/BioC2009%20ParallelR.pdfhttp://files.meetup.com/1781511/HighPerformanceComputingR-Szczepanski.pdfhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdfhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdfhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdfhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdfhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-Rhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-Rhttps://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorialhttp://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/http://www.jstatsoft.org/v31/i01/paperhttp://csde.washington.edu/statnet/resources.shtmlhttp://user2010.org/slides/Zhang.pdfhttp://www.jstatsoft.org/v24/i09/paperhttp://www.jstatsoft.org/v24/i06/paperhttp://files.meetup.com/1406240/sna_in_R.pdf
  • 8/10/2019 Ma14 Data Sources

    7/7