Upload
shekhar-suman
View
214
Download
0
Embed Size (px)
Citation preview
8/10/2019 Ma14 Data Sources
1/7
DATA SOURCES Data Mining / Knowledge Discovery from Databases / Machine Learning / Analytics
These are popular data sources for machine intelligence in general:
a) A book on R:
http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf
It has a good intro to R, the most popular and open source Data Mining and Statistical Package. It gives alot of references for R as well as Data. It also gives a few cases. You can play with data.
b) Has many competitions running on analytics. They can get data on real problem and work on thesame. One needs to register and participate.
www.kaggle.com->
c) Most well known site for experiments with Machine Learning:
http://archive.ics.uci.edu/ml/
d) A big data source:
http://www.kdnuggets.com/datasets/index.html
Acknowledgement: Dr. J. Sethuraman, FP Alumni.
An extract on Cases and Data Sources
are separately loaded (in .pdf format)
Source:http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf
You can open the Word file in Draft
mode for proper viewing:
http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://www.kaggle.com/http://www.kaggle.com/http://archive.ics.uci.edu/ml/http://archive.ics.uci.edu/ml/http://www.kdnuggets.com/datasets/index.htmlhttp://www.kdnuggets.com/datasets/index.htmlhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://www.kdnuggets.com/datasets/index.htmlhttp://archive.ics.uci.edu/ml/http://www.kaggle.com/http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf8/10/2019 Ma14 Data Sources
2/7
Source:http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf
Online ResourcesThis chapter presents links to online resources on R and data mining, includes books, documents,
tutorials and slides. A list of links is also available athttp://www.rdatamining.com/resources/onlinedocs
.
Chapter 15.
15.1 R Reference Cards
R Reference Card, by Tom Shorthttp://cran.r-project.org/doc/contrib/Short-refcard.pdf
R Reference Card for Data Mining, by Yanchang Zhao
http://www.rdatamining.com/docs
R Reference Card, by Jonathan Baronhttp://cran.r-project.org/doc/contrib/refcard.pdf
R Functions for Regression Analysis, by Vito Riccihttp://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf
R Functions for Time Series Analysis, by Vito Riccihttp://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
15.2 RQuick-Rhttp://www.statmethods.net/
R Tips: lots of tips for R programminghttp://pj.freefaculty.org/R/Rtips.html
R Tutorialhttp://www.cyclismo.org/tutorial/R/index.html
The R Manuals, including an Introduction to R, R Language Definition, R Data Import/Export, andother R manuals
http://cran.r-project.org/manuals.html
R You Ready?http://pj.freefaculty.org/R/RUReady.pdf
R for Beginnershttp://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Econometrics in Rhttp://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf
Using R for Data Analysis and Graphics - Introduction, Examples and Commentaryhttp://www.cran.r-project.org/doc/contrib/usingR.pdf
Lots of R Contributed Documents, including non-English ones
http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdfhttp://www.rdatamining.com/resources/onlinedocshttp://cran.r-project.org/doc/contrib/Short-refcard.pdfhttp://www.rdatamining.com/docshttp://cran.r-project.org/doc/contrib/refcard.pdfhttp://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdfhttp://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdfhttp://www.statmethods.net/http://pj.freefaculty.org/R/Rtips.htmlhttp://www.cyclismo.org/tutorial/R/index.htmlhttp://cran.r-project.org/manuals.htmlhttp://pj.freefaculty.org/R/RUReady.pdfhttp://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdfhttp://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdfhttp://www.cran.r-project.org/doc/contrib/usingR.pdfhttp://www.cran.r-project.org/doc/contrib/usingR.pdfhttp://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdfhttp://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdfhttp://pj.freefaculty.org/R/RUReady.pdfhttp://cran.r-project.org/manuals.htmlhttp://www.cyclismo.org/tutorial/R/index.htmlhttp://pj.freefaculty.org/R/Rtips.htmlhttp://www.statmethods.net/http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdfhttp://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdfhttp://cran.r-project.org/doc/contrib/refcard.pdfhttp://www.rdatamining.com/docshttp://cran.r-project.org/doc/contrib/Short-refcard.pdfhttp://www.rdatamining.com/resources/onlinedocshttp://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf8/10/2019 Ma14 Data Sources
3/7
http://cran.r-project.org/other-docs.html
The R Journalhttp://journal.r-project.org/current.html
Learn R Toolkit
http://processtrends.com/Learn_R_Toolkit.htm
Resources to help you learn and use R at UCLAhttp://www.ats.ucla.edu/stat/r/
R Tutorial - An R Introduction to Statisticshttp://www.r-tutor.com/
Cookbook for Rhttp://wiki.stdout.org/rcookbook/
Slides for a couple of R short courseshttp://courses.had.co.nz/
Tips on memory in Rhttp://www.matthewckeller.com/html/memory.html
15.3 Data Mining
Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach and Vipin KumarLecture slides (in both PPT and PDF formats) and three sample chapters on classification,association and clustering available at the link below.http://www-users.cs.umn.edu/%7Ekumar/dmbook
Tutorial on Data Mining Algorithms by Ian Wittenhttp://www.cs.waikato.ac.nz/~ihw/DataMiningTalk/
Mining of Massive Datasets, by Anand Rajaraman and Je UllmanThe whole book and lecture slides are free and downloadable in PDF format.http://infolab.stanford.edu/%7Eullman/mmds.html
Lecture notes of data mining course, by Cosma Shalizi at CMUR code examples are provided in some lecture notes, and also in solutions to home works.http://www.stat.cmu.edu/%7Ecshalizi/350/
Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavanand Hinrich Schutze at Stanford University
It covers text classification, clustering, web search, link analysis, etc. The book and lectureslides are free and downloadable in PDF format.http://nlp.stanford.edu/IR-book/
Statistical Data Mining Tutorials, by Andrew Moorehttp://www.autonlab.org/tutorials/
Tutorial on Spatial and Spatio-Temporal Data Mininghttp://www.inf.ufsc.br/%7Evania/tutorial_icdm.html
15.4. DATA MINING WITH R
Tutorial on Discovering Multiple Clustering Solutionshttp://dme.rwth-aachen.de/en/DMCS
http://cran.r-project.org/other-docs.htmlhttp://journal.r-project.org/current.htmlhttp://processtrends.com/Learn_R_Toolkit.htmhttp://www.ats.ucla.edu/stat/r/http://www.r-tutor.com/http://wiki.stdout.org/rcookbook/http://courses.had.co.nz/http://www.matthewckeller.com/html/memory.htmlhttp://www-users.cs.umn.edu/~kumar/dmbookhttp://www.cs.waikato.ac.nz/~ihw/DataMiningTalk/http://infolab.stanford.edu/~ullman/mmds.htmlhttp://www.stat.cmu.edu/~cshalizi/350/http://nlp.stanford.edu/IR-book/http://www.autonlab.org/tutorials/http://www.inf.ufsc.br/~vania/tutorial_icdm.htmlhttp://dme.rwth-aachen.de/en/DMCShttp://dme.rwth-aachen.de/en/DMCShttp://www.inf.ufsc.br/~vania/tutorial_icdm.htmlhttp://www.autonlab.org/tutorials/http://nlp.stanford.edu/IR-book/http://www.stat.cmu.edu/~cshalizi/350/http://infolab.stanford.edu/~ullman/mmds.htmlhttp://www.cs.waikato.ac.nz/~ihw/DataMiningTalk/http://www-users.cs.umn.edu/~kumar/dmbookhttp://www.matthewckeller.com/html/memory.htmlhttp://courses.had.co.nz/http://wiki.stdout.org/rcookbook/http://www.r-tutor.com/http://www.ats.ucla.edu/stat/r/http://processtrends.com/Learn_R_Toolkit.htmhttp://journal.r-project.org/current.htmlhttp://cran.r-project.org/other-docs.html8/10/2019 Ma14 Data Sources
4/7
Time-Critical Decision Making for Business Administrationhttp://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm
A paper on Open-Source Tools for Data Mining, published in 2008http://eprints.fri.uni-lj.si/893/1/2008-OpenSourceDataMining.pdf
An overview of data mining toolshttp://onlinelibrary.wiley.com/doi/10.1002/widm.24/pdf
Textbook on Introduction to social network methodshttp://www.faculty.ucr.edu/~hanneman/nettext/
Information Difusion In Social Networks: Observing and In uencing Societal Interests, a tutorial atVLDB'11http://www.cs.ucsb.edu/~cbudak/vldb_tutorial.pdf
Tools for large graph mining: structure and difusion, a tutorial at WWW2008http://cs.stanford.edu/people/jure/talks/www08tutorial/
Graph Mining: Laws, Generators and Toolshttp://www.stanford.edu/group/mmds/slides2008/faloutsos.pdf
A tutorial on outlier detection techniques at ACM SIGKDD'10http://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdf
A Taste of Sentiment Analysis - 105-page slides in PDF formathttp://statmath.wu.ac.at/research/talks/resources/sentimentanalysis.pdf
Data Mining with R - Learning by Case Studieshttp://www.liaad.up.pt/~ltorgo/DataMiningWithR/
Data Mining Algorithms In Rhttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R
Statistics with Rhttp://zoonek2.free.fr/UNIX/48_R/all.html
Data Mining Desktop Survival Guidehttp://www.togaware.com/datamining/survivor/
15.5 Classication/Prediction with R
An Introduction to Recursive Partitioning Using the RPART Routineshttp://www.mayo.edu/hsr/techrpt/61.pdf
Visualizing classier performance with package ROCRhttp://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt
15.6 Time Series Analysis with R
An R Time Series Tutorialhttp://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htm
Time Series Analysis with R
http://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htmhttp://eprints.fri.uni-lj.si/893/1/2008-OpenSourceDataMining.pdfhttp://onlinelibrary.wiley.com/doi/10.1002/widm.24/pdfhttp://www.faculty.ucr.edu/~hanneman/nettext/http://www.cs.ucsb.edu/~cbudak/vldb_tutorial.pdfhttp://cs.stanford.edu/people/jure/talks/www08tutorial/http://www.stanford.edu/group/mmds/slides2008/faloutsos.pdfhttp://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdfhttp://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdfhttp://statmath.wu.ac.at/research/talks/resources/sentimentanalysis.pdfhttp://www.liaad.up.pt/~ltorgo/DataMiningWithR/http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_Rhttp://zoonek2.free.fr/UNIX/48_R/all.htmlhttp://www.togaware.com/datamining/survivor/http://www.mayo.edu/hsr/techrpt/61.pdfhttp://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppthttp://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppthttp://www.mayo.edu/hsr/techrpt/61.pdfhttp://www.togaware.com/datamining/survivor/http://zoonek2.free.fr/UNIX/48_R/all.htmlhttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_Rhttp://www.liaad.up.pt/~ltorgo/DataMiningWithR/http://statmath.wu.ac.at/research/talks/resources/sentimentanalysis.pdfhttp://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdfhttp://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdfhttp://www.stanford.edu/group/mmds/slides2008/faloutsos.pdfhttp://cs.stanford.edu/people/jure/talks/www08tutorial/http://www.cs.ucsb.edu/~cbudak/vldb_tutorial.pdfhttp://www.faculty.ucr.edu/~hanneman/nettext/http://onlinelibrary.wiley.com/doi/10.1002/widm.24/pdfhttp://eprints.fri.uni-lj.si/893/1/2008-OpenSourceDataMining.pdfhttp://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm8/10/2019 Ma14 Data Sources
5/7
http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf
Using R (with applications in Time Series Analysis)http://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdf
CRAN Task View: Time Series Analysishttp://cran.r-project.org/web/views/TimeSeries.html
15.7 Association Rule Mining with R
Introduction to association rules: A computational environment for mining association rules andfrequent item setshttp://cran.csiro.au/web/packages/arules/vignettes/arules.pdf
Visualizing Association Rules: Introduction to arulesVizhttp://cran.csiro.au/web/packages/arulesViz/vignettes/arulesViz.pdf
Association Rule Algorithms In Rhttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mining
15.8 Spatial Data Analysis with R
Applied Spatio-temporal Data Analysis with FOSS: R+OSGeohttp://www.geostat-course.org/GeoSciences_AU_2011
Spatial Regression Analysis in R - A Workbookhttp://geodacenter.asu.edu/system/files/rex1.pdf
15.9 Text Mining with R
Text Mining Infrastructure in Rhttp://www.jstatsoft.org/v25/i05
Introduction to the tm Package Text Mining in Rhttp://cran.r-project.org/web/packages/tm/vignettes/tm.pdf
Text Mining Handbook with R code exampleshttp://www.casact.org/pubs/forum/10spforum/Francis_Flynn.pdf
Distributed Text Mining in Rhttp://epub.wu.ac.at/3034/
15.10 Social Network Analysis with R
R for networks: a short tutorialhttp://sites.stat.psu.edu/~dhunter/Rnetworks/
15.11. DATA CLEANSING AND TRANSFORMATION WITH R
Social Network Analysis in R
http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdfhttp://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdfhttp://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdfhttp://cran.r-project.org/web/views/TimeSeries.htmlhttp://cran.csiro.au/web/packages/arules/vignettes/arules.pdfhttp://cran.csiro.au/web/packages/arulesViz/vignettes/arulesViz.pdfhttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mininghttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mininghttp://www.geostat-course.org/GeoSciences_AU_2011http://geodacenter.asu.edu/system/files/rex1.pdfhttp://www.jstatsoft.org/v25/i05http://www.jstatsoft.org/v25/i05http://geodacenter.asu.edu/system/files/rex1.pdfhttp://www.geostat-course.org/GeoSciences_AU_2011http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mininghttp://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mininghttp://cran.csiro.au/web/packages/arulesViz/vignettes/arulesViz.pdfhttp://cran.csiro.au/web/packages/arules/vignettes/arules.pdfhttp://cran.r-project.org/web/views/TimeSeries.htmlhttp://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdfhttp://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdfhttp://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf8/10/2019 Ma14 Data Sources
6/7
http://files.meetup.com/1406240/sna_in_R.pdf
A detailed introduction to Social Network Analysis with package snahttp://www.jstatsoft.org/v24/i06/paper
A statnet Tutorial
http://www.jstatsoft.org/v24/i09/paper
Slides on Social network analysis with Rhttp://user2010.org/slides/Zhang.pdf
Tutorials on using statnet for network analysishttp://csde.washington.edu/statnet/resources.shtml
15.11 Data Cleansing and Transformation with R
Tidy Data and Tidy Toolshttp://vita.had.co.nz/papers/tidy-data-pres.pdf
The data.table package in Rhttp://files.meetup.com/1677477/R_Group_June_2011.pdf
15.12 Big Data and Parallel Computing with R
State of the Art in Parallel Computing with Rhttp://www.jstatsoft.org/v31/i01/paper
Taking R to the Limit, Part I - Parallelization in R
http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/
Taking R to the Limit, Part II - Large Datasets in Rhttp://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
Tutorial on MapReduce programming in R with package rmrhttps://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorial
Distributed Data Analysis with Hadoop and Rhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-R
Massive data, shared and distributed memory, and concurrent programming: bigmemory andFor eachhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdf
High Performance Computing with Rhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdf
R with High Performance Computing: Parallel processing and large memoryhttp://files.meetup.com/1781511/HighPerformanceComputingR-Szczepanski.pdf
Parallel Computing in Rhttp://blog.revolutionanalytics.com/downloads/BioC2009%20ParallelR.pdf
http://files.meetup.com/1406240/sna_in_R.pdfhttp://www.jstatsoft.org/v24/i06/paperhttp://www.jstatsoft.org/v24/i09/paperhttp://user2010.org/slides/Zhang.pdfhttp://csde.washington.edu/statnet/resources.shtmlhttp://www.jstatsoft.org/v31/i01/paperhttp://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/https://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorialhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-Rhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-Rhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdfhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdfhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdfhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdfhttp://files.meetup.com/1781511/HighPerformanceComputingR-Szczepanski.pdfhttp://blog.revolutionanalytics.com/downloads/BioC2009%20ParallelR.pdfhttp://blog.revolutionanalytics.com/downloads/BioC2009%20ParallelR.pdfhttp://files.meetup.com/1781511/HighPerformanceComputingR-Szczepanski.pdfhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdfhttp://igmcs.utk.edu/sites/igmcs/files/Patel-High-Performance-Computing-with-R-2011-10-20.pdfhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdfhttp://sites.google.com/site/bigmemoryorg/research/documentation/bigmemorypresentation.pdfhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-Rhttp://www.infoq.com/presentations/Distributed-Data-Analysis-with-Hadoop-and-Rhttps://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorialhttp://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/http://www.jstatsoft.org/v31/i01/paperhttp://csde.washington.edu/statnet/resources.shtmlhttp://user2010.org/slides/Zhang.pdfhttp://www.jstatsoft.org/v24/i09/paperhttp://www.jstatsoft.org/v24/i06/paperhttp://files.meetup.com/1406240/sna_in_R.pdf8/10/2019 Ma14 Data Sources
7/7