Upload
liz
View
40
Download
4
Embed Size (px)
DESCRIPTION
Towards Successful Ph.D. Research in Database Systems and Data Mining. Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj July 13, 2014. Outline. Database and data mining: highly promising themes - PowerPoint PPT Presentation
Citation preview
04/22/23 1
04/22/23 2
Towards Successful Ph.D. Research in Database Systems and Data Mining
Jiawei HanDepartment of Computer Science
University of Illinois at Urbana-Champaignwww.cs.uiuc.edu/~hanj
April 22, 2023
04/22/23 3
Outline Database and data mining: highly promising themes
Long history of strong and successful research Lots of new challenges Lots of research themes
Selection of promising directions and promising topics Making your research bigger impact Discussing, debating, and active brain-storming Capturing and harvesting the sparks of thought
Towards highly productive research Learning from others: reviews and judgment Collaborations and team work
04/22/23 4
DB and DM: Long History of Strong & Successful Research
Necessity is the mother of invention Coming from the real application demand Constantly seeking new and extended applications Developing core technologies for information systems
A long history of success Real systems, numerous applications, and big industry Relational database systems → application-oriented DBMS
(spatiotemporal, CRM, banking, health info, …) → data warehouses → data mining → Web search: Google
In-depth and thoroughness in research Constant search for new, innovative methodologies and algorithms In-depth study of implementation, optimization, and user needs Scalability, uncertainty, approximation, streaming, ranking,
aggregation, privacy, and security
04/22/23 5
Still Challenging and Promising Huge amount of data is mounting up rapidly
Giga-bytes → terabytes → peta-bytes in very fast pace Data collection and dissemination: sensors, digital cameras, Web
Database and data mining: Various new applications Data streams, RFID, sensor networks, video/audio data, text and
Web, computer/software systems, social networks, biological data, and science/engineering data
Searching, ranking, mining, uncertainty, noise, privacy, security Database and data mining are still flourishing
Scalable statistical and machine learning methods Pattern analysis methods Integrated with database systems, data warehouses, and Web as
a natural, hidden process Still many open research problems and multiple research
frontiers
04/22/23 6
Research Frontiers in Data Mining Information network analysis Stream data warehousing & data mining Pattern mining, pattern usage, and pattern understanding Warehousing, and mining of moving object data, RFID
data, and data from sensor networks Spatiotemporal and multimedia data mining Biological data mining Text and Web mining Data mining for software engineering and system analysis Data cube-oriented multidimensional online analysis Classification and ranking everywhere: databases, Web,
documents, and knowledge
04/22/23 7
A Multidimensional View of Research Themes
Data view relational data, transactional data, information network data,
stream data, spatial, temporal, multimedia (video/audio), moving object data, RFID data, sensor data, biological data, text and Web data, software engineering and system data
Issue view modeling, management, indexing, retrieval (query), update,
integration, warehousing, mining, data cube computation, multidimensional online analysis, security, privacy, …
Methodology view: incremental, parallel, distributed For mining: statistical, machine learning, decision-tree, MDL,
HMM, Naïve-Bayes, … Application view: Different industries, governments, science &
engr. Adding dimensions: time, space, … Relaxing assumptions: approximation, uncertainty, …
04/22/23 8
Outline Database and data mining: highly promising themes
Long history of strong and successful research Lots of new challenges Lots of research themes
Selection of promising directions and promising topics Making your research bigger impact Discussing, debating, and active brain-storming Capturing and harvesting the sparks of thought
Towards highly productive research Learning from others: reviews and judgment Collaborations and team work
04/22/23 9
Selection of Promising Directions Read survey papers, proceedings, etc., discuss with your
friends and professors, and use your own reasoning Is the direction likely to be much needed and have a
bright future? Do I have sufficient background to work on it? Am I truly interested in it? Does the direction attract long-term investigation?
It is OK to change it or adjust it? May need to constantly adjust your research
directions Ex. Myself, from deductive DBs (recursive query
processing) to data mining
04/22/23 10
Making Your Research Bigger Impact Necessity is the mother of invention
What is the most needed in the next several years? Will it have long term impact or fade out soon?
Innovative and thorough research Is your approach fresh, innovative, somewhat ground-breaking? Have you examined it systematically? Have you considered
alternative or previously studied methods? Can it be further improved?
Two kinds of research topics: creative vs. improvement Find new themes (new patterns, new methodologies, new
directions) Improve the existing solutions
Never be tied with the existing solutions First think on it independently, and work out independently Believe “always can find new ways to improve it!”
04/22/23 11
Discussions, Sparks, and Technical Meat
Watch before you leap Careful and thorough thinking should go before
implementing and testing Form small groups instead of working alone
Slides, emails, and weekly theme-based meetings or teleconferences
Questions on slides, related work, new design, proposed algorithms, try to find ways to improve it
Capture and harvest the sparks of thought Many good ideas may come from a “weak” spark of thinking Capture the sparks timely and do not let it slip away
04/22/23 12
Case 1: ICDE’07 Best Student Paper Award
Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu, and Hong Cheng, “Mining Colossal Frequent Patterns by Core Pattern Fusion”, in Proc. 2007 Int. Conf. on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007 (the BEST STUDENT award)
Identifying the problem that the current technology cannot solve and its applications Colossal patterns, bio-applications
How the paper was generated? Progressive refinement: slides → discussions → algorithms → discussions →
experiments → new slides Smart ideas and technical innovation
04/22/23 13
Case 2: ICDE’06 Best Student Paper Award
Hector Gonzalez, Jiawei Han, Xiaolei Li, and Diego Klabjan, “Warehousing and Analysis of Massive RFID Data Sets”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.
Necessity is the mother of invention Working on a key problem: RFID data warehousing
The key solution: deep compression How deep is deep? Maximal sharing of bulky movements
Multiple designs, refinements, testing and refinement again slides → discussions → algorithms → discussions →
experiments → new slides Constant brain-storming
04/22/23 14
Outline Database and data mining: highly promising themes
Long history of strong and successful research Lots of new challenges Lots of research themes
Selection of promising directions and promising topics Making your research bigger impact Discussing, debating, and active brain-storming Capturing and harvesting the sparks of thought
Towards highly productive research Learning from others: reviews and judgment Collaborations and team work
04/22/23 15
Learning from Others: Reviews and Judgment
A very important task for training Ph.D. is the judgment: judging others as well as judging yourself
A good researcher should be first a good judge on research
Reading a good research paper: First read the problem and try it by yourself
Be active at serving as a reviewer: See how others evaluate the work and learn from the good judges
Read survey papers and write your own simple surveys on the problems you intend to work on
04/22/23 16
Putting All the Eggs in One Basket?
Working on several research problems or only on one? Initially, more than one theme may help test the
water and settle down a promising theme that matches you
Even after you have been focused on one theme, it is good to try slight different problems
Productivity, alternative thoughts, adjustable solutions, and research collaborations
Working with your friends and colleagues Complement each other on strength and expertise
04/22/23 17
Seminar Course: Continuous Training/Education
Advanced seminars for DAIS and DM group Constantly running in every semester Presenting your own work and get feedbacks from
the group Mostly are recently accepted conference papers
Requiring only one page summary/abstract Presenting good papers from recent, top
conferences: selecting only SIGKDD, SIGMOD, VLDB, ICDE, ICDM, SDM, WWW, …, conference papers published in the last 12 months.
04/22/23 18
Conference and Journal Reviews
Volunteering on conference and journal coordination For each conference we served as a PC member, we
have one Ph.D. student volunteering as conference coordinator
S/he will communicate with the group members to select papers, collect reviews, and I will have one or more rounds of thorough discussions with the coordinator to make sure the reviews are not biased, comprehensive and in high quality
Also, the reviews will be relatively ranked and balanced A good exercise for all the participants Similar exercises for journals and proposal reviews
04/22/23 19
Semester Summary and Awards
Award summary as a way to promote excellence on research Summary meeting at each semester Summary on each student’s Webpage and
presentation Award voting with multiple grades: Gold, silver,
bronze and honorable mentioning Vote after the major conference evaluation results
are out Publish the award voting summary Presents and web publicity
Award competition also promotes collaborations
04/22/23 20
Questions
04/22/23 21
Thanks and Questions
04/22/23 22
Create a Productive Research Group
Selection of promising students Training and selection of students from classes Test run with research problems Watch on sparks and working attitude Written qualifications vs. oral ones
Team organization CS591 vs. meetings (start + ending meetings) Use student’s expertise, strength, and interests Division of group work: Everyone is in charge Theme-based dynamic small research groups Encouraging students on their progress: papers, etc. Semester summary, web-pages Award competition
04/22/23 23
Group Administration/Public Relation Work (Sept. 06-
Aug.'07) Group Webmaster (news, group Web page, pictures, etc.): Tianyi
Wu Web-based research reference collections: Hong Cheng Hardware, equipment, and software master: Sang Kim TKDD Information Director: Xiaoxin Yin DAIS seminar coordinator: Deng Cai DAISY System administrator: Hector Gonzalez IlliMine project coordinator: Xiaolei Li Industry/visitor coordinator: Chao Liu Conference and journal review coordinator (3): Dong Xin, Jing Gao
and Chen Chen Research proposal coordinator (2): Feida Zhu and Jianlin Feng Social activity coordinator: Jaegil Lee, Ok-ran Jeong
04/22/23 24
Work on Promising Research Topics
Selection of promising research topics Select topics based on your strength and interest Putting all the eggs in one basket ?― may work on 2-3 topics
at the same time Discussion, debate, and active brain-storming Capture and harvest the sparks of thought
Two kinds of research topics: creative vs. improvement Find complete new theme (new patterns, new methodologies,
new directions) Improve the existing solutions
Never be tied with the existing solutions First think on it independently, and work out independently Believe “always can find new ways to improve it!”