24
06/21/22 1

Towards Successful Ph.D. Research in Database Systems and Data Mining

  • Upload
    liz

  • View
    40

  • Download
    4

Embed Size (px)

DESCRIPTION

Towards Successful Ph.D. Research in Database Systems and Data Mining. Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj July 13, 2014. Outline. Database and data mining: highly promising themes - PowerPoint PPT Presentation

Citation preview

Page 1: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 1

Page 2: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 2

Towards Successful Ph.D. Research in Database Systems and Data Mining

Jiawei HanDepartment of Computer Science

University of Illinois at Urbana-Champaignwww.cs.uiuc.edu/~hanj

April 22, 2023

Page 3: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 3

Outline Database and data mining: highly promising themes

Long history of strong and successful research Lots of new challenges Lots of research themes

Selection of promising directions and promising topics Making your research bigger impact Discussing, debating, and active brain-storming Capturing and harvesting the sparks of thought

Towards highly productive research Learning from others: reviews and judgment Collaborations and team work

Page 4: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 4

DB and DM: Long History of Strong & Successful Research

Necessity is the mother of invention Coming from the real application demand Constantly seeking new and extended applications Developing core technologies for information systems

A long history of success Real systems, numerous applications, and big industry Relational database systems → application-oriented DBMS

(spatiotemporal, CRM, banking, health info, …) → data warehouses → data mining → Web search: Google

In-depth and thoroughness in research Constant search for new, innovative methodologies and algorithms In-depth study of implementation, optimization, and user needs Scalability, uncertainty, approximation, streaming, ranking,

aggregation, privacy, and security

Page 5: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 5

Still Challenging and Promising Huge amount of data is mounting up rapidly

Giga-bytes → terabytes → peta-bytes in very fast pace Data collection and dissemination: sensors, digital cameras, Web

Database and data mining: Various new applications Data streams, RFID, sensor networks, video/audio data, text and

Web, computer/software systems, social networks, biological data, and science/engineering data

Searching, ranking, mining, uncertainty, noise, privacy, security Database and data mining are still flourishing

Scalable statistical and machine learning methods Pattern analysis methods Integrated with database systems, data warehouses, and Web as

a natural, hidden process Still many open research problems and multiple research

frontiers

Page 6: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 6

Research Frontiers in Data Mining Information network analysis Stream data warehousing & data mining Pattern mining, pattern usage, and pattern understanding Warehousing, and mining of moving object data, RFID

data, and data from sensor networks Spatiotemporal and multimedia data mining Biological data mining Text and Web mining Data mining for software engineering and system analysis Data cube-oriented multidimensional online analysis Classification and ranking everywhere: databases, Web,

documents, and knowledge

Page 7: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 7

A Multidimensional View of Research Themes

Data view relational data, transactional data, information network data,

stream data, spatial, temporal, multimedia (video/audio), moving object data, RFID data, sensor data, biological data, text and Web data, software engineering and system data

Issue view modeling, management, indexing, retrieval (query), update,

integration, warehousing, mining, data cube computation, multidimensional online analysis, security, privacy, …

Methodology view: incremental, parallel, distributed For mining: statistical, machine learning, decision-tree, MDL,

HMM, Naïve-Bayes, … Application view: Different industries, governments, science &

engr. Adding dimensions: time, space, … Relaxing assumptions: approximation, uncertainty, …

Page 8: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 8

Outline Database and data mining: highly promising themes

Long history of strong and successful research Lots of new challenges Lots of research themes

Selection of promising directions and promising topics Making your research bigger impact Discussing, debating, and active brain-storming Capturing and harvesting the sparks of thought

Towards highly productive research Learning from others: reviews and judgment Collaborations and team work

Page 9: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 9

Selection of Promising Directions Read survey papers, proceedings, etc., discuss with your

friends and professors, and use your own reasoning Is the direction likely to be much needed and have a

bright future? Do I have sufficient background to work on it? Am I truly interested in it? Does the direction attract long-term investigation?

It is OK to change it or adjust it? May need to constantly adjust your research

directions Ex. Myself, from deductive DBs (recursive query

processing) to data mining

Page 10: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 10

Making Your Research Bigger Impact Necessity is the mother of invention

What is the most needed in the next several years? Will it have long term impact or fade out soon?

Innovative and thorough research Is your approach fresh, innovative, somewhat ground-breaking? Have you examined it systematically? Have you considered

alternative or previously studied methods? Can it be further improved?

Two kinds of research topics: creative vs. improvement Find new themes (new patterns, new methodologies, new

directions) Improve the existing solutions

Never be tied with the existing solutions First think on it independently, and work out independently Believe “always can find new ways to improve it!”

Page 11: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 11

Discussions, Sparks, and Technical Meat

Watch before you leap Careful and thorough thinking should go before

implementing and testing Form small groups instead of working alone

Slides, emails, and weekly theme-based meetings or teleconferences

Questions on slides, related work, new design, proposed algorithms, try to find ways to improve it

Capture and harvest the sparks of thought Many good ideas may come from a “weak” spark of thinking Capture the sparks timely and do not let it slip away

Page 12: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 12

Case 1: ICDE’07 Best Student Paper Award

Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu, and Hong Cheng, “Mining Colossal Frequent Patterns by Core Pattern Fusion”, in Proc. 2007 Int. Conf. on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007 (the BEST STUDENT award)

Identifying the problem that the current technology cannot solve and its applications Colossal patterns, bio-applications

How the paper was generated? Progressive refinement: slides → discussions → algorithms → discussions →

experiments → new slides Smart ideas and technical innovation

Page 13: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 13

Case 2: ICDE’06 Best Student Paper Award

Hector Gonzalez, Jiawei Han, Xiaolei Li, and Diego Klabjan, “Warehousing and Analysis of Massive RFID Data Sets”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.

Necessity is the mother of invention Working on a key problem: RFID data warehousing

The key solution: deep compression How deep is deep? Maximal sharing of bulky movements

Multiple designs, refinements, testing and refinement again slides → discussions → algorithms → discussions →

experiments → new slides Constant brain-storming

Page 14: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 14

Outline Database and data mining: highly promising themes

Long history of strong and successful research Lots of new challenges Lots of research themes

Selection of promising directions and promising topics Making your research bigger impact Discussing, debating, and active brain-storming Capturing and harvesting the sparks of thought

Towards highly productive research Learning from others: reviews and judgment Collaborations and team work

Page 15: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 15

Learning from Others: Reviews and Judgment

A very important task for training Ph.D. is the judgment: judging others as well as judging yourself

A good researcher should be first a good judge on research

Reading a good research paper: First read the problem and try it by yourself

Be active at serving as a reviewer: See how others evaluate the work and learn from the good judges

Read survey papers and write your own simple surveys on the problems you intend to work on

Page 16: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 16

Putting All the Eggs in One Basket?

Working on several research problems or only on one? Initially, more than one theme may help test the

water and settle down a promising theme that matches you

Even after you have been focused on one theme, it is good to try slight different problems

Productivity, alternative thoughts, adjustable solutions, and research collaborations

Working with your friends and colleagues Complement each other on strength and expertise

Page 17: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 17

Seminar Course: Continuous Training/Education

Advanced seminars for DAIS and DM group Constantly running in every semester Presenting your own work and get feedbacks from

the group Mostly are recently accepted conference papers

Requiring only one page summary/abstract Presenting good papers from recent, top

conferences: selecting only SIGKDD, SIGMOD, VLDB, ICDE, ICDM, SDM, WWW, …, conference papers published in the last 12 months.

Page 18: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 18

Conference and Journal Reviews

Volunteering on conference and journal coordination For each conference we served as a PC member, we

have one Ph.D. student volunteering as conference coordinator

S/he will communicate with the group members to select papers, collect reviews, and I will have one or more rounds of thorough discussions with the coordinator to make sure the reviews are not biased, comprehensive and in high quality

Also, the reviews will be relatively ranked and balanced A good exercise for all the participants Similar exercises for journals and proposal reviews

Page 19: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 19

Semester Summary and Awards

Award summary as a way to promote excellence on research Summary meeting at each semester Summary on each student’s Webpage and

presentation Award voting with multiple grades: Gold, silver,

bronze and honorable mentioning Vote after the major conference evaluation results

are out Publish the award voting summary Presents and web publicity

Award competition also promotes collaborations

Page 20: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 20

Questions

Page 21: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 21

Thanks and Questions

Page 22: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 22

Create a Productive Research Group

Selection of promising students Training and selection of students from classes Test run with research problems Watch on sparks and working attitude Written qualifications vs. oral ones

Team organization CS591 vs. meetings (start + ending meetings) Use student’s expertise, strength, and interests Division of group work: Everyone is in charge Theme-based dynamic small research groups Encouraging students on their progress: papers, etc. Semester summary, web-pages Award competition

Page 23: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 23

Group Administration/Public Relation Work (Sept. 06-

Aug.'07) Group Webmaster (news, group Web page, pictures, etc.): Tianyi

Wu Web-based research reference collections: Hong Cheng Hardware, equipment, and software master: Sang Kim TKDD Information Director: Xiaoxin Yin DAIS seminar coordinator: Deng Cai DAISY System administrator: Hector Gonzalez IlliMine project coordinator: Xiaolei Li Industry/visitor coordinator: Chao Liu Conference and journal review coordinator (3): Dong Xin, Jing Gao

and Chen Chen Research proposal coordinator (2): Feida Zhu and Jianlin Feng Social activity coordinator: Jaegil Lee, Ok-ran Jeong

Page 24: Towards Successful Ph.D. Research in Database Systems and Data Mining

04/22/23 24

Work on Promising Research Topics

Selection of promising research topics Select topics based on your strength and interest Putting all the eggs in one basket ?― may work on 2-3 topics

at the same time Discussion, debate, and active brain-storming Capture and harvest the sparks of thought

Two kinds of research topics: creative vs. improvement Find complete new theme (new patterns, new methodologies,

new directions) Improve the existing solutions

Never be tied with the existing solutions First think on it independently, and work out independently Believe “always can find new ways to improve it!”