15
Building a data-set AJIT PHADNIS 14 TH JULY 2015

Building Your Own Data Set - Ajit Phadnis

Embed Size (px)

Citation preview

Building a data-set

AJIT PHADNIS

14TH JULY 2015

When does a research become significant?

1. Addresses a pressing issue which has been overlooked2. Brings a new perspective to the way an issue has been looked3. Introduces new data to investigate an already examined or

new issueMany researchers begin with ambitions of ‘breaking in’ with new theories or challenging established ones, but often times bringing in new data may be the best way to make a ‘mark’

‘Borrowed’ vs ‘Built’ data ‘Borrowed’ data-set - Where the data for all variables (dependent, independent) of interest come from one database; your data-set is therefore a subset of a larger database

‘Built’ data-set - Where more than one data source is used to construct a data-set

=> The researcher can build this data-set from a combination of primary and secondary sources

Primary and secondary data Primary data: Data collected by the investigator himself/ herself for a specific purpose Advantage: Control over quality of the data, can collect additional data Disadvantage: Cost of gathering data Secondary data: Data collected by someone else for some other purpose (but being utilized by the investigator for another purpose) Advantage: Less cost, can get temporal data Disadvantage: Cannot be sure of quality!

Caveat with secondary data Illustration from National Election Survey 2014 study:

Q15 (a) Which party is better for administration?

Now look at this question in the backdrop of what was asked earlier in the Questionnaire Q1 (a) Whom did you vote for?

Q15 (a) Which party is better for administration?

Do you think the responses would be identical if Q 15(a) was asked as an independent question?

Why do many empirical researchers NOT attempt to build data-sets?• They believe that Large ‘n’ sample sizes are a must to prove any point Borrowed data is a good means to deflect questions regarding data Time pressures working within a ‘Publish or Perish’ culture What if I collect all that data and nothing proves ‘significant’?

• But what they lose out on in the bargain Data drives the research question rather than the other way round => Torture the data till it

confesses! Pushes you to look for data from other countries where Indian researchers have little contextual

knowledge. Further we collectively end up with very little research on India There is far less enthusiasm in conducting a research that you were not driven about in the first

place => How is this different from a corporate job that you have quit to come to research?* There are few lucky ones who manage to get ‘perfect’ borrowed data to answer their self-driven research question!

A personal journey with building data

Background of political science research (esp. parties) in India

Many research would qualify as historical narratives

Abundant empirical research in the qualitative domain: election studies, party studies, leadership studies

Relatively few attempts at quantitative research: mostly descriptive kinds and few statistical efforts unlike the abundant quantitative literature in public policy

I estimate that a large portion of quantitative research on political parties exercises data from the National Election Study

In general, there is a perception that data for cross-party comparisons in India is very difficult to get! (e.g.: no. of members, selection of candidates, intra-party careers, leadership selection)

Research interest Investigating intra-party functioning of Indian political parties NES gives limited information on party members and supporters, but not much

can be derived about party’s internal processes Specifically I wished to look at intra-party career paths that parties offered to

their members Proliferating literature on party switching in many countries; no such studies on

party switching in India This presented an opportunity for me to connect the phenomena of party

switching with intra-party career paths presented by parties My specific contention: “Parties that offer systematic career paths are likely to

experience lower levels of party switching”

Operationalizing intra-party careers How does one represent ‘systematic intra-party careers’ in quantitative terms? I conceptualized that systematic intra-party careers should have two properties

Party career lengths should be long => implying that members grow up the ranksParty career lengths should be predictable => Members should follow a similar career path

Next question: Where do we get data for intra-party careers? First step is to explore possible information sources where politician career

data may be available: Election Commission, party Constitutions It then occurred to me that it may be possible to locate information on the

career backgrounds of party legislators => published on Lok Sabha website

Illustration of data source Triangulated with other sourcesMP’s personal websiteMP’s social media profilePolitical party websiteCandidate interviews on Mera

NetaNational newspaper reports

How the data-set looks?S.No. State Party Name

Local assembly/

govt.

State assembly/

Council

State body/ Minister/ Chairman

Comm.

National assembly

National body/ Minister/

Chairman Comm.

Party ancillary (Youth,

women)

Caste/ Commun

ity in-charge

Local partyState party

National party

Career Score

1 Mah SSAdhalrao Patil,Shri

Shivaji0 0 0 1 0 0 0 0 0 0 1

2 WB AITC Adhikari,Shri Deepak 0 0 0 0 0 0 0 0 0 0 0

3WB AITC

Adhikari,Shri Sisir Kumar

1 1 0 1 1 0 0 0 0 0 4

4 WB AITC Adhikari,Shri Suvendu 1 1 0 1 0 0 0 0 0 0 35 UP BJP Adityanath ,Shri Yogi 0 0 0 1 0 0 0 0 0 0 16 Mah SS Adsul,Shri Anandrao 0 0 1 1 1 0 0 0 0 0 37 Guj BJP Advani,Shri Lal 1 0 0 1 1 0 0 1 0 1 58 UP BJP Agrawal,Shri Rajendra 0 0 0 1 0 0 0 1 1 0 39 Ker IUML Ahamed,Shri E. 1 1 1 1 1 0 0 0 1 1 7

10Mah BJP

Ahir,Shri Hansraj Gangaram

0 1 1 1 0 0 0 0 0 0 3

11 Raj BJP Ahlawat,Smt. Santosh 1 1 0 0 0 1 0 1 1 0 512 WB BJP Ahluwalia,Shri S.S. 0 0 0 1 1 0 0 0 0 1 313 WB AITC Ahmed,Shri Sultan 1 1 0 1 1 1 0 0 0 1 6

14 Ass AIUDFAjmal,Maulana Badruddin 0 1 0 1 1 0 0 0 0 1 4

15 Ass AIUDF Ajmal,Shri Sirajuddin 0 1 1 0 0 0 0 0 0 0 216 WB AITC Ali,Shri Idris 0 0 0 0 0 0 1 0 0 0 117 Kar BJP Ananth Kumar,Shri 0 0 0 1 1 1 0 0 1 1 5

18 Kar BJPAngadi,Shri Suresh

Chanabasappa0 0 0 1 0 0 0 1 0 0 2

19 Ker INC Antony,Shri Anto 0 0 0 1 0 1 0 1 1 0 4

20 Bih NCP Anwar ,Shri Tariq 0 0 1 1 1 1 0 1 1 1 7

No Position Points

1. Local governing bodies 1.02. State assembly 1.0

3. State body/ minister 1.0

4. National assembly 1.0

5. National body/ minister 1.0

6. Party at local level 1.07. Party at state level 1.08. Party at national level 1.09. Ancillary party bodies 1.0

10. Party community groups 1.0  Career Score 0-10

The data collection effort Roughly it took me between 18-20 mins to gather the background

profile of one MP So far I have completed coding the profiles of 540 MPs from the 16th

Lok SabhaÞApproximate time taken for this effort ~ 10,000 mins => 170 hours In order to beef up the sample sizes I will also be coding the 540 MP profiles from the 15th Lok Sabha So my data collection effort is only half done!

Gains from a built data-set The data-set becomes a strong selling point for your paper A good data-set has the potential to be used for more than one

research project Someone who uses your data in future all cites you. So more

citations! It is easier to ask others for data if you have data to share It is your contribution to the universe of data in a particular domain

Important do’s and don’ts Your proposal is the basis on which you propose to gather data. Get brief

proposals (2-3 pages) vetted by good academic minds before beginning data collection.

Ensure that you have created a Coding Manual before you start collecting data. Edit the manual as you come across data that do not fall under your initial classification

After collecting 10% of data, check whether the data trend broadly matches your initial expectations. Conduct such recurring tests

Consider collecting additional data (with no additional cost) that may be useful for future research

Thank you