26
How Public Should Public Data Be? Privacy & E-governance in India Chintan Vaishnav Sloan School, MIT [email protected] Karen Sollins CSAIL, MIT [email protected] Nikita Kodali EECS, MIT [email protected] Abstract India is at a critical juncture as cities and governments in general move into the digital age in order not only to provide enhanced services, but to improve transparency and efficiency, and as this development is taking place at a time when the Supreme Court bench has decided in August, 2017 that privacy is a constitutional right. It is in this context that this research asks the follow- ing question: how can cities (a state actor) use citizen data to maximize the governance while protecting the cit- izens fundamental right to privacy? We examine how this balance and tradeoffs arise due to the way Urban Local Bodies (ULBs) collect, use, and disclose citizen data. For the investigation, we collect ULB metadata by examining the architecture of the products of eGovernments Foun- dation, one of the leading providers of digital tools for ULBs, and by directly by interacting with ULBs. Based on this investigation, we define two new indices, a Gov- ernance Efficiency Index (GEI) and a Information Pri- vacy Index (IPI), that allows for measuring city’s per- formance on the two dimensions, understanding where tensions arise in simultaneously improving performance on both, and where innovation will overcome such ten- sions. In addition, to being able to make specific obser- vations and recommendations for the particular cases we examined, this work is a proof of concept that such anal- ysis can and should be done more widely, in order to un- derstand the tradeoffs between government transparency and efficiency on one hand, and privacy on the other. 1 Introduction India as a nation is at dual inflection points. On the one hand there are many initiatives to move various aspects of Indian civil life into the digital age, including the work of the eGovernments Foundation beginning in 2003 [5], the work of the Unique Identification Authority of India (UIDAI) [22] beginning in 2009 and the realization of its work in Aadhaar 1 and the Smart Cities initiative [12] begun in 2015. Concurrently, the Supreme Court in its decision of August, 2017 [13] and further expanded and explored by the SriKantha panel of experts [25] has de- fined privacy as a constitutional right, which will have tectonic societal and operational implications. At the juxtaposition of these two forces in this pa- per we analyze the increased the roles, rights and re- sponsibilities of citizens, businesses, and municipal gov- ernments in simultaneously providing increased trans- parency and efficiency of municipal government opera- tion and privacy as defined by the Supreme Court deci- sion as well as the subsequent Srikrishna experts white paper. This paper takes concrete steps in analyzing the types of data being collected by municipalities, the uses of that data to achieve governmental operations, and tradeoffs between governmental effectiveness and pri- vacy driven constraints on data access. Because this research is focused on those two major transformations in Indian municipal governance, it is im- portant to note that until recently all government infor- mation was collected on paper forms. The first step in au- tomating that has been to collect the same data as on the paper forms, but electronically. We begin by observing that the paper-based data collection, provided significant privacy for the citizen in conjunction with much more limited transparency and efficiency of those operations. With a mandate to improve transparency and efficiency, cities are moving toward digitization of data collection, especially as enabled by the work of the eGovernments Foundation. This is occurring while separately the legal situation is being transformed with the Supreme Court decision on privacy and the succeeding committee of ex- perts’ white paper. The final element in the evolving con- text is the explosive technological growth in BigData and inference. It is at the cross roads of these that we proceed with our study. As discussed by Barocas and Nissenbaum [15], ques- tions of privacy can valuably be separated from questions

How Public Should Public Data Be? Privacy & E-governance

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How Public Should Public Data Be? Privacy & E-governance

How Public Should Public Data Be? Privacy & E-governance in India

Chintan VaishnavSloan School, [email protected]

Karen SollinsCSAIL, MIT

[email protected]

Nikita KodaliEECS, MIT

[email protected]

Abstract

India is at a critical juncture as cities and governmentsin general move into the digital age in order not only toprovide enhanced services, but to improve transparencyand efficiency, and as this development is taking placeat a time when the Supreme Court bench has decidedin August, 2017 that privacy is a constitutional right.It is in this context that this research asks the follow-ing question: how can cities (a state actor) use citizendata to maximize the governance while protecting the cit-izens fundamental right to privacy? We examine how thisbalance and tradeoffs arise due to the way Urban LocalBodies (ULBs) collect, use, and disclose citizen data. Forthe investigation, we collect ULB metadata by examiningthe architecture of the products of eGovernments Foun-dation, one of the leading providers of digital tools forULBs, and by directly by interacting with ULBs. Basedon this investigation, we define two new indices, a Gov-ernance Efficiency Index (GEI) and a Information Pri-vacy Index (IPI), that allows for measuring city’s per-formance on the two dimensions, understanding wheretensions arise in simultaneously improving performanceon both, and where innovation will overcome such ten-sions. In addition, to being able to make specific obser-vations and recommendations for the particular cases weexamined, this work is a proof of concept that such anal-ysis can and should be done more widely, in order to un-derstand the tradeoffs between government transparencyand efficiency on one hand, and privacy on the other.

1 Introduction

India as a nation is at dual inflection points. On the onehand there are many initiatives to move various aspectsof Indian civil life into the digital age, including the workof the eGovernments Foundation beginning in 2003 [5],the work of the Unique Identification Authority of India(UIDAI) [22] beginning in 2009 and the realization of

its work in Aadhaar 1 and the Smart Cities initiative [12]begun in 2015. Concurrently, the Supreme Court in itsdecision of August, 2017 [13] and further expanded andexplored by the SriKantha panel of experts [25] has de-fined privacy as a constitutional right, which will havetectonic societal and operational implications.

At the juxtaposition of these two forces in this pa-per we analyze the increased the roles, rights and re-sponsibilities of citizens, businesses, and municipal gov-ernments in simultaneously providing increased trans-parency and efficiency of municipal government opera-tion and privacy as defined by the Supreme Court deci-sion as well as the subsequent Srikrishna experts whitepaper. This paper takes concrete steps in analyzing thetypes of data being collected by municipalities, the usesof that data to achieve governmental operations, andtradeoffs between governmental effectiveness and pri-vacy driven constraints on data access.

Because this research is focused on those two majortransformations in Indian municipal governance, it is im-portant to note that until recently all government infor-mation was collected on paper forms. The first step in au-tomating that has been to collect the same data as on thepaper forms, but electronically. We begin by observingthat the paper-based data collection, provided significantprivacy for the citizen in conjunction with much morelimited transparency and efficiency of those operations.With a mandate to improve transparency and efficiency,cities are moving toward digitization of data collection,especially as enabled by the work of the eGovernmentsFoundation. This is occurring while separately the legalsituation is being transformed with the Supreme Courtdecision on privacy and the succeeding committee of ex-perts’ white paper. The final element in the evolving con-text is the explosive technological growth in BigData andinference. It is at the cross roads of these that we proceedwith our study.

As discussed by Barocas and Nissenbaum [15], ques-tions of privacy can valuably be separated from questions

Page 2: How Public Should Public Data Be? Privacy & E-governance

of anonymity. There is increasingly sophisticated workon anonymity in large datasets, beginning with the earlywork on k-anonymity [27, 28] and thence to differentialprivacy [18] and searchable encryption [16] to provideanonymity. But as Barocas and Nissenbaum point outmuch of the challenge of privacy has to do with privacyviolations or infringements based on inference. With re-spect to inference one interesting further path of researchis to consider the basis for any inference, because thatbasis is often key to definitions of privacy-violating orunfair discrimination, such as the work by Datta [17]on discovering the underlying data used for inference.These advances in technology and thinking allow us tobe careful in defining the scope of our study.

The context for this study, which is that munici-pal government operation, is divided into more thantwo dozen primary areas of operation, ranging fromgrievances, to several kinds of taxes such as property andwater, to government services such as water and sewagehookups, to record keeping such as births, deaths, andmarriages. These areas are organized into four primaryfoci of operation: revenue, expenditure, citizen services,and administration. Each area has a defined set of data itcollects, a rich record for each transaction, ranging fromidentification of the submitter of the record, to informa-tion in order to perform the transactions. Each of theprimary areas performs a number of functions, based ontheir internal organizational structure using some or allof the data and possibly data from other functions. Ourobjective is to define a framework for making choices be-tween privacy and the governmental operations definedby transparency, efficiency and effectiveness at doing therequired tasks. To understand and evaluate this tradeoff,we define two new indices: a Government Efficiency En-dex (GEI) and an Information pPivacy Index (IPI). Thecentral question we want to ask for each piece of data isabout its utility in the context of its governmental areaof operation, and the impact of either constraining oreliminating it. This will allow us to better understandthe implications of privacy policies on governance.Ouranalysis enables us to answer questions about each datafield collected as well as aggregate information, its use,importance in either efficiency or transparency and riskwith respect to privacy. Furthermore, with more in-depthanalysis we can understand the cultural, financial andfunctional exposure impact.

The paper proceeds as follows. In Section 2, we ex-pand on the research questions considering evaluation inthe context of both existing, widely held privacy prin-ciples and our two indices. Because the context ofthis work is so tumultuous, Section 3 provides extensivebackground separately on the situation on the ground inIndia (technical and cultural) in Section 3.1 and sepa-rately on the legal constitutional decisions with respect

to privacy, including a discussion about Aadhaar, in In-dia, in Section 3.2. For the reader knowledgeable inone or the other of these areas, these sections may beskipped. Section 4 explains our and data collection pro-cesses. Analysis and results are discussed in Section 5,and further synthesis with additional discussion of theuse and impact of our two new indices occurs in Section6. We conclude in Section 7 with a summary of our ob-servations and lessons, a discussion of limitations of thework, and finally directions in which this work will moveforward.

2 The Research Questions and RelevantPrivacy Principles

In order to analyze the privacy implications inherent inthe move to digitized Urban Local Bodies (ULBs) gover-nance, we first formulate the research questions of inter-est. Then, we evaluate the questions from the perspectiveof the four key privacy principles, derived from a subsetof those specified by the Supreme Court judgment, inSection 2.2. Finally, we mention the two indices that areproposed as a result of this research, a Governance Ef-ficiency Index (GEI) and an Information Privacy Index(IPI), in order to help the reader evaluate tradeoffs in thecollection, use, disclosure, and possibly opportunities forinnovation, as they read the rest of the paper.

2.1 Research Questions

The overarching question of interest to this paper is this:

How can cities (a state actor) use citizen datato maximize the governance while protectingthe citizens fundamental right to privacy?

The Srikrishna Committee white paper [?] identifiesseveral privacy principles to be of importance to India’scontext. Given the focus of this research on cities, we fo-cus on the following subset: Data Collection, Data Use,Data Disclosure, Data Security and Data Anonymity.These are principles that affect both governance effi-ciency and privacy, and where cities, acting as a DataController, must determine what their policies oughtto be. Below, these principles are presented as sub-questions serving the above overarching question:

1. How does limiting the collection of citizen data af-fect governance efficiency and information privacy?

2. How does loss of data integrity affect data use?

3. How to disclose or anonymize citizen data withoutaffecting privacy in undesirable way?

2

Page 3: How Public Should Public Data Be? Privacy & E-governance

Answering these questions would then clarify city-level action on the rest of the principles that have moreto do with how to implement privacy protection, and arealso identified by Srikrishna Committee as being rele-vant to India, such as notice and consent for data collec-tion and use, and openness with respect to publication ofthese policies. These can be dealt with once we answerthe above questions, and are therefore not focused uponin this paper.

2.2 Evaluation in the context of privacyprinciples

The Supreme Court decision on privacy identified nineprivacy principles, as the basis for its decision.2 In thiswork, we concentrate on four: collection limitation, pur-pose limitation, disclosure, and anonymity, the last twoof which fall under the security principle of the SupremeCourt decision. We quote the decision on these threeprinciples:

(iii) Collection Limitation: A data controllershall only collect personal information fromdata subjects as is necessary for the purposesidentified for such collection, regarding whichnotice has been provided and consent of theinidividual taken. Such collection shall bethrough lawful and fair means;

(iv) Purpose limitation: Personal data collectedand processed by data controllers should be ad-equate and relevant to the purposes for whichit is processed. A data controller shall collect,process, disclose, make available, or otherwiseuse personal information only for the purposesas stated in the notice after taking consent ofindividuals. If there is a change of purpose,this must be notified to the individual. Af-ter personal information has been used in ac-cordance with the identified purpose it shouldbe destroyed as per the identified procedures.Data retention mandates by the governmentshould be in compliance with the National Pri-vacy Principles.

...

(vii) Security: A data controller shall securepersonal information that they have either col-lected or have in their custody, by reasonablesecurity safeguards against loss, unauthorisedaccess, destruction, use, processing, storage,modification, deanonymization, unauthoriseddisclosure [either accidental or incidental] orother reasonably foreseeable risks.

In this context, our first topic of concern is the ques-tion of what data is being collected, its uses, necessity,and requirements. As we have discovered in our detailedexamination of the data that is collected and its degree ofnecessity and importance, the current practice is to col-lect exactly the same data that was originally collectedon paper. We began, for instance, by noticing that mobilephone numbers are collected everywhere. This led to aseries of interesting questions, ranging from ”Is this nec-essary?” to ”Is this valuable?” to questions of how ’iden-tifying’ mobile phone numbers are, to whether mobilephone numbers could not be collected, since an increas-ing number of the submissions are from mobile phones.As we will see below, we made some interesting dis-coveries in this area, and the answers are not necessarilywhat we had expected. In addition, since we will be an-alyzing our results for two types of potential violations,data integrity and data confidentiality, understanding thevalue and importance of the data is a component of eval-uating the risks as traded against the value. One compo-nent of our data collection was to identify all the kinds ofdata that are being collected.

Second, we focus on analyzing the types and nature ofdata collection from the perspective of the second prin-ciple above, purpose or use limitation. This principle fo-cuses on the context in which the data is intended for use,which is closely related to Nissenbaum’s concept of pri-vacy in context. [23] One of the important observationswe make in our contextual analysis of the data is that inanalyzing for purpose limitation the results are generallynot binary. It is not the case that a piece of data is eitherabsolutely critical or irrelevant. There is middle ground,where the presence of a particular data field will improvesome aspect of the purpose and functionality. The mobilephone numbers above are an example. If the function inquestion is the assignment of a value to a piece of prop-erty for tax purposes, having the owner’s phone num-ber simplifies setting up an appointment to inspect theproperty. Thus the assessment in the end may be bothmore efficient (one of the objectives in improving cityservices) and more accurate, because the inspector canget onto the property. In contrast, one might ask aboutthe necessity or value of including mobile phone num-bers in marriage records, as is current practice. Thus,a component of our research has been to collect infor-mation about which data is used for which purposes inwhich modules, and its degree of importance or necessityin achieving government functions. For this, we considernot only which data is needed, but exactly who, withinthe governmental organization needs that access.

Our last two principles both derived from the de-scription of security, above, in particular disclosureand anonymity. Our focus with respect to both un-wanted or unauthorized disclosure and opportunities for

3

Page 4: How Public Should Public Data Be? Privacy & E-governance

deanonymization have led to an analysis of the data withrespect to a combination of two subjects of risks, thesource of the record and, if there is a subject, then thesubject. Thus, in a request for a water connection, theonly person involved is the requester. In a grievance,there is the submitter of the grievance as well as the tar-get. We notice that targets may be the ULB itself as ina pothole or street light being out, an individual, as in acomplaint about nighttime noise, or a business, as in acomplaint about unlicensed selling of food. We exam-ine the exposure and opportunities for deanonymizationalong three types of impact: cultural, functional, and fi-nancial, both with respect to data integrity violations anddata confidentiality violations. For each, we also con-sider the probability of it occurring as well.

This collection of data about the data, both the fields ofdata collection and analysis of the use and implicationsof violations of either integrity or confidentiality formsthe basis for our analysis.

2.3 Evaluation indices

This work arrives at two indices, a Governance Effi-ciency Index (GEI) and an Information Privacy Index(IPI), that together show how exactly does the data col-lection, use, disclosure, and anonymity affect cities per-formance on the dimensions of governance efficiencyand privacy. More importantly, collection of what typeof data creates a tension between increasing performancealong one index while lowering it on the other, therebyidentifying a space where innovation can help in keepingboth of the indices high.

The Governance Efficiency Index (GEI) arises frommultiplication of two components: Timeliness of Ser-vice, and Accuracy of Service. A service is timely whendelivered on or before the time limit published by thecity. A service is accurate when the right service is de-livered to the right citizen without any rework.

The second index, Information Privacy Index (IPI),arises from multiplication of three components: RightCollection, Right Use, Right Disclosure of the citizendata. The Data Collection is right when it is minimal-ist in terms of limiting it to data absolutely necessary forproviding the requested service. The Data Use is rightwhen the data is accessible to only those functionaries atthe city who need to deliver the requested service. TheData Disclosure is right when no data that makes a cit-izen personally identifiable nor allows for any ’undesir-able’ inference about them. In section 6 we define andformulate these indices precisely, discuss how to mea-sure them, and their various implications.

3 Background: Digitization and Constitu-tional Privacy

In this section we examine how India arrived at the pointwhere we could and should seek answers to the abovequestions. We consider advances in technology and thelegal and social basis for attention to privacy. With re-spect to technology, there are two thrusts that bear re-view, the increased accessibility to the Internet in thehands of individuals and the evolution of computerizedsystems in urban governments, including the technicalwork on Aadhaar. With respect to privacy, the devel-opment of Aadhaar and its biometric-identity basis hasraised increasing numbers of legal cases on questions ofprivacy, which in the longer term led to the SupremeCourt Bench decision in August, 2017 on the constitu-tionality of privacy in India. We will examine these twotopics separately below.

Again, for the reader familiar with technology and dig-itization developments in India, we recommend skippingthe next section. For the reader familiar with the con-stitutional developments in India with respect to privacy,we recommend skipping Section 3.2 and going directlyto Section 4. .

3.1 Advances in Digitization of CitiesOver the last two decades or so, there has been a con-certed effort to transform urban local bodies (ULBs), thelegal term for municipal governments, with the intentionof making them more agile in addressing citizens’ con-cerns, to make them scalable, transparent, and efficient.We review this development here, because it is culminat-ing in digitizing and automating as much of the ULBs’activities as possible. Although the ULBs themselves areresponsible for their own governance, the federal govern-ment has also enhanced its support and encouragement ofthese advances.

At the federal level, the current ministry responsiblefor urban governance is the Ministry of Housing and Ur-ban Affairs (MOHUA). Its current set of responsibilities[7] is the apex organization at the national level address-ing issues of housing and urban issues, formulating pol-icy, coordinating and sponsoring state level programs,generally executed by state and local governments. Interms of ULB governance, from the perspective of thisproject one of the particularly interesting programs for-mulated by the MOHUA was a program to reorganizeULBs to develop and support satellite ULBs around theseven largest cities, [9]. This is only one of their manyprograms and efforts.

In terms of the cities themselves, there are two primarytrajectories that are important to note here. The first isthe growth in urban population. The 2011 census [1] re-

4

Page 5: How Public Should Public Data Be? Privacy & E-governance

ported a total population of 1.2B people. There are threemega-cities of over 10M each and another 43 with popu-lations over 1M. Recognizing that the metropolitan areasof the top seven largest cities in India had serious organi-zational and service-provision issues, [9] the governmentcreated the Jawaharlal Nehru National Urban RenewalMission in 2005 [6] to create and improve governance in63 cities across India includ these satellite areas, , espe-cially to improve service delivery to poorer citizens. Partof this effort, as can be seen by the checklists in the satel-lite town efforts is to move their work to a digital ratherthan paper-based approach. It is important to note that itwas during this time that the eGovernments Foundation,which had been founded in 2003, grew in importance,building their open source technologies for ULBs to usein providing citizen interfaces, both in terms of initial re-quests for services and in tracking the progress of theirrequests.Two interesting recent developments (missions)of the government of India are the Atal Mission for Re-juvenation and Urban Transformation (AMRUT) to pro-vide many improved city services especially for poorercitizens, including among other things water, sewerage,transportation and park services [2] in their homes andneighborhoods across the country, and the Smart Citiesinitiative [10] to make city governance ”smart”, both be-gun in 2015. The latter is bringing a cadre of cities intoits program each, in order to grow its base. There arealso parallel programs to improve urban transportation,although that is less related to our particular study.

At the same time both mobile/smart phone use and lit-eracy continue to grow, making smart phone access to thesorts of government services addressed in this paper in-creasingly available to more of the population. Overall,according to the 2011 Census, literacy grew from over64% in 2001 to over 74% in 2011 although the averagenumber of years of schooling across the country is only5.1 years. [8] Data tells us [11] that in 2013 there were524.9M mobile phone users and predicts that by 2019that number will be 813,2M and a growth between 2015and 2019 of smart phone ownership going from 199Mto 383.9M. This is important because the two methodsof citizen interaction in the egovernment types of ini-tiatives are through the citizen using a smart phone todirectly submit electronic forms or through direct con-tact in which a government representative will enter anelectronic form on behalf of the citizen. The intentionis that smart phone interaction comprise the vast major-ity of these, which becomes increasingly viable with in-creased literacy and increased smart phone availability.

Another significant development in this same time pe-riod is the work of the Unique Identification Authorityof India (UIDAI) and what has come to be called Aad-haar, as mentioned above, the move to biometric identityfor all Indian residents. In an effort to create a compre-

hensive database of its citizens, in 2009 the Governmentof India created Aadhaar, the worlds largest biometricID system, collected by the UIDAI. We note here thatas Nilekani and Shah [22] report that between 2009 and2014 they had registered 900 million people.

3.2 The Constitutionality of PrivacyIn 2009, India began down the path of registering everycitizen with their information and certain biometric dataunder Aadhaar. While Aadhaar was used to verify iden-tification and document citizens, there occurred numer-ous cases where sensitive personal information had beenleaked or hacked and privacy was compromised. As aresult petitions questioned the need to collect biometricinformation, data security, and personal privacy, the cen-tral government issued a Supreme Court bench to decideon whether under the Indian Constitution, if privacy is aguaranteed fundamental right.

3.2.1 Supreme Court Decision

At this point, the central government issued a nine-judgebench decision to reflecting how the Constitution makersenvisioned the nature of privacy:

• Is privacy a guaranteed fundamental right in theConstitution?

• What is privacy defined as?

• Is the right to privacy embedded in the right to lib-erty and personal dignity, or other guarantees ofprotected fundamental rights?

• In what parts of a citizen’s life is privacy guaran-teed?

• How much should the government regulate privacy(nature of regulatory power)?

• What are the different aspects of privacy and doesthe Constitution cover some but not the others?

On August 24, 2017, the Bench unanimously decidedthat under the Indian Constitution, privacy is a funda-mental right other than for reasons of national security,protection against crime, and protection of revenue. Ob-serving that the Indian Constitution is a dignitarian con-stitution focused on upholding every citizens personaldignity, the Bench outlined several reasons why privacyis important for ordered liberty: (1) privacy is a form ofdignity; (2) privacy provides a limit on the government’spower as well as a limit on private sector entities’ power;(3) privacy is key for freedom of thought and opinion;(4) it provides the right to control personal information

5

Page 6: How Public Should Public Data Be? Privacy & E-governance

as well as provides incentive for development of person-ality; (5) a guarantee of privacy prevents unreasonableintrusions by malicious public, private, or individual ac-tors. It was determined that privacy is intrinsic to thevalues of Article 21 which gives citizens the right to lifeand personal liberty. Furthermore, privacy should applyto both physical forms and to technological forms of in-formation; rights to enter the home should be up to theindividual, excepting security reasons listed in Article14. Lastly, privacy serves eternal values and guaranteesas well as foundation of ordered liberty. Consequently,the Bench formulated a three-fold requirement for a validlaw on privacy:

1. A law stating the privacy is a fundamental right ac-cording to Article 21 should exist.

2. To guard against arbitrary state action, the restric-tions imposed on the nature and content of the lawshould abide by Article 14’s exceptions to reason-ableness.

3. The legislature must be proportional to the objectand needs sought to be fulfilled by the law.

The Bench, recognizing that data protection and dataprivacy are complex issues that require expert opinion,mandated that the government create a Committee of Ex-perts under the Chairmanship of Justice BN Srikrishna, aformer judge of the Indian Supreme Court, to deliberateon a data protection framework for the country. Whilethe constitutionality of the right to privacy was decidedupon, the complexity of regulating privacy derives fromthe context-dependent economics of privacy. To betterunderstand existing models of privacy protection and en-forcement, an understanding of the transforming defini-tion and value of privacy depending on contexts is im-portant. The committee’s report was delivered in lateNovember, 2017 to solicit feedback, with a more finalreport in July, 2018.

3.2.2 Understanding the Value of Privacy in the In-dian Context

The combination of big data and machine learning tech-niques can inform significantly about society and havethe potential to bring about positive societal change anddigital records can allow for greater efficiency and ac-countability. Policymakers have to reconsider how openshould open data be, and where the fine line lies be-tween keeping information private yet taking the mostadvantage of the large scale of digitized information. Of-ten, data collected with informed consent for a particu-lar purpose can be repurposed and analyzed for a dif-ferent subset of insights. In these cases, the economic

value of the data changes and especially in the big datarealm, privacy regulation grapples with problems of un-predictability, externalities, probabilistic harms, and val-uation difficulties.[26]

The economics of privacy concerns the trade-offs as-sociated with the balance of public and private spheresbetween individuals, organizations, and governmentswith respect to personal privacy, as discussed by Acquistiet al.[14] In the Indian situation, the data generated bythe citizens, who are often the data subjects or providers,is passed sequentially to the data collector, data holder,and data users who may be private or public entities pro-viding a particular service to the citizens. The data col-lectors for e-governance data are the municipal govern-ments, but the data holders in the backend differ fromstate to state, the e-Governments Foundation [5] beingone of these hired by individual states independently. Fortrue economic analysis, one would need to consider thefull set of participants in the data including: (1) the dataproviders, often the subjects of the data; (2) the data col-lectors, generally the ULBs themselves; (3) the data stor-age managers, which may be the ULBs or private con-tractors; (4) the data analyzers and users, who again maybe the ULBs themselves, or third party contractors; and,finally, (5) infrastructure providers such as cloud servicesand network providers. All will have access to the datain one way or another and all will have potential eco-nomic incentives in handling the data and possibly in theprivacy of it (or not).

While citizens derive individual benefits and enjoyany common public goods produced using the assem-bled data, three key themes emerge from the flow and useof information about individuals by firms or governmen-tal organizations as discussed by Acquisti et al.. First, asingle unifying practice of privacy is difficult to formu-late, as privacy issues of economic relevance arise in awide variety of contexts and a variety of markets for per-sonal information. Although the Smart Cities Missionmentions only security in the e-governance context, theSrikrishna Committee seeks to create an overarching dataprivacy protection framework that may minimizes costsfrom standardizing across sectors, even if it may not beable to minimize trade-offs.

Second, it is difficult to conclude whether privacy pro-tection entails a net positive or negative change in eco-nomic terms, because of the tradeoffs between protect-ing against fraud and identity theft, and the costs ofanonymizing data, securing the storage of data, and soon. For instance, while revealing mobile location in-formation can be beneficial in improving traffic condi-tions or transportation efficiency, it may be consideredan intrusion upon privacy if the government continuouslymonitors citizens’ locations with the intent of surveil-lance.

6

Page 7: How Public Should Public Data Be? Privacy & E-governance

Lastly, especially in a country like India where its 1.3billion citizens lie on a broad spectrum of levels of edu-cation and income, a large number of poor or poorly ed-ucated people are at a disadvantage in accurately assess-ing the benefits or consequences from the sharing or pro-tecting of personal information. Even the most educatedcitizens may not necessarily understand the power of an-alytics or apply machine learning. Furthermore even themost conscientious organizations may not be able to layout all of the scenarios in which the information will beused. Disclosing data causes a reversal of informationasymmetries: before the information is released, the datasubject, holds greater knowledge about the informationthan the data holder. Afterwards, the data subject maynot know what the data holder can do with the data andthe consequences associated with sharing the data. Whilegiving up privacy may allow a citizen to receive tangiblebenefits such as welfare approval, revealing the data mayalso incur intangible consequences, such as the loss ofautonomy and possibility of increased surveillance .Themarket cannot respond appropriately to information gapswhere users cannot express their true preferences for pri-vacy protection. [19] As a result of this informationasymmetry, designing even specific privacy regulationscannot necessarily cover unknown use cases or accountfor under-informed citizens.

In addition to the caveats that exist with creating pri-vacy legislation, there are two basic tradeoffs that thegovernment sees with the sharing of personal data, as dis-cussed by Acquisti et al. First, individuals and communi-ties can economically benefit from sharing data. One par-ticular case in which the sharing of data is undoubtedlybeneficial is India’s Mahatma Gandhi National RuralEmployment Guarantee Act (MGNREGA), the world’slargest social welfare scheme [24] to alleviate povertyand provide benefits to impoverished and marginalizedsections of society. At the same time, when Aadhaar islinked to welfare schemes and education scholarships,inappropriate access to the information could compro-mise personally identifiable information like bankinginformation or culturally sensitive information such ascaste.

Second, certain positive and negative externalitiesarise through data creation and transmission. The ob-vious positive externality is that specific aggregate andindividual analysis of the data may lead to correlationbetween particular events in for example the health oreducation sector. For example, researchers with accessto education data were able to discover that student at-tendance in low-income areas increased when in-schoolmeals were provided. Negative externalities, however,can include intrusive surveillance by the government ortargeted pricing by corporations arising from a comfortof sharing one’s information. As a result, the economic

value of one’s personal data continuously changes de-pending on context and how willing other people are toshare their personal information.

3.2.3 Existing Models of Privacy

The constitutionality and contextual dependence of pri-vacy provide a considerable challenge in formulatingone set of standardized regulations on the conditions un-der which personal information can be shared and themethods by which to share and monitor the data. Thetwo main components of international privacy regula-tions are guidelines first, on how to protect the data and,second, on how to enforce privacy protection. Interna-tionally, the three common models of privacy protec-tion can be described as i) the ”Command and Control”Model, ii) the Self-Regulation/Sectoral Model, and iii)the Co-Regulatory Model.[25] The Srikrishna Commit-tee assessed the three models and concluded that the Co-Regulatory Model was appropriate for India as its vary-ing levels of government involvement and industry par-ticipation can be molded to the Indian context. The Com-mand and Control Model, also known as the Compre-hensive Model,[3] includes a general law that regulatesthe collection, use and dissemination of personal infor-mation in the private and public sectors, governed by anoversight body.

4 The Data and Data Collection

In this section we review the sources of our data, bothin terms of use of the current tools for collecting data,through eGov, and our decisions about both site and datatype selection.

4.1 e-Governments Foundation, Currentinstallations and Digital Services

The eGovernments Foundation (eGov) develops plat-forms that enable city and state governments to improveaccountability, transparency, and efficiency of the deliv-ery of citizen services. The eGov platform is designedto aid in the management of four categories of govern-ment information: administration, revenue, expenditure,and citizen services. While administration and expendi-ture modules account for employee management, legalcase management, payroll and pensions, assets, and soon, revenue and citizen service modules mainly includetax evaluations and registrations filed by citizens. Rev-enue sources include collection of property tax, watertax, trade licenses, advertisement tax and fees from gov-ernment land and estates while citizen services includebirth and death registrations, marriage registrations, an

7

Page 8: How Public Should Public Data Be? Privacy & E-governance

online citizen portal, public grievance registrations, andbuilding plan approvals. The platform allows municipalofficials to enter information and view individual and cu-mulative data on quantitative and geospatial dashboards.The digital actions of each employee are logged in orderto monitor performance and accountability. It also pro-motes citizen engagement by interfacing with an onlinecitizen portal and mobile app where people can submitand view the status of their applications and registration,improving transparency and accessibility. EGovs clientsinclude but are not limited to the state of Andhra Pradesh,the state of Punjab, Greater Chennai Corporation, and thestate of Maharashtra.

4.2 Site and Module Selection

For this study, we chose research sites located in a singlestate. For confidentiality reasons, the identity of the stateis kept private. Within these states there are over 325 ofULBs. In the state we studied, the ULBs we chose arenow equipped with the eGovs platform. ULBs are clas-sified into three types by population: Nagar Panchayatshave a population of less than 100,000 people, munic-ipalities have populations greater than 100,000 people,and municipal corporations have populations of greaterthan one million people. For this study, we chose twomunicipal corporations and one municipality to constructa representative sample. Administratively, the Directorof Municipal Administration (DMA) oversees the eGovimplementations in all ULBs.

Within a state, the DMA, who provides state leveloversight for support services in the municipalities, man-ages the Additional Director, Joint Directors, and As-sistant Directors who oversee various aspects of all mu-nicipalities. Then, each municipal corporation houses acommissioner who, with the ULB mayor, provides ad-ministration and governance of the operations of eachdistrict. Each ULB is assigned a commissioner depend-ing on which district it resides in. The Commissionerdefines access controls for employees and can monitoremployee performance. Within every ULB, there existthe Administration, Revenue, Accounts, Public Healthand Sanitation, Engineering, Town Planning and PovertyAlleviation departments. Each department is responsiblefor processing certain modules classified under Expen-diture and Revenue. All departments, however, are re-sponsible for the Public Grievance module depending onfactors discussed in Section 4.2.3.

4.2.1 Classification of eGov Modules

Of the four categories of eGov modules, the Revenue andand Citizen Services modules are public-facing and rel-evant to citizen data collection and citizen service deliv-

ery. These subset of modules can be grouped into fourtypes based on what kind of information they may revealabout a citizen. First, modules may be revealing of per-sonal identity, as they contain highly sensitive personalinformation. Birth and Death Registration, Marriage Li-censes, and the Citizen Portal fall into this category. Sec-ond, modules such as Water Charges, Property Taxes,and Building Approval are revealing of personal assets.Third, the Trade License and Advertisement Tax mod-ules are examples of modules that are revealing of a cit-izen’s commercial assets. Lastly, the Public Grievancesmodule forms its own category, as its function does notnecessarily require citizens to reveal sensitive personalinformation.

4.2.2 Modules Selected for This Study

The Water Charges Module, Property Tax Module, andPublic Grievance Module (PGR) were chosen for thisstudy. They are some of the earliest implemented eGovmodules at the sites we visited, and so produce a largevolume of transactions.

4.3 Data Available for Analysis

The Property Tax (PT) and Water Charges modules of-fer various services. For example, in the Water ChargesModule, citizens can apply for New Connection, Re-Connection, Closure of Connection, and so on. For bothmodules, the new property tax assessment and new wa-ter connection applications and workflows are fairly rep-resentative of and the most comprehensive of all of theservices in their respective modules. As such, we refer tothe new property tax assessment and new water connec-tion workflows as the generalized Property Tax Moduleand the Water Charges Module, respectively.

Each of the three selected modules have their ownworkflow. Once a citizen submits a form or a request,all of the information that they have submitted is passedthrough various levels of hierarchy in the appropriate de-partment.

4.3.1 Property Tax Module

The Property Tax Module includes services to evaluateor change property tax. While the representative ser-vice is New Property Assessment, the module also in-cludes services like Transfer of Title, Bifurcation, Addi-tion/Alterations, Revision Petitions, Demolitions, and soon. The module requires the applicant to give owner de-tails, property address details, assessment details, ameni-ties, construction details, floor details, details of sur-rounding boundaries of the properties, court documents,and vacant land details if applicable.

8

Page 9: How Public Should Public Data Be? Privacy & E-governance

The quantitative evaluation of property tax paymentdepends on Usage, Classification, Zone, Age, and Oc-cupancy Type data fields. Application particulars, suchas contact details and address are important in verifyingpersonal identity and assets. Once a citizen submits anevaluation request, the data is verified by a Junior SeniorAssistant, then send to a Bill Collector and Revenue In-spector who verify details and conduct site visits. A rev-enue officer validates the evaluation, at which point theapplication must be approved at the Commissioner Levelin order to be complete. In smaller ULBs, two or moreof these functions may be completed by the same official.In larger ULBs, the process may be less uniform so thatwork is spread across multiple officials in the same levelof hierarchy.

4.3.2 Water Charges Module

The Water Tax module, which the Engineering depart-ment manages, includes services to evaluate or changewater tax payments. The fields that are essential forthe evaluation of water tax are Zone, Uses Type, WaterSource, Pipe Size and where it is applicable the WhiteRation Card. If the resident holds a White Ration Card,that means they are eligible for subsidies. In that case,the name and address become important for verificationpurposes, that the person holding the white ration card isthe one living at the property. The Property AssessmentID must also be provided in the application, where allof the information from that Property Tax (PT) assess-ment is available to the officials in the Water Chargesworkflow. Other services in the module include Changeof Usage, Closure of Connection, Reconnection Service,and Additional Water Tap Connection. Similar to the PTmodule, application particulars are necessary as well forverification.

The workflow generally looks similar to the PT mod-ule, where a Junior/Senior Assistant verifies applicationdetails, Assistant Engineer does a field verification andfeasibility testing, Deputy Executive Engineer/ExecutiveEngineer/Superintendent Engineer scrutiny the estima-tion details, and the Commissioner approves the evalu-ation.

4.3.3 Public Grievances Module

The Public Grievance Module allows citizens to sub-mit a complaint to the municipality about complaintslike sanitation issues, stray animals, illegal businesses,non-functioning of street lights, complaints regardingschools, voter lists, and so on. The module maps to a mu-nicipal administration department depending on the typeof grievance submitted. This module requires the citizento input contact details of the citizen and grievance de-

tails including location and photos.Once the complaint issubmitted and reaches an official in the relevant depart-ment, the official has a certain number of days by whichhe must address the issue. These number of days, calledService Level Agreements or SLAs, are unique to theIndian state where we carried out our site visits. If anofficial does not complete his task within the given SLA,then the task will be escalated to the next level in the hier-archy. This accountability model promotes transparencyand improves efficiency.

5 Analysis and Results

In this section we discuss both our analysis of the dataand the results we derive from it, with respect separatelyon collection minimization, loss of data integrity, anddata disclosure.

5.1 Understanding Data Use

While every data field collected in each of the three mod-ules is visible to every official involved in the workflowof the module, officials do not necessarily need all thedata available to them in order to complete their task. Tounderstand the data viewable by each official, we beganby creating a binary matrix to reflect which official actu-ally used each piece of data. That let us conclude whichofficials actually needed access to each piece of data,and in particular what percentage of officials needed thatdata. We realized that there was more to learn in a matrixof data types and officials. In particular, our next ques-tion of the data was to label each cell with one of fourcharacteristics: (1) whether the data field mandatory forthis official to do their job; (2) whether the field is usedby this official for operational use; (3) whether the fieldis used by this official for administrative use; or (4) it isunused.

A data field can be utilized operationally or adminis-tratively, or not utilized at all. An official uses a datafield operationally if his role cannot be completed with-out access to that data field. On the other hand, an offi-cial uses a data field administratively if he must legallyor in some circumstances be able to view the data field,even if he does not directly need it to complete his role.This allowed us to aggregate the information into Fig-ures 1, 33 and 6, for the public grievances, property tax,and water modules, by computing the percentage of offi-cials for whom each of categories (1) through (4) are true.This analysis leads to the observation that data collectioncould be reduced without any impact of operations andresponsibilities of the officials.

The matrix also enabled us to summarize the dataalong the other axis by collecting the percentage

9

Page 10: How Public Should Public Data Be? Privacy & E-governance

of data items that fall into each of our four cate-gories above for each official. Thus, for examplein the Water charges module, the officials are Ju-nior/Senior Assistant, Assistant Engineer, Deputy Exec-utive/Executive/Superintendent Engineer, and Commis-sioner. A Junior/Senior assistant, for example, in the wa-ter charges module only verifies application particulars,such as checking that the address details match those inthe property assessment ID or that the name on the ap-plication matches the name on the white ration card, ifsubmitted. According to our site visits, connection de-tails are irrelevant to the Junior/Senior assistants. In thiscase, only the data fields included in Application Par-ticulars and the white ration card are used operationally.The rest of the data fields are not used at all to completethe role designated to the Junior/Senior Assistant. TheDeputy Executive/Executive/Superintendent Engineer isthe main decision-making authority in an evaluation andso reviews all details of the application. Hence, we couldcompute the percentage of fields that are mandatory, usedfor operation use, administrative use, or not used at all.

In the property tax module, at the Commissioner level,however, most of the data is used administratively. TheCommissioner must legally be able to view all of thedata, as he is the final approver for all evaluations. Inmost cases though, he does not check application details,as the details have been verified multiple times during thework flow, and mistakes in evaluations of small proper-ties do not significantly affect the municipal revenue. Ifan evaluation for a large commercial complex is submit-ted, then he may check that the evaluated tax value corre-lates with the size or geographic location of the property.In this case, he will operationally use information aboutthe building details, photo of the property, and owner de-tails. We then graphed this data in Figures 2, 5 and 7 forthe public grievances, property tax and water modulesrespectively. From this analysis, one is led to the obser-vation that different officials only need access to subsetsof the data, thus leading to reduce purpose driven accessfor each official to only those fields they need.

The above led to determining which data fields arenecessary, collected for efficiency or accuracy, or unnec-essary as indicated in Figure 8. This analysis in conjunc-tion with the results above led to our final graph, Figure9, in which we summarize the data utility of specific datato specific officials, overall. It is this final graph led usto summarize the overall utility of the data collected andused across all three of our modules, water charges, prop-erty tax, and public grievances.

5.2 Understanding Implications of Collec-tion Minimization

The current online forms were transcribed from the pre-viously existing paper forms. Pre-digitization, as muchinformation as possible was collected from the citizen,even if some fields seemed unnecessary. Privacy was stillconserved as data was not easily searchable, and citizenswere fine with giving up more information so that theywould not have to spend more time and money to re-turn to the municipal office again to give more details. Inthe digital era where data is searchable and more easilyaccessible, collecting unnecessary data jeopardizes per-sonal privacy.

From talking with various officials and gaining an in-depth understanding of the workflow, we understand thatall citizen data collected for each module can be catego-rized broadly into three categories: necessary for com-pleting the function of the module, collected for effi-ciency/accuracy of the workflow, and unnecessary to themodule (see Figure 8). Necessary use means that a datafield is either required for a quantitative valuation or forpersonal identity or asset verification purposes. In WaterCharges for example, the attributes Property Type, UsageType, Pipe Size, Water Source Type, Connection Type,and Address are the only factors used to quantitativelycalculate the water tax. Attributes such as Property As-sessment number, White Ration Card, and Name of Ap-plicant are used to verify details of the request, and so areimportant to validate the request. Data fields collectedfor efficiency/accuracy are not necessarily needed in or-der to complete the function of the module, but give of-ficials a clearer picture of the application. Some of theseattributes include information that can be observable onthe site, such as amenities or details of surrounding areasfor the property tax module. Other information makesestablishing contact with the applicant easier, such as thename and phone number of the citizen filing a publicgrievance. Lastly, there are data fields that are unnec-essary to complete a function, but are still collected. InPGR for example, the address of the citizen complain-ing is not necessary to either contact him or address thegrievance. For Water Charges and Property Tax module,email is not used to contact or give updates to the citizen,as the status of an application is communicated verballyor through the citizen portal or app.

5.3 Understanding Implications of Loss ofData Integrity

In this section we focus on several sorts of implicationsthat derive from loss of data integrity. We being by con-sidering privacy implication and the consider the func-tional and financial implications of loss of data integrity.

10

Page 11: How Public Should Public Data Be? Privacy & E-governance

The reasons for loss of integrity are left to different work.

5.3.1 Privacy Analysis I: Loss of Data Integrity

According to NIST,[21] the loss of data integrity is de-fined as data being altered in an unauthorized mannerduring storage, processing, or in transit. In order to un-derstand the implications if data integrity is lost for eachdata field in the three modules, we built an Implicationsmatrix. This matrix is too large to present in a paper,so we discuss it here. We determined three types of im-plications that the loss of integrity may have: cultural,financial, and functional. Cultural implications relate towhat social inferences can be made about a person from aparticular data field. There would be a financial implica-tion if the citizen is affected financially, and a functionalimplication if the function of the module is unfulfilled.

5.3.2 Functional and Financial Implications of Lossof Data Integrity

For each data field in each of the modules, we evaluatewhat kind of implication may result if only that particu-lar data field was altered in an unauthorized manner. Forexample, if only a person’s name in Water Charges mod-ule was changed, and the name on the application doesnot match the name in the property assessment anymore,then the water tax evaluation would be stalled. As a re-sult, the loss of integrity of the Name attribute in the Wa-ter Charges module would have a functional implication.In the case of Water Charges and Property Tax, finan-cial implications and functional implications go hand inhand. If a tax evaluation is incorrect, then the citizen isfinancially effected.

From the Implications Matrix, we understand that theloss of integrity is likely to have functional and financialimplications. The loss of integrity can affect the verifi-cation of personal identity or assets, the quantitative val-uation of the water or property tax, or hinder efficiency.As described in Section 6.2, necessary data fields can beseparated into those required for verification and thoserequired for the quantitative calculation that is the outputof a module. If the integrity of either type of necessaryfield is lost, then by definition the function of the modulecannot be completed. There may be an over-valuation orunder-valuation of a property or water tax if the factorsdetermining them are changed. This poses a financialimplication. A functional implication is the function be-ing stalled if the identifying information of a person, hisassets, or a public grievance are inaccurate.

If the data fields collected for efficiency/accuracy arecorrupted, then the function of the module may not bestalled, but may be severely hindered. If a GrievancePhoto was changed to a different image, a functionary

would still be able to find the location of the grievanceby the Grievance Details or contacting the complainer,but his job would likely be severely sidetracked by an ir-relevant image. Therefore, we understand that protectingthe integrity of citizen data fields is important in order toprotect against harmful functional and financial implica-tions.

5.4 Understanding the Implications ofData Disclosure on Privacy

In this section we consider the effects of data disclosureon privacy. We begin by considering who will be im-pacted, and the relative severity of that. We then discussthe financial and cultural implication, and conclude thissection with a discussion of the challenges of inferenceover the grievance data.

5.4.1 Privacy Analysis II: Loss of Data Confidential-ity

According to NIST, the loss of data confidentiality meansthat protected data is accessed by or disclosed to an unau-thorized party. Similar to the Implications matrix for theloss of integrity, we built an Implications matrix for theloss of data confidentiality. For each data field, we eval-uated what kind of implication may arise from the datafield being exposed. At a high level, we can separatetwo dimensions of a citizen service: the type of servicebeing requested/offered; and who is the service being of-fered to/offered by. The implications of the loss of con-fidentiality are the gravest when both aspects are leaked;meaning an unauthorized person knows not only whatthe service being offered is, but also who is being served.The type of implications can be functional or cultural.

5.4.2 Financial and Cultural Implications of DataVoluntary or Involuntary Disclosure

The next question we asked of the data was whetherwe could understand and categorize the types of impli-cations from loss of confidentiality of the data. Whenwe analyze the types of grievances and their implica-tions, we find that some are solely financial. Figure 10presents a subset of these that we found among the data.Typically, these complaints were made either by individ-ual citizens or business, but about other businesses, butwe note their financial implications. If a restaurant hascomplaints about incorrect slaughtering or garbage dis-posal, whether or not it is true, the restaurant may suf-fer financially. In contrast, we also found some kinds ofgrievances that were solely cultural. If a citizen com-plained about noise at night, it was likely to be aboutanother citizen at home or walking down the street. As a

11

Page 12: How Public Should Public Data Be? Privacy & E-governance

third category, we found some complaints that had bothfinancial and cultural implications. Examples of thesecan be found in Figure 11. Noting that generally, finan-cial only losses would derive from a complaint againsta business, a citizen complaining about another citizenwould lead to a cultural loss, and a business complainingabout a citizen would lead to a combination of culturaland financial loss, we tabulated the number of each typeof relationship, as shown in Figure ??. In addition, inthat figure we noted the number of complaints where therespondent to the complaint would be the ULB, notingthat that exposure there would have neither cultural orfinancial implications directly.

5.4.3 Grievance and Inference

For the PGR module, the types of inferences that can bemade from the loss of confidentiality can be determinedthrough an understanding of the actors and their roles.For a public grievance, there is a complainer, and the en-tity that must take action to fix the complaint. The com-plainer can either be a citizen or a business, and the entitywho must take action to fix the problem can be anothercitizen, business, or the municipality. For example, a cit-izen may submit a complaint about the non-functioningof street lights. In that case, the municipal administra-tion must take action. However, if the complaint is aboutthe illegal slaughtering of animals, then the complainercould have been a citizen who may have been affectedby the business or a legal competing business. The entitythat must fix the issue is the illegal business.

From analyzing the two actors for the 110 types ofpublic grievances published by the on-site’s government,we found that we can assign certain types of implicationsdepending on who the two actors are. We notice that cit-izen on citizen complaint has cultural implications, citi-zen or business on business has functional implications,and business on citizen has cultural and financial impli-cations. As shown in Figure 12, most complaints mustbe fixed by the ULBs, likely regarding public infrastruc-ture and health and sanitation issues. The majority ofcomplaints that have implications come from complain-ing about a business, resulting in financial implications.If a business is affected by a complaint and the entity whohas complained is exposed, the complainer may endurefinancial consequences due to bias against them from thebusiness. In contrast, complaints against citizens tendto result in cultural implications, as the person againstwhom the complaint has been lodged may develop polit-ical, social, or other types of cultural biases against thecomplaining entity.

5.5 Risks from combining dataAs background, we also reflect here on the inherent risksnot only from individual data items, but also from com-bining data in various ways. To do that, we first con-sider the risks to privacy that derive from combining datawithin a database in order to expose a richer picture ofthe individual and then consider briefly the risks asso-ciated with combining data from different sources. Weconsider these in light of two important documents, thecollection of papers edited by Lane et al. [20] and the USNational Institutes of Standards and Technology’s reporton guidance for protecting the confidentiality of person-ally identifiable information. [21] The first includes anumber of papers on the issues of privacy in the contextof Big Data across the board, and the second, by focusingon a framework, processes and procedures for improvingthe protection of confidentiality, highlights a number ofthe concerns we raise here.

5.5.1 Risks from data within a single database

In order to make our discussion of privacy risks moreconcrete, we begin by focusing on the public grievancedatabase, the data collected there and the risks associ-ated with that data. In just this one database, the datacan be divided into three categories: data about the per-son submitting the grievance, data about the location ofthe subject of the grievances, and data about the actualsubject of the grievance. The submitter of a grievanceincludes several personal pieces of information: name,mobile phone number, and email. The person’s name it-self carries significant information about the individualbeyond being just a name for the person, including lan-guage, religion, and caste. Thus simply including a nameindicates significant cultural information about the per-son. Email and mobile phone numbers are used acrossmany databases both inside these ULB operations andin other non-governmental settings, such as help desks,blogs, and so forth. In addition to the exposure of per-sonal and possibly private cultural information, the com-bination of these three pieces of information can lead tocreating a significantly richer picture of the individual.Furthermore, integrity violations of this data can lead toincorrect and therefore perhaps additional risky assign-ment of attributes to the individual. If an email addressis corrupted into one that was used by someone else forposting on a blog, the original person may be incorrectlyinferred as having made the posts, as an example.

When one considers location information such as lo-cality, address and zone/ward/block the subject may beat risk for other sorts of inferences that may be privacyviolations. First, location information can lead to infer-ences, about political perspectives by knowing the localein which the person presumably votes, other personal

12

Page 13: How Public Should Public Data Be? Privacy & E-governance

cultural assumptions based on the location, and financialassumptions because people often live near other peopleof similar financial status. Second, inclusion of locationinformation with a name can further identify the individ-ual. Names may not be unique, but names in locationsbecome increasingly unique. Thus, location informationboth alone and in conjunction with name can lead to yetfurther privacy violations. Finally, if a person submits agrievance about, for example, a street light being out ata particular location, that is a clear indication that thatperson was at that location after dark, leading perhaps toexposure of private information if the data is made pub-lic. In addition, integrity violations, as mentioned above,can lead to incorrect and perhaps unwanted or even dan-gerous inferences about the individual.

Loss of confidentiality or integrity in the data aboutthe subject of a grievance poses yet further risks, in thiscase, to the subject of a grievance. Consider a grievanceabout sewage overflow at a location. This may lead toinferences about the types of activities happening insidethe location, water usage, with possible significant im-plications in areas of water shortage, and so forth. Suchinformation could lead to denial of water access, leadingto both functional and financial implications based on ei-ther loss of confidentiality or integrity of the data.

5.5.2 Cross data-base implications: Implications ofprivacy in context

We must also briefly touch on some of the risks posed bythe kind of data ULBs are collecting across the databasesthey support. First, consider that births, deaths and mar-riages must include names, for births parents names, lo-cations, and that that information in general is public,there is an issue of combining things like name and ad-dress from these records with various fields from thegrievances or property modules (property tax, water con-nection, water tax, etc.) to build a richer profile of anindividual than any one database provides. Some of thisinformation is necessary to be public (birth, death, mar-riage records) and some is needed only to achieve thefunction but need not be public (identify of grievancesubmitter). Furthermore, one must consider combiningthis data with other databases such as voter roles, bankinformation, presence and activity on social networks.And the list goes on.

What we note here is that governments that are col-lecting data about their citizens either as sources of infor-mation or subjects of those submissions are at risk withrespect to the data both from loss of confidentiality andloss of integrity in four key dimensions of cultural, func-tional, financial, and privacy itself, especially as infer-ence techniques become increasingly sophisticated.

6 Synthesis

We now synthesize the analysis above into two indicesthat help measure the efficiency of governance and in-formation privacy. Defining these indices helps in high-lighting the tension that arises when trying to maximizeboth indices simultaneously. This tension then carves outa space where innovation can help achieve a high levelof governance efficiency, information privacy, and trans-parency.

6.1 Governance Efficiency Index (GEI)We define Governance Efficiency Index (GEI) as fol-lows:

Governance Efficiency Index (GEI) =Timeliness of Service * Accuracy of Service

GEI is constructed such that it ranges from 0-1, wherea value of 1 denotes highest level of governance effi-ciency.

The definition of Timeliness of Service rests uponwhen a service is considered timely. We consider a ser-vice timely when it is delivered on or before the desiredService Level Agreement (SLA). The ULBs in Indiapublish an SLA for each service they offer, as promisedby the Citizen’s Charter. [4]4 The Timeliness of Servicecomponent is measured as follows: for a given service, itis measured by the fraction of times the service is deliv-ered on or before the SLA over a given unit of time (i.e.,hour, day, month, etc.). For a given group (a divisionwithin ULB, the ULB as a whole), Timeliness of Serviceis measured by averaging the timeliness of the servicesdelivered by the group over a given unit of time.

The definition of Accuracy of Service rests upon whena service is considered accurate. We consider a serviceaccurate when right service is delivered to the right per-son without any rework. The Accuracy of Service com-ponent is measured as follows: for a given service, it ismeasured by the fraction of times the service is deliveredwithout rework over a given unit of time (i.e., hour, day,month, etc.). For a given group (a division within ULB,the ULB as a whole), Accuracy of Service is measuredby averaging the accuracy of the services delivered bythe group over a given unit of time.

GEI is calculated empirically and in real time usingthe ULB level performance data

6.2 Information Privacy Index (IPI)We define Information Privacy Index (IPI) as follows:

Information Privacy Index (IPI) =Right Collection * Right Use * RightDisclosure

13

Page 14: How Public Should Public Data Be? Privacy & E-governance

IPI is constructed such that it ranges from 0-1, wherea value of 1 denotes highest level of information privacy.

We define Right Collection as collection of those datafields that are necessary for delivering the service. Inother words, without collecting these data fields, the re-quested service cannot be delivered. Right Collection ismeasured for a given service or for services offered by agiven group as Necessary Data Fields/Total Data FieldsCollected.

We define Right Use as access of data fields to onlythose (in the ULB) who need it for delivering the service.Right Use is measured for a given service or for servicesoffered by a given group as Number of Data Field ToWhich Access Is Necessary / Number of Data Fields ToWhich Access Is Granted.

We define Right Disclosure as public disclosure datafields that protects personal identity and undesirable in-ference. Right Disclosure is measured for a given serviceor for services offered by a given group as (1 - (Num-ber of Data fields With PII or Undesirable Inference Dis-closed / Total Number of Fields with PII or UndesirableInference)).

IPI is determined based on the analysis of data collec-tion, use, and disclosure policies of ULBs. The real-timevalue of IPI will rest upon the frequency and types ofservice requests a ULB serves.

6.3 Trade-offs in maximizing GEI and IPIand a Space for Innovation

Setting up the two indices and understanding how datacollection, use, and disclosure influence them reveals ar-eas where trade off exist when maximizing the two in-dices.

6.3.1 Data Collection Trade-offs and Need for Inno-vation

In general, the less is the data collected, used, and dis-closed, the better the privacy protection is. Less data,however, does not always help improve the governanceefficiency as the data collected for each service falls inthree categories to greater or lesser extent: Data Nec-essary for Offering Service Xn; Data Collected for Ef-ficiency or Accuracy Purposes Xea; and Data Unneces-sary for Offering Service Xu, which may be collected forlegacy reasons. If we could collect just Xn, it would max-imizing both GEI and IPI; however, presently, serviceprovisioning occasionally require using Xea. The needfor using Xea is arguably different in different sizes ofULBs (e.g., larger municipal corporation vs. smaller Na-gar Palikas) because their environments are at a differentlevels of maturity in terms of adoption of communica-tions technologies, and digitization of identity and as-

sets. Here then is one space for innovation in answeringa question: can we innovate to offer requested service ina timely and accurate manner without collecting Xea andXu?

6.3.2 Data Disclosure Trade-offs and Need for Inno-vation

As discussed above, disclosing data could lead to gener-ating various forms inferences. One may argue that anyinference about individuals and their action is potentially”bad” from the perspective of privacy protection. Iron-ically, however, being able to generate inference aboutunderserved needs is critical to innovating and servingthem. So, not all inference is ”bad”.

From this perspective, we must distinguish betweeninference that is desirable vs. one that is undesirable.At the outset, an undesirable inference may arise whensomeone can be socially, religiously, or in other waysdiscriminated such that their rights are either denied ordeferred. Ultimately, it is the legal framework that is toprotect the use of data for unlawful purposes; however,simply having easy access to undesirable inference in-creases the ability to accelerate unlawful behavior, espe-cially the access to and the precision with which it canbe done.

One form of desirable inference arises when unmet orunderserved needs of a society or segments of a societycan be understood without revealing individual identi-ties. Here then is another space for innovation. To realizethese innovation, one must answer the following ques-tion: How does one disclose citizen data for desirableinference about unmet needs to be met by innovations bynon-state actors?

6.4 Transparency of Governance and ItsImplications for Efficiency and Privacy

The analyses of the two indices, and the concern fortransparency of governance that emerged during our fieldinterviews, indicate that Transparency of Governance,when conceived as separate from data disclosure, is notaffected by the pursuit of higher efficiency and privacy.Transperancy of Governance may be required at threelevels:

1. Transparency to the one requesting service.

2. Transparency to gauge government’s performance

3. Transparency from the perspective of Use Limita-tion, i.e., is the data being used for what it was col-lected for.

As such, for the person requesting service, transparencyof governance should be maximum, meaning, they

14

Page 15: How Public Should Public Data Be? Privacy & E-governance

should be able to access the status of the service re-quested and know how much time is left for service com-pletion, who to contact in case of any questions. Suchcontrolled access does not hamper efficiency or privacy.

Next, for gauging ULBs performance, it is possibleto disclose data that will show how efficient the govern-ment is: the number of services fulfilled, the fraction ofservices fulfilled within SLA, etc. Doing so does not re-quire disclosing any personal information or informationthat leads to undesirable inference.

The final element of transparency about actual data useseems difficult to deliver without incorporating the nec-essary engineering in the digital platform to track datause, and set and enforce policies.

While we believe level 2 of transparency for gaugingULB performance is easier to achieve, further research isrequired to understand how to avoid unintended disclo-sure of data when offering transparency at levels 2 and3.

7 Conclusions, Limitations and Next Steps

7.1 ConclusionsIn this paper, we examine the question of how cities (astate actor) can use citizen data to maximize the gover-nance while protecting the citizens fundamental right toprivacy. We examine this question in the context of threecities in India. Our analysis comes at a critical junctureas privacy has been declared as a fundamental right ofevery Indian by the Supreme Court of India in Augustof 2017, and a highest level committee of Central Gov-ernment led by Retd. Justice Srikrisha is in the processof seeking public comments on the proposed data pro-tection bill. Based on field research performed at eGov-ernments Foundation and thee of their client cities, wehave analyzed the present data flow, use, and disclosurein these cities to define two indices that help us answerthe above question: Government Efficiency Index (GEI)and Information Privacy Index (IPI).

With the help of metadata analysis, we demonstratethat both efficiency and privacy are measurable conceptsin the context of urban governance. Our analysis showshow exactly to measure them. More importantly, we ar-gue that there does exist a tension when trying to maxi-mize both governance efficiency and privacy simultane-ously, and identify the regions of data where these ten-sions are manifest, making this observation more thanjust a philosophical argument. The way to reduce thesetension is to promote innovations that could improvegovernance services and privacy simultaneously. Ouranalysis identifies issues with data collection and datadisclosure, where we must experiment further in promot-ing innovation by non-state actors.

7.2 Limitations

There are several areas where this work requires furtherrefinement. First, the recognition that inference gener-ated from data can be undesirable and desirable, and thishas relationship to innovation requires refinement. Inparticular, defining what is undesirable or desirable infer-ence more rigorously, and then reexamining where suchinferences get generated.

Second, the current definition of GEI incorporatestimeliness and accuracy, but does not say anything aboutat what cost. This is something we need to examine.

Third, the idea of Operational and AdministrativeUses of data may need further classification becauseboth operations and administration have multiple differ-ent purposes. We need to examine whether addming thisadditional granularity teaches us any new lessons.

We will address these limitations as our immediatenext steps.

7.3 Next Steps

We believe the indices defined in this paper may be gen-eralizeable beyond Indian cities and also beyond any par-ticular product for digital governance such as the eGov-ernments product suite. The basis for this hypothesis isthe belief that the component of both indices rest uponwhat is universally acceptable in the realms of urbangovernance (timely and accurate service delivery) andprivacy protection (right collection, use, and disclosureof data). The measurement of these indices, however,may experience different levels of difficulty in differentcities depending upon how mature is their e-governanceparadigm. Anecdotally, we believe the three Indian citieswe studied may be ahead on e-governance as comparedto most other cities of the world, including many in thedeveloped world.

Our next step is to measure these indices with realdata, and at different levels of aggregation: for individ-ual service, for sub-divisions and divisions of ULBs, forULBs, and for the whole state. We expect that this anal-ysis will be revealing in new ways. Computing these in-dices at multiple levels would bring in both the time andspace dimensions. For example, we may now find outthat there is seasonality to how efficient and privacy sen-sitive are the service requests in, say, summer vs. mon-soon. Similarly, one can analyze why a particular serviceis more efficient in one ULB and not another, or how theIPI gets impacted in one ULB because a privacy sensitiveservice gets requested more often. Along similar lines, itcould reveal how a particular division (e.g., Engineering)may have more efficiency and privacy sensitive requestsas compared to other divisions. At the ULB level alsoone might see how two ULBs that have the same policy

15

Page 16: How Public Should Public Data Be? Privacy & E-governance

for collection, use, and disclosure of data, and hence thesame IPI if all else was equal, may differ in IPI becauseof different frequency of service requests given their lo-cations.

Next, we hypothesize here that the goal of providingtransparent governance, construed in a specific way, neednot be compromized by the pursuit of higher governanceefficiency and privacy. One way to examine this hy-pothesis is to construct a third index, Governance Trans-parency Index (GTI) and systematically study how to de-fine it and what are the tensions between transparency,efficiency, and privacy. Finally, the most exciting nextstep is to now take on the study of how to innovate tosimultaneously keep efficiency and privacy high. Thispaper identifies contours of such a study that we will beundertaking next.

Acknowledgements

This work would not have been possible without thework and contributions of Gautham Ravichander and hiscolleagues at the eGovernments Foundation. In addi-tion, we wish to acknowledge research funding as wellas many probing questions from the MIT Internet PolicyResearch Initiative.

References[1] 2011 Census, Office of the Registrar General and Census

Commissioner, Ministry of Home Affairs, Indi. Available athttp://www.censusindia.gov.in/2011census/.

[2] Atal Mission for Rejuvenation and Urban Transformation (AM-RUT), Ministry of Housing and Urban Affaris, Government ofIndia. Available at http://mohua.gov.in/cms/amrut.php.

[3] CIPP Guide: Comparing the Co-Regulatory Model,Comprehensive Laws and the Sectoral Approach.Available at by Privacy Commissioner of Canada athttps://www.cippguide.org/2010/06/01/comparing-the-co-regulatory-model-comprehensive-laws-and-the-sectoral-approach/.

[4] Citizen’s Charterin Government of India, Dept. of Administra-tive Reforms and Public Grievances, Ministry of Personal Pub-lic Grievances and Pensions, Government of India. Available athttps://goicharters.nic.in.

[5] eGovernments Foundation. Accessed athttps://www.egovernments.org on August 12, 2018.

[6] Jawaharlal Nehru National Urban Renewal Mission, Min-istry of Housing and Urban Affairs, Government of India.Available at http://mohua.gov.in/cms/jawaharlal-nehru-national-urban-renewal-mission.php.

[7] Mandate, Ministry of Housing and Urban Affairs, Government ofIndia. Available at http://mohua.gov.in/cms/mandate.php.

[8] NationMaster: India vs United States Education Stats Com-pared. Available at http://www.nationmaster.com/country-info/compare/India/United-States/Education.

[9] Scheme for satellite towns around seven megacities, Min-istry of Housing and Urban Affairs, Government of India.Available at http://mohua.gov.in/cms/scheme-for-satellite-towns-around-seven-megacities.php.

[10] Smart Cities, Ministry of Housing and Urban Affairs, Gov-ernment of India. Available at http://mohua.gov.in/cms/smart-cities.php.

[11] Statista: Statistics and market data on telecommunications. Avail-able at https://www.statista.com/markets/418/topic/481/telecommunications/. Accessed Aug. 14, 2018.

[12] Smart cities mission statement and guidelines, June 2015. Avail-able at: http://smartcities.gov.in/upload/uploadfiles/files/SmartCityGuidelines(1).pdf.

[13] Writ petition (civil) no 494 of 2012. SupremeCourt of India, August 2018. Available athttps://www.thehindu.com/news/national/article19551816.ece/BINARY/RightToPrivacyVerdict.

[14] ACQUISTI, A., TAYLOR, C. R., AND WAGMAN, L. Theeconomics of privacy. Journal of Economic Literature 52,1 (2016). Sloan Foundation Economics Research Paper No.2580411. Available at SSRN: https://ssrn.com/abstract=2580411or http://dx.doi.org/10.2139/ssrn.2580411. 2.

[15] BAROCAS, S., AND NISSENBAUM, H. Big Data’s End Runaround Anonymity and Consent. In Privacy, Big Data, and thePublic Good, J. Lane, V. Stodden, S. Bender, and H. Nissenbaum,Eds.

[16] CURTMOLA, R., GARAY, J., KAMARA, S., AND OSTROVSKY,R. Searchable symmetric encryption: Improved definitions andefficient constructions. Journal of Computer Security 19, 5 (Nov.2011), 895–934.

[17] DATTA, A., FREDRIKSON, M., KO, G., MARDZIEL, P., ANDSEN, S. Use privacy in data-driven systems: Theory and experi-ments with machine learnt programs. In Proceedings of the 2017ACM SIGSAC Conference on Computer and Communications Se-curity (New York, NY, USA, 2017), CCS ’17, ACM, pp. 1193–1210. Available at http://doi.acm.org/10.1145/3133956.3134097.

[18] DWORK, C. Differential privacy. In Proceedings of the 33rd In-ternational Conference on Automata, Languages and Program-ming - Volume Part II (Berlin, Heidelberg, 2006), ICALP’06,Springer-Verlag, pp. 1–12.

[19] HIRSCH, D. D. The law and policy of online privacy: Regula-tion, self-regulation, or co-regulation? Seattle University LawReview 439, 440-441 (2011).

[20] LANE, J., STODDEN, V., BENDER, S., AND NISSENBAUM, H.,Eds. Privacy, Big Data, and the Public Good. Cambridge Uni-versity Press, New York, NY, 2014.

[21] MCCALLISTER, E., GRANCE, T., AND SARFONE, K.Guide to protecting the confidentiality of personally identi-fied information (pii). Special Publication 800-122, NationalInstitute of Standards and Technology, Computer Secu-rity Division, Information Technology Laboratory, NIST,Gaithersburg, MD,20899-8930, April 2010. Available athttps://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-122.pdf.

[22] NILEKANI, N., AND SHAH, V. Rebooting India: Realizing abillions aspirations. Allen Lane, 2015.

[23] NISSENBAUM, H. Privacy in Context: Technology, Policy, andthe Integrity of Social Life. Stanford University Press, 2009.

[24] SINGH, S. Evaluation of worlds largest social welfare scheme:An assessment using non-parametric approach. Evaluation andProgram Planning 57 (August 2016), 16–29.

[25] SRIKRISHNA, B. N., SUNDARARAJAN, A., PANDEY, A. B.,KUMAR, A., MOONA, R., RAI, G., KRISHNAN, R., SEN-GUPTA, A., AND VEDASHREE, R. White paper of the committeeof experts on a data protection framework for india, Nov 2017.Available at http://meity.gov.in/writereaddata/files/white paper on data protection in india 18122017 final v2.1.pdf.

16

Page 17: How Public Should Public Data Be? Privacy & E-governance

[26] STRANDBURG, K. J. Monitoring, Datafication, and Consent:Legal Approaches to Privacy in the Big Data Context. In Privacy,Big Data, and the Public Good, J. Lane, V. Stodden, S. Bender,and H. Nissenbaum, Eds.

[27] SWEENEY, L. Achieving k-anonymity privacy protection usinggeneralization and suppression. International Journal on Uncer-tainty, Fuzziness and Knowledge-base Systems 10, 5 (2002), 571–588.

[28] SWEENEY, L. k-anonymity: a model for protecting privacy. In-ternational Journal on Uncertainty, Fuzziness and Knowledge-base Systems 10, 5 (2002), 557–570.

Notes1The book of Nilekani and Shah [22] is one of many references on

this work, but provides a solid, deeply knowledgeable and introspectivereview of these developments.

2The nine privacy principles identified by the Supreme Court are:Notice, Choice and Consent, Collection Limitation, Purpose Limi-tation, Access and Correction, Disclosure of Information, Security,Openness, and Accountability.

3Figure 4 contains the complete list of fields represented verticallyin Figure 3.

4Not all ULBs provide such SLAs.

17

Page 18: How Public Should Public Data Be? Privacy & E-governance

Appendix: Figures

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Name*

MobileNo*

Email

Address

Selectfromtopgrievancetype

GrievanceType*

GrievanceDetails*

GrievanceLocaIon*

Landmark(ifany)

%COLLECTION %OPERATIONALUSE %ADMINISTRATIVEUSE %UNUSED

Figure 1: Public Grievance Data Usage

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Ci/zen(Call/Website/MobileApp)

HelpDeskOfficer(ULBOfficial)

GO ULBOfficialFirstLevelEscala/on

%COLLECTION %OPERATIONALACCESS %ADMINISTRATIVEACCESS %UNUSED

Figure 2: Public Grievance Data Access

18

Page 19: How Public Should Public Data Be? Privacy & E-governance

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%100%

CategoryofOwnership*PropertyDepartment

AadharNoOwnerName*EmailAddress

Guardian*ZoneNo*

ElecJonWard*WardNo*DoorNo

IscorrespondeceaddressdifferentfrompropertyOccupaJonCerJficateNo

OccupancyCerJficateDateElectricity

AOachedBathroomsWaterHarvesJng

FloorTypeRoofTypeFloorNo*

NumberofUsage*Occupancy*

ConstrucJonDate*UnstructuredLand*

SurveyNumber*VacantLandArea(inSq.Mtrs.)*

EffecJveDate*LayoutApprovalAuthority*

LayoutPermitDate*East*

South*No*

CourtName*MROProceedingDate*

BuildingPermissionCopyTwoNon-JudicialStampPaperofRs.10

PhotoofAssessment*PaOaCerJficate

WillDeedRegistraJonDocument

DesignaJonRemarks

%COLLECTION %OPERATIONALUSE %ADMINISTRATIVEUSE %UNUSED

Figure 3: Property Tax Data Usage: Complete Y-axis labels in Figure 4

19

Page 20: How Public Should Public Data Be? Privacy & E-governance

Remarks LayoutPermitDate* OccupancyCertificateDate

Employee LayoutPermitNumber* ExtentofSite(Sq.Mtrs.)*Designation LayoutApprovalAuthority* OccupationCertificateNo

Department VacantLandPlotArea* ReasonforCreation*IherebydeclareIhavecheckedallapplicationdetailsanddocumentsuploaded.

EffectiveDate* Iscorrespondeceaddressdifferentfrompropertyaddress?

PhotoofPropertywithHolder

MarketValue(AsperRegisteredDocument)*

PinCode*

RegistrationDocument VacantLandArea(inSq.Mtrs.)*

DoorNo

DecreeDocument PattaNumber* StreetWillDeed SurveyNumber* WardNo*MROProceedings Length* EnumerationBlockPattaCertificate UnstructuredLand* ElectionWard*CopyofDeathCertificate/SuccessionCertificate/LegalHeirCertificate

EffectivefromDate* BlockNo*

PhotoofAssessment* ConstructionDate* ZoneNo*NotarizedAffidavitCumIndemnityBondonRs.100StampPaper

OccupantName Locality*

TwoNon-JudicialStampPaperofRs.10

Occupancy* Guardian*

AttestedCopyofPropertyDocument

FirmName* GuardianRelation*

BuildingPermissionCopy NumberofUsage* EmailAddressTestatorandTwoWitnessesSigned

ClassificationofUsage* Gender*

MROProceedingDate* FloorNo* OwnerName*Date* Construction

TypeWoodType MobileNo*

CourtName* RoofType AadharNoMROProceedingNumber* WallType PropertyType*No* FloorType PropertyDepartmentDocumentType* CableConnection Apartment/ComplexName

South* WaterHarvesting CategoryofOwnership*West* WaterTapEast* AttachedBathroomsNorth* Toilets

ElectricityLift

Amenities

Forwardto

DocumentEnclosedDetails

Documents

DetailsofSurroundingBoundariesoftheProperty

VacantLandDetails

AssessmentDetails

PropertyAddress

FloorDetails

OwnerDetail

PropertyTax

Figure 4: Property Tax Data Categories and Fields: The Y-axis labels for Figure 3

20

Page 21: How Public Should Public Data Be? Privacy & E-governance

0%

20%

40%

60%

80%

100%

120%

MeeSeva/CSC/Online

JuniorAsst./SeniorAsst.

BillCollector RevenueInspector

RevenueOfficer AsstComm,ZonalComm,AddiFonal

Comm,DeputyComm,

Commissioner

%COLLECTION %OPERATIONALACCESS %ADMINISTRATIVEACCESS %UNUSED

Figure 5: Property Tax Data Access

21

Page 22: How Public Should Public Data Be? Privacy & E-governance

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

PTAssessmentNumber*

NameofApplicant

MobileNo

Email

AadharNo

Locality

Address

Zone/Ward/Block

NoofFloors

PropertyTax

ConnectionType*

WaterSourceType*

PropertyType*

Category*

UsageType*

H.S.CPipeSize(Inches)*

PumpCapacity(Litres)

No.ofPersons

P.TaxReceipt

DistributionLineLocationMap

WhiteRationCard

20RsCourtFeeStamp

ApproverDepartment*

ApproverDesignation*

Approver*

Comments

%COLLECTION %OPERATIONALUSE %ADMINISTRATIVEUSE %UNUSED

Figure 6: Water Data Usage

22

Page 23: How Public Should Public Data Be? Privacy & E-governance

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

MeeSeva/CSC/Online

JuniorAsst./SeniorAsst.

AssistantEngineer DeputyExeEngg,ExeEngg,

SupertndtEngg

Commissioner

%COLLECTION %OPERATIONALACCESS %ADMINISTRATIVEACCESS %UNUSED

Figure 7: Water Data Access

23

Page 24: How Public Should Public Data Be? Privacy & E-governance

WaterCharges PropertyTax PublicGrievances

PT Assessment No* Category of Ownership Grievance Details Name of Applicant Name Grievance LocationAddress Property TypeConnection Type* Owner NameWater Source Type* Property Address [all fields]Property Type* UsageCategory* ClassificationUsage Type*. ZoneH. S.C Pipe Size (Inches)* AgeProperty Tax Occupancy TypeWhite Ration Card Floor Details [all fields]

Vacant Land Details [all Document Enclosed Details

Mobile No Moile No NameLocality  Amenities Phone NumberZone/Ward/Block Construction Type Landmark

Pump Capacity (Litres)

Details of Surrounding Boundaries of the Property [all fields] Grievance Type

P. Tax Receipt Grievance photos Distribution Line

Email Guardian Email addressNo. of Persons GuardianRelation Address No of Floors20Rs Court Fee Stamp

Necessary

CollectedforEfficiency/Accuracy

Unnecessary

Figure 8: Data fields sorted by utility for Water Charges, Property Tax, and Public Grievence modules

24

Page 25: How Public Should Public Data Be? Privacy & E-governance

0%10%20%30%40%50%60%70%80%90%100%

WATERCHARGES PROPERTYTAX PUBLICGREVIENCES

Necessary ForEfficiency/Accuracy Unnecessary

Figure 9: Overall Data Utility across Water, Property Tax and Public Grievances

FinancialImplication/Inference

Complaintsregardingrestaurants/functionhalls[PHS]

MisuseofCommunityHall[TownPlanning]

Removalofshopsinthefootpath[TownPlanning]

ComplaintsregardingDispensary[PHS]

Unhygienicandimpropertransportofmeatandlivestock[PHS]

Unauthorized/IllegalConstruction[TownPlanning]

UnhygienicconditionsbecauseofSlaughterHouse[PHS]

Foodadulteration:roadsideeateries[PHS]

Unauthorizedadvt.boards[TownPlanning]

ComplaintsregardingallSanctionedloans[UPA]

ComplaintsregardingFunctionhalls[PHS]

IssuesrelationtoadvertisementBoards[TownPlanning]

Figure 10: Examples of Public Grievances types for which loss of confidentiality leads to financial loss

25

Page 26: How Public Should Public Data Be? Privacy & E-governance

FinancialandCulturalImplication/Inference

Obstructionofwaterflow[Engineering]

Burningofgarbage[PHS

MisuseofCommunityHall[TownPlanning]

IllegalSlaughtering[PHS]

Issuesrelatingtovacantlands[PHS]

Unauthorizedsaleofmeatandmeatproduct[PHS]

IllegaldrainingofsewagetoSWD/Opensite[PHS]

ViolationofDCR/Buildingby-laws[TownPlanning]

UnauthorizedtreeCutting[PHS]

Encroachmentonthepublicproperty[TownPlanning]

Figure 11: Examples of Public Grievances types for which loss of confidentiality leads to financial and cultural loss

Typeofrelationship

Citizen-->Citizen(cultural)

Citizen-->Business(financial)

Business-->Citizen(cultural+financial)

Business-->Business(financial)

OnlyULBAddressable

No.grievancetypesinthiscategory 8 20 5 15 75

Figure 12: Loss of confidentiality can be to either the complainer or the subject; the relationship is related to whetherthe loss is financial or cultural.

26