23
Data Mining is becoming Extremely Powerful, but Dangerous N. Kulathuramaiyer, H. Maurer Abstract Data Mining describes a technology that discovers non- trivial hidden patterns in a large collection of data. Although, this technology has a tremendous impact on our lives, the invaluable contribution of this invisible technology often goes unnoticed. This paper addresses the various forms of data mining shedding light on its expanding role in enriching our life. Emerging forms of data mining are able to perform multidimensional mining on a wide variety of heterogeneous data sources, to provide solutions to many problems. This paper highlights the advantages and disadvantages that arise from the ever-expanding scope of the data mining. Data Mining augments human intelligence by equipping us with the wealth of knowledge, empowering us to perform our daily task more effectively and efficiently. As the mining scope and capacity increases, users and organisations are now more willing (acceptable) to compromise privacy as a trade-off for gaining peace of mind and additional comforts. The huge data stores of the master miners allow them to gain deep insights about individual lifestyles, social and behavioural patterns and business and financial trends resulting in a disproportionate power distributions. Is it then possible to constrain the scope of mining while delivering the promise of better life? Introduction As we become overwhelmed by the influx of data, Data Mining presents a refreshing window to deal with the onslaught. Data Mining thus holds the key to many unresolved mysteries

Google is in the Process of Taking Over the World

  • Upload
    tommy96

  • View
    896

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Google is in the Process of Taking Over the World

Data Mining is becoming Extremely Powerful, but DangerousN. Kulathuramaiyer, H. Maurer

Abstract

Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although, this technology has a tremendous impact on our lives, the invaluable contribution of this invisible technology often goes unnoticed.

This paper addresses the various forms of data mining shedding light on its expanding role in enriching our life. Emerging forms of data mining are able to perform multidimensional mining on a wide variety of heterogeneous data sources, to provide solutions to many problems.

This paper highlights the advantages and disadvantages that arise from the ever-expanding scope of the data mining. Data Mining augments human intelligence by equipping us with the wealth of knowledge, empowering us to perform our daily task more effectively and efficiently. As the mining scope and capacity increases, users and organisations are now more willing (acceptable) to compromise privacy as a trade-off for gaining peace of mind and additional comforts. The huge data stores of the master miners allow them to gain deep insights about individual lifestyles, social and behavioural patterns and business and financial trends resulting in a disproportionate power distributions. Is it then possible to constrain the scope of mining while delivering the promise of better life?

Introduction

As we become overwhelmed by the influx of data, Data Mining presents a refreshing window to deal with the onslaught. Data Mining thus holds the key to many unresolved mysteries and age-old problems, whereby the availability of data and the power to analyse presents new possibilities. This paper explores this important technology shedding insights on its tremendous powers and potentials.

According to [Han and Kamber, 2007] data mining is defined as the Extraction of interesting (non trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases. We take a broad understanding of data mining, where we also include other related machine based discoveries such as deductive query processing and visual data mining. Databases include both structured data (in relational databases), semi structured data (e.g. metadata in XML documents) as well as unstructured documents such as text documents and multimedia content. Visual Data Mining refers to the discovery of patterns in large data sets by using visualization techniques.

As an example, data mining has been widely employed for the learning of consumer behaviour based on historical data of purchases made at retail outlets. Demographic data as collected from loyalty cards is combined with behavioural patterns of buyers to enable

Page 2: Google is in the Process of Taking Over the World

retailers in designing promotional programmes for specific customer segments. Similarly, credit card companies use data mining to discover deviations in spending patterns of customers to overcome fraud. Through this, these companies are able to guarantee the highest quality of service to their customers. Another form of mining has been employed in tracing of possible terrorist attacks through the mining of traffic patterns of chatter. A chatter is a electronic signal that is detected on phone lines. [] highlights that a surge in chatter followed by a sudden silence was recorded just before the September 11 incident as well as before the Bali bombing and other similar incidents.

Despite the success stories in areas such as customer relationship modeling, fraud detection, banking, [KDD], the majority of applications tend to employ generic approaches and lacks due integration with workflow systems. As such, Data Mining is currently at a chasm state and has yet to become widely adopted by the large majority [Han and Kamber, 2007].

The subsequent section gives a broad overview of data mining technology, to provide the basis for the ensuing discussions on its impact.

Data mining Technology

Mining involves the extraction of patterns from a collection of data via the use of machine learning algorithms. Sophisticated mining technologies of today integrate multiple machine learning algorithms to perform one or more of the following functions:

a) construct an aggregated or personalised predictive model of systems , events and individuals being studied and supporting decision making by employing these models in a number of ways (extraction of classification patterns)

b) identify similarity/dissimilarity in terms of distributional patterns of data items and their relationships with associated entities (clustering)

c) uncover associational and behavioral patterns based on relationship drawn from transactional data (associational pattern mining)

d) determine trends highlighting both the norm as well as deviations based on observable patterns. (e.g mathematical data modelling)

e) Determine sequential patterns amongst a number events or state of data objects to depict behavioural patterns (sequential pattern mining)

Having access to data thus becomes a powerful capability which can then effectively be harness by sophisticated mining software. The statement by O’Reilly [O’Reilly] that ‘Data is the Next Intel Inside’ illustrates its hidden potency. Data at the hands of credit card companies, will allow them to profile customers according to lifestyles, spending pattern and brand loyalty. Political parties are now able to predict with reasonable accuracy how voters are likely to vote.[Rash,2006]

Data Mining Process

Page 3: Google is in the Process of Taking Over the World

In order to describe the processes involved in performing data mining we divide it into 3 phases: domain focussing, model construction (actual mining using machine learning algorithms), and decision making (applying the model to unseen instances). Jenssen 2002, refers to these phases as Data Gathering, Data mining and Decision Making.

Domain focussingA traditional data mining architecture [Usama, Fayyad 1996] divides the first phase into smaller steps such as pre-processing, selection, cleaning, and transforming the dataset into focussed relations. A well-scoped mining in a well-defined domain area can be characterised by this traditional model.

However in a more complex data mining application [as in Mobasher,2005] this phase (referred to as Data preparation) may incorporate the use of domain knowledge and site structure in discovery of patterns in unstructured data. In this case, the preliminary phase involves activities such as data cleaning, validation of page views and detection of session boundaries. In mining unstructured data such as Web logs, there is a need to effectively identify basic units of user events (may be vague) such as pageviews. These pageviews then need to be grouped together to form sessions which may also have grey boundaries.

We describe this phase as domain focussing because, in mining applications such as Web search and domestic security, this phase itself involves the application of some form of clustering or incorporate intensive knowledge engineering. For search log mining, a model charactering user search behaviour is aggregated. Users behaviour patterns [as described in [Colle, Srivastava ] can be employed to structure search logs into intention-related transactions. For anti-terrorism or domestic security [Anderson], this phase involves associational subject link analysis requiring a deep domain analysis (a great deal manual efforts needed)

Model Construction PhaseThe subsequent phase involves the development of a predictive and or descriptive model based on the application of machine learning algorithms. At this model construction phase a model of generalised patterns are constructed to capture the intrinsic patterns stored in the data. For instance we could have a model of spending patterns of loyalty cardholders, or a descriptive model of SPAM message characterisation.

This (Mining) phase could [Mobasher, 2005] involve the derivation of aggregated usage profiles based on a multidimensional mining of usage patterns organised according to clustered characterisations. Their mining phase employs multiple machine learning schemes to perform transaction clustering, pageview clustering, associational pattern mining and sequential pattern mining in extracting aggregate usage profiles. In the mining described by [Mobasher,2005], a number of sessions over a period of time are combined together to charcterise a user profile.

Clickstream data has been used to model global profiles of buyers indicating details such as the intensity and urgency of the buyer in acquiring a product. [Hofgesang, &

Page 4: Google is in the Process of Taking Over the World

Kowalczyk, 2005] Amazon makes use of clickstream data in this manner to profile users based on transactions of book purchases. Their session identification is simpler in that users maintain accounts and all purchase transactions are bounded by secure sessions. Apart from that, Amazon is also able to employ other meta-data captured in user accounts and other contributions of users (editing, review) to characterise profiles.

We describe this phase as model construction to also highlight the data integration from multiple data sources that is performed in emerging applications. It has to be noted that database matching or integration is performed across all three phases.

As highlighted in [Mobasher] e-commerce applications employs the integration of both user data such as demographics, ratings, purchase histories together with product attributes from operational databases to enable the discovery of important business intelligence metrics.

Decision Making PhaseThe third phase involves the application of the model generated to perform decision making. This is an important phase where profiling and user modelling are then applied to life situations. Simplistic applications of data mining tend to merely employ the model to predict likelihood of events, occurrences, based largely on past patterns. Amazon, for example, is able to recommend books according a user’s profile. Similarly, network operators are able to track fraudulent activities in usage of phone lines, by tracking deviation patterns as compared to standard usage characterisation. User profiling in complex applications can be used as a basis for conviction and used to make further discoveries.

The next section focuses on Web search as a complex form of knowledge discovery where some form of mining is performed in almost every stage within the 3 phases of mining discussed. Google for example employs Spell checking and automatic suggestions (Google Suggest) at the data cleaning stage (incorporating clustering). The advantage of performing mining at this stage allows the filtering of queries and the caching of results to reduce the load on the ‘full-search’ miner.

Web Search as Data Mining

Web Mining can typically be divided into Web Page content mining, Web structure mining and Web log mining (including search log). Traditional search engines utilised web content only for building their index of the Web. Web structure has become important in current search engines which uses web structure patterns to determine popularity of websites. Web log mining has already been addressed adequately in the previous section. Leading search engines of today combine these three forms of mining to provide results that is able to meet users needs better.

The web has emerged as a massive repository of information with billions of web pages, massive collections of multimedia documents, millions of digitised books, decades of financial documents, world news in almost all languages, massive collection of community-tagged multimedia object and the list goes on. Search engines have turned

Page 5: Google is in the Process of Taking Over the World

this repository into a massive data warehouse as well as a playground for automated discovery of hidden treasures. Web Search is thus viewed as an extensive form a multidimensional heterogeneous mining of a largely unstructured data in uncovering an unlimited number of mind-boggling discoveries. The scale of data available is in the range of peta bytes, [Witten] and it much greater than the terra bytes of data available at the hands of large global corporations such as Walmart.

As compared to the Data Mining process described in section [], Web search is a much more complex process. Figure 1 illustrates the scope and extent of mining performed by search engines.

Fig. 1 Extensive Mining of Heterogeneous Data sources

The strength of search engines stems from its absolute control over vast collections of data and the various analytical tools it has. These tools include text processing and mining systems, translators, content aggregators, data visualisation tools, data integration tools, data traffic analyzer, financial trends analyzer, context aware systems, etc. The analytical tools provide alternative mining resources for enhancing the quality of discoveries.

The data at their disposal include, but are not limited to web pages, email collections, discussion groups and forums, images, video, books, financial data, news, desktop content, scholarly papers, patents, geographical data, chronological data, community generated dynamic tagged content (video, music, writings), product offerings, local business data, shared documents, and user profiles. The expanse of data resources is effectively exploited in their ability to support users’ decision-making process, as well as in providing alternative channels for further investigation. Search engines can either

Page 6: Google is in the Process of Taking Over the World

simultaneously or incrementally mine these datasets to provide a variety of search results which include phone contacts, street addresses, news feeds, dynamic web content, images, video, audio, speech, books, artefacts, etc.

By analyzing search history over a period of time, search engines have access to a great deal of insights into lives of presumably ‘anonymous’ searchers. A search query indicates the intent of a user to acquire particular information to accomplish a task that relates to some aspect of his or her lifestyle. This ability to capture intent opens up a great deal possibilities for search engines. The sensitive nature of this data is described in section[]. The global patterns in search query logs to provide insights on the usefulness of particular keyword for an inquiry.

Search traffic patterns is another data source that can be applied to highlight relationships between search terms and events. For instance the number of searches for “Christmas presents” peaks in the early part of the month of December. [Heather Hopkins, 2007] Search traffic data analysis have also been shown to reveal social and market patterns such as unemployment and property market trends (see Trancer, 2007). Apart from that the intentions of global users can be modelled by terms employed in search. A sudden burst of search term frequency have been observed seeking quick answers to questions posed in reality shows, such as “Who wants to be a Millionaire”.[Witten] An emerging paradigm, mashups (see Kulathuramaiyer, Maurer, 2007) together with mobile web services allows the discovery of localised contextual profiles.

Targeted advertisements based on keyword-bidding is currently employed by search engines. In the near future, complex mining capabilities will provide personalised context specific [Lenssen] advertisements. It would be possible via RFID technology, for a user passing by an intelligent billboard, [Google smart billboards] to encounter a highly personalized messages such as ‘Nara, you have not purchased your airline ticket yet, you have only 2 weeks for your intended flight. I know of a discount you can’t refuse.’ This level of user profiling could be achieved merely by utilizing shopping cart analysis, together with cookies and calendar entries. Figure 2 illustrates the layered mining that could be employed to facilitate such a discovery. This is describe by [Kulathuramaiyer and Balke] as connecting the dots, to illustrate the ability to extract and harness knowledge from massive databases at an unprecedented level.

Page 7: Google is in the Process of Taking Over the World

Figure 2: Connected Mining based on Database MatchingThe next section describes emerging forms of complex data mining, which would require to combines many of the above mining functions together and more.

Applications of Data Mining

Environmental Modelling applicationsThere are complex problems for which data mining could be used to provide answers by uncovering patterns hidden beneath layers of data. In many cases, domain focussing has in the past has been the biggest challenge. The layered mining of heterogeneous data as described in the previous section presents new possibilities towards the unearthing of deep-rooted mysteries. As an example, data mining could be employed for the modelling of environmental conditions in the development of an early warning system to address a wide range of natural disasters such as avalanches, landslides, tsunami and other environment events such as global warming. The main challenge to addressing such a problem is in the lack of understanding of structural patterns characterising various parameters which may currently not be known.

As highlighted by [Maurer, et al], although a large variety of computer based methods have been used for the prediction of natural disasters, the ideal instrument for forecasting has not been found yet. As highlighted in their paper, there are also situations whereby novel techniques have been employed but only to a narrow domain of limited circumstances.

Integration of multiple databases and the compilation of new sources of data are required in the development of full-scale environmental systems. As advances in technology allow the construction of massive databases through the availability of new sources of data such as multimedia data and other forms of sensory data, data mining could well provide a solution. In order to shed insights on a complex problem such as this, massive databases that was not previously available need to be incorporated e.g. data about after event situations of the past [Maurer, et al]. Such Data on past events could be useful in highlighting pattern related to potentially in-danger sites. Data to be employed in this mining will thus comprise of both of weather and terrestrial parameters together with

Page 8: Google is in the Process of Taking Over the World

other human induced parameters such as vegetation, deforestation over a period of time.[Maurer, et al]

Domain focussing will be concerned with discovery of causal relationships (e.g using Bayes networks) as a modelling step. Multiple sources of data which include new sources of data need to be applied in the discovery of likely causal relationship patterns. A complex form of data mining is required even at the phase of domain focussing. This will involve an iterative process whereby hypothesis generation could be employed to narrow the scope of the problem to allow for a constrained but meaningful data collection. For complex domains such as this, unconstrained data collection may not always be the best solution. Domain focussing would thus perform problem detection, finding deterministic factors and to hypothesise relationships that will be applied in the model. [Beulens et al, 2006] describe a similarly complex representation system for an early warning system for Food supply networks.

Subsequently, the model construction phase will employ a variety of learning algorithms, to profile events or entities being modelled. As this stage may negate model relationships, domain focussing may need to be repeated and iteratively performed to refine further. The model construction phase should allow the incremental development of a model, based on a complex representation of the causal networks. [Beulens et al, 2006]Model construction phase will explore the use of mining methods such as clustering, associational rule mining, neural networks etc. to verify the validity of causal associations. Once a potential causal link is hypothesised, verification can be done by employing various data mining methods. [Beulens,et al] have proposed a combinations of approaches which include deviation detection, classification, dependence model and causal model generation.

The Decision Making phase will then employ the validated causal relationship model in exploring life case studies. Data Visualisation will need to be employed in such a scenario to contrast between the two clusters. An environment for an interactive explorative visual domain focussing is crucial, to highlight directions for further research. Data mining could serve as a means of characterisation of profiles for both areas prones to disasters and those which are safe. Until the domain focussing is effectively achieved, a semi-automated solution [Pillmann, 2002] may be the best solution. Alternatively software agents could employed to perform autonomous discovery for tasks such as validating causal links.

Medical Applications

We will briefly discuss another form of mining that has a high impact. In the medical domain, data mining can be applied to discover unknown causes to diseases such as ‘sudden death’ syndrome or heart attacks which remains unresolved in the medical domain. The main difficulty in performing such discoveries is in collecting the data necessary to make rational judgements. Large databases need to be developed to provide the modelling capabilities. These databases will comprise of clinical data on patients found to have the disease, and those who are free of it. Additionally non-traditional data

Page 9: Google is in the Process of Taking Over the World

such as includes retail sales to determine purchase of drugs, and calls to emergency room together with auxiliary data such as microarray data in genomic databases and environmental data would also be required. [Li]

Non traditional data could also incorporate major emotional states of patients by analyzing and clustering the magnetic field of human brains which can be measured non invasively using electrodes to a persons’ heads. [Maurer,et al] Social patterns can also be determined through the profile mining as described in the previous section to augment the findings of this system. Findings of functional behaviour of humans via the genomic database mining, would also serve as a meaningful input.

The development of large databases for medical explorations will also open possibilities for other discoveries such as Mining family medical history and survival analysis to predict life span. [Han and Kamber]

Advantages of data Mining

Data mining has crept into our lives in a variety of forms. It has empowered individuals across the world to vastly improve the capacity of decisionmaking in focussed areas. Powerful mining tools are going to become available for a large number of people in the near future. This section describes the advantages of data mining.

Data mining will enhance our life in a number of ways which include the enabling domestic security through a number of surveillance systems, better health trough medical mining applications, protection against many forms of intriguing dangers, and access to just-in-time technology to address most of our need. Mining will provide companies effective means of managing and utilising resources. People and organizations will acquire the ability to perform well-informed (and possibly well-researched) decision-making. Data mining also provides answers through sifting through multiple sources of information which were never known to exist, or could not be conceivably acquired to provide enlightening answers.

DM could be combined with collaborative tools to further facilitate and enhance decision-making in a variety of ways. Data mining is thus able to transforms personal or organizational knowledge which may be locked in the heads of individuals (tacit knowledge) or in legacy databases, to become publicly available. Many more new benefits will emerge as technology advances.

Disadvantages of Data Mining

Having seen the powers of this fascinating technology an its profound impact and influence on our lifestyles, we will now explore the potential dangers of this technology. As with all forms of technology, there is a need to explore both sides of the coin.

In order to illustrate the privacy concerns of data mining, we describe the sensitive nature of web search history. Search history data represents an extremely personal flow of thought patterns of users that reflects ones quest for knowledge, curiosity, desires,

Page 10: Google is in the Process of Taking Over the World

aspirations, as well as social inclinations and tendencies. As such it is not surprising that a large amount of psychographic data such as user’s attitudes towards topics, interests, lifestyles, intent and belief can be detected from these logs. The extent of the possible discoveries has been clearly illustrated by the incidence where AOL released personal search of 658,000 subscribers [Jones, 2006]. This incident has exposed the sensitivity of information at the hands of search engines.

A great deal of knowledge about users is also being maintained by governments, airlines, medical miners, shopping consortiums A valid concern would be that the slightest leak could be disastrous. Figure 3 illustrates the amount of knowledge about anonymous users that could be established by global search engines, via the connection of dots. (see Kulathuramaiyer and Balke 2006)

Fig. 3 Search History can reveal a great deal of information about users

Other forms of mining that may be capable of even more dramatic privacy infringements include Real-time outbreak and disease Surveillance program as an early warning for bioterrorism, [Spice] Total Information Awareness program,[Anderson] and The Automated Targeting System [ATS].

Particularly in these types of applications, another common danger is profiling where there is a possibility of drastic implications based on the mining results such as an arrest. There is a danger of generalizations to be characteristics of factors such as race, ethnicity, or gender, rather than on deeper, more meaningful indicators. Another danger is the prevalence of false positives, where an entirely innocent individual or group is targeted for investigation because of poor decision making. To illustrate the danger of false positives, a reasonable rate of success of 80% was considered for an application such as TIA. [b] This will result in 20% of US citizens (48 million) being considered false positives. [b]

Page 11: Google is in the Process of Taking Over the World

Data mining will empower mining giants to be able to go beyond the ability to PREDICT what is going to happen in a number of areas of economic importance, but actually have the power to KNOW what will happen, hence can e.g. exploiting the stock market in an unprecedented way. They also have the capacity to make judgements on issues and persons with scary accuracy.

Data mining has thus puts in the hands of a few large companies the power to effect the lives of millions by the control it has on the universe of information. The unconstrained expansion of their business scope embodies them with the omniscience to affect our lives.

The next section solutions discuss solutions to constrain the scope and visibility of mining without compromising on the extent of discovery.

What can we do?

In order to avoid the dangers of connecting the dots, two approaches have been proposed which include keeping databases separate and anonymous mining. [Kulathuramaiyer and Maurer, 2007] describe an approach to effectively keep databases separate.

Distributed Specialised mining

In this distributed approach, separate facilities will be adopted for the development of software for document similarity detection. (similar capability is found in search engines) Each distributed site has the responsibility for performing deep but focussed mining of a single domain of specialisation (i.e. Computer Science, Psychology). Facilities such as this can be established in numerous localities throughout Europe and even across the world to effectively address multiple disciplines and languages. This will also address the the current problem with search engines which tend to be too generic. [S.J. Vaughan-nichols] This proposal ensures that no central agency will have an exclusive control over powerful technology and all resources. In order to ensure the neutrality of content, all such facilities will need to be managed by not-for-profit agencies such as universities and public libraries.

Anonymous Mining[Kovatcheva] has a proposed a means of protecting the anonymity of surfers by the use of

Anonymity agents and pseudonym agents as the prevent the need for users to be identified. Their paper also proposed the use of negotiation agents and trust agents to assist users in reviewing a request from a service in being able to make a rational decision of allowing the use of particular personal data.

A similar agent-based approach is highlighted by [Ka Taiplae] via rule-based processing.First, an "intelligent agent" is used for dispatching a query to distributed databases. The agent will then negotiate access and permitted uses for each database. Secondly, data items themselves are labeled with meta-data describing how that item must be processed.

Page 12: Google is in the Process of Taking Over the World

Thus, even if a data item is removed or copied to a central database, it retains relevant rules by which it must be processed.

Value Sensitive Design has been proposed by [friedman] which employs logical modelling to account for human values in a principled and comprehensive manner throughout the design process. Another anonymisation step has also been proposed through a framework by [e].

The main challenge lies in coming up with guidelines and rules such that site administrators or software agents can use to direct various analyses on data without compromising the identity of an individual user. Furthermore, there should be strict regulations to prevent the usage data from being exchanged inappropriately or soldto other sites. Users should also be made aware of the privacy policies of any given site, so that they can make an informed decision about revealing their personal data. The success of such guidelines can only be guaranteed if they are backed up by a legal framework

Conclusion

As data mining matures and becomes widely deployed in even more encompassing ways, we need to become aware on how to effectively enrich our lives. At the same time, the dangers associated with this technology needs to be minimised by deliberate efforts on the part of enforcement agency, miners and the users of the system.

The powers to enhance our lives with the promise of unlimited knowledge, will make the world much more exciting, by opening up numerous possibilities. As the degree of user profiling of BSEs can be mind boggling, drastic actions are required fast. Effective measures are required in curtailing the dissemination of private information. Apart from that international laws need to be in place to ensure a balanced growth and control of resources.

References

Battelle,J., 2005, The Search- How Google and Its Rivals Rewrote the Rules of Business and Transformed our Culture, Porfolio, Penguin Group, New York, 2005

Trancer, B. 2007, July Unemployment Numbers (U.S.) - Calling All EconomistsWebsite:http://weblogs.hitwise.com/bill-tancer/2006/08/july_unemployment_numbers_us_c.html Accessed 17 January 2007

Vise, D.A., Malseed,M., 2006, The Google Story- Inside the Hottest Business, Media and Technology Success of our Time, Pan MacMillan Books, Great Britain, 2006

Witten, I.H., Gori, M., Numerico, T., Web Dragons, 2007, Inside the Myths of Search Engine Technology, Morgan Kaufmann, San Francisco, 2007

Page 13: Google is in the Process of Taking Over the World

S R Anderson , Total Information Awareness and BeyondBill of Rights Defense Committe White paper The Dangers of Using Data Mining Technology to Prevent TerrorismData Mining: where legality and ethics rarely meet..Kelly Shermach, http://www.crmbuyer.com/story/52616.html, 18th January

The technological and social aspects of data mining by means of web server access logs By Elizabeth Kovatcheva ,Helena Tadinen http://www.pafis.shh.fi/~elikov02/SFISWS2/SFIS2.html 18 january

[3] George R. Milne Privacy and ethical issues in database/interactive marketing and �public policy: A research framework and overview of the special issue , Journal of �Public Policy & Marketing, Spring 2000

Hofgesang, P.I., and Kowalczyk,W.,2005, Analysing Clickstream Data:From Anomaly Detection to Visitor Profiling, ECML/PKDD Discovery Challenge 2005 Website: http://www.cs.vu.nl/ci/DataMine/DIANA/papers/hofgesang05pkdd.pdfPoker & Fantasy Football - Lessons on Finding Affiliate Partnerships http://weblogs.hitwise.com/heather-hopkins/2005/11/ 22 jan

K.C. Jones ,2007 Fallout From AOL's Data Leak Just Beginning ,http://www.informationweek.com/news/showArticle.jhtml?articleID=191900935, acessed

http://www.eweek.com/article2/0,1895,2060543,00.aspPolitical Parties Reap Data Mining Benefits By Wayne Rash November 16, 2006eWeek.com enterprise News and reviews accessed 18 janAccelerating the Drug Design Process through Parallel Inductive Logic Programming Data Mining James Graham1, C. David Page2, Ahmed [email protected]://ieeexplore.ieee.org/iel5/8699/27543/01227345.pdfProceedings of the Computational Systems Bioinformatics (CSB’03) 0-7695-2000-6/03 2003 IEEE Computer Society

KDnuggets : Polls : Successful Data Mining Applications (July 2005)http://www.kdnuggets.com/polls/2005/successful_data_mining_applications.htm

K.A. Taipale (December 15, 2003). "Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data". Colum. Sci. & Tech. L. Rev. 5 (2). SSRN 546782 / OCLC 45263753..

David Jenssen, "Data mining in networks." Invited talk to the Roundtable on Social and Behavior Sciences and Terrorism. National Research Council, Division of Behavioral and Social Sciences and Education, Committee on Law and Justice. Washington, DC.

Page 14: Google is in the Process of Taking Over the World

December 11.2002

Data Mining for Personalization. In The Adaptive Web: Methods and Strategies of Web Personalization, Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.). Lecture Notes in Computer Science,  Vol. 4321. Springer-Verlag, Berlin Heidelberg, 2006, to appear. http://maya.cs.depaul.edu/~mobasher/papers/aw06-mobasher.pdf

Advanced Data Preprocessing for Intersites Web Usage MiningDoru Tanasa and Brigitte Trousse, AxIS Project Team, INRIA Sophia Antipolis

[b] Is Big Brother Our Only Hope Against Bin Laden?, Salon.com, Dec. 3, 2002 at http://www.salon.com/tech/feature/2002/12/03/tia/index_np.html

[c] http://archives.cnn.com/2002/US/05/15/inv.fbi.terror/Senator: U.S. didn't connect 'dots' before 9/11May 15, 2002 Posted: 10:04 PM EDT (0204 GMT)

R. Coole, Mobasher b, Srivastava, J, Grouping Web Page References into Transactions for Mining World Wide Web Browsing PatternsProceedings of the 1997 IEEE Knowledge and Data Engineering Exchange Workshop Page: 2   ,1997 ISBN:0-8186-8230-2 IEEE Computer Society

328 See Matthew Fordhal, Researchers Seek to Safeguard Privacy in Anti-terrorism Plan, Seattle Times, July 14, 2003, available at http://seattletimes.nwsource.com/cgi-bin/PrintStory.pl?document_id=135262838&zsection_id=268448455&slug=btprivacy14&date=20030714; see also IAO Report, supra note 88, at A-13 ("DARPA is examining the feasibility of a privacy appliance . . . to enforce access rules and accounting policy.").

http://www.wired.com/news/privacy/0,1848,60489,00.htmlBy Ryan Singel | Also by this reporter02:00 AM Sep, 18, 2003

Bamshad Mobasher, Web Usage Mining and Personalisation, in Munindar P. Singh, editor Practical Handbook of Internet Computing, Chapman & Hall/ CRC Press, 2005http://maya.cs.depaul.edu/~mobasher/papers/IC-Handbook-04.pdf

Advanced Data Preprocessing for Intersites Web Usage MiningDoru Tanasa and Brigitte Trousse, AxIS Project Team, INRIA Sophia Antipolis

see also Gareth Cook, Software Helps Police Draw Crime Links, Boston Globe, July 17, 2003, at A1

See, e.g., Jim Goldman, Google for Cops: Revolutionary software helps cops bust criminals (TechTV broadcast Apr. 12, 2003, modified Apr. 17, 2003), available at http://www.techtv.com/news/scitech/story/0,24195,3424108,00.html

Page 15: Google is in the Process of Taking Over the World

Describes how chatters are detected to determines possible attakshttp://news.bbc.co.uk/2/hi/uk_news/3041151.stmthe Wednesday, 21 May, 2003, 10:12 GMT 11:12 UK

Batya Friedman et al., Value Sensitive Design: Theory and Methods (Draft of June 2003), at http://www.ischool.washington.edu/vsd/vsd-theory-methods-draft-june2003.pdf

A. Beulens, Y. Li, M. Kramer, J. van der Vorst, Possibilities for applying data mining for early Warning in Food Supply Networks, CSM’0620thWorkshop on Methodologies and Tools forComplex System Modeling and Integrated Policy Assessment28 – 30 August, 2006 http://www.iiasa.ac.at/~marek/ftppub/Pubs/csm06/beulens_pap.pdf

Privacy in age of data mining topic of workshop at CMU Friday, March 28, 2003By Byron Spice, Post-Gazette Science Editorhttp://www.post-gazette.com/nation/20030328snoopingnat4p4.asp

Science, Engineering, and Biology Informatics - Vol. 2 LIFE SCIENCE DATA MININGedited by Stephen Wong (Harvard Medical School, USA) & Chung-Sheng Li (IBM Thomas J Watson Research Center)CHAPTER 1: SURVEY OF EARLY WARNING SYSTEMS FORENVIRONMENTAL AND PUBLIC HEALTH APPLICATIONSChung-Sheng Li http://www.worldscibooks.com/compsci/etextbook/6268/6268_chap01.pdf

The Automated Targeting System (ATS) http://www.eff.org/news/archives/2006_11.php

S.J. Vaughan-nichols, Researchers make Make Search more intelligent, Industry Trends in Computer, (Eds) Lee Garber, IEEE Computer Society, December 2006

[e] http://ieeexplore.ieee.org/iel5/9670/28523/01274912.pdf?arnumber=1274912