NWO-MVI Proposal 020908

Embed Size (px)

Citation preview

  • 8/14/2019 NWO-MVI Proposal 020908

    1/8

    Emerging qualityCreating dynamic user and content profiles in online knowledge networks

    Thieme Hennis

    02-Sep-08

  • 8/14/2019 NWO-MVI Proposal 020908

    2/8

    Type, theme, and length of projectClassification: 1A This project can be classified as research treating ethical and societal aspects of concrete

    technological developments.

    A virtual identity has both psychological and economical significance. In my research, the motivation to share

    knowledge and help others is an important aspect of the way the virtual identity of an individual is built. It alsopresumes and supports other economical structures, such as a more flexible employer-employee relationship.

    Theme: Virtual realityThe Internet has become an essential element of many economies and societies. Similarly, we are intrinsically

    attached to and have become part of the Web (Kelly, 2005). The last few years, we have seen an enormous

    increase in people being active online; connecting, creating, sharing, and building up identities. Smart data-mining

    systems are able to create dynamic profiles of people and content representing expertise, relevance, and quality.

    Internet pioneer Wendy Hall describes it as follows;

    Every time you do something on the internet, it is effectively logged, building up this profile that is with you

    for your life. We will be able to build software that can interpret that profile to help get the answer that

    you need in the context that youre in (Smith, 2006).

    Length of research: 4 yearsThe author applies for a single Ph.D. research (length: 4 years).

    Research teamQuality is a socially defined concept. I will try to make it quantifiable by measuring certain use & user relationships

    within decentralized networks. At present, the research team consists of the following people;

    Professor Wim Veen TU Delft Wim has been involved in research into learning and innovation in

    education for many years. He developed a very relevant and useful model concerning networked learning.

    Alpha (sociological-educational theories)

    Dr. Jaco Appelman TU Delft Jaco has been involved in collaboration software research for many years.

    He will assist with methodological and content issues. Beta (collaborative software & systems engineering)

    Job Timmermans, M.Sc. PEERS Job is co-founder of PEERS, has finished master degrees in philosophy

    and systems engineering. His role in the project is to discuss the true practical application of the developed

    system. Alpha (philosophy of quality) & Beta (collaborative software & application interface)

    Research descriptionIn decentralized (virtual) networks with tools and technologies that allow anyone to contribute anything, it is

    increasingly problematic to determine reliability of content and people online. The research I propose must bring

    forward rules and variables that can be used (by for example software engineers) to let quality and expertise

    emerge over time and be visible.

    What represents quality?In the last decade, the internet has evolved into a platform allowing any person to participate and contribute.

    Various easy-to-use web technologies empower people to share interests and knowledge, search and structure

    content, and connect with friends and peers.

  • 8/14/2019 NWO-MVI Proposal 020908

    3/8

    Web 2.0 represents a blurring of the boundaries between Web users and producers, consumption and

    participation, authority and amateurism, play and work, data and the network, reality and virtuality

    (Zimmer, 2008).

    With the increase of online participation, a number of issues have emerged regarding quality, authority, expertise,

    and trust (Keen, 2007). With organizations becoming more open and seeking ways to make use of the contributions

    of people around the world, these issues become even more prevalent (Abbott, 2000). As there are many newtools for publishing and creating new content, there are tools that are specifically made to search, filter, rate,

    evaluate and recommend content to people in certain contexts. Still, f iltering through more and more resources

    hidden online or in internal networks, remains difficult (Benkler, 2006; Howe, 2006).

    Finding the right resources and people

    Search algorithms of popular search engines focus on popularity (or authority) rather than on what is commonly

    regarded as quality. In this process, no human reviews are involvedand thus created in a sub-optimal manner

    (Lewandowski & Hchsttter, 2008). Because similar search algorithms and ineffective content management

    systems are used within organizations, most of the time spent by knowledge workers is spent in recreating already

    existing (and in-house available) information;

    A lot of money and intellectual power is spent on reinventing the wheel and searching for knowledge. This

    is a huge problem for companies and a central challenge for KM research (Swaak, Ifamova, Kempen, &

    Graner, 2004).

    People define quality. Usually this involves relying on others, such as experts or people you trust. This should also

    be the case for the way search engines and content management systems determine quality. In specific; this means

    the inclusion of human reviews and other metadata generated by people (times used, favorited, tagged or

    recommended) in structuring and managing content. In doing that, quality is linked to context and more

    transparent for the user and related to certain context variables (which may be user input variables in search

    engines).

    StandardsA number of initiatives, such as PICS (Resnick & Miller, 1996; Armstrong, 1997) and Resource Profiles (Downes,

    2005) propose protocols or frameworks that can be used to evaluate, rate, or structure online content. Many

    websites have implemented rating and reputation mechanisms to increase transparency and indicate trust in

    content and people. Still, a general standard for online content does not exist.

    Wikipedia co-founder Larry Sanger has recently called for a system for syndicating and rating online data, claiming

    it to be the obvious next step (and Big Idea) for the Internet. It will enable systems to weight data not just on

    Google-style PageRank algorithms, but also things like

    quality according to generally trusted sources; or quality according to your peer group; or quality according

    to academic and academic-endorsed sources; etc. (Sanger, 2008)

    What Sanger proposes, is a system that includes relevant information of a person with the rating in order to add

    context and enrich the information about pieces of content with relevant metadata (such as quality according to

    peer group, evaluations, usage).

    As the Web is populated with more data, it becomes easier to automatically mine these kinds of user and usage

    statistics about people and their behavior online, popularity and interest, friends and activities and turn into

    valuable metadata. For example, APML (Attention Profile Markup Language) and ULML (User Labor Markup

  • 8/14/2019 NWO-MVI Proposal 020908

    4/8

    Language) intend to set standards for capturing and sharing information about people online. When you combine

    this people metadata with active feedback generated by users (through rating and evaluations), profiles of people

    and content can be made automatically (through use) that can be used to increase motivation to contribute and

    share, enhance flexibility for freelance workers and organizations, and improve efficiency in finding people and

    content (Choi, Kruk, Grzonkowski, Stankiewicz, Davis, & Breslin, 2006).

    Hypothesis, research questions, instruments, and methodologyBoth user profile (expertise level & domain) and usage (number of views, clicks, ratings, recommendations, etc.)

    are relevant and should be utilized to determine quality and relevance of content. Furthermore, using this

    information for profiling the original contributor leads to a system

    of dynamic and up-to-date expertise profiles based on the value of

    contributions. The hypothesis I want to falsify is that in virtual

    knowledge networks, findable expert and content profiles can be

    made by analyzing how content is used, and by whom, and linking

    the results to the original contributor. The two fundamental

    assumptions that make up the hypothesis are:

    1. User and usage information determine quality and domain of creations; and2. Quality of creations determine the creators expertise.

    About these two important assumptions, a lot has been written and done. The recent increase of people being

    active online and sharing content allows for complex data retrieval and profiling algorithms for dynamically

    determining quality. How this translates into research is described in the next section.

    Research questions and instruments

    As mentioned, the above assumptions are addressed by various reputation systems and rating, quality, and

    profiling mechanisms. I will first investigate the most important relationships, rules, and (upcoming) standards in

    the generation of metadata about quality, authority, and expertise by these (stand-alone) systems. Concurrently I

    will look into processes of knowledge workers using different types of publishing and rating tools, and find out themost important variables of quality in knowledge networks. These variables are then ordered along different levels:

    i.e. personal (expertise, competencies, passion, etc.), relational (quality of interactions), and informational

    (usefulness of content, reliability source). This first step of literature research and case-studies is inductive, and

    results in a model that will be validated by doing a large-scale survey. Through regression analysis personal biases

    will be filtered out and an empiric foundation will be created for the interpretatively developed model.

    Step Research question(s) or description Instrument /

    Method

    Outcome

    1. CONTENT PROFILES: What are variables and (metadata)standards and initiatives for defining quality of content?

    User-driven: active rating and evaluation

    Machine-driven: measuring usage

    Literature, desk-

    research

    Authoritative paper(s) about

    Metadata standards and

    quality in decentralized

    online networks AND/ORMetadata standards,

    profiling mechanisms and

    authority/expertise protocols

    and rules in decentralized

    online networks.

    2. USER PROFILES: What are variables and (metadata)standards and initiatives for defining expertise of persons?

    User-driven: recommendations etc.

    Machine-driven: determination of authority (based

    on several factors)

    Idem

    3. A first case study will provide insight into criteria,possibilities and constraints of using different tools.

    How can the standards and variables be measured,

    using existing tools?

    Case study:

    interview, survey,

    experiment

    Criteria, possibilities, and

    constraints & toolbox.

  • 8/14/2019 NWO-MVI Proposal 020908

    5/8

    4. Using the outcomes of the three steps, I will describe the most important variables and requirements fordetermining quality and expertise in online networks.

    How can user-driven and machine-driven metadata about quality of content translate to dynamic expertise

    profiles of content creators (or: How should content-profiles influence user-profiles?)

    How should the expertise-profiles influence content-profiles?

    Additionally, I will clarify the requirements for the case studies and the research that follows, to test the hypothesis.

    These requirements include instruments/technologies used, user-participation, size of network, and more.

    5. VALIDATION MODEL: Does the interrelation of content-metadata and user-metadata in determining quality and

    expertise improve finding of people and resources in

    organizations?

    What are critical success factors?

    Case study:interview, survey

    Framework for measuringsearch quality within

    organizations & Critical

    success factors for the model

    6. Describing the outcomes of the research. Report and functional designfor the proposed system.

    Timeline

    The steps in the above table are ordered chronologically. The timeline below describes the structure in more detail:

    1. Year 1: Step 1, 2, 3 Literature research, creating research framework and quality model and theory,conducting an exploratory case study, preparing further case studies and writing papers.

    2. Year 2: Step 4 & 5 Developing and deploying the model in research communities and evaluation ofmodel. More specifically;

    Describing how different tools are used to create and share information, and how these tools

    define quality/expertise.

    Evaluating and refining the model and theory. This means describing (i) how usage (popularity,

    rating, reviewing, etc.) and users (experts versus laymen) together determine quality of content,

    and (ii) how this translates to the expertise or authority of the content creator.

    3. Year 3: Step 5 Similar to the second, but with more focus on converging research results in order tocreate an improved and more abstract model for quality and expertise in online knowledge networks. The

    two main requirements are that the model functions as desired and that it can be used as a basis for

    creating metadata generating software.

    4. Year 4: Step 6 Describing and finalizing my research: make it useful for practical solutions.Methodology; Grounded Theory

    Because I will develop a new theory about quality based on existing literature and research, the chosen

    methodology is grounded theory. Grounded theory can be described as a research method in which the theory is

    developed from the data, rather than the other way around. It is an inductive approach, meaning that it moves

    from the specific to the more general. Because theories for virtual identities, quality and rating systems, and

    constitutions around the increased empowerment of people are currently taking shape, this is the best approach:

    utilizing it to create a better model.

    Societal impact and valorizationMy objective is to create a system that measures peoples activities and contributions online and automatically

    translates this to a virtual identity (or karma) that can be found by the right persons in the right context. Such asystem allows people to be found and employed more directly and flexibly (Malone & Laubacher, 1998). Depending

    on how efforts are valued and used by community, the virtual identity of the contributor changes. I suppose this

    leads to two things;

    People contribute valuable content to community (otherwise it will not add value to their ID);

    People are more intrinsically motivated to contribute (fun, community feeling) rather than by financial

    reward. Still, the virtual ID forms a bridge to future job opportunities or assignments based on

    (motivation-based) contributions.

  • 8/14/2019 NWO-MVI Proposal 020908

    6/8

    Such a system will change organizational structures, and create a more flexible and free economy, as speculated by

    Pekka Himanen:

    Could there be a free market economy in which competition would not be based on controlling information

    but on other factors an economy in which competition would be on a different level (and, of course, not

    just in software, but in other fields, too)?

    Pekka Himanen; the Hacker Ethic and the Spirit of the Information Age (2001)

    Competition, then, would be then based on the contrary, the sharing of information and resources between people

    and in flexible networks and communities. I know that this is another testable assumption, but that could be done

    in further research. Before we can do that though, we must build the foundation of this system.

    Case study; Sustainable network

    My analysis of quality and expertise in virtual environments (like online communities) will be the basis of PEERS

    Interaction Management System1; software analyzing interaction of users with each other and online content. All

    described relationships, rules, and standards will be built in it , so it can be tested and applied immediately.

    Currently, we are deploying our software at different organizations in different settings. The following will serve as

    exploratory case study in the research;

    Sustainability network (100-250 professionals) consisting of DKA (De Kleine Aarde), Enviu, OSIRIS, and the

    TU Delft Sustainability department (SEPAM faculty). These organizations, concerned with sustainability

    and alternative technologies, have clearly expressed their interest and commitment to contribute and be

    part of the proposed research. I will deploy different software tools within these organizations, and use

    PEERS Interaction Management System to create dynamic exchangeable profiles of people and content.

    They allow users to make use of content and connect with people outside of their own organization. Tools

    and technologies already used by the organizations will be part of the research, if they allow measurement

    of use and users by PEERS IMS.

    1http://aboutpeers.com

    http://aboutpeers.com/http://aboutpeers.com/http://aboutpeers.com/http://aboutpeers.com/
  • 8/14/2019 NWO-MVI Proposal 020908

    7/8

    Works CitedAbbott, V. (2000). Web page quality: can we measure it and what do we find? A report of exploratory findings.J

    Public Health, 22 (2), 191-197.

    Armstrong, C. (1997, May 19). Metadata, PICS and Quality. Retrieved August 10, 2008, from Ariadne magazine:

    http://www.ariadne.ac.uk/issue9/pics/

    Benkler, Y. (2006). Wealth of Networks; How Social Production Transforms Markets and Freedom. New Haven, CT:

    Yale University Press.

    Choi, H. C., Kruk, S. R., Grzonkowski, S., Stankiewicz, K., Davis, B., & Breslin, J. G. (2006). Trust Models for

    Community-Aware Identity Management. Identity, Reference, and the Web Workshop at the WWW Conference,

    2006.

    Downes, S. (2005). Resource Profiles.Journal of Interactive Media in Education, 5.

    Himanen, P. (2001). The Hacker Ethic and the Spirit of the Information Age. New York: Random House.

    Howe, J. (2006, June). The Rise of Crowdsourcing. Retrieved August 10, 2008, from Wired Magazine (14):

    http://www.wired.com/wired/archive/14.06/crowds.html

    Keen, A. (2007). The Cult of the Amateur. New York: Doubleday Business.

    Kelly, K. (2005, August). We Are the Web. Retrieved August 08, 2008, from Wired Magazine (13):

    http://www.wired.com/wired/archive/13.08/tech.html

    Lewandowski, D., & Hchsttter, N. (2008). Web Searching: A Quality Measurement Perspective. In A. Spink, & M. (.

    Zimmer, Web Searching: Interdisciplinary Perspectives (pp. 309-343). Dordrecht: Springer.

    Malone, T., & Laubacher, R. (1998, September-October). The dawn of the E-lance economy. Harvard Business

    Review, 144-152.

    Resnick, P., & Miller, J. (1996). PICS: Internet Access Controls Without Censorship. Communications of the ACM, 39,

    87-93.

    Sanger, L. (2008, July 8). Syndicated Web ratings - an idea whose time has come?Retrieved August 8, 2008, from

    Citizendium Blog: http://blog.citizendium.org/2008/07/09/syndicated-web-ratings-an-idea-whose-time-has-come/

    Smith, D. (2006, May 21).All set for a baby.com revolution. Retrieved August 10, 2008, from Guardian - The

    Observer: http://www.guardian.co.uk/technology/2006/may/21/news.theobserver

    Swaak, J., Ifamova, L., Kempen, M., & Graner, M. (2004). Finding in-house knowledge: patterns and implications. I-

    KNOW04. Graz, Austria: Telematica Institute. Available at https://doc.telin.nl/dscgi/ds.py/Get/File-40767.

    Zimmer, M. (2008). Preface: Critical Perspectives on Web 2.0. First Monday (online), 13 (3).

  • 8/14/2019 NWO-MVI Proposal 020908

    8/8

    Preliminary budgetAs yet, I request the full amount needed to complete this research: 300.000 for a fulltime (4-year) PhD position,

    including research team, logistics and travel support, accommodation and all other expenses.

    Valorization workshopThe valorization workshop consists of 2 parts.

    In November a modest online conference will be held. I will do this using free conferencing and

    collaboration technologies. I will put 4 important questions forward which are addressed with by the

    invited speakers (15 minutes per speaker). A discussion follows with participants with my research and

    research question as the main topic.

    An offline meeting will be held in December with all stakeholders, including individuals from PEERS,

    research committee, and potential cases. Depending on the possibility of having this hosted by an

    institution, a maximum of 1000 is needed to hire office space and arrange beverages.

    Summary for laymen (in Dutch)De laatste 10 jaar heeft het internet zich ontwikkeld met technologien die mensen steeds beter in staat stelt om

    content te maken, toe te voegen, en te beoordelen. Bijna elke persoon kan met behulp van een computer en een

    internet verbinding zijn/haar passies, interesses, en kennis delen, en dat gebeurt dan ook. Verschillende

    mechanismen bestaan om die overvloed aan content te filteren en categoriseren, maar het blijft erg moeilijk om

    online of in virtuele netwerken het kaf van het koren te scheiden. Dit geldt voor zowel voor content (wat is

    betrouwbaar/van hoge kwaliteit?) als voor mensen (is deze persoon echt een expert op dit gebied?).

    De toename online activiteiten van mensen schept naast meer content, ook betere mogelijkheden om deze

    content te structureren en waarderen. Dit kan op verschillende manieren:

    Ten eerste kan het gebruik van content worden gemeten: dit is zowel het passieve lezen, als het actieve

    structureren/beoordelen/evalueren van content;Ten tweede kan worden gemeten door wie de content gebruikt.

    De hypothese is dat door zowel het gebruik als de gebruiker te meten en te analyseren, er hele specifieke en up-to-

    date profielen gemaakt kunnen worden van content en mensen. Deze profielen zijn dynamisch en afhankelijk van

    de activiteiten omtrent persoon of content. Naarmate iemand actiever is, krijgt deze een rijker profiel (hoeft niet

    per se beter te zijn) en naarmate een stuk content meer gebruikt wordt, kan deze beter worden geprofileerd. Zon

    systeem ondersteunt het decentraal en flexibel werken van kenniswerkers in virtuele of open organisaties.