Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1 | P a g e 2 8
WIGI: Wikipedia Gender Index
Initial community feedback survey
Prepared by: User Masssly 31 May 2015
https://github.com/notconfusing/WIGI https://meta.wikimedia.org/wiki/Grants:IdeaLab/WIGI:_Wikipedia_Gender_Index
2 | P a g e 2 8
Table of Contents
Introduction ............................................................................................................................. 3
Executive Summary ............................................................................................................... 3
Background and Objectives ................................................................................................... 3
Body ........................................................................................................................................... 5
Survey Method ....................................................................................................................... 5
Participants ............................................................................................................................. 5
Survey Instrument .................................................................................................................. 6
Procedure ................................................................................................................................ 6
Analysis .................................................................................................................................. 7
Results .................................................................................................................................... 8
Conclusion …………………………………………………………………………………..13
References …………………………………………………………………………………..14
Appendix ………………………………………………………………………………...….15
Survey Questions and Responses…………………………………………………………15
Links to Survey Data ……………………………………………………………………..32
3 | P a g e 2 8
1.0 Introduction
1.1 Executive Summary
Wikimedia’s gender gap is rife and well documented and statistics on Gender gap of Wikipedia
Biographies is often presented as an observation of trend of editorship with the assumption that
editor-gender and article-gender may be related. However, it is worth moving beyond that to
investigate biography articles for their own sake and analyze the biography gender gap by
variables such as date of birth, citizenship, language, etc. It is even more useful to sample these
data many times over different time periods and view the trends that emerge. Wikipedia Gender
Inequality Index (WIGI) seeks to automate the production and graphing of these statistics in a
publically viewable website with open-data downloads, and at the end of a year (2015) provide
a final report on the observed trends.
Community inputs were solicited from Wikipedia editors to enable the identification of all
variables of concern that might interest researchers or Wikipedia community members to allow
them understand the behavior of gender inequality on Wikipedia articles. Issues identified
varied from articles’ length and number of sources/references contained in gendered Wikipedia
biography articles, to a demand of affirmative action and a position statement from Wikimedia
Foundation regarding the gender gap of both articles and edits and the treatment of women and
minority editors. There also emerged a strong emphasis by respondents to include in WIGI a
measurement of professional occupational fields of Women.
1.2 Background and Objectives
Wikimedia foundation put up the inspire campaign in early 2015 to support innovative ideas
to address gender gap and increase gender diversity on Wikipedia and sister projects. Wikipedia
Gender Inequality Index (WIGI) was among 16 out of 266 projects that was recommended and
4 | P a g e 2 8
approved for funding. WIGI is an Individual Engagement Grant that seeks to automate the
production and graphing of statistical presentation of gender in articles by various variables
such as date of birth, citizenship, occupation, etc. - in a publicly viewable website with open-
data downloads.
The purpose of this survey is to provide initial insights of what kind of information users would
like captured, ideally about the state of Wikipedia biographies. That information will be used
to inform what statistics the WIGI portal will show. It will further provide insights into
respondents experience using current Wikimedia related analytical tools and what they expect
from them. Finally, this survey is an opportunity to engage the community in order to benefit
from their feedback even at the initial stages of the project.
5 | P a g e 2 8
2.0 Body
2.1 Survey Method
The self-administered questionnaire survey method was used to gather information about what
users would like captured about Wikipedia biographies and what statistics they would like the
WIGI portal to show. Surveys provide an efficient method for collecting data from a large
population in order to enhance understanding of expectations of respondents’ about some
ongoing project. (Bascos-Deveza, 2010), (Babbie, 2013). Additionally, web based surveys is
the most commonly used method to collect information about internet based or online
communities such as Wikipedia. (Sax et al, 2003) (Andrews et al, 2003). Therefore, the
researcher determined that self-administered web based survey method would be the most
appropriate due to its versatility and the ability to collect data from a wide range of people from
many locations (Dillman, Smyth, & Christian, 2009). The questionnaire was posted online and
invites sent out to prospective participants to click on the link to respond.
2.2 Participants
The participants in this research were Wikim[p]edians. This population were selected for the
research because they are the users of Wikimedia projects and therefore are in the best position
to determine the type of data they would like captured about Wikipedia Biography articles and
the statistics WIGI should show. Ideally, all Wikim[p]edians were qualified to take the survey
but the link to access it was posted at listservs and project talk pages where users are
particularly interested about Gender Inequality studies in Wikipedia articles. After posting the
links, special email invitations was sent to users who are identified as being active participants
in community discussions about Gender Inequality on Wikipedia.
6 | P a g e 2 8
2.3 Survey Instrument
The survey was developed based on variables relevant to WIGI with input coming from
members of the project. Divided into three sections the survey:
(A) Accessed respondent’s Knowledge of other analytic websites
(B) Asked them what they wished to see or expected from WIGI
(C) Assessed their experience editing Wikip[m]edia
The survey asked questions that enables us measure respondents Knowledge of other analytic
websites. Aside experience in Wikipedia, respondents who have encountered similar analytical
websites will have a better understanding of what WIGI is trying to accomplish. They are in a
better position to give advice based on their experiences.
The remainder of the survey focused on gathering data about what kind of statistics respondents
would like to see on WIGI and the various variables they are interested in measuring.
Finally, the survey asked questions that sought to access respondents experience on Wikipedia
assuming that those who are more experienced about Wikipedia will provide the most useful
answers.
2.4 Procedure
The survey was run for approximately three and half days, between 27 May and 30 May 2015.
The procedure used to reach prospective participants did not meet the requirements of a
Probability or random sampling as such, results and conclusions drawn does not necessarily
infer from the sample to the general population but only serves as insights as to what the
community might be expecting at the end of the day where results of WIGI are presented.
Below is a persons and locations where the survey link was posted and prospective respondents
asked to click on it:
7 | P a g e 2 8
#
Email lists
Personal Email
invites Project Talk pages
User
Talkpages
1
Wikimedia-
GH User:Yantang Li
WikiProject Women's
History EvergreenFir
2 Gendergap User:Weatherby55 WikiProject Gender Studies Dr. Blofeld
3 User:Turn685 WikiProject Women writers
4
User:Scribbling
woman
5 User:Shaslangu
6 User:SPECIFICO
7 User:Stelpa
8 User:Switchercat
9 User:Enock4seth
10 User:Gobonobo
11 User:Nkansahrexford
12 User:Rberchie
13 User:Lambiam
14 User:SlimVirgin
8 | P a g e 2 8
2.5 Analysis
Frequency or univariate tables represent the simplest method for analyzing categorical data and
are often used as a procedure to review how different categories of values are distributed
(Vaughn, 2001). Therefore, the collected qualitative data was sorted into tables, which allowed
for comparison and the calculation of total participant responses, as well as frequency
percentages.
2.6 Results
The survey had a completion mean of 55 percent. A total of 36 surveys were started out of
which 50% were completed. Surveys Started is the total number of responses that have been
collected. This number includes responses that was submitted by the respondent and
incomplete responses that were collected by the system after the survey was closed. Surveys
Completed is the number of surveys submitted by respondents, meaning that the respondent
was screened out of the survey or reached the final page and clicked the submit button. This
number does not include anyone who did not submit a survey. It must be stressed that a
completed survey does not necessarily mean that the respondent answered all the questions.
Respondents to the survey were mostly active and experienced Wikip[m]edians. To gauge their
experience and understanding about Biographies, Editing and Wikipedia in general, the survey
gathered that about eighty-nine percent of respondents have been Wikip[m]edians well over
one year and they edit the encyclopedia at least once a week.
9 | P a g e 2 8
Eighty-percent of respondents edited at least two-three times a week whiles 82-percent stated
they have been editing past two years now.
Concerning present Analytical websites, the survey asked respondents if they had had prior
interaction with the following tools: reportcard.wmflabs.org, datavis.wmflabs.org/where/,
datavis.wmflabs.org/agents/ and www.wikipediatrends.com/. Respondents were generally
familiar with the above listed options whiles others indicated they had gone ahead to use other
tools including http://stats.wikimedia.org/ and http://stats.grok.se/. Seventy-six percent of
respondents said they have used Analytical websites to observe Graphical data trends. Twenty-
nine percent use such websites to decide what to edit on Wikipedia whiles 18% have
downloaded data from them to do their own analysis. Others have also used Analytical websites
to “To share information about trends and the situation right now” and as references for
“Academic publishing”.
10 | P a g e 2 8
Seventy-one percent of respondents thought either that gender inequality on Wikipedia
reflected exactly the gender inequality situation in the real world or that gender inequality on
Wikipedia was far too worse compared to gender inequality in the real world. The way the
survey put the question made it difficult for some respondents to understand. One respondent
stated, “I don't understand question 3 of this survey”. Nonetheless, Seventy-one percent of
responses was enough to suggest that the gender inequality situation on Wikipedia was worse
than it is in the real world. It is also an indication of how bad respondents think of the gender
situation on Wikipedia and their expectation on WIGI to help address that.
When asked what they are most interested in seeing in WIGI and the variables they would like
to be measured in the WIGI portal, eighty-nine percent would like to see profession included.
Only half thought WIGI should include measurement on date of Death. At least seventy-two
percent are interested in the variables of “Dates of Birth”, “Citizenship” and “Ethnicity” and
“Article in which Wikipedia language”. Twenty-eight percent expressed an extra interest in
seeing measurements of the following variables:
11 | P a g e 2 8
# Proposed variable
1 Whether avowed feminist
2 Article length
3 Featured/Good Article status
4 Number of editors on each article
5 Comparison across languages
6
Missing features in article (such as no image, no sources, POV and other problem
templates)
7 Censored edits/Reverts/Edit wars
8 Gender
9
Links to/from articles of the same and different genders (this would show how
much of a walled garden gendered data is)
Respondent’s not only expressed a keen interest to know about Gender Inequality on Wikipedia
but also a willingness to act on it. All respondents (100%) stated they are willing to spend at
least 10 minutes on an Analytical Website to understand Gender inequality on Wikipedia and
eighty-two percent are actually willing to spend several hours improving articles about women.
Such enthusiasm from respondents can probably come by, by them deriving motivation from
the statistics they have been presented with on WIGI.
The survey went ahead to ask respondents what type of insights they would like to gather
broadly from WIGI. Below is a word cloud of what they wanted to gather:
12 | P a g e 2 8
More specifically, below is a table summary of their responses about insights they would like
to gather:
# Insights Respondents wish to gather from WIGI
1 Any
2 How big the problem of Inequality is
3 How to fix the problem of Inequality
4 Where the problem is most offensive?
5 Where there are less problems.
6 How to learn from successful projects.
7 How many known women edited the articles?
8 What articles about women have been put up for deletion and how many times?
9
The intersections of race, class, gender, and sexual orientation. For example, if there is a
wider gap for Black woman than White woman.
10 Article length as a metric of how thorough the article is.
13 | P a g e 2 8
11 Number of sources in an article as a metric of how thorough the article is.
12
With reference to NPOV: how many ANTI-pornography / ANTI-pornography articles are
there? (Note: How can Wikipedia be considered neutral if there is a WikiProject
Pornography and a Pornography Portal but not the opposing POV?)
13
Affirmative action. There needs to be reflection upon the fact that women are again and
again dropped from history writing. This systemic bias needs to be encountered.
14 Information about what professional fields are represented
15 Information about how gender varies in the different professional fields
16
Statistical network analysis of the ways in which men's and women's article reference
each other as links, i.e. a dynamic network diagram that shows the links between men and
women's articles, and to be able to limit the articles in those sets based on categories that
they are in in order to explore representation by profession
17 Causes of inequality
18 Ideas for solutions to the problem of Inequality
19 Coverage of Ancient Greek and Roman women writers.
20
Bias due to lack of articles about women who are scientists, engineers, attorneys, or
members of IEEE or ACM.
Finally, respondents were given the opportunity to leave behind comments at the end of the
survey. Highlighting on some of the comments, one respondent demanded a “position
statement, from the WMF regarding the gender gap of both articles and edits and the treatment
of women and minority editors”. Another respondent suggested a shift in focus of Editathons
on women in Science, Art, and Architecture to women in Sociology, Philosophy, Politics,
Economics and Care professionals. He went ahead to bemoan the lack of projects centered on
childcare and care of the elderly.
14 | P a g e 2 8
Email and the User talk pages are the popular destinations the respondents chose to be
contacted again in relation to WIGI, at 53% and 60% respectively. Eleven, respondents agreed
that we could contact them further. They provided us with their email addresses or Wikipedia
Usernames. Others stated they could always be reached through the Wikimedia Gender Gap
Mailing List. The list of cohorts who will be followed up on to get input on beta releases
throughout the development cycle of WIGI have been excluded from this report.
15 | P a g e 2 8
3 Conclusions
Participant’s to the survey responded with many more variables to be included and displayed
on the WIGI portal than initially envisioned. For the lack of want of time to deliver according
to the timeline, this phase of the project is only able to show a limited number of the most
recurrent proposed variables that would be most useful and beneficial to researchers and the
community at large. Below are the main things the portal will do
View Graphs and Charts
o Two modes:
Current
Changes since last week
o Graphs
By Gender
By Country
By Date of Death
By Place of Birth
By Culture (female % vs. total biographies)
By Language of the Biography
Overall view of citizen and place of birth
World population compared
Gender Range over time
Article size/length (in bytes and word count)
Celebrity Terms
o Charts
Comparing how WIGI ranks countries vs. other Gender indices
By-Profession: a breakdown of the jobs the women written about have
Download Datasets
o Navigate snapshot points
Download current dataset
Download dataset from user-specified timeframe
16 | P a g e 2 8
References
Andrews, D., Nonnecke, B., Preece J. (2003). "Conducting Research on the Internet: Online
Survey Design, Development and Implementation Guidelines" (PDF). International
Journal of Human-Computer Interaction 2 (16): 185–210.
Babbie, E. (2013). The practice of social research. Belmont, Calif.: Wadsworth Cengage
Learning
Don A. Dillman, Jolene D. Smyth, Leah Melani Christian (2008). Internet, Mail, and Mixed-
Mode Surveys: The Tailored Design Method. Wiley; 3 edition. p. 512. ISBN-13: 978-
0471698685
Mark E. Vaughan (2011). The Design, Fabrication, and Modeling of a Piezoelectric Linear
Motor Master thesis:Virginia Polytechnic Institute and State University
Sax, L., Gilmartin, S., Jenny J. Lee, J., Hagedorn, L. (May 2003). Using Web Surveys to
Reach Community College Students: An Analysis of Response Rates and Response
Bias (PDF). Association of Institutional Research. p. 27.
Teresita Bascos-Deveza. "128 IFC Bulletin No 34 Quantifying qualitative data from
expectation surveys: how well do expectation surveys forecast inflation?" (PDF). IFC
Bulletin (34).
17 | P a g e 2 8
Appendix
1.1 Survey Questions and Responses
My Report
Last Modified: 05/31/2015
1/11. Which Wikipedia Analytic Website tools have you used? (You may tick more than 1)
# Answer
Response %
1 reportcard.wmflabs.org
8 47%
2 datavis.wmflabs.org/where/
4 24%
3 datavis.wmflabs.org/agents/
3 18%
4 www.wikipediatrends.com/
7 41%
5 Other
8 47%
Other
None
None
http://stats.wikimedia.org/
http://stats.grok.se/
none
lots of others
none
Statistic Value
Min Value 1
Max Value 5
Total Responses 17
18 | P a g e 2 8
2/11. How have you used the tools? (You may tick more than 1)
# Answer
Response %
1
I have observed
the graphical
data trends
13 76%
2
I have
downloaded the
datasets for my
own analysis
3 18%
3
I have used the
insights to help
me decide what
articles to edit
5 29%
4 Other
6 35%
Other
N/A
-
To share information about trends and the situation right now
none
academic publishing
none
Statistic Value
Min Value 1
Max Value 4
Total Responses 17
19 | P a g e 2 8
3/11. How well do you think Gender Inequality on Wikipedia Biographies reflects Gender Inequality in
the real World?
# Answer
Response %
1 Far too Much
7 33%
2 Too Much
3 14%
3 About Right
5 24%
4 Too Little
5 24%
5 Far too Little
1 5%
Total 21 100%
Statistic Value
Min Value 1
Max Value 5
Mean 2.52
Variance 1.76
Standard Deviation 1.33
Total Responses 21
20 | P a g e 2 8
4/11. Which of these variables are you interested in and would like to see included in the inequality study?
# Answer
Response %
1 Date of Birth
13 72%
2 Date of Death
9 50%
3 Citizenship
14 78%
4 Ethnicity
13 72%
5 Profession
16 89%
6
Article in which
Wikipedia
language
14 78%
7 Other
5 28%
Other
whether avowed feminist
Article length, Featured/Good Article status, Number of editors on each article, Comparison across
languages, Missing features in article (such as no image, no sources, POV and other problem templates),
Censored edits/Reverts/Edit wars
gender
Links to/from articles of the same and different genders (this would show how much of a walled garden
gendered data is)
Statistic Value
Min Value 1
Max Value 7
Total Responses 18
21 | P a g e 2 8
5/11. How much time would you like to spend on an Analytical Website in order to understand Gender
Inequality on Wikipedia?
# Answer
Response %
1
Not more than
5 minutes
0 0%
2
Not more than
10 minutes
4 25%
3
Not more than
30 minutes
4 25%
4 About an hour
5 31%
5 Several hours
3 19%
Total 16 100%
Statistic Value
Min Value 2
Max Value 5
Mean 3.44
Variance 1.20
Standard Deviation 1.09
Total Responses 16
22 | P a g e 2 8
6/11. How much time would you like to spend writing or improving Wikipedia articles about women?
# Answer
Response %
1
Not more than
5 minutes
2 12%
2
Not more than
10 minutes
0 0%
3
Not more than
30 minutes
1 6%
4 About an hour
0 0%
5 Several hours
14 82%
Total 17 100%
Statistic Value
Min Value 1
Max Value 5
Mean 4.41
Variance 1.88
Standard Deviation 1.37
Total Responses 17
23 | P a g e 2 8
7/11. What insights would you be most interested in gathering from Wikipedia Gender Inequality Index tool?
Text Response
Any.
How many known women edited the articles. What articles about women have been put up for deletion and how many
times?
The intersections of race, class, gender, and sexual orientation. I want to know if, for example, there is a wider gap for
Black woman than White woman. I would also be interested in article length, number of sources, etc. as a metric of how
thorough the article is.
Sliced and diced every way possible. I'm presenting a talk at Wikimania Mexico City on "Content Gender Gap". This
information would be extremely helpful fo rme.
With reference to NPOV: how many ANTI-pornography / ANTI-pornography articles are there? (Note: How can
Wikipedia be considered neutral if there is a WikiProject Pornography and a Pornography Portal but not the opposing
POV?)
There needs to be reflection upon the fact that women are again and again dropped from history writing. This systemic
bias needs to be encountered. What about affirmative action in Wikipedia?
See question 4. How big the problem is. How to fix it. Where the problem is most offensive? Where there are less
problems. How to learn from successful projects.
insights about professions - less interested in biographies than information about what fields are represented and how
gender varies in different fields
I have been dreaming (for a while since the big "Women novelists" hullabullo in the press, to do a statistical network
analysis of the ways in which men's and women's article reference eachother as links. I think it would be really powerful
to have a dynamic network diagram that shows the links between men and women's articles, and to be able to limit the
articles in those sets based on categories that they are in in order to explore representation by profession (I think this
might be an incentive for people in underepreresented communities in professions like history or literature to contribute
to Wikipedia)
causes of inequality and ideas for solutions
I am most interested in coverage of Ancient Greek and Roman women writers. I suspect that lack of coverage is most
closely related to lack of coverage in ancient source material, but could also be influenced by the relative lack of female
contributors to Wikipedia and lack of interest in Doric Greek culture and literature.
Bias due to lack of articles about women who are scientists, engineers, attorneys or members of IEEE or ACM.
Statistic Value
Total Responses 12
24 | P a g e 2 8
8/11. Your experience with Wikipedia. How long have you been editing on Wikipedia?
# Answer
Response %
1
Not more than
3 months
2 12%
2
Not more than
6 months
0 0%
3
Not more than
1 year
0 0%
4
Not more than
2 years
1 6%
5 Over 2 years
14 82%
Total 17 100%
Statistic Value
Min Value 1
Max Value 5
Mean 4.47
Variance 1.76
Standard Deviation 1.33
Total Responses 17
25 | P a g e 2 8
9/11. How often do you edit Wikipedia?
# Answer
Response %
1 Never
1 6%
2
Less than
Once a Month
1 6%
3 Once a Month
0 0%
4
2-3 Times a
Month
0 0%
5 Once a Week
1 6%
6
2-3 Times a
Week
4 24%
7 Daily
10 59%
Total 17 100%
Statistic Value
Min Value 1
Max Value 7
Mean 6.00
Variance 3.25
Standard Deviation 1.80
Total Responses 17
26 | P a g e 2 8
10/11. Would you like to hear more about the WIGI project? Please tell us how to reach you. (Contacts removed)
# Answer
Response %
1 Email
8 53%
2
Wikimedia
Username
9 60%
3
Instant
messaging
1 7%
4
Schedule a
Telephone call
2 13%
5
Schedule a
Video call
2 13%
6 Other
2 13%
Statistic Value
Min Value 1
Max Value 6
Total Responses 15
27 | P a g e 2 8
11/11. You have reached the end of the survey. Is there anything else you might like to add? Please feel
free to leave your comments and Click on the arrow below >> to submit when you are done.
Text Response
I would like to see more action, or perhaps even a position statement, from the WMF regarding the gender gap
of both articles and edits and the treatment of women and minority editors. There are a number of long-term
users (e.g., Eric Corbett) and even admins (e.g., Drmies on my user talk page today) who feel it necessary to
be antagonistic toward any others the see as "activists" or minorities. It creates a toxic environment.
This is very important work. I'm looking forward to learning more about it, and seeing the results.
Editathons have focussed on women in science, arts (including writing), and more recently architecture. I
would like to see more editathons for women in sociology, philosophy, politics, economics (including feminist
economics / development economics), care professionals. There are currently no projects centered around child
care, care of the elderly etc., that is quite shameful after all the years Wikipedia has been operating.
I got an error message when I tried to leave questions unanswered.
This is such a good opportunity to find out so many things. Thank you for your work.
I have some experience in visualization and other research concepts through the Digital Humanities, and would
love to help brainstorm some of your goals/methods. I think one of the most interesting tools from a gender
gap item, would be the ability to see how a network of gendered articles become walled gardens (or
interconnect with other concepts, or get mentioned in survey articles etc). Too many of the convresations about
the gender gap relate to purely quantitative differences between reprsentation: the bigger problem are more of
the qualitative questions, especially if we want to advance actual gender representation equalty (how fleshed
out is the article? what does it connect to? How often does it get viewed? )
I don't understand question 3 of this survey, "Q3. How well do you think Gender Inequality on Wikipedia
Biographies reflects Gender Inequality in the real World?"
I was encouraged to see Wagner et al. assessing gender inequality earlier this year and thought that their model
for assessing visibility bias could be adapted to a dataset. Do you think the WIGI project could include a module
devoted to gender representation on the main page? I'm envisioning a dataset for biographies stovepiped by
section (DYK, ITN, etc.) as well as time.
Statistic Value
Total Responses 8
28 | P a g e 2 8
1.2 Links to Survey Data
The raw data CSV and SPSS files are located here:
https://drive.google.com/file/d/0B0jCIl911eXacS1HSk01dTBQZTg/view?usp=sharin
g
Basic Survey stats are available here:
https://drive.google.com/file/d/0B0jCIl911eXaejR1ak9pRzd0QTA/view?usp=sharing
Cover page Photo: Maximilianklein, CC-BY-SA 3.0
Source: https://meta.wikimedia.org/wiki/File:Wigi_map.png
For more information about the survey and data, contact [User:Masssly], WIGI Research
Team bottom-liner, at [email protected]