Wikipedia:Statistics Department
From Wikipedia, the free encyclopedia
Related pages include: Wikipedia:WikiProject Wikidemia, m:Statistics, Wikipedia:Size comparisons, Wikipedia:Largest encyclopedia...
This project, the Statistics Department, provides a space for contributors interested in statistics to discuss what to measure when, and how.
This page and project are still very preliminary. If you would like to help, please add your name below and introduce yourself on the talk page. The to-do list below is just a start...
Contents |
[edit] Title
Statistics Department
[edit] Scope
This WikiProject aims primarily to design, implement, and discuss the collection of statistics about Wikipedia content, metacontent, contributors, and visitors. We seek to better understand how people use Wikipedia and its community, and what is most useful to them. We also seek to explore new ways of streamlining the generation of timely statistics.
[edit] Participants
Please add your name here!
- +sj + 20:21, 7 Dec 2004 (UTC)
- Tobacman 23:18, 8 Dec 2004 (UTC)
- Wile E. Heresiarch 21:02, 5 Jan 2005 (UTC)
- Joo + 7 Apr 2005
- --Alterego 22:19, Jun 18, 2005 (UTC)
- Doppelganger 18:08, 23 Apr 2005 (UTC)
- Quinobi 15:32, 23 Jun 2005 (UTC)
- Dude 03:12, 5 March 2006 (UTC)
- bluesnj
- Prairie_Dad
- odoketa 12 Dec 2006
- AttishOculus 06:14, 11 January 2007 (UTC)
- hzenilc 23 March 2007 (UTC)
[edit] Completed Statistics
- See WP:S, m:Statistics, and more.
[edit] Research Questions
[edit] Context
- What experts come to the site?
- What makes people feel they are "in a position to contribute"? (as one person recently said to me +sj +)
- How can exposing gaps in coverage encourage contribution?
- Who comes to the site expecting garbage? What is their experience like?
- Who comes expecting a world of perfect, free content? What is theirs like?
[edit] Contribution
- Who contributes to Wikipedia, when during the day/week, and how often?
- What causes sudden spikes in readers, contributors, vandals?
- Are there patterns in the contributions? E.g. age, gender, race and nationality versus categories?
- What motivated the top contributors? E.g. repute, reciprocity, altruism, relationships, roles? Free content, neutrality, software design, democracy, community, others?
- How are the quality, validity and reliability of content maintained? By whom, and to what extent?
- How does server load contribute to activity of users? in the hours/days after a slowdown?
- Where (on Earth!) are the contributors? Are contributors to en.wikipedia in English speaking countries, Spanish/Portuguese lang. contributors in Iberia or Latin America or elsewhere, German lang. contributors in German, Austria, Switz. or elsewhere, etc.
[edit] Promoting Readership/Consumption
- Who reads Wikipedia articles, when?
- What linkpaths do they follow through the site?
- What are common first pages visited?
- What are common pages visited from the Main Page?
- How have changes to Recent Changes page and Main historically affected user clickthroughs from those pages?
- How often do anonymous visitors/readers (or visitors from Google/Yahoo) visit pages like RC, Random, the Community Portal?
- What are the readers' ratings of the quality or usefulness of each page?
[edit] Curtailing Mischief
- How can we quantify vandalism? Trolling?
- How many admins are online at a given time?
- How does the # online relate to the amount of vandalism that takes place?
- Are vandals deterred by quick response times?
- How effective are bans and blocks? How often do vandals come back right away as anons or with another ip?
[edit] Processes
- How do different people add content? <-- what does this mean (other than Edit This Page)? Elaboration needed.
- Slow vs. fast contributors; people who write offline vs. online
- How many use offline editors, and upload in blocks?
- How many people migrate content from other free repositories to WM sites?
- photos, text (to commons, source)
[edit] Methodology
This section should cover how the research data will be collected and analysed, and not Wikipedia context or processes (moved to above section).
[edit] Data Collection
- Webalizer statistics
- Add optional fields in every member's profile form for age, gender, race, nationality (perhaps with a privacy option - so system can collect data, but not visible to general public)
- Polls for all in Community Portal
- Surveys/Interviews of top contributors
- Constructs needed for different motivational factor
[edit] Data Analysis
- Define & Select Uniform Data Structures and Software (SPSS, SAS)
- Define varaibles
-outcome measures
- Correlational Designs
- T-tests
- ANOVA/MANOVA (for correlational data)
-post hocs (LSDs, Fischers)
- Factor Analysis
- Non-parametric measures (Chi-Square)
[edit] Research Schedule
- Jul05:
- Sep05:
- Dec05:
- Jan06:
[edit] Caveats?
- Privacy
- Consent to participate in certain surveys
- Feedback effects of certain metrics (edit #) via social loops (people editing for the sake of edit count)