EditSubmitRatios

From Wikimedia Usability Initiative

Squid Log Scan Fact Sheet

The idea is to extract from the 1:1000 sampled squid log the edit and submit calls that are relevant to the usability project and track edit/save ratio of relevant calls over time.

  1. Only script index.php
  2. Only mime type "text/html"
  3. Only article space 0 (few newbees will edit other namespaces)
  4. No failed calls
  5. No edit calls that were result of clicking a link to a not existing page, as most probably 95% are unintentional edit calls, where people who do not know what a red link implies clicked that link.
  6. No requests issued by bots
  7. No incomplete log records (where e.g. destination wiki is masked)

The following are frequencies in which index.php result codes are found in the 1:1000 sampled squid logs from just over 6 months:

Note: counts needs update, criteria 1-3 were not yet applied

TCP_DENIED/403,action=edit  321390 
TCP_DENIED/403,action=submit    33
.
TCP_MISS/000,action=edit      7352
TCP_MISS/000,action=submit    1186
.
TCP_MISS/200,action=edit    800200
TCP_MISS/200,action=submit   75768
TCP_MISS/206,action=edit        20
TCP_MISS/206,action=submit     269
.
TCP_MISS/301,action=edit       662
TCP_MISS/302,action=edit    184217
TCP_MISS/302,action=submit  116141
.
TCP_MISS/400,action=edit         6
TCP_MISS/403,action=edit      2746
TCP_MISS/404,action=edit       119
TCP_MISS/404,action=submit     206
TCP_MISS/417,action=edit        53
TCP_MISS/417,action=submit     716
.
TCP_MISS/500,action=edit       362
TCP_MISS/500,action=submit      81
TCP_MISS/502,action=submit      87
TCP_MISS/503,action=edit         7
TCP_MISS/503,action=submit    5878
TCP_MISS/504,action=edit        53
TCP_MISS/504,action=submit      91

Here are the relevant html codes from Squid FAQ (see also w3 spec):

 000 Used mostly with UDP traffic.
 200 OK
 206 Partial Content
 301 Moved Permanently
 302 Moved Temporarily
 400 Bad Request
 403 Forbidden
 404 Not Found
[417 Expectation Failed]
 500 Internal Server Error
 502 Bad Gateway
 503 Service Unavailable
 504 Gateway Timeout
Discard errors
  • TCP_DENIED is almost exlusively bad requests from ill behaving browsers.
  • Status codes with range 400 or above are valid but failed requests, e.g. the squid server could not pass on the request to a server or service (TCP_MISS/503)
Discard cruft
  • Status codes that occur on less than 1% or all requests (000, 206 and 301) have not been not further investigated, and are discarded.

This first selection leaves

TCP_MISS/200,action=edit    800200 [1]
TCP_MISS/302,action=edit    184217 [2]
.
TCP_MISS/200,action=submit   75768 [3]
TCP_MISS/302,action=submit  116141 [4]
Discarding clicks to non existing pages

url contains 'redlink=1'

TCP_MISS/200,action=edit   434602 [1]
TCP_MISS/302,action=edit      492 [2]
.
TCP_MISS/200,action=submit  75768 [3]
TCP_MISS/302,action=submit 116141 [4]

Explanation: Almost all TCP_MISS/302 (2) are result of clicking a missing link. Roughly half of TCP_MISS/200 (1) are as well.

So missing links can result in 200 or 302, this initially was somewhat confusing. On further inspection this turns out to be language dependent (configuration item ?).

Here are counts for wikis with most hits:

 30108,wp:de,action=edit,redlink=1,TCP_MISS/200
  1759,wp:de,action=edit,redlink=1,TCP_MISS/302
  6756,wp:en,action=edit,redlink=1,TCP_MISS/200
129361,wp:en,action=edit,redlink=1,TCP_MISS/302
 56032,wp:es,action=edit,redlink=1,TCP_MISS/200
  2332,wp:es,action=edit,redlink=1,TCP_MISS/302
 23094,wp:fr,action=edit,redlink=1,TCP_MISS/200
   493,wp:fr,action=edit,redlink=1,TCP_MISS/302
 19301,wp:it,action=edit,redlink=1,TCP_MISS/200
   646,wp:it,action=edit,redlink=1,TCP_MISS/302
 49288,wp:ja,action=edit,redlink=1,TCP_MISS/200
  5569,wp:ja,action=edit,redlink=1,TCP_MISS/302
  9585,wp:nl,action=edit,redlink=1,TCP_MISS/200
   558,wp:nl,action=edit,redlink=1,TCP_MISS/302
 13210,wp:pl,action=edit,redlink=1,TCP_MISS/200
   121,wp:pl,action=edit,redlink=1,TCP_MISS/302
 18140,wp:pt,action=edit,redlink=1,TCP_MISS/200
   318,wp:pt,action=edit,redlink=1,TCP_MISS/302
 26393,wp:ru,action=edit,redlink=1,TCP_MISS/200
  1345,wp:ru,action=edit,redlink=1,TCP_MISS/302
   398,wp:zh,action=edit,redlink=1,TCP_MISS/200
 15875,wp:zh,action=edit,redlink=1,TCP_MISS/302
Discard previews and unsuccessful saves

When a user clicks the 'Save Page' or 'Preview' buttons, script index.php is invoked with parameter 'action=submit'.

The extra info that tells script index.php how to respond is not available in the squid log. But much can be learned from the result code.

TCP_MISS/200 (3): result of a preview request or unsuccessful save (e.g. edit conflict)
TCP_MISS/302 (4): result of successful save, were user is redirected to the saved page.
TCP_MISS/200,action=edit   434602 [1]
TCP_MISS/302,action=edit      492 [2]
.
TCP_MISS/302,action=submit 116141 [4]
Discard further cruft (2)
TCP_MISS/200,action=edit   434602 [1]
TCP_MISS/302,action=submit 116141 [4]
Breakdown by User/Bot
bot=N,TCP_MISS/200,action=edit   300183
bot=Y,TCP_MISS/200,action=edit   134419
.
bot=N,TCP_MISS/302,action=submit  83814
bot=Y,TCP_MISS/302,action=submit  32327

Code used to detect bots is roughly this:

if agent string contains 'http://'                       # url only allowed/expected with bots
and agent string does not contain 'bsalsa.com'           # exception: bsalsa messed up, see http://www.bsalsa.com/forum/showthread.php?t=724
and agent string does not contain MSIE + version number  # most likely false positives
then bot is true

if agent contains 'bot' or 'crawl(er)' or 'spider' 
then bot is true

Note: the large edit:save ratio for bots obviously has nothing to do with usability. Possibly some bots access articles without intention to save, e.g. to harvest raw text parameters for info box, or list of used templates.

Discard bots
bot=N,TCP_MISS/200,action=edit   300183
bot=N,TCP_MISS/302,action=submit  83814
Count edit/submits and ratio per wiki (show here when combined 100 or more)
wk:lt         edits    133, submit      3, ratio  44.3
wx:mw         edits    411, submit     30, ratio  13.7
wk:es         edits    194, submit     24, ratio   8.1
wp:id         edits   1758, submit    233, ratio   7.5
wp:ms         edits    530, submit     71, ratio   7.5
wb:de         edits    245, submit     34, ratio   7.2
wb:pt         edits    125, submit     18, ratio   6.9
wp:sw         edits    239, submit     41, ratio   5.8
wp:af         edits    154, submit     27, ratio   5.7
wx:meta       edits    449, submit     86, ratio   5.2
wb:es         edits    108, submit     22, ratio   4.9
wp:ja         edits  13904, submit   2813, ratio   4.9
wp:pt         edits   8977, submit   1875, ratio   4.8
wb:fr         edits     82, submit     18, ratio   4.6
wp:es         edits  18430, submit   4038, ratio   4.6
wp:eo         edits    420, submit     93, ratio   4.5
wk:no         edits    119, submit     27, ratio   4.4
wp:fa         edits   1220, submit    275, ratio   4.4
wp:el         edits    636, submit    148, ratio   4.3
wp:tl         edits    287, submit     69, ratio   4.2
wk:ru         edits    243, submit     66, ratio   3.7
wp:ro         edits    956, submit    260, ratio   3.7
wp:th         edits    949, submit    265, ratio   3.6
wp:bs         edits    114, submit     33, ratio   3.5
wp:ta         edits    127, submit     36, ratio   3.5
wb:en         edits    439, submit    131, ratio   3.4
wk:it         edits     82, submit     24, ratio   3.4
wp:vi         edits    845, submit    257, ratio   3.3
wk:de         edits    205, submit     66, ratio   3.1
wk:fr         edits    577, submit    193, ratio   3.0
wp:sq         edits    116, submit     39, ratio   3.0
wp:hr         edits    478, submit    163, ratio   2.9
wp:bg         edits    705, submit    256, ratio   2.8
wp:da         edits    614, submit    222, ratio   2.8
wp:mr         edits     84, submit     30, ratio   2.8
wp:pl         edits   4603, submit   1635, ratio   2.8
wp:tr         edits   2803, submit    992, ratio   2.8
wp:ar         edits   1852, submit    689, ratio   2.7
wp:en         edits  88576, submit  32352, ratio   2.7
wp:ko         edits   1670, submit    618, ratio   2.7
wp:nl         edits   4226, submit   1565, ratio   2.7
wp:sk         edits    330, submit    121, ratio   2.7
wp:de         edits  14306, submit   5505, ratio   2.6
wp:ru         edits   9094, submit   3541, ratio   2.6
wp:sr         edits    620, submit    242, ratio   2.6
wp:zh         edits   3273, submit   1254, ratio   2.6
wk:en         edits   1184, submit    477, ratio   2.5
wp:cs         edits   1005, submit    408, ratio   2.5
wp:lv         edits    143, submit     57, ratio   2.5
wx:species    edits    221, submit     90, ratio   2.5
wk:pt         edits    156, submit     64, ratio   2.4
wp:fr         edits  10463, submit   4433, ratio   2.4
wp:lb         edits     79, submit     33, ratio   2.4
wv:en         edits    123, submit     51, ratio   2.4
wp:it         edits   7249, submit   3138, ratio   2.3
wk:tr         edits     85, submit     39, ratio   2.2
wn:en         edits    134, submit     61, ratio   2.2
wp:ka         edits    172, submit     82, ratio   2.1
wp:mk         edits    176, submit     83, ratio   2.1
wk:ko         edits     69, submit     34, ratio   2.0
wp:fi         edits   1222, submit    603, ratio   2.0
wp:hu         edits   1536, submit    776, ratio   2.0
wp:lt         edits    316, submit    159, ratio   2.0
wp:sl         edits    225, submit    113, ratio   2.0
wp:no         edits    855, submit    444, ratio   1.9
wp:simple     edits    332, submit    177, ratio   1.9
wp:sv         edits   1733, submit    895, ratio   1.9
wp:he         edits   1332, submit    722, ratio   1.8
wp:ml         edits    119, submit     67, ratio   1.8
wq:en         edits    159, submit     89, ratio   1.8
wp:eu         edits    122, submit     71, ratio   1.7
wp:uk         edits    754, submit    456, ratio   1.7
wm:incubator  edits     81, submit     51, ratio   1.6
wp:br         edits     71, submit     48, ratio   1.5
wp:ca         edits    700, submit    454, ratio   1.5
wp:et         edits    226, submit    148, ratio   1.5
wp:la         edits    109, submit     74, ratio   1.5
wp:az         edits    108, submit     75, ratio   1.4
ws:de         edits    151, submit    111, ratio   1.4
ws:en         edits    190, submit    134, ratio   1.4
wx:commons    edits   4710, submit   3445, ratio   1.4
wk:fi         edits    100, submit     75, ratio   1.3
wp:gl         edits    129, submit     98, ratio   1.3
wp:hi         edits    198, submit    147, ratio   1.3
wp:nn         edits     68, submit     54, ratio   1.3
ws:fr         edits    176, submit    138, ratio   1.3
ws:ru         edits    100, submit     80, ratio   1.3
wk:nl         edits     77, submit     78, ratio   1.0
wk:pl         edits    105, submit    174, ratio   0.6
.
total shown   edits 223571, submit  79506, ratio   2.8
.
wb:Wikibooks, wk:Wiktionary, wn:Wikinews, wp:Wikipedia, wq:Wikiquote, ws: Wikisource, wv:Wikiversity, wx:special wiki

Caveat 1: What is strange is that wx:mw (mediawiki wiki) and meta have high edit/submit counts. One would expect mostly experienced editors there, and therefor a edit/submit ratio closer to 1 than on other wikis. Further research, cross checking needed.

Caveat 2: If certain user assisted (= semi-automatic) bots would no be recognized as bots (= give no clue in the agent string in the squid log record), that would give serious pollution of the counts. For instance if a spell checker bot asks the user to confirm a suggested correction directly in the edit screen, that could generate hundreds of aborted edits in one evening session. further research , cross checking needed.