EditSubmitRatios
Squid Log Scan Fact Sheet
The idea is to extract from the 1:1000 sampled squid log the edit and submit calls that are relevant to the usability project and track edit/save ratio of relevant calls over time.
- Only script index.php
- Only mime type "text/html"
- Only article space 0 (few newbees will edit other namespaces)
- No failed calls
- No edit calls that were result of clicking a link to a not existing page, as most probably 95% are unintentional edit calls, where people who do not know what a red link implies clicked that link.
- No requests issued by bots
- No incomplete log records (where e.g. destination wiki is masked)
The following are frequencies in which index.php result codes are found in the 1:1000 sampled squid logs from just over 6 months:
Note: counts needs update, criteria 1-3 were not yet applied
TCP_DENIED/403,action=edit 321390 TCP_DENIED/403,action=submit 33 . TCP_MISS/000,action=edit 7352 TCP_MISS/000,action=submit 1186 . TCP_MISS/200,action=edit 800200 TCP_MISS/200,action=submit 75768 TCP_MISS/206,action=edit 20 TCP_MISS/206,action=submit 269 . TCP_MISS/301,action=edit 662 TCP_MISS/302,action=edit 184217 TCP_MISS/302,action=submit 116141 . TCP_MISS/400,action=edit 6 TCP_MISS/403,action=edit 2746 TCP_MISS/404,action=edit 119 TCP_MISS/404,action=submit 206 TCP_MISS/417,action=edit 53 TCP_MISS/417,action=submit 716 . TCP_MISS/500,action=edit 362 TCP_MISS/500,action=submit 81 TCP_MISS/502,action=submit 87 TCP_MISS/503,action=edit 7 TCP_MISS/503,action=submit 5878 TCP_MISS/504,action=edit 53 TCP_MISS/504,action=submit 91
Here are the relevant html codes from Squid FAQ (see also w3 spec):
000 Used mostly with UDP traffic. 200 OK 206 Partial Content 301 Moved Permanently 302 Moved Temporarily 400 Bad Request 403 Forbidden 404 Not Found [417 Expectation Failed] 500 Internal Server Error 502 Bad Gateway 503 Service Unavailable 504 Gateway Timeout
- Discard errors
- TCP_DENIED is almost exlusively bad requests from ill behaving browsers.
- Status codes with range 400 or above are valid but failed requests, e.g. the squid server could not pass on the request to a server or service (TCP_MISS/503)
- Discard cruft
- Status codes that occur on less than 1% or all requests (000, 206 and 301) have not been not further investigated, and are discarded.
This first selection leaves
TCP_MISS/200,action=edit 800200 [1] TCP_MISS/302,action=edit 184217 [2] . TCP_MISS/200,action=submit 75768 [3] TCP_MISS/302,action=submit 116141 [4]
- Discarding clicks to non existing pages
url contains 'redlink=1'
TCP_MISS/200,action=edit 434602 [1] TCP_MISS/302,action=edit 492 [2] . TCP_MISS/200,action=submit 75768 [3] TCP_MISS/302,action=submit 116141 [4]
Explanation: Almost all TCP_MISS/302 (2) are result of clicking a missing link. Roughly half of TCP_MISS/200 (1) are as well.
So missing links can result in 200 or 302, this initially was somewhat confusing. On further inspection this turns out to be language dependent (configuration item ?).
Here are counts for wikis with most hits:
30108,wp:de,action=edit,redlink=1,TCP_MISS/200 1759,wp:de,action=edit,redlink=1,TCP_MISS/302 6756,wp:en,action=edit,redlink=1,TCP_MISS/200 129361,wp:en,action=edit,redlink=1,TCP_MISS/302 56032,wp:es,action=edit,redlink=1,TCP_MISS/200 2332,wp:es,action=edit,redlink=1,TCP_MISS/302 23094,wp:fr,action=edit,redlink=1,TCP_MISS/200 493,wp:fr,action=edit,redlink=1,TCP_MISS/302 19301,wp:it,action=edit,redlink=1,TCP_MISS/200 646,wp:it,action=edit,redlink=1,TCP_MISS/302 49288,wp:ja,action=edit,redlink=1,TCP_MISS/200 5569,wp:ja,action=edit,redlink=1,TCP_MISS/302 9585,wp:nl,action=edit,redlink=1,TCP_MISS/200 558,wp:nl,action=edit,redlink=1,TCP_MISS/302 13210,wp:pl,action=edit,redlink=1,TCP_MISS/200 121,wp:pl,action=edit,redlink=1,TCP_MISS/302 18140,wp:pt,action=edit,redlink=1,TCP_MISS/200 318,wp:pt,action=edit,redlink=1,TCP_MISS/302 26393,wp:ru,action=edit,redlink=1,TCP_MISS/200 1345,wp:ru,action=edit,redlink=1,TCP_MISS/302 398,wp:zh,action=edit,redlink=1,TCP_MISS/200 15875,wp:zh,action=edit,redlink=1,TCP_MISS/302
- Discard previews and unsuccessful saves
When a user clicks the 'Save Page' or 'Preview' buttons, script index.php is invoked with parameter 'action=submit'.
The extra info that tells script index.php how to respond is not available in the squid log. But much can be learned from the result code.
- TCP_MISS/200 (3): result of a preview request or unsuccessful save (e.g. edit conflict)
- TCP_MISS/302 (4): result of successful save, were user is redirected to the saved page.
TCP_MISS/200,action=edit 434602 [1] TCP_MISS/302,action=edit 492 [2] . TCP_MISS/302,action=submit 116141 [4]
- Discard further cruft (2)
TCP_MISS/200,action=edit 434602 [1] TCP_MISS/302,action=submit 116141 [4]
- Breakdown by User/Bot
bot=N,TCP_MISS/200,action=edit 300183 bot=Y,TCP_MISS/200,action=edit 134419 . bot=N,TCP_MISS/302,action=submit 83814 bot=Y,TCP_MISS/302,action=submit 32327
Code used to detect bots is roughly this:
if agent string contains 'http://' # url only allowed/expected with bots and agent string does not contain 'bsalsa.com' # exception: bsalsa messed up, see http://www.bsalsa.com/forum/showthread.php?t=724 and agent string does not contain MSIE + version number # most likely false positives then bot is true if agent contains 'bot' or 'crawl(er)' or 'spider' then bot is true
Note: the large edit:save ratio for bots obviously has nothing to do with usability. Possibly some bots access articles without intention to save, e.g. to harvest raw text parameters for info box, or list of used templates.
- Discard bots
bot=N,TCP_MISS/200,action=edit 300183 bot=N,TCP_MISS/302,action=submit 83814
- Count edit/submits and ratio per wiki (show here when combined 100 or more)
wk:lt edits 133, submit 3, ratio 44.3 wx:mw edits 411, submit 30, ratio 13.7 wk:es edits 194, submit 24, ratio 8.1 wp:id edits 1758, submit 233, ratio 7.5 wp:ms edits 530, submit 71, ratio 7.5 wb:de edits 245, submit 34, ratio 7.2 wb:pt edits 125, submit 18, ratio 6.9 wp:sw edits 239, submit 41, ratio 5.8 wp:af edits 154, submit 27, ratio 5.7 wx:meta edits 449, submit 86, ratio 5.2 wb:es edits 108, submit 22, ratio 4.9 wp:ja edits 13904, submit 2813, ratio 4.9 wp:pt edits 8977, submit 1875, ratio 4.8 wb:fr edits 82, submit 18, ratio 4.6 wp:es edits 18430, submit 4038, ratio 4.6 wp:eo edits 420, submit 93, ratio 4.5 wk:no edits 119, submit 27, ratio 4.4 wp:fa edits 1220, submit 275, ratio 4.4 wp:el edits 636, submit 148, ratio 4.3 wp:tl edits 287, submit 69, ratio 4.2 wk:ru edits 243, submit 66, ratio 3.7 wp:ro edits 956, submit 260, ratio 3.7 wp:th edits 949, submit 265, ratio 3.6 wp:bs edits 114, submit 33, ratio 3.5 wp:ta edits 127, submit 36, ratio 3.5 wb:en edits 439, submit 131, ratio 3.4 wk:it edits 82, submit 24, ratio 3.4 wp:vi edits 845, submit 257, ratio 3.3 wk:de edits 205, submit 66, ratio 3.1 wk:fr edits 577, submit 193, ratio 3.0 wp:sq edits 116, submit 39, ratio 3.0 wp:hr edits 478, submit 163, ratio 2.9 wp:bg edits 705, submit 256, ratio 2.8 wp:da edits 614, submit 222, ratio 2.8 wp:mr edits 84, submit 30, ratio 2.8 wp:pl edits 4603, submit 1635, ratio 2.8 wp:tr edits 2803, submit 992, ratio 2.8 wp:ar edits 1852, submit 689, ratio 2.7 wp:en edits 88576, submit 32352, ratio 2.7 wp:ko edits 1670, submit 618, ratio 2.7 wp:nl edits 4226, submit 1565, ratio 2.7 wp:sk edits 330, submit 121, ratio 2.7 wp:de edits 14306, submit 5505, ratio 2.6 wp:ru edits 9094, submit 3541, ratio 2.6 wp:sr edits 620, submit 242, ratio 2.6 wp:zh edits 3273, submit 1254, ratio 2.6 wk:en edits 1184, submit 477, ratio 2.5 wp:cs edits 1005, submit 408, ratio 2.5 wp:lv edits 143, submit 57, ratio 2.5 wx:species edits 221, submit 90, ratio 2.5 wk:pt edits 156, submit 64, ratio 2.4 wp:fr edits 10463, submit 4433, ratio 2.4 wp:lb edits 79, submit 33, ratio 2.4 wv:en edits 123, submit 51, ratio 2.4 wp:it edits 7249, submit 3138, ratio 2.3 wk:tr edits 85, submit 39, ratio 2.2 wn:en edits 134, submit 61, ratio 2.2 wp:ka edits 172, submit 82, ratio 2.1 wp:mk edits 176, submit 83, ratio 2.1 wk:ko edits 69, submit 34, ratio 2.0 wp:fi edits 1222, submit 603, ratio 2.0 wp:hu edits 1536, submit 776, ratio 2.0 wp:lt edits 316, submit 159, ratio 2.0 wp:sl edits 225, submit 113, ratio 2.0 wp:no edits 855, submit 444, ratio 1.9 wp:simple edits 332, submit 177, ratio 1.9 wp:sv edits 1733, submit 895, ratio 1.9 wp:he edits 1332, submit 722, ratio 1.8 wp:ml edits 119, submit 67, ratio 1.8 wq:en edits 159, submit 89, ratio 1.8 wp:eu edits 122, submit 71, ratio 1.7 wp:uk edits 754, submit 456, ratio 1.7 wm:incubator edits 81, submit 51, ratio 1.6 wp:br edits 71, submit 48, ratio 1.5 wp:ca edits 700, submit 454, ratio 1.5 wp:et edits 226, submit 148, ratio 1.5 wp:la edits 109, submit 74, ratio 1.5 wp:az edits 108, submit 75, ratio 1.4 ws:de edits 151, submit 111, ratio 1.4 ws:en edits 190, submit 134, ratio 1.4 wx:commons edits 4710, submit 3445, ratio 1.4 wk:fi edits 100, submit 75, ratio 1.3 wp:gl edits 129, submit 98, ratio 1.3 wp:hi edits 198, submit 147, ratio 1.3 wp:nn edits 68, submit 54, ratio 1.3 ws:fr edits 176, submit 138, ratio 1.3 ws:ru edits 100, submit 80, ratio 1.3 wk:nl edits 77, submit 78, ratio 1.0 wk:pl edits 105, submit 174, ratio 0.6 . total shown edits 223571, submit 79506, ratio 2.8 . wb:Wikibooks, wk:Wiktionary, wn:Wikinews, wp:Wikipedia, wq:Wikiquote, ws: Wikisource, wv:Wikiversity, wx:special wiki
Caveat 1: What is strange is that wx:mw (mediawiki wiki) and meta have high edit/submit counts. One would expect mostly experienced editors there, and therefor a edit/submit ratio closer to 1 than on other wikis. Further research, cross checking needed.
Caveat 2: If certain user assisted (= semi-automatic) bots would no be recognized as bots (= give no clue in the agent string in the squid log record), that would give serious pollution of the counts. For instance if a spell checker bot asks the user to confirm a suggested correction directly in the edit screen, that could generate hundreds of aborted edits in one evening session. further research , cross checking needed.