There are several reasons why we want to develop metrics regarding multimedia content on/from Wikimedia websites:
- We need data about users to design or improve the software, as part of the User research, or for ethnographic reasons.
- We need data to facilitate communication / outreach / relationships with GLAMs.
- We need data to measure the changes induced by the Multimedia Usability project, and more generally assess the impact of changes we try.
- What do we need/want to measure, and why?
- Participation: How many people contribute multimedia content? How much? How often? How many people improve Commons (classification, description, etc.)
- Impact: How many people use and reuse our multimedia content? How much? How often?
Ideally with evolution over time. We would need to record this on a regular basis, probably on the toolserver.
Inventory & temporal evolution
|How many files do we host on Commons?||Can be measured with countless tools, e.g. here or here.|
|How many files are uploaded on Commons each month?||Here and here (experimental!).|
|How many files are uploaded to all Wikimedia projects each month?|
|What are the topics covered by our content? (distribution)||Count files in the subcategories of Category:Topics||Breakdown of media files by super topical category, e.g. art, science etc. We may need to do some clean-up in the Category:Topics first though.|
|What are the media types of our content? (distribution)||E.g. medium: maps, technical drawings, animations, photos. This may be derived from . We could count files in the subcategories of Category:Media types but it doesn't seem to be reliable. We also have historical data in Commons:MIME type statistics. We might have to wait until we actually record this somewhere (and extract/migrate existing data).|
|Where does our content come from ? (own work, etc.) (distribution)||Count files in the subcategories of Category:Pictures and images by source? Doesn't look reliable|
|What location does our content come from? (map)||Extract location information from geotagging templates on file pages & plot it||See http://poulpy.blogspot.com/2010/02/elles-sont-ou-les-photos-de-commons.html|
|Under which licenses is our content released? (distribution)||Count files in the subcategories of Category:Copyright statuses, Category:Free licenses, Category:Creative Commons licenses||When we count in categories, we should probably automatically count separately subcategories that contain more than XX% files of the parent category. It would allow us to have a more accurate overview without having to manually decide which subcategories to count separately.|
|What is the size of our files? (distribution)||DB query|
|What upload medium was used? (distribution)||"Old" upload form, new upload form, API (bot, desktop applications, add-media-wizard, etc.). Needs schema change & minor change to the upload API.|
|How many edits are performed on Commons? And in which namespace?||Broken down by namespace|
|How many edits are performed on all Wikimedia projects? Particularly, in the File namespace?||in order to be able to compare the evolution of Commons|
|How fast are files categorized on Commons?||User:Multichill/Categorization stats|
|How many files are deleted on Commons each month?||Similar to the upload deletion ratio but where deletions would be only those of files uploaded during that same period.|
|How long have files been online before they were deleted?|
Relevance (where is it shown)
How often do we serve images in their original huge size? Breakdown of image hosted/served by thumbnail size
Internal (Wikimedia projects)
|How are files from a specific category used across Wikimedia projects?||glamorous||GlobalUsage-based|
|How are files from cultural partnerships used across Wikimedia projects?||AmalGLAMate||a GLAM-specific aggregation of images-in-category usage statistics|
- Discussion about other GlobalUsage weekly stats.
External (other sites)
|Can we track external use of content?||There is no reliable way to record usage from websites that use a local copy of files they found on Wikimedia Commons. As a consequence, we can only track usage from websites that fetch media files directly from Commons.|
|How many websites use Commons as file repository?|
|How many files from Commons are used on Wikimedia websites?|
|How many files from Commons are used on MediaWiki websites using InstantCommons?||needs development to integrate InstantCommons with GlobalUsage|
|How many files from Commons are used on websites using other CMSes?||to be discussed when we actually find a way to extend InstantCommons to other CMSes.|
We use a typology similar to the one already used on Wikistats & the report card:
- active participants: 5+ edits per month (Report Card)
- very active participants: 100+ edits per month (Report Card)
|How many new accounts are created at Commons each month?||with distinction between accounts created directly on Commons and SUL accounts created automatically|
|How often do uploads succeed?||Ratio upload screen requested / actual transfers|
|Who uploads files?||user: new, active, very active participants on Commons, also depending on whether they're new, active or very active on another Wikimedia project|
|What is the language used by our viewers? (distribution)||example|
|What is the location of our viewers? (map)|
|How many viewers see a given image, and at what resolution?||image view statistics (image usage coupled to page views); see mw:Hit stats aggregation. Something in apparently in the works with Domas & WMDE.|
- ideally, we would be able to break down results using all filters, e.g. for a given file, see how many people viewed it, from where, using what language
- ideally, we would also be able to collect similar statistics for a set of files (e.g. inside a category).
We can measure this only for Wikimedia websites