Multimedia:Metrics

Draft

This is a Draft

Although this document may have useful content, it should not be considered final. Please take this into consideration when discussing this page.

This is an internal document

While making small fixes to this page like fixing typos and dead-links is encouraged, any changes which significantly modify the information of this page should be suggested on the discussion page instead, as this is an internal document.

There are several reasons why we want to develop metrics regarding multimedia content on/from Wikimedia websites:

We need data about users to design or improve the software, as part of the User research, or for ethnographic reasons.
We need data to facilitate communication / outreach / relationships with GLAMs.
We need data to measure the changes induced by the Multimedia Usability project, and more generally assess the impact of changes we try.

Questions:

What do we need/want to measure, and why?
- Participation: How many people contribute multimedia content? How much? How often? How many people improve Commons (classification, description, etc.)
- Impact: How many people use and reuse our multimedia content? How much? How often?

Content

Ideally with evolution over time. We would need to record this on a regular basis, probably on the toolserver.

Inventory & temporal evolution

Question	Implementation	Notes
How many files do we host on Commons?		Can be measured with countless tools, e.g. here or here.
How many files are uploaded on Commons each month?		Here and here (experimental!).
How many files are uploaded to all Wikimedia projects each month?
What are the topics covered by our content? (distribution)	Count files in the subcategories of Category:Topics	Breakdown of media files by super topical category, e.g. art, science etc. We may need to do some clean-up in the Category:Topics first though.
What are the media types of our content? (distribution)		E.g. medium: maps, technical drawings, animations, photos. This may be derived from . We could count files in the subcategories of Category:Media types but it doesn't seem to be reliable. We also have historical data in Commons:MIME type statistics. We might have to wait until we actually record this somewhere (and extract/migrate existing data).
Where does our content come from ? (own work, etc.) (distribution)		Count files in the subcategories of Category:Pictures and images by source? Doesn't look reliable
What location does our content come from? (map)	Extract location information from geotagging templates on file pages & plot it	See http://poulpy.blogspot.com/2010/02/elles-sont-ou-les-photos-de-commons.html
Under which licenses is our content released? (distribution)	Count files in the subcategories of Category:Copyright statuses, Category:Free licenses, Category:Creative Commons licenses	When we count in categories, we should probably automatically count separately subcategories that contain more than XX% files of the parent category. It would allow us to have a more accurate overview without having to manually decide which subcategories to count separately.
What is the size of our files? (distribution)	DB query
What upload medium was used? (distribution)		"Old" upload form, new upload form, API (bot, desktop applications, add-media-wizard, etc.). Needs schema change & minor change to the upload API.

Maintenance

Question	Implementation	Notes
How many edits are performed on Commons? And in which namespace?		Broken down by namespace
How many edits are performed on all Wikimedia projects? Particularly, in the File namespace?		in order to be able to compare the evolution of Commons
How fast are files categorized on Commons?		User:Multichill/Categorization stats
How many files are deleted on Commons each month?		Similar to the upload deletion ratio but where deletions would be only those of files uploaded during that same period.
How long have files been online before they were deleted?

Relevance (where is it shown)

How often do we serve images in their original huge size? Breakdown of image hosted/served by thumbnail size

Internal (Wikimedia projects)

Question	Implementation	Notes
How are files from a specific category used across Wikimedia projects?	glamorous	GlobalUsage-based
How are files from cultural partnerships used across Wikimedia projects?	AmalGLAMate	a GLAM-specific aggregation of images-in-category usage statistics

External (other sites)

Question	Implementation	Notes
Can we track external use of content?		There is no reliable way to record usage from websites that use a local copy of files they found on Wikimedia Commons. As a consequence, we can only track usage from websites that fetch media files directly from Commons.
How many websites use Commons as file repository?
How many files from Commons are used on Wikimedia websites?
How many files from Commons are used on MediaWiki websites using InstantCommons?		needs development to integrate InstantCommons with GlobalUsage
How many files from Commons are used on websites using other CMSes?		to be discussed when we actually find a way to extend InstantCommons to other CMSes.

Users

Typology

We use a typology similar to the one already used on Wikistats & the report card:

active participants: 5+ edits per month (Report Card)
very active participants: 100+ edits per month (Report Card)

Participation

Question	Implementation	Notes
How many new accounts are created at Commons each month?		with distinction between accounts created directly on Commons and SUL accounts created automatically
How often do uploads succeed?	Ratio upload screen requested / actual transfers
Who uploads files?		user: new, active, very active participants on Commons, also depending on whether they're new, active or very active on another Wikimedia project

Reach

Question	Implementation	Notes
What is the language used by our viewers? (distribution)		example
What is the location of our viewers? (map)
How many viewers see a given image, and at what resolution?		image view statistics (image usage coupled to page views); see mw:Hit stats aggregation. Something in apparently in the works with Domas & WMDE.

ideally, we would be able to break down results using all filters, e.g. for a given file, see how many people viewed it, from where, using what language
ideally, we would also be able to collect similar statistics for a set of files (e.g. inside a category).

We can measure this only for Wikimedia websites

Resources