Multimedia:Meeting in Paris/Notes/Mass Upload

From Wikimedia Usability Initiative

Workflow analysis

What's the current workflow related to this topic?

  • ...
  • ...

Major issues

What are the major issues related to this topic?

  • There are several different possible sources
    • and we have to accomodate them all (or do we?)
    • GLAM
    • Commons users
    • Contests
    • Already on the web (LOC, Flickr, etc.)
  • Streamline multifile uploading process
  • Copyright verification
  • Poor metadata handling
    • Machine readable (import various types and merge to "our format", and export 1 standard type so third parties can import the data to their own collections)
    • Human-entered
  • Organization and integration of images (once images are metadata-correct, we need to engage the community effectively to categorise and improve and incorporate to Wikipedia)
  • Processes/guidelines are inconsistent
  • Role accounts (one time uploads), this gives the GLAM "ownership" of their files and makes reporting easier (reports based on uploads-by-user)
  • Making bots easier to build and share components (rather than creating bespoke software for each project)
  • Documentation
    • Best practices
  • We are currently not able to provide feedback
    • Usage data (e.g. Dopplr summary) that lists number of page hits, images used in which articles, number of clickthroughs...
    • Change data - doesn't have to be "pretty" but enables export of changes (e.g. categories, dates) so they can incorporate back into their collection. Flickr does this.
  • Custom templates with the right format of data for the institution. E.g. some institutions prefer we list the "accession number", whilst others prefer "record number", "image number", URL etc etc. The fields should be relevant to the institution.

Tools and workarounds

What are the existing tools or workarounds used to circumvent these issues?

  • Batch uploading documentation[1]
  • Commonist[2] - java program (Datura)
  • Commonplace[3] - Windows/Linux program (IlyaHaykinson)
  • Upload API[4] (Michael, Bryan TM)
  • External upload tool developed for photo contests (Kaldari)
  • Flickr (as a staging server)
  • (as a staging server)
  • Imagecopy built by Multichill (wrapper around Commons Helper)[5]
  • FlickrRipper[6] - transfer multiple flickr images at once
  • Custom bots/tools
    • pyWikipedia bots[7]
    • Eloquence upload script[8]
    • Nichalp's Upload Script[9]

Other possible solutions

What other solutions could be used to circumvent these issues? (Be bold and creative)

  • Enhance Commons
  • 3rd Party front-end sites
  • Client programs (like Commonist or Flickr Uploader)

FlickrBot 2.0

Create a bot that allows a users to upload an entire photostream from Flickr to Commons. Allow searching Flickr by: User, Group, Set, Pool

Automatically populated categories, titles, etc. from the Flickr info.

Allow user to browse the images and choose which ones are good for uploading before they are passed to the upload script.

Use duplicate checking.

Could be implemented by combining Multichill's pywiki FlickrBot for pulling files from photostreams with MM's FlickBot for uploading them to Commons.

Commons Helper 2.0

New version of Commons Helper that will be developed by Mangus.

Problem with old Commons Helper: Translating Wikipedida specific templates to Commons templates.

Solution: Use script to create parse tree from remote template and then construct new Commons template from the parse tree.

Staging server

Set up a server to house files to be organized before they are pushed to Commons. This server would also host custom uploading interfaces taylored to specific users/institutes.

  • Make uploading easier:
    • FTP
    • Drag and Drop
    • Flash multi-file uploading
  • File format for mass upload description data and structured information
    • This may depend on having clean structured license metadata!
  • Allow creation of role accounts for institution representatives, contenst participants, etc.
  • Feedback opportunity for Wikimedia users in the Staging Area regarding the Previews
  • Report Sheet: Number of images, number of images used, number of page views
  • Generic Reports on single images
  • Reports for Groups, set of images
  • Capacity planning for server and storage deployment as mass uploads increase
  • Allowing and Creating Role Accounts for Mass uploads
  • Reporting and exporting of changes back to the GLAM
  • Clickthrough statistics for links clicked to the GLAM institutions
  • Export images to Flickr
  • Implement institution specific templates

Solutions analysis

What are the advantages and drawbacks of these solutions? (existing or not yet)

  • Efficiency in solving the issue
  • Drawbacks
  • Feasibility
  • ...


What issues should we focus on?

Short term

  • Improve flickrripper
  • More active development of Commonist
  • Make a web based contests tool

Medium term

  • Create web based tool to upload lots of files
  • Create web based tool to upload lots of flickr files

Long term

  • Integrate the web based tools into Commons