Submissions/Editing challenges on multi-script wikis
This is an accepted submission for Wikimania 2017. |
- Submission no. 3045 - T1, T3, TC
- Title of the submission
- Editing challenges on multi-script wikis
- Type of submission (lecture, panel, tutorial/workshop, roundtable discussion, lightning talk, poster, birds of a feather discussion)
- Lecture
- Author of the submission
- C. Scott Ananian (cscott)
- Language of presentation
- English
- E-mail address
- cananianwikimedia.org
- Username
- cscott
- Country of origin
- USA
- Affiliation, if any (organisation, company etc.)
- Wikimedia Foundation
- Personal homepage or blog
- https://cscott.net
- Abstract (up to 300 words to describe your proposal)
- Editors who write in English, French, or another Western European language probably don't stop to think about the fact they share the same writing system, Latin script. But that's not the only way to write. Many may have heard of Cyrillic script, used across eastern Europe and north and central Asia, including Russia. But there are hundreds of scripts in use around the world, and we have Wikipedia projects in over 50 different scripts.
- This talk is concerned with the subset of these projects where multiple scripts are used on the same wiki. In some places the same language is written in different ways by different speakers. For example, Serbian is written in either Latin or Cyrillic script. Standard Chinese is written in traditional or simplified scripts. Kurdish uses Latin, Arabic, or Cyrillic scripts.
- Mediawiki uses a technology called LanguageConverter to automatically transliterate between scripts, so that you can read one of these wikis in your choice of writing system. This avoids unnecessary forks of the wikis and makes our content available to more readers.
- LanguageConverter also allows you (to a slightly lesser degree) to create and edit articles in your choice of writing system. However, this quickly leads to wikitexts which are a mixture of different writing systems. Unless all editors can read (and proofread) all the writing systems, article editability suffers as the number of interleaved contributions increases.
- This talk will describe how the Parsing team is updating LanguageConverter to better integrate it into core, how we are translating LanguageConverter markup into Parsoid's HTML5-based representation, and how we hope to use Parsoid technology to make a substantial improvement in native script editing, finally untangling the jumble of scripts.
- What will attendees take away from this session?
- Attendees whose native language uses a single (Latin) script will learn a bit about our projects which use multiple writing systems.
- Everyone will come away with an understanding of the technology which lets mediawiki convert between writing systems, some of its limitations, and some exciting improvements planned for the future!
- Theme of presentation
- Technology, Interface & Infrastructure
- For workshops and discussions, what level is the intended audience? Intermediate
- Length of session (if other than 25 minutes, specify how long)
- 25 minutes
- Will you attend Wikimania if your submission is not accepted?
- Yes
- Slides or further information (optional)
- Presentation slides, Slides w/ speaker notes, phab:T17161, phab:T113002, phab:T87652
- Special requests
- Is this Submission a Draft or Final?
Interested attendees
If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).
- Amir É. Aharoni (talk) 09:56, 9 April 2017 (UTC)
- Birgit Müller (WMDE) (talk) 21:39, 25 April 2017 (UTC)
- Christoph Jauera (WMDE) (talk) 07:02, 26 April 2017 (UTC)
- SSastry (WMF) (talk) 18:43, 16 May 2017 (UTC)
- --Elitre (WMF) (talk) 13:34, 23 June 2017 (UTC)
- --Ziko (talk) 13:56, 12 August 2017 (UTC)
- -Krish Dulal (talk) 15:01, 12 August 2017 (UTC)