Jump to ContentJump to Main Navigation
Computing and Language VariationInternational Journal of Humanities and Arts Computing Volume 2$
Users without a subscription are not able to see the full content.

John Nerbonne and Charlotte Gooskens

Print publication date: 2009

Print ISBN-13: 9780748640300

Published to Edinburgh Scholarship Online: September 2012

DOI: 10.3366/edinburgh/9780748640300.001.0001

Show Summary Details
Page of

PRINTED FROM EDINBURGH SCHOLARSHIP ONLINE (www.edinburgh.universitypressscholarship.com). (c) Copyright Edinburgh University Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in ESO for personal use.date: 27 February 2021

Representing Tone in Levenshtein Distance

Representing Tone in Levenshtein Distance

(p.205) Representing Tone in Levenshtein Distance
Computing and Language Variation

Cathryn Yang

Andy Castro

Edinburgh University Press

Levenshtein distance, also known as string edit distance, correlates strongly with both perceived distance and intelligibility in various Indo-European languages. This chapter describes the application of Levenshtein distance to dialect data from Bai, a Sino-Tibetan language, and Hongshuihe Zhuang, a Tai language. In applying Levenshtein distance to languages with contour tone systems, the chapter asks the following questions: How much variation in intelligibility can tone alone explain? Which representation of tone results in the Levenshtein distance that shows the strongest correlation with intelligibility test results? The chapter evaluates six representations of tone: onset, contour and offset; onset and contour only; contour and offset only; target approximation, autosegments of H (high) and L (low), and Chao's (1930) pitch numbers. For both languages, the more fully explicit onset-contour-offset and onset-contour representations show significantly stronger inverse correlations with intelligibility. This suggests that, for cross-dialectal listeners, the optimal representation of tone in Levenshtein distance should be at a phonetically explicit level and include information on both onset and contour.

Keywords:   Levenshtein distance, tone, intelligibility, Bai, Hongshuihe Zhuang, contour, autosegments, pitch numbers, onset, offset

Edinburgh Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs, and if you can't find the answer there, please contact us.