On Recent Activities of Oriental COCOSDA

Institute of Information Sciences and Electronics,
University of Tsukuba
1-1-1 Tennodai, Tsukuba, Ibaraki 305, Japan

Department of Applied Electronics, Science University of Tokyo
2641 Yamazaki, Noda, Chiba 278, Japan

Post script version is here.
Revised OHP is here.

At the COCOSDA meeting in Yokohama, Japan, on Sept. 23, 1994, S. Itahashi proposed that it is necessary for oriental countries to set up an organization through which people concerned can exchange ideas share information and discuss regional matters on spoken language processing. The basic ideas are as follows:

1)Regional problems should be settled by regional effort.
2)There has been growing interest in oriental languages from Western countries.
3)There are many peculiarities in oriental languages which are different from European languages:

a)They are of much variety; they belong to different language families.
b)They use different orthographic systems such as Chinese characters,
Korean syllabic alphabets and Japanese Kana alphabets. c)There are various systems of romanization.
d)Correspondence between orthography and phonemic or phonetic description is not necessarily clear.
4)Oriental countries have regional continuity.

There are already several organizations in each oriental country as shown in Table 1 but they do not have any mutual communication so far. S. Itahashi discussed the above points with researchers from Korea and China. They agreed to the proposal that oriental countries should have an organization which coordinate problems related to speech corpora, speech recognition, speech synthesis and speech I/O systems assessment methods.

The Oriental COCOSDA Preparatory Meeting was held at the University of Hong Kong on Friday, March 21, 1997. Eleven people from five countries and territories gathered in Hong Kong and dicsussed speech corpora development in Oriental Countries. The outline of the discussions is as follows:

H. Fujisaki made an overview of COCOSDA and pointed out general and regional problems on corpus studies. He also presented a new Japanese large-scale JSPS project by special grant for Research for the Future. He then clarified that the definition of "Oriental" could be in two ways, regional and linguistic (non-European). It was discussed that members of Oriental COCOSDA should be either those who live and work in oriental districts, speak and study oriental languages or those who are interested in oriental language corpora and speech I/O standardization.

N. Campbell strengthened the necessity to clarify relations between Oriental COCOSDA and the parent COCOSDA. It was understood that Oriental COCOSDA is a sub-organization of COCOSDA in the sense that the members of the former attend the meeting of the latter to report and discuss their activities. It was also said that COCOSDA should not be a funding organization but a pressure group to promote speech corpora creation and speech I/O standardization.

S. Itahashi presented several Japanese speech corpora including a noise database and a compact disc containing Japanese synthetic speech. T. Takezawa reported on speech and text corpora developed by ATR.

J. Zhang explained various Chinese projects on speech research/corpora as well as Chinese speech databases. He also gave a report of National Performance Assessment of Speech I/O System in China.

Y. Lee introduced various organizations concerning speech research in Korea and then he presented Korean speech and text corpora.

H. Wang made an overview of speech researches and databases in Taiwan.

K. Shikano presented activities of the Large Vocabulary Continuous Speech Database Working Group established in the Information Processing Society of Japan.

S. Hayamizu explained activities in the Real World Computing Project by MITI, Japan.

C. Chan introduced Chinese Lexicon and Speech/Character Databases at the University of Hong Kong. He gave a demonstration of a handwritten Chinese character recognition sysytem.

A. Kurematsu detailed various issues in transcribing Japanese language into Roman characters and problems in segmentation.

N. Campbell requested us to collect speech corpus of long duration in various languages to be used for sysnthesis-by-compilation. He also reported on the results of synthetic speech quality evaluation which showed the separation between "likeable" and "intelligible" voices.

We plan to have Orinetal COCOSDA Workshop in Tsukuba, Japan in May, 1998. Topics of Interest are as follows:

1. Speech and Text Corpora 2. Speech Input Sysytems Assessment 3. Speech Output Sysytems Assessment 4. Romanization of Non-Roman Characters 5. Orthographic System of Oriental Languages 6. Prosodic and Segmental Notations
Table 1 Present organizations and projects related to spoken language corpora and speech I/O systems assessment
1)Speech Database Committee, Acoustical Society of Japan
2)Speech Input/Output Systems Expert Committee, JEIDA
JEIDA: Japan Electronic Industry Development Association
3)Intellectual Resources Working Group of RWCP
RWCP: Real World Computing Partnership, MITI(1992-2001)
4)JSPS Research Project by Special Grant for Promotion of Science and Technology for Exploration of Future "Man-Machine Dialogue Systems Through Spoken Language"
JSPS: Japan Society for the Promotion of Science
5)MESSC International Scientific Research Program: Joint Research on "Spoken Language Databases and Prosoic Labeling"
MESSC: Ministry of Education, Science, Sports and Culture
6)ATR Interpreting Telecommunications Research Laboratories
7)LRSI: Linguistic Resources Sharing Initiative
10)Chinese COCOSDA
8)KCCSLP: Korean Coordinating Committee for Spoken Language Processing
9)KLE: Center for Korean Language Engineering
11)Hong Kong COCOSDA
12)MAT: Mandarin Speech Across Taiwan Project

Web sites:
Last modified: Thu Oct 9 13:16:39 JST 1997