This is the page for the NAIST-NTT Ted Talk Treebank. It is manually annotated treebank of TED talks that was created through a joint research project of NAIST and the NTT CS Lab. More information can be found in the following paper:
Graham Neubig, Katsuhito Sudoh, Yusuke Oda, Kevin Duh, Hajime Tsukada, Masaaki Nagata.
The NAIST-NTT Ted Talk Treebank (BibTex)
International Workshop on Spoken Language Translation (IWSLT). Lake Tahoe, USA. December 2014.
The first version of the corpus includes 1,217 sentences and 23,158 words manually annotated with parse trees. Specifically, the data can includes:
You can download the data below. The data, like TED, is available under the Creative Commons Share-Alike Attribution Non-Commercial License.
Any questions about the corpus can be directed to Graham Neubig (neubig at is.naist.jp).