- All Implemented Interfaces:
- Function<List<HasWord>,List<HasWord>>
public class ChineseEscaper
extends Object
implements Function<List<HasWord>,List<HasWord>>
An Escaper for Chinese normalization to match Treebank.
Currently normalizes "ASCII" characters into the full-width
range used inside the Penn Chinese Treebank.
Notes: Smart quotes appear in CTB, and are left unchanged.
I think you get various hyphen types from U+2000 range too - certainly,
Roger lists them in LanguagePack.
- Author:
- Christopher Manning