T
- The class of the returned tokenspublic static class PTBTokenizer.PTBTokenizerFactory<T extends HasWord> extends Object implements TokenizerFactory<T>
PTBTokenizer
for details of the parameters and options.PTBTokenizer
,
Serialized FormModifier and Type | Field and Description |
---|---|
protected LexedTokenFactory<T> |
factory |
protected String |
options |
Modifier and Type | Method and Description |
---|---|
Iterator<T> |
getIterator(Reader r)
Returns a tokenizer wrapping the given Reader.
|
Tokenizer<T> |
getTokenizer(Reader r)
Returns a tokenizer wrapping the given Reader.
|
Tokenizer<T> |
getTokenizer(Reader r,
String extraOptions)
Get a tokenizer for this reader.
|
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> |
newCoreLabelTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns CoreLabel objects and
uses the options passed in.
|
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> |
newPTBTokenizerFactory(boolean tokenizeNLs,
boolean invertible) |
static <T extends HasWord> |
newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory,
String options)
Constructs a new PTBTokenizer that uses the LexedTokenFactory and
options passed in.
|
static TokenizerFactory<Word> |
newTokenizerFactory()
Constructs a new TokenizerFactory that returns Word objects and
treats carriage returns as normal whitespace.
|
static PTBTokenizer.PTBTokenizerFactory<Word> |
newWordTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns Word objects and
uses the options passed in.
|
void |
setOptions(String options)
Sets default options for how tokenizers built from this factory should behave.
|
protected final LexedTokenFactory<T extends HasWord> factory
protected String options
public static TokenizerFactory<Word> newTokenizerFactory()
public static PTBTokenizer.PTBTokenizerFactory<Word> newWordTokenizerFactory(String options)
options
- A String of optionspublic static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newCoreLabelTokenizerFactory(String options)
options
- A String of options. For the default, recommended
options for PTB-style tokenization compatibility, pass
in an empty String.public static <T extends HasWord> PTBTokenizer.PTBTokenizerFactory<T> newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory, String options)
tokenFactory
- The LexedTokenFactoryoptions
- A String of optionspublic static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeNLs, boolean invertible)
public Iterator<T> getIterator(Reader r)
getIterator
in interface IteratorFromReaderFactory<T extends HasWord>
r
- Where to read objects frompublic Tokenizer<T> getTokenizer(Reader r)
getTokenizer
in interface TokenizerFactory<T extends HasWord>
r
- A Reader (which is assumed to already by buffered, if appropriate)public Tokenizer<T> getTokenizer(Reader r, String extraOptions)
TokenizerFactory
getTokenizer
in interface TokenizerFactory<T extends HasWord>
r
- A Reader (which is assumed to already by buffered, if appropriate)extraOptions
- Options for how this tokenizer should behavepublic void setOptions(String options)
TokenizerFactory
setOptions
in interface TokenizerFactory<T extends HasWord>
options
- Options for how this tokenizer should behave