public class ChineseNumberSequenceClassifier extends AbstractSequenceClassifier<CoreLabel>
NumberSequenceClassifier
(without using SUTime) and works on Chinese sequence.
TODO: An interface needs to be used to reuse code for NumberSequenceClassifier
TODO: Ideally a Chinese version of SUTime needs to be used to provide more flexibility and accuracy.Modifier and Type | Field and Description |
---|---|
static Pattern |
CURRENCY_WORD_PATTERN |
static String[] |
CURRENCY_WORDS_VALUES |
static Pattern |
DATE_PATTERN1 |
static Pattern |
DATE_PATTERN2 |
static Pattern |
DATE_PATTERN3 |
static Pattern |
DATE_PATTERN4 |
static Pattern |
DATE_PATTERN5 |
static String |
DATE_TAG |
static HashSet<String> |
DATE_WORDS |
static String[] |
DATE_WORDS_VALUES |
static String |
MONEY_TAG |
static String |
NUMBER_TAG |
static String |
ORDINAL_TAG |
static String |
PERCENT_TAG |
static Pattern |
PERCENT_WORD_PATTERN1 |
static Pattern |
PERCENT_WORD_PATTERN2 |
static String |
SUTIME_PROPERTY |
static Pattern |
TIME_PATTERN1 |
static String |
TIME_TAG |
static HashSet<String> |
TIME_WORDS |
static String[] |
TIME_WORDS_VALUES |
static boolean |
USE_SUTIME_DEFAULT |
static String |
USE_SUTIME_PROPERTY |
static String |
USE_SUTIME_PROPERTY_BASE |
classIndex, featureFactories, flags, knownLCWords, pad, windowSize
Constructor and Description |
---|
ChineseNumberSequenceClassifier() |
ChineseNumberSequenceClassifier(boolean useSUTime) |
ChineseNumberSequenceClassifier(Properties props,
boolean useSUTime,
Properties sutimeProps) |
Modifier and Type | Method and Description |
---|---|
List<CoreLabel> |
classify(List<CoreLabel> document)
Use a set of heuristic rules to assign NER tags to tokens.
|
List<CoreLabel> |
classifyWithGlobalInformation(List<CoreLabel> tokenSequence,
CoreMap document,
CoreMap sentence)
|
void |
loadClassifier(ObjectInputStream in,
Properties props)
Load a classifier from the specified input stream.
|
static void |
main(String[] args) |
void |
serializeClassifier(ObjectOutputStream oos)
Serialize a sequence classifier to an object output stream
|
void |
serializeClassifier(String serializePath)
Serialize a sequence classifier to a file on the given path.
|
void |
train(Collection<List<CoreLabel>> docs,
DocumentReaderAndWriter<CoreLabel> readerAndWriter)
Trains a classifier from a Collection of sequences.
|
apply, backgroundSymbol, classify, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswersKBest, classifyAndWriteAnswersKBest, classifyAndWriteViterbiSearchGraph, classifyFile, classifyFilesAndWriteAnswers, classifyFilesAndWriteAnswers, classifyKBest, classifyRaw, classifySentence, classifySentenceWithGlobalInformation, classifyStdin, classifyStdin, classifyToCharacterOffsets, classifyToString, classifyToString, classifyWithInlineXML, countResults, countResultsSegmenter, defaultReaderAndWriter, dumpFeatures, finalizeClassification, getKnownLCWords, getSampler, getSequenceModel, labels, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, makeObjectBankFromFile, makeObjectBankFromFile, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromReader, makeObjectBankFromString, makePlainTextReaderAndWriter, makeReaderAndWriter, plainTextReaderAndWriter, printFeatureLists, printFeatures, printProbs, printProbs, printProbsDocument, printProbsDocuments, printResults, reinit, segmentString, segmentString, train, train, train, train, train, train, windowSize, writeAnswers
public static final boolean USE_SUTIME_DEFAULT
public static final String USE_SUTIME_PROPERTY
public static final String USE_SUTIME_PROPERTY_BASE
public static final String SUTIME_PROPERTY
public static final String NUMBER_TAG
public static final String DATE_TAG
public static final String TIME_TAG
public static final String MONEY_TAG
public static final String ORDINAL_TAG
public static final String PERCENT_TAG
public static final Pattern CURRENCY_WORD_PATTERN
public static final Pattern PERCENT_WORD_PATTERN1
public static final Pattern PERCENT_WORD_PATTERN2
public static final Pattern DATE_PATTERN1
public static final Pattern DATE_PATTERN2
public static final Pattern DATE_PATTERN3
public static final Pattern DATE_PATTERN4
public static final Pattern DATE_PATTERN5
public static final Pattern TIME_PATTERN1
public static final String[] CURRENCY_WORDS_VALUES
public static final String[] DATE_WORDS_VALUES
public static final String[] TIME_WORDS_VALUES
public ChineseNumberSequenceClassifier()
public ChineseNumberSequenceClassifier(boolean useSUTime)
public ChineseNumberSequenceClassifier(Properties props, boolean useSUTime, Properties sutimeProps)
public List<CoreLabel> classify(List<CoreLabel> document)
classify
in class AbstractSequenceClassifier<CoreLabel>
document
- A List
of something that extends CoreMap
.public List<CoreLabel> classifyWithGlobalInformation(List<CoreLabel> tokenSequence, CoreMap document, CoreMap sentence)
AbstractSequenceClassifier
List
of something that extends CoreMap
using as
additional information whatever is stored in the document and sentence.
This is needed for SUTime (NumberSequenceClassifier), which requires
the document date to resolve relative dates.classifyWithGlobalInformation
in class AbstractSequenceClassifier<CoreLabel>
tokenSequence
- A List
of something that extends CoreMap
public void train(Collection<List<CoreLabel>> docs, DocumentReaderAndWriter<CoreLabel> readerAndWriter)
AbstractSequenceClassifier
train
in class AbstractSequenceClassifier<CoreLabel>
docs
- An ObjectBank or a collection of sequences of INreaderAndWriter
- A DocumentReaderAndWriter to use when loading test filespublic void serializeClassifier(String serializePath)
AbstractSequenceClassifier
serializeClassifier
in class AbstractSequenceClassifier<CoreLabel>
serializePath
- The path/filename to write the classifier to.public void serializeClassifier(ObjectOutputStream oos)
AbstractSequenceClassifier
serializeClassifier
in class AbstractSequenceClassifier<CoreLabel>
public void loadClassifier(ObjectInputStream in, Properties props) throws IOException, ClassCastException, ClassNotFoundException
AbstractSequenceClassifier
loadClassifier
in class AbstractSequenceClassifier<CoreLabel>
in
- The InputStream to load the serialized classifier fromprops
- This Properties object will be used to update the
SeqClassifierFlags which are read from the serialized classifierIOException
- If there are problems accessing the input streamClassCastException
- If there are problems interpreting the serialized dataClassNotFoundException
- If there are problems interpreting the serialized datapublic static void main(String[] args) throws IOException
IOException