public class LexerUtils extends Object
Modifier and Type | Class and Description |
---|---|
static class |
LexerUtils.DashesEnum |
static class |
LexerUtils.EllipsesEnum |
static class |
LexerUtils.QuotesEnum |
Modifier and Type | Method and Description |
---|---|
static String |
asciiQuotes(String in)
Convert all single and double quote like characters to the ASCII quote characters: ' ".
|
static String |
escapeChar(String s,
char c)
This quotes a character with a backslash, but doesn't do it
if the character is already preceded by a backslash.
|
static String |
handleDashes(String tok,
LexerUtils.DashesEnum dashesStyle) |
static String |
handleEllipsis(String tok,
LexerUtils.EllipsesEnum ellipsesStyle) |
static String |
handleQuotes(String tok,
boolean probablyLeft,
LexerUtils.QuotesEnum quoteStyle) |
static String |
minimallyNormalizeCurrency(String in)
Still at least turn cp1252 euro symbol to Unicode one.
|
static String |
normalizeAmp(String in)
Convert an XML-escaped ampersand back into an ampersand.
|
static String |
normalizeCurrency(String in) |
static String |
normalizeFractions(boolean normalizeFractions,
boolean escapeForwardSlashAsterisk,
String in)
Change precomposed fraction characters to spelled out letter forms.
|
static String |
pennNormalizeParens(String input,
boolean normalizeParentheses) |
static String |
processCp1252misc(String arg) |
static String |
removeSoftHyphens(String in) |
public static String normalizeFractions(boolean normalizeFractions, boolean escapeForwardSlashAsterisk, String in)
normalizeFractions
- If false, do nothing; if true normalize to ASCII character sequenceescapeForwardSlashAsterisk
- If true also escape forward slash with backslash (deprecated historical PTB)in
- The input string to normalizepublic static String minimallyNormalizeCurrency(String in)
public static String normalizeAmp(String in)
public static String escapeChar(String s, char c)
public static String asciiQuotes(String in)
public static String handleQuotes(String tok, boolean probablyLeft, LexerUtils.QuotesEnum quoteStyle)
public static String handleEllipsis(String tok, LexerUtils.EllipsesEnum ellipsesStyle)
public static String handleDashes(String tok, LexerUtils.DashesEnum dashesStyle)