本文整理汇总了Java中cmu.arktweetnlp.Twokenize类的典型用法代码示例。如果您正苦于以下问题:Java Twokenize类的具体用法?Java Twokenize怎么用?Java Twokenize使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
Twokenize类属于cmu.arktweetnlp包,在下文中一共展示了Twokenize类的4个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: process
点赞 3
import cmu.arktweetnlp.Twokenize; //导入依赖的package包/类
@Override
public void process(CAS cas)
throws AnalysisEngineProcessException
{
String text = cas.getDocumentText();
// NOTE: Twokenize provides a API call that performs a normalization first - this would
// require a mapping to the text how it is present in the CAS object. Due to HTML escaping
// that would become really messy, we use the call which does not perform any normalization
List<String> tokenize = Twokenize.tokenize(text);
int offset = 0;
for (String t : tokenize) {
int start = text.indexOf(t, offset);
int end = start + t.length();
createTokenAnnotation(cas, start, end);
offset = end;
}
}
开发者ID:UKPLab,
项目名称:argument-reasoning-comprehension-task,
代码行数:20,
代码来源:ArkTweetTokenizerFixed.java
示例2: TweetObject
点赞 2
import cmu.arktweetnlp.Twokenize; //导入依赖的package包/类
public TweetObject(String text) {
// TODO Auto-generated constructor stub
text = text.replaceAll("[^ -~]", "");
this.tokens = Twokenize.tokenizeRawTweetText(text);
//this.tokens = Arrays.asList(text.split("\\s"));
}
开发者ID:uiuc-ischool-scanr,
项目名称:SAIL,
代码行数:7,
代码来源:DictionaryFeatures.java
示例3: tokenize
点赞 1
import cmu.arktweetnlp.Twokenize; //导入依赖的package包/类
/**
* Sets the string to tokenize. Tokenization happens immediately.
*
* @param s the string to tokenize
*/
@Override
public void tokenize(String s) {
List<String> words=Twokenize.tokenizeRawTweetText(s);
this.m_tokenIterator=words.iterator();
}
开发者ID:felipebravom,
项目名称:AffectiveTweets,
代码行数:14,
代码来源:TweetNLPTokenizer.java
示例4: TweetNLPTokenizer
点赞 1
import cmu.arktweetnlp.Twokenize; //导入依赖的package包/类
/**
* initializes the Object
*
* @param content the String to tokenize
*/
public TweetNLPTokenizer(String content) {
this.tokens = Twokenize.tokenizeRawTweetText(content);
this.iterator = tokens.iterator();
}
开发者ID:Waikato,
项目名称:wekaDeeplearning4j,
代码行数:10,
代码来源:TweetNLPTokenizer.java