• 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏吧

Java Twokenize类的典型用法和代码示例

java 1次浏览

本文整理汇总了Java中cmu.arktweetnlp.Twokenize的典型用法代码示例。如果您正苦于以下问题:Java Twokenize类的具体用法?Java Twokenize怎么用?Java Twokenize使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

Twokenize类属于cmu.arktweetnlp包,在下文中一共展示了Twokenize类的4个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: process

点赞 3

import cmu.arktweetnlp.Twokenize; //导入依赖的package包/类
@Override
public void process(CAS cas)
        throws AnalysisEngineProcessException
{
    String text = cas.getDocumentText();

    // NOTE: Twokenize provides a API call that performs a normalization first - this would
    // require a mapping to the text how it is present in the CAS object. Due to HTML escaping
    // that would become really messy, we use the call which does not perform any normalization
    List<String> tokenize = Twokenize.tokenize(text);
    int offset = 0;
    for (String t : tokenize) {
        int start = text.indexOf(t, offset);
        int end = start + t.length();
        createTokenAnnotation(cas, start, end);
        offset = end;
    }

}
 

开发者ID:UKPLab,
项目名称:argument-reasoning-comprehension-task,
代码行数:20,
代码来源:ArkTweetTokenizerFixed.java

示例2: TweetObject

点赞 2

import cmu.arktweetnlp.Twokenize; //导入依赖的package包/类
public TweetObject(String text) {
	// TODO Auto-generated constructor stub
	text = text.replaceAll("[^ -~]", "");
	this.tokens = Twokenize.tokenizeRawTweetText(text);
	//this.tokens = Arrays.asList(text.split("\\s"));
}
 

开发者ID:uiuc-ischool-scanr,
项目名称:SAIL,
代码行数:7,
代码来源:DictionaryFeatures.java

示例3: tokenize

点赞 1

import cmu.arktweetnlp.Twokenize; //导入依赖的package包/类
/**
 * Sets the string to tokenize. Tokenization happens immediately.
 * 
 * @param s the string to tokenize
 */
@Override
public void tokenize(String s) {

	List<String> words=Twokenize.tokenizeRawTweetText(s);
	this.m_tokenIterator=words.iterator();	


}
 

开发者ID:felipebravom,
项目名称:AffectiveTweets,
代码行数:14,
代码来源:TweetNLPTokenizer.java

示例4: TweetNLPTokenizer

点赞 1

import cmu.arktweetnlp.Twokenize; //导入依赖的package包/类
/**
 * initializes the Object
 *
 * @param content the String to tokenize
 */
public TweetNLPTokenizer(String content) {
  this.tokens = Twokenize.tokenizeRawTweetText(content);
  this.iterator = tokens.iterator();
}
 

开发者ID:Waikato,
项目名称:wekaDeeplearning4j,
代码行数:10,
代码来源:TweetNLPTokenizer.java


版权声明:本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系管理员进行删除。
喜欢 (0)