• 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏吧

Java SequenceFileIterable类的典型用法和代码示例

java 1次浏览

本文整理汇总了Java中org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable的典型用法代码示例。如果您正苦于以下问题:Java SequenceFileIterable类的具体用法?Java SequenceFileIterable怎么用?Java SequenceFileIterable使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

SequenceFileIterable类属于org.apache.mahout.common.iterator.sequencefile包,在下文中一共展示了SequenceFileIterable类的11个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: loadDictionary

点赞 3

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
private static String[] loadDictionary(String dictionaryPath, Configuration conf) {
  if (dictionaryPath == null) {
    return null;
  }
  Path dictionaryFile = new Path(dictionaryPath);
  List<Pair<Integer, String>> termList = Lists.newArrayList();
  int maxTermId = 0;
   // key is word value is id
  for (Pair<Writable, IntWritable> record
          : new SequenceFileIterable<Writable, IntWritable>(dictionaryFile, true, conf)) {
    termList.add(new Pair<Integer, String>(record.getSecond().get(),
        record.getFirst().toString()));
    maxTermId = Math.max(maxTermId, record.getSecond().get());
  }
  String[] terms = new String[maxTermId + 1];
  for (Pair<Integer, String> pair : termList) {
    terms[pair.getFirst()] = pair.getSecond();
  }
  return terms;
}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigDataCS,
代码行数:21,
代码来源:InMemoryCollapsedVariationalBayes0.java

示例2: setup

点赞 3

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
@Override
protected void setup(Context context) throws IOException, InterruptedException {
  super.setup(context);
  Configuration conf = context.getConfiguration();
  URI[] localFiles = DistributedCache.getCacheFiles(conf);
  Preconditions.checkArgument(localFiles != null && localFiles.length >= 1,
          "missing paths from the DistributedCache");

  dimension = conf.getInt(PartialVectorMerger.DIMENSION, Integer.MAX_VALUE);
  sequentialAccess = conf.getBoolean(PartialVectorMerger.SEQUENTIAL_ACCESS, false);
  namedVector = conf.getBoolean(PartialVectorMerger.NAMED_VECTOR, false);
  maxNGramSize = conf.getInt(DictionaryVectorizer.MAX_NGRAMS, maxNGramSize);

  Path dictionaryFile = new Path(localFiles[0].getPath());
  // key is word value is id
  for (Pair<Writable, IntWritable> record
          : new SequenceFileIterable<Writable, IntWritable>(dictionaryFile, true, conf)) {
    dictionary.put(record.getFirst().toString(), record.getSecond().get());
  }
}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigDataCS,
代码行数:21,
代码来源:TFPartialVectorReducer.java

示例3: setup

点赞 3

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
@Override
protected void setup(Context context) throws IOException, InterruptedException {
  super.setup(context);
  Configuration conf = context.getConfiguration();
  URI[] localFiles = DistributedCache.getCacheFiles(conf);
  Preconditions.checkArgument(localFiles != null && localFiles.length >= 1, 
      "missing paths from the DistributedCache");

  vectorCount = conf.getLong(TFIDFConverter.VECTOR_COUNT, 1);
  featureCount = conf.getLong(TFIDFConverter.FEATURE_COUNT, 1);
  minDf = conf.getInt(TFIDFConverter.MIN_DF, 1);
  maxDf = conf.getLong(TFIDFConverter.MAX_DF, -1);
  sequentialAccess = conf.getBoolean(PartialVectorMerger.SEQUENTIAL_ACCESS, false);
  namedVector = conf.getBoolean(PartialVectorMerger.NAMED_VECTOR, false);

  Path dictionaryFile = new Path(localFiles[0].getPath());
  // key is feature, value is the document frequency
  for (Pair<IntWritable,LongWritable> record 
       : new SequenceFileIterable<IntWritable,LongWritable>(dictionaryFile, true, conf)) {
    dictionary.put(record.getFirst().get(), record.getSecond().get());
  }
}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigDataCS,
代码行数:23,
代码来源:TFIDFPartialVectorReducer.java

示例4: setup

点赞 3

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
@Override
protected void setup(Context context) throws IOException, InterruptedException {
  super.setup(context);
  Configuration conf = context.getConfiguration();
  URI[] localFiles = DistributedCache.getCacheFiles(conf);
  Preconditions.checkArgument(localFiles != null && localFiles.length >= 1,
          "missing paths from the DistributedCache");

  maxDf = conf.getLong(HighDFWordsPruner.MAX_DF, -1);

  Path dictionaryFile = new Path(localFiles[0].getPath());
  // key is feature, value is the document frequency
  for (Pair<IntWritable, LongWritable> record :
          new SequenceFileIterable<IntWritable, LongWritable>(dictionaryFile, true, conf)) {
    dictionary.put(record.getFirst().get(), record.getSecond().get());
  }
}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigDataCS,
代码行数:18,
代码来源:WordsPrunerReducer.java

示例5: processOutput

点赞 3

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
protected RuleBase processOutput(JobContext job, Path outputPath) throws IOException {
 
  Configuration conf = job.getConfiguration();

  FileSystem fs = outputPath.getFileSystem(conf);

  Path[] outfiles = Chi_RWCSUtils.listOutputFiles(fs, outputPath);
  
  RuleBase ruleBase = null;
  
  // read all the outputs
  for (Path path : outfiles) {
    for (Pair<LongWritable,RuleBase> record : new SequenceFileIterable<LongWritable, RuleBase>(path, conf)) {
  	if(ruleBase == null){
        ruleBase = record.getSecond();
  	}
    }
  }
  
  return ruleBase;
}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigDataCS,
代码行数:22,
代码来源:PartialBuilder.java

示例6: processOutput

点赞 3

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
protected RuleBase processOutput(JobContext job, Path outputPath) throws IOException {
  	  
  Configuration conf = job.getConfiguration();

  FileSystem fs = outputPath.getFileSystem(conf);

  Path[] outfiles = Chi_RWUtils.listOutputFiles(fs, outputPath);
  
  RuleBase ruleBase = null;
  
  // read all the outputs
  for (Path path : outfiles) {
    for (Pair<LongWritable,RuleBase> record : new SequenceFileIterable<LongWritable, RuleBase>(path, conf)) {
  	if(ruleBase == null){
        ruleBase = record.getSecond();
  	}
    }
  }
  
  return ruleBase;
}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigData-Ave,
代码行数:22,
代码来源:PartialBuilder.java

示例7: loadModel

点赞 2

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
public static Pair<Matrix, Vector> loadModel(Configuration conf, Path... modelPaths)
    throws IOException {
  int numTopics = -1;
  int numTerms = -1;
  List<Pair<Integer, Vector>> rows = Lists.newArrayList();
  for (Path modelPath : modelPaths) {
    for (Pair<IntWritable, VectorWritable> row :
        new SequenceFileIterable<IntWritable, VectorWritable>(modelPath, true, conf)) {
      rows.add(Pair.of(row.getFirst().get(), row.getSecond().get()));
      numTopics = Math.max(numTopics, row.getFirst().get());
      if (numTerms < 0) {
        numTerms = row.getSecond().get().size();
      }
    }
  }
  if (rows.isEmpty()) {
    throw new IOException(Arrays.toString(modelPaths) + " have no vectors in it");
  }
  numTopics++;
  Matrix model = new DenseMatrix(numTopics, numTerms);
  Vector topicSums = new DenseVector(numTopics);
  for (Pair<Integer, Vector> pair : rows) {
    model.viewRow(pair.getFirst()).assign(pair.getSecond());
    topicSums.set(pair.getFirst(), pair.getSecond().norm(1));
  }
  return Pair.of(model, topicSums);
}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigDataCS,
代码行数:28,
代码来源:TopicModel.java

示例8: loadVectors

点赞 2

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
private static Matrix loadVectors(String vectorPathString, Configuration conf)
  throws IOException {
  Path vectorPath = new Path(vectorPathString);
  FileSystem fs = vectorPath.getFileSystem(conf);
  List<Path> subPaths = Lists.newArrayList();
  if (fs.isFile(vectorPath)) {
    subPaths.add(vectorPath);
  } else {
    for (FileStatus fileStatus : fs.listStatus(vectorPath, PathFilters.logsCRCFilter())) {
      subPaths.add(fileStatus.getPath());
    }
  }
  List<Pair<Integer, Vector>> rowList = Lists.newArrayList();
  int numRows = Integer.MIN_VALUE;
  int numCols = -1;
  boolean sequentialAccess = false;
  for (Path subPath : subPaths) {
    for (Pair<IntWritable, VectorWritable> record
        : new SequenceFileIterable<IntWritable, VectorWritable>(subPath, true, conf)) {
      int id = record.getFirst().get();
      Vector vector = record.getSecond().get();
      if (vector instanceof NamedVector) {
        vector = ((NamedVector)vector).getDelegate();
      }
      if (numCols < 0) {
        numCols = vector.size();
        sequentialAccess = vector.isSequentialAccess();
      }
      rowList.add(Pair.of(id, vector));
      numRows = Math.max(numRows, id);
    }
  }
  numRows++;
  Vector[] rowVectors = new Vector[numRows];
  for (Pair<Integer, Vector> pair : rowList) {
    rowVectors[pair.getFirst()] = pair.getSecond();
  }
  return new SparseRowMatrix(numRows, numCols, rowVectors, true, !sequentialAccess);

}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigDataCS,
代码行数:41,
代码来源:InMemoryCollapsedVariationalBayes0.java

示例9: parseOutput

点赞 2

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
/**
 * Extract the prediction for each mapper and write them in the corresponding output file. 
 * The name of the output file is based on the name of the corresponding input file.
 * Will compute the ConfusionMatrix if necessary.
 */
private void parseOutput(JobContext job) throws IOException {
  Configuration conf = job.getConfiguration();
  FileSystem fs = mappersOutputPath.getFileSystem(conf);

  Path[] outfiles = Chi_RWCSUtils.listOutputFiles(fs, mappersOutputPath);

  // read all the output
  List<double[]> resList = new ArrayList<double[]>();
  for (Path path : outfiles) {
    FSDataOutputStream ofile = null;
    try {
      for (Pair<DoubleWritable,Text> record : new SequenceFileIterable<DoubleWritable,Text>(path, true, conf)) {
        double key = record.getFirst().get();
        String value = record.getSecond().toString();
        if (ofile == null) {
          // this is the first value, it contains the name of the input file
          ofile = fs.create(new Path(outputPath, value).suffix(".out"));
        } else {
          // The key contains the correct label of the data. The value contains a prediction
          ofile.writeChars(value); // write the prediction
          ofile.writeChar('\n');

          resList.add(new double[]{key, Double.valueOf(value)});
        }
      }
    } finally {
      Closeables.closeQuietly(ofile);
    }
  }
  results = new double[resList.size()][2];
  resList.toArray(results);
}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigDataCS,
代码行数:38,
代码来源:Chi_RWCSClassifier.java

示例10: parseOutput

点赞 2

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
/**
 * Extract the prediction for each mapper and write them in the corresponding output file. 
 * The name of the output file is based on the name of the corresponding input file.
 * Will compute the ConfusionMatrix if necessary.
 */
private void parseOutput(JobContext job) throws IOException {
  Configuration conf = job.getConfiguration();
  FileSystem fs = mappersOutputPath.getFileSystem(conf);

  Path[] outfiles = Chi_RWUtils.listOutputFiles(fs, mappersOutputPath);

  // read all the output
  List<double[]> resList = new ArrayList<double[]>();
  for (Path path : outfiles) {
    FSDataOutputStream ofile = null;
    try {
      for (Pair<DoubleWritable,Text> record : new SequenceFileIterable<DoubleWritable,Text>(path, true, conf)) {
        double key = record.getFirst().get();
        String value = record.getSecond().toString();
        if (ofile == null) {
          // this is the first value, it contains the name of the input file
          ofile = fs.create(new Path(outputPath, value).suffix(".out"));
        } else {
          // The key contains the correct label of the data. The value contains a prediction
          ofile.writeChars(value); // write the prediction
          ofile.writeChar('\n');

          resList.add(new double[]{key, Double.valueOf(value)});
        }
      }
    } finally {
      Closeables.closeQuietly(ofile);
    }
  }
  results = new double[resList.size()][2];
  resList.toArray(results);
}
 

开发者ID:saradelrio,
项目名称:Chi-FRBCS-BigData-Ave,
代码行数:38,
代码来源:Chi_RWClassifier.java

示例11: readFList

点赞 2

import org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable; //导入依赖的package包/类
/**
 * Generates the header table from the serialized string representation
 * 
 * @return Deserialized header table
 */
public static List<Pair<String, Long>> readFList(Configuration conf)
        throws IOException {
    List<Pair<String, Long>> list = new ArrayList<Pair<String, Long>>();
    Path[] files = DistributedCache.getLocalCacheFiles(conf);
    if (files == null) {
        throw new IOException(
                "Cannot read Frequency list from Distributed Cache");
    }
    if (files.length != 1) {
        throw new IOException(
                "Cannot read Frequency list from Distributed Cache ("
                        + files.length + ')');
    }
    FileSystem fs = FileSystem.getLocal(conf);
    Path fListLocalPath = fs.makeQualified(files[0]);
    // Fallback if we are running locally.
    if (!fs.exists(fListLocalPath)) {
        URI[] filesURIs = DistributedCache.getCacheFiles(conf);
        if (filesURIs == null) {
            throw new IOException(
                    "Cannot read header table from Distributed Cache");
        }
        if (filesURIs.length != 1) {
            throw new IOException(
                    "Cannot read header table from Distributed Cache ("
                            + files.length + ')');
        }
        fListLocalPath = new Path(filesURIs[0].getPath());
    }
    for (Pair<Text, LongWritable> record : new SequenceFileIterable<Text, LongWritable>(
            fListLocalPath, true, conf)) {
        list.add(new Pair<String, Long>(record.getFirst().toString(),
                record.getSecond().get()));
    }
    return list;
}
 

开发者ID:navxt6,
项目名称:SEARUM,
代码行数:42,
代码来源:ARM.java


版权声明:本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系管理员进行删除。
喜欢 (0)