Scanner and Tokenizer for file indexing - Java Snipplr Social Repository

Revision: 64631

at August 30, 2013 18:09 by jarlah

Initial Code

package com.java.test;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
import java.util.StringTokenizer;
 
/***
 * input file format:
 * 
 *  foo|bar
 *  foo2|bar2
 *  .
 *  .
 */
public class Indexer {
	private Map<String, String> indexerMap = new HashMap<String, String>();
	
	public Indexer(String filename) throws IOException{
		indexerMap = parseFileIntoMap(filename);
	}
	
	private Map<String, String> parseFileIntoMap(String filename) throws IOException {
		Map<String, String> map = new HashMap<String, String>();
		
		Scanner scanner = null;
		try {
			scanner = new Scanner(new File(filename), "UTF-8");
			scanner.useDelimiter("[
]+");
			while (scanner.hasNext())
			{
				StringTokenizer tokenizer = new StringTokenizer(scanner.next(), "\\|");
				if (tokenizer.countTokens() == 2) {
					String key = tokenizer.nextElement().toString();
					String value = tokenizer.nextElement().toString();
					map.put(key, value);
				}
			}
		} catch (FileNotFoundException e) {
			throw new IllegalArgumentException("Could not load file indexing! The file "+filename+" does not exist.", e);
		}finally {
			if(scanner != null)
				scanner.close();
		}
		
		return map;
	}
	
	/***
	 * Returns the value from the map if key is not null
	 */
	public String getValue(String key) {
		if (key == null) {
			return null;
		} else {
			return indexerMap.get(key);
		}
	}	
}

Initial URL

Initial Description

Use Scanner instead of BufferedReader and StringTokenizer for parsing the line. I see a potential for using Scanner for both use cases. But it was a major improvement to get rid of the split arrays. In addition the BufferedReader was not closed. The scanner is in the final loop. Also, use of static is highly discouraged. Object oriented approach is better and no reason to expose map or list, or the parse method. Both is now private. String expectations about file encoding. Lets assume we only handle UTF-8 files. No fishy norwegian characters.

Initial Title

Scanner and Tokenizer for file indexing

Initial Tags

Initial Language

Java

Choose a language for easy browsing: