Java Software: 1-Line Sort & Uniq Utility
By Angsuman Chakraborty, Gaea News NetworkTuesday, October 18, 2005
I had to sort and uniq (create a unique set of strings) a large list with lots of duplicate. My options were to write it in Java or download cygwin and run: cat file | sort | uniq > result
Cygwin download never works for me. After I spend lots of time selecting a juicy list of utilities, it always fails somewhere in the download process. I liked it much better when it was a single download.
I opted for Java route naturally. After all it was just a single line of code really.
for(String item:getFileAsSet(args[0])) System.out.println(item);
Obviously you are wondering where the heck do I get getFileAsSet. It is just one of the many reusable java utilities I created to make my job easier. The crucial part of the code for this utility is:
TreeSet<String> set = new TreeSet<String>(); while((temp = reader.readLine()) != null) { temp = temp.trim(); // Hate spaces; Your mileage may vary if(temp.length() > 0) { set.add(temp); } }
The beauty of this code is that TreeSet is a SortedSet implementation. Adding to the Set automatically ensures elimination of duplicates as well as sorting. All I had to do was just print the result.
ouahdi