java - How can I evaluate a hash table implementation? (Using HashMap as reference) -


problem:

  • i need compare 2 hash table implementations (well hashmap one) , make reasonable conclusion.

  • i not interested in 100% accuracy being in right direction in estimation.

  • i interested in difference not per operation on hashtable "whole".

  • i don't have strict requirement on speed if other implementation reasonably slower can accept do expect/require memory usage better (since 1 of hashtables backed primitive table).

what did far:

originally created own custom "benchmark" loops , many calls hint gc feeling of difference reading online using standard tool more reliable/appropriate.
example of approach (mapinterface wrapper can switch among implementations.):

int[] keys = new int[10000000]; string[] values = new string[10000000];   for(int = 0; < keys.length; ++i) {      keys[i] = i;      values[i] = "" + i; }  if(operation.equals("put", keys, values)) {      runputoperation(map);   }    public static long[] runoperation(mapinterface map, integer[] keys, string[] values) {       long min = long.max_value;       long max = long.min_value;       long run = 0;       for(int = 0; < 10; ++i) {          long start = system.currenttimemillis();          for(int = 0; < keys.length; ++i) {                       map.put(keys[i], values[i]);           }         long total = system.currenttimemillis() - start;           system.out.println(total/1000d + " seconds");             if(total < min) {             min = time;         }         if(total > max) {             max = time;          }          run += time;            map = null;            map = createnewhashmap();          hintstogc();        }     return new long[] {min, max, run};  }        public void hintstogc() {       for(int = 0; < 20; ++i) {             system.out.print(". ");             system.gc();                         try {                 thread.sleep(100);             } catch (interruptedexception e) {                               e.printstacktrace();           }                   }  }   private hashmapinterface<string> createnewhashmap() {       if(jdk) {           return new jdkhashmapwrapper<string>();       }       else {         return new alternativehashmapwrapper<string>();        }    }      public class jdkhashmapwrapper implements hashmapinterface<string>  {     hashmap<integer, string> hashmap;              jdkhashmapwrapper() {           hashmap = new hashmap<integer, string>();       }       public string put(integer key, string value)  {        return hashmap.put(key, value);       }    //etc   } 

(i want test put, get, contains , memory utilization)
can sure using approach can reasonable measurements?
if not appropriate tool use , how?

update:
- test random numbers (also ~10m random numbers) using securerandom.
- when hash table resizes print logical size of hash table/size of actual table load factor

update:
specific case, interested in integers can of pitfalls there approach?

update after @dimo414 comments:

well @ minimum hashtable "whole" isn't meaningful

i mean how hashtable behaves under various loads both @ runtime , in memory consumption.

every data structure tradeoff of different methods

i agree. my trade-off acceptable access penalty memory improvement

you need identify features you're interested in verifying

1) put(key, value);
2) get(key, value);
3) containskey(key);
4) above when having many entries in hash table

some key consideration using hash tables size of "buckets" allocation, collision resolution strategy, , shape of data. essentially, hash table takes key supplied application , hashes value less or equal number of allocated buckets. when 2 key values hash same bucket, implementation has resolve collision , return right value. example, 1 have sorted linked list each bucket , list searched.

if data happens have lot of collisions, performance suffer, because hash table implementation spend time resolving collision. on other hand, if have large number of buckets, solve collision problem @ expense of memory. also, java's built-in hashmap implementation "rehash" if number of entries gets larger amount - imagine expensive operation worth avoiding.

since key data positive integers 1 10m, test data looks good. ensure different hash tables implementations initialized same bucket size given test, otherwise it's not fair comparison. finally, vary bucket size on pretty significant range , rerun tests see how implementations changed behavior.


Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

cytoscape.js - How to add nodes to Dagre layout with Cytoscape -