String intern pool
From Wikipedia, the free encyclopedia
In some modern object-oriented programming languages, including Java and .NET languages, the string intern pool is a data structure managed internally by the platform or virtual machine to facilitate efficient implementation of certain string processing tasks. The pool contains a single copy (called the intern) of each distinct string that is currently represented by a string object in the system. By invoking a method of the string class (for example String.intern()
in Java), the programmer has access to this unique string object.
The most common use for the string intern pool is for doing large numbers of string comparisons. For example, if we have n strings and we wish to find any duplicates, O(n2) string comparisons are needed. Unfortunately, string comparison can be slow (especially if the strings are long and don't vary much), and this many string comparisons may be unacceptable. The solution is to use the string intern pool to get the intern of each of these strings, and then we only need to do O(n2) reference comparisons, which are much faster. Even if retrieving the interns is fairly slow (and with proper implementation, e.g. using a hash table, it need not be), only O(n) of these operations must be performed, meaning that for large enough n, using the intern pool will be faster.
Another use of the string intern pool is to reduce memory usage if a large number of active objects store a given string created dynamically during the execution of a program and is often repeated (for instance, it is read from the network, or from storage). Such strings may include magic numbers or network protocol information.
[edit] The intern pool in other programming languages
Even if the standard library of a programming language does not support string interning, it is possible to implement an intern pool in user code.
An example of such an implementation follows, in the C++ programming language. Note that this implementation returns pointers to arrays of characters, not String objects.
class StringTable { char** array; int* count; int size; int numStrings; public: StringTable() { size = 20; array = new char*[size]; count = new int[size]; numStrings = 0; } int addString(const char* ch) { // this method adds a String to the pool and returns its index if (numStrings == 0) { // if there is no string in the pool, add first array[0] = new char[strlen(ch)+1]; strcpy(array[0], ch); numStrings++; return 0; } int i = 0; for (i; i < numStrings; i++) { // compares, if the String is already in the pool if (strcmp(ch, array[i]) == 0) { count[i]++; return i; } } if (i == size) { return -1; } array[i] = new char[strlen(ch)+1]; // if not, add String to the pool and return its index strcpy(array[i], ch); numStrings++; return numStrings-1; } char* getString(int index) const { return array[index]; } };
class String { int length; char* text; int index; public: static StringTable* table; String(const char* ch) { length = strlen(ch); text = new char[length+1]; strcpy(text,ch); index = table->addString(text); } char* intern() { // return pointer to the String in pool if (index == -1) return 0; return table->getString(index); } }; StringTable* String::table = new StringTable();