What would you do if your professor ask you to come up with a solution for a symbol table that is targeted only for string-based keys which are Faster than Hashing and more Flexible than BSTs?
Trie Data Structure is the solution that you will be looking for.
Trie is a fast search and miss data structure when it comes to storing key-value pairs where key is a string. This is very efficient for maintaining a small symbol table.
In this article, I will implement a R-Way Trie with common apis like GET, PUT and DELETE.
Table of Contents
Basic Concept
Think of Trie as a tree with finite branches. And after traversing the tree based on the “key” you either reach till the value or a null node. Either way you have found the answer to the query.
Let’s take an example to understand how can we construct this data structure.
Statement – I LIKE APPLES.
Let’s say for the above statement I want to count the length of each word and maintain a symbol table like the following,
I | 1 |
LIKE | 4 |
APPLES. | 7 |
Trie Representation
If I have to represent the above words into a Trie, then that Trie would look like this,
The leaf node contains the value associated with the key.
The links of the tree creates the ‘key’ and the leaf node holds the ‘value’ associated with that key.
This is how a Trie Node would look like,
public class TrieNode {
private static final int R = 256; // extended ascii
Object value;
TrieNode[] next = new TrieNode[R];
}
Later in this article, I will show you how to write the logic to put and get the information into the trie. For now let’s examine the given problem statement.
Extended ASCII Code For Given String
Now, let’s examine the given statement and find out what all information it holds which can be useful in constructing the Trie.
I | L | I | K | E | A | P | P | L | E | S | . | ||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
The above statement comprises 14 chars (13 if indexed from 0).
This is a simple string encoded in ASCII.
Therefore, every character is in the range of 0 to 256.
Below is the ASCII encoded value for the same,
I | L | I | K | E | A | P | P | L | E | S | . | ||
73 | 32 | 76 | 79 | 86 | 69 | 32 | 65 | 80 | 80 | 76 | 69 | 83 | 46 |
So, in the Trie data structure, the length of the “key” will define the depth and the “ASCII” code will be used as the index for storing the character at that depth.
Since we only have to worry about storing strings as key, extended ASCII could be used as a possible charset for storing keys. There are 256 chars in extended ASCII so at every intersection, there will be 256 possible paths.
PUT Key-Value Pair Into Trie
Let’s combine the knowledge that we have so far to add a new node into the Trie.
There are 3 basic steps that you need to remember in order to construct a Trie:
- Step 1. Create a new node if the existing node is null
- Step 2. If the current depth is equal to the length of the ‘key’ then set the value to that node.
- Step 3. Call the above steps recursively with
depth + 1
.
Here is an elegant recursive solution to put key and associate a value to it,
public class TrieST<Value> {
private static final int R = 256; // extended ASCII
private TrieNode root = new TrieNode();
public void put(String key, Value value) {
root = put(root, key, value, 0);
}
private TrieNode put(TrieNode x, String key, Value value, int d) {
// Step 1
if (x == null) x = new TrieNode();
// Step 2
if (d == key.length()) {
x.value = value;
return x;
}
// Step 3
char c = key.charAt(d);
x.next[c] = put(x.next[c], key, value, d + 1);
return x;
}
}
In order to call the above code,
- Create a new object from
TrieST
class. - Pass the
key
andvalue
to theput()
method.
var trie = new TrieST<Integer>();
trie.put("I", "I".length());
trie.put("LIKE", "LIKE".length());
trie.put("APPLES", "APPLES.".length());
// programatically you can split and perform the operations as below:
String str = "I LIKE APPLES.";
var trie = new TrieST<Integer>();
Arrays.stream(str.split("\\s+"))
.forEach(word -> trie.put(word, word.length()));
assertTrue(trie.contains("LIKE")); // true
This is not much of a use as of now. To make it useful, we will have to implement GET
functionality.
GET Value By Key From Trie
Reading the value by key is similar to the way we save it. We traverse the Trie on the basis of the passed key and check for the value in the leaf node.
- If the value is null then it means that the value has been deleted or does not exist.
- You hit the null node midway. It means that there is no node with the given key.
Let’s see how to implement GET.
public boolean contains(String key) {
return get(key) != null;
}
public Value get(String key) {
TrieNode x = get(root, key, 0);
if (Objects.isNull(x)) return null;
return (Value) x.value;
}
private TrieNode get(TrieNode x, String key, int d) {
if (x == null) return null;
if (d == key.length()) return x;
char c = key.charAt(d);
return get(x.next[c], key, d + 1);
}
Go through the code and let me know if you need any explanation regarding any piece.
At this point we’ve almost implemented Trie except for the Delete part. Let’s see how delete can be performed in a trie.
DELETE Key-Value From Trie
In order to delete an entry from trie you can set the value of that node to null
. But that would be inefficient when you have to search by that key in the Trie.
That is because then it would traverse all the way to the leaf node just to find out that the value is deleted already.
Instead, you should check if the node has any other branches that contains values and if there is none, it’s better to delete the entire link.
This is how you can perform delete in Trie,
public void delete(String key) {
delete(root, key, 0);
}
private void delete(TrieNode x, String key, int d) {
if (x == null) return;
if (d == key.length()) {
x.value = null;
deleteLink(x);
return;
}
char c = key.charAt(d);
delete(x.next[c], key, d+1);
}
private void deleteLink(TrieNode x) {
int nullCount = 0;
for (TrieNode i : x.next)
if (i == null) nullCount++;
if (nullCount == x.next.length)
x.next = null;
}
Great!!!
At this point we have a working implementation of a Trie data structure.
Next I was planning to test is against Hash Symbol Table to check how fast it is. For that I would use the unix dictionary that contains more than a million words. I will post my find here later. To get updated do not forget to subscribe.
I’m embedding my notes along with the code below for your reference.