Oh...
Please note. The search index is not a small list of all the words written in all the 1200 000 posts here... It is a list of post id's that contain these words!
The ENW search index is most certainly 15 million rows or larger. Please do not tamper with the search index unless it's absolutely necessarily. A complete rebuild of the search index will probably take several hours and when it does, the server will go
down.
There are no other way to include these short words mentioned here than to rebuild the search index due to the nature of the message board software.
Allow me to explain:
----
The search function consists of two tables in the database. Think of a table as a two-dimensional array, such as Excel spreadsheet.
First we have the "word table" (the name of this particular table), a table of all the words that exists in the database. This table probably has an "id" field and a "word" field. The word table is probably a couple of thousand rows, and will perhaps look like this:
Code:
the word table
+-------+-----------------+
| id | word |
+-------+-----------------+
| ... | ... |
| 1091 | crawl |
| 1092 | claustrophobic |
| 1093 | bovd |
| 1094 | monte |
| ... | ... |
+-------+-----------------+
Whenever you post a new thread, the forum software will go through every word in the post and check if it exists and the word table. If it doesnt, it will add itself to the end of the list, say, now when I write "goonsnargish" it will be included in the word list because noone has probably mentioned it before.
Do you think that's bad? Then enter the "search index" (as it's often called). The search index is a table of two fields (perhaps three), one field is the "post id" and the other is the "word id".
Lets assume this post is postid 1509509. When I press "submit reply" button the the search index will update itself.
Consider the following short sentence: "All of monte cooks base are belong to piratecat". Assume "monte" is word 1094 (look at the word table below), "cooks" is word 10944, "base" is word 3, "belong" is word 1998 and "piratecat" is word 13, the search table will add this info to the end of the list:
Code:
+----------+-----------+
| postid | wordid |
+----------+-----------+
| 1509509 | 1094 |
| 1509509 | 10944 |
| 1509509 | 3 |
| 1509509 | 1998 |
| 1509509 | 13 |
+----------+-----------+
Think about it. The average post here is probably some hundred words. That's why the search index is several million rows!
Okay, this is actually not so bad as it sounds because it makes the forum search very smooth, there is simply no better way to do it.
But... Here is our problem: The search index updates itself when you post a reply or write a new thread. That is, every search index begins at zero rows and grows as more posts are added to the database. After two thousand posts the search index has updated itself two thousand times. There are about 1200000 posts here at ENW, the search index has grown for a couple of years and has updated itself a million times.
Now... There's "rebuilding".
See, whenever you update the rules of the search indexing (such as only words of four letters or longer are added), the search index (and the word table) gets cluttered with words that shouldn't be there. If it once was okay with indexing two letter words and you suddenly change this to four letter words, the word index will still include the two and three letter words added before you changed the rules. So will the search index.
When Eric added the word "DM" to the word table, new post with the word will be added to the search index. However, older posts will not.
That's when you rebuild the index. Think of it as removing the search index and word index completely, all 20 million rows or so. Now, the forum software will begin at the first post in the database (lets assume this is post 1) and post after post update the word index and search index until all words from all the 1200000 has been added.
Resource heavy? Indeed. In fact, the index will become incredibly unstable and might corrupt itself if someone posts something during the update process.
So.. Please be careful.