Stopping By Words On A...
Jan. 27th, 2012 08:25 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Yesterday's main task was getting the basics of a stop word list in place to support text search. (For those who haven't lived in the world of text indexing and retrieval, stop words are those that you don't care about, the ones that appear in just about every piece of text and therefore are useless for locating important information.)
I'm intending to supply a default list with the application but also allow its customization. This means that the provided list needs to be copied to a manageable location -- database -- when the application is installed. The app needs a screen that allows additions and deletions. An easy way to do additions is showing the user what has already been indexed and allowing those to be moved into the stop word list.
Removing a stop word is somewhat problematic in that it could only be effective for messages indexed after the removal. That means either re-indexing existing messages or living with inconsistency. There's an option of disallowing removal after any messages have been indexed...which might not be a bad thing.
At this point, database access for stop words is working; only the maintenance function is still to be done.
I'm intending to supply a default list with the application but also allow its customization. This means that the provided list needs to be copied to a manageable location -- database -- when the application is installed. The app needs a screen that allows additions and deletions. An easy way to do additions is showing the user what has already been indexed and allowing those to be moved into the stop word list.
Removing a stop word is somewhat problematic in that it could only be effective for messages indexed after the removal. That means either re-indexing existing messages or living with inconsistency. There's an option of disallowing removal after any messages have been indexed...which might not be a bad thing.
At this point, database access for stop words is working; only the maintenance function is still to be done.