Monday, July 14, 2008

One concern I have heard against using the CLR regex support in SQL server 2005, is performance. One way to overcome the cost of expensive regex queries is to do the search a bit smarter. One could start with the LIKE operator, or equivilant in other systems, and then do a sampling of rows in a table that returned results from the LIKE operation. After obtaining a sample rather than the entire table, one could then perform the operation on a separate system, or in a separate thread on the same system. With this approach, very complex patterns could be searched for, and one could create a separate repository from which chuncking could be used. This would work for not only text, but also images and other information as long as the parser can read and understand the format.

No comments: