Thursday, January 29, 2009

Regular search does not consider the fact that sensitive documents are typically found in clusters. If your DLP search engine has found one sensitive document in a location such as a file share or laptop, the probability of there being more is very high, however they are found to be false negatives. For example if a sensitive document is found in a file share, there is a high likely hood that there are other documents of equal sensitivity that are not covered. The usage scenario could be an HR professional storing documents in a folder for a specific task. If the filter only finds one, the current assumption with DLP is that there are no other files in this folder that are sensitive. This is a false assumption based on my observations of real incidents.

How to remedy for this?
A manual review can be done for the rest of the folder and folders in the tree
The folder can be marked sensitive, and all documents in this folder is then considered sensitive
The folder can be automatically reviewed by a broader capture filter (filters used are usually tuned to reduce false positives leading to a higher number of false negatives)
Finger printing (full or partial) can be used to see if these documents resides elsewhere
Pattern creation can be used to improve the search patterns
Etc.

The true solution to this is a combined approach using manual inspection, machine learning, and making the assumption that the likely hood of one single sensitive document residing in a repository is low, and that the likely hood of more than one document is sensitive is high, and mitigate the risk of the cluster by classifying, tagging, and protecting the cluster instead of a single document.

No comments: