Monday, July 14, 2008

How to go about crating a better mousetrap (DLP)

If we go through the questions to ask: Where is it, What is it, Who has access, and how is it protected, we can see there are answers in each one of these four questions that can be used to answer others with a high probability.

If we think about the where is it. If we look at one particular user, that user will use a limited number of resources to create and store information. She might have a local lap top used for daily work, a hand full of SharePoint sites she visits, a few file shares and maybe two or three data bases typically accessed via a line of business application, and finally and very important, instant messaging and email.

If we expand the view of this person, and try to define that person in a network, we can look at organizational/hierarchical views of this person, and we can see frequency of communication via SharePoint, file shares, email and IM. With that information, we can create a social network of nodes between her and her co-workers and contacts. If we know that she frequently uses information of high sensitivity, we can apply a higher probability of her network also working on highly sensitive information, or has a greater opportunity to receive sensitive information. Each node going further out, will have a reduced opportunity of receiving sensitive information, unless they also work on sensitive information. Of course a highly connected node will have higher probability than a lesser connected node.

With this, we can create network models and base probability of each one of the nodes accessing, or have the potential to access sensitive information. This network diagram would be created by correlating information from email systems, logon events etc, and then correlate this to known repositories of sensitive information. Of course this approach will take several iterations as one would assume that in the beginning, few of the repositories would be classified and catalogued.

Now, if we start looking at Alice, and what information she receives, we could chunk the sensitive information she receives from let say a data base, and then see if there are hits on these chunks in email, IM, or in documents she creates. If it is, we can then assign a probability of whether the information is sensitive or not. If we have enough information so the probability is higher than a preset threshold, we could then automatically assign the appropriate classification, annotate the information with the appropriate meta data, and assign the correct protection using for example DRM or other encryption technologies, or just set the appropriate access control list permissions on the document.

Assigning rights to a document or repository then becomes a bit easier as you can glean information from previous transactions. With entitlement monitoring on repositories and in AD, you can then see if Alice should still have this access or not. A further development could be done to create a view into the social network to see if there is an increase or decrease of communications between nodes. If there has been a decrease, the organizational chart may not have been updated, but the node's work may have changed, and therefore may no longer need access to this information. In this case, if Alice owns one or more of these repositories, she could then be notified and queried if this node, Bob, still needs access. This system could of course also be used to monitor for abnormalities and anomalies.

We can also make assumptions about sensitivity of information based on protections on the system hosting the information (this may not hold true for end systems, but will generally hold true for financial systems and HR systems etc). If it is encrypted, or have other security measures in place, its probability of containing sensitive information may be higher, however this is a weak assumption in many cases, especially before a program has been put in place to safeguard sensitive information in an organization.

No comments: