Thursday, July 31, 2008

Government risk for loss of sensitive information still high:

According to an article in computer world http://computerworld.com/action/article.do?command=viewArticleBasic&articleId=9110983&intsrc=hm_list, only 30% of the laptops containing sensitive information are encrypted.

Since it is taking the government such a long time to encrypt, I would suggest they deploy encryption based on sensitivity of documents stored on laptops. They should start searching using DLP, and make some assumptions around employee roles and mandate encryption

Monday, July 28, 2008

What were they thinking? I understand the need to provide the court with adequate evidence (I am not a lawyer), but you would think the prosecutor would at least ask the court to conceal the information when it exposes an entire city's network.

San Francisco DA exposes the city's network passwords:http://computerworld.com/action/article.do?command=viewArticleBasic&articleId=9110758&intsrc=hm_list

Maybe it is time to run documents provided to court through a review for sensitivity before actually submitting documents to court? In my own experience, I know that documents containing health information and information about children becomes sealed, and the court has the discretion to seal any information it finds necessary to seal as long as it does not violate the public's right to access of information. Clearly, the public does not need to know San Francisco's network passwords, and the tax payers clearly does not need to see their hard earned money being used to reset all these passwords.

Friday, July 18, 2008

I have earlier described pattern matching, and "smart" information retrieval by first looking at broad groupings of information to create a set, then search the resultant set with a finer granularity in search terms.

If we use the neo cortex processing as an example, lower levels of information is detected by our sensory organs and processed at a lower level, and a fraction of this information is actually processed in a higher level organ. If we were to process information this way, we could do the following: For each search term, key words being the lowest, we could assign probability of this documents relevance, and then search the resultant set with bigrams. This result set would then be searched with trigrams. These resultant set would then be assigned with a probability of relevance. The finest search using complex patterns would only be used on the final set.

For each of these searches, a registry (data base) would then serve as the index of this information, and it should correlate to a taxonomy. This taxonomy would then be used to create meta data that would be assigned the document. With this, the opportunity to search for hidden patterns would be possible via data mining techniques.

Monday, July 14, 2008

One concern I have heard against using the CLR regex support in SQL server 2005, is performance. One way to overcome the cost of expensive regex queries is to do the search a bit smarter. One could start with the LIKE operator, or equivilant in other systems, and then do a sampling of rows in a table that returned results from the LIKE operation. After obtaining a sample rather than the entire table, one could then perform the operation on a separate system, or in a separate thread on the same system. With this approach, very complex patterns could be searched for, and one could create a separate repository from which chuncking could be used. This would work for not only text, but also images and other information as long as the parser can read and understand the format.
Symantec releases Data Base support for sensitive information: http://biz.yahoo.com/iw/080624/0409691.html
For further thinking about DAM, database access management and scanning data bases for sensitive information, see: http://securosis.com/
How to go about crating a better mousetrap (DLP)

If we go through the questions to ask: Where is it, What is it, Who has access, and how is it protected, we can see there are answers in each one of these four questions that can be used to answer others with a high probability.

If we think about the where is it. If we look at one particular user, that user will use a limited number of resources to create and store information. She might have a local lap top used for daily work, a hand full of SharePoint sites she visits, a few file shares and maybe two or three data bases typically accessed via a line of business application, and finally and very important, instant messaging and email.

If we expand the view of this person, and try to define that person in a network, we can look at organizational/hierarchical views of this person, and we can see frequency of communication via SharePoint, file shares, email and IM. With that information, we can create a social network of nodes between her and her co-workers and contacts. If we know that she frequently uses information of high sensitivity, we can apply a higher probability of her network also working on highly sensitive information, or has a greater opportunity to receive sensitive information. Each node going further out, will have a reduced opportunity of receiving sensitive information, unless they also work on sensitive information. Of course a highly connected node will have higher probability than a lesser connected node.

With this, we can create network models and base probability of each one of the nodes accessing, or have the potential to access sensitive information. This network diagram would be created by correlating information from email systems, logon events etc, and then correlate this to known repositories of sensitive information. Of course this approach will take several iterations as one would assume that in the beginning, few of the repositories would be classified and catalogued.

Now, if we start looking at Alice, and what information she receives, we could chunk the sensitive information she receives from let say a data base, and then see if there are hits on these chunks in email, IM, or in documents she creates. If it is, we can then assign a probability of whether the information is sensitive or not. If we have enough information so the probability is higher than a preset threshold, we could then automatically assign the appropriate classification, annotate the information with the appropriate meta data, and assign the correct protection using for example DRM or other encryption technologies, or just set the appropriate access control list permissions on the document.

Assigning rights to a document or repository then becomes a bit easier as you can glean information from previous transactions. With entitlement monitoring on repositories and in AD, you can then see if Alice should still have this access or not. A further development could be done to create a view into the social network to see if there is an increase or decrease of communications between nodes. If there has been a decrease, the organizational chart may not have been updated, but the node's work may have changed, and therefore may no longer need access to this information. In this case, if Alice owns one or more of these repositories, she could then be notified and queried if this node, Bob, still needs access. This system could of course also be used to monitor for abnormalities and anomalies.

We can also make assumptions about sensitivity of information based on protections on the system hosting the information (this may not hold true for end systems, but will generally hold true for financial systems and HR systems etc). If it is encrypted, or have other security measures in place, its probability of containing sensitive information may be higher, however this is a weak assumption in many cases, especially before a program has been put in place to safeguard sensitive information in an organization.

Thursday, July 10, 2008

It has been a while since I have updated the blog, but here is an article from MSNBC news I found stressing the need for inspection of information leaving your network: "Last year, a Virginia investment firm employee decided to trade music or a movie on the file-sharing network LimeWire on a company computer. He inadvertently shared his firm's files, including personal data of clients, one of them Supreme Court Justice Stephen Breyer" Seems that no-one including our Supreme Court justices are safe against loss of PII.