Monday, December 31, 2007

IPv6 support for Data Loss Prevention from Fidelis, http://www.fidelissecurity.com/,a data in motion vendor, the first vendor I know of that has IPv6 support: http://www.eweek.com/article2/0%2c1895%2c2238457%2c00.asp
What has classification to do with it?

Why do you need to classify your data? Isn't classification for secret government organizations and military organizations? I believe information classification is now needed for business as well. Today, organizations are under pressure to prevent loss of sensitive personal information both from a regulatory compliance requirement as well as from the public who is now getting tired of companies loosing their information.

So if you are to decide on a classification scheme what should you do? You can go from simple to complex, but the best bet is to choose somewhere in between. A three level classification could be: Secret, Sensitive, and Public.

The value of a classification system is of course that when your information is classified, and you know where it is, you can apply the right set of controls to it. Think about being able to target your encryption efforts. This can mean the difference between being able to deploy encryption versus not, as the cost of protecting everything is usually cost prohibitive.

The beauty of combining your information loss prevention program with a classification system is that as you discover sensitive information in your organization, you can apply the right set of controls when you also apply the classification scheme to your information, thereby protecting what needs to be protected, and not worry so much about information which would not cause a material loss to your organization if it was to be lost.
Pattern Matching

How to discover patterns in sensitive data that enables you to not only find what you already know about, but also discover sensitive information you didn’t know you had?

First off, you have to start with a corpus of known sensitive information. There are many algorithms to choose from. The simplest is of course key word searches. Then there are of course regular expression matching such as NFA and DFA. You can also use exact string matching, or hash parts of the content you are looking for, and see if it occurs in other areas. A new and exiting field is from Genetics. Genetic algorithms can show how information mutates (e.g. how information transforms when it goes from data base into email or documents).

When you have a corpus, you can then train your rules against the corpus. It sounds straight forward, but in some instances, I have seen over a million false positives on a group of computers. Rules needs tweaking, and it can be time consuming work. Unfortunately, I have not seen much automation in this area, so that is something we are currently working on.

A good resource for Pattern matching can be found here: http://www.cs.ucr.edu/~stelo/pattern.html
IT Governance is becoming more of a buzzword nowadays for a good reason. More and more of a company's financial transactions are fully automated, and with that there are ample opportunities for theft from a company. IT systems also holds most the intellectual property of a company has, and theft of IP has been listed as one of the top issues for US companies in Asia: http://www.mytelus.com/money/news/article.do?pageID=ex_business/home&articleID=2844426.

IT Governance should be part of the company's overall Governance, Risk, and Compliance efforts. GRC should drive the investments, divestments and strategy for IT to ensure competitiveness of the company. This includes protecting valuable assets in a company. IP protection will become more and more important as more and more IP is going away from paper based IP towards digitized IP. The question is, how do you identify and protect your IP?

The only way to identify IP, is to evaluate the business processes creating, using and storing IP. In most instances, IP "floats" around in an organization in email and documents, even when there are safeguards in place for who can access the IP initially.

When these business processes are understood, a process redesign might be necessary, and if so, it should be risk driven. If you have IP, and you are concerned about loosing it, the first step should be to go over your current policies. Are they adequate? If they are adequate, have you placed security controls that enables you to measure compliance against the policies? If you are missing controls, or have less than optimal controls, it is well worth spending the time and quantify the risk of the non adherence to the policy, and select the areas with highest risk first.
2007 is going down as the year with record losses of sensitive data, see article from MSNBC http://www.msnbc.msn.com/id/22420774/
I figured I needed to explain what Information Loss Prevention, ILP, Data Loss Prevention, DLP, and Content Loss Prevention is. ILP, DLP, and CLP are terms used interchangeably for a technology and methodology to search end points, repositories, network, IM, and email for sensitive information such as Personally Identifiable Information, PII, Personal Health Information, Credit Card information, CC, Intellectual Property, Business Intelligence, BI etc.

There are several vendors providing solutions in the space. Some of the best known names in this nascent but growing market are: Vontu, Tablus, Vericept, and Reconnex. There are more, and I might come up with a full list later on. There are two other vendors, not spoken of as much for enterprise solutions, these are Orchestria and Work Share. Orchestria is mostly known for solutions for financial institutions.

The solutions these companies provide use search technology to ferret out sensitive information that may be damaging to a company if it is lost. Due to NDA's signed with each company, I am hesitant to discuss strengths and weaknesses with each one at this time. If I get an approval from any of these companies to discuss publicly their strengths and weaknesses, I will do so. In the mean time, the best way to get information about these companies is to look at Forrester and Gartner reports which discusses the strengths and weaknesses. Here is a link to wikipedia:
http://en.wikipedia.org/wiki/Information_Leak_Prevention

Furthermore, if you are investigating which solution you should purchase, or if you are seeing a need for these types of solutions in your organization, you should ask for referrals from each company you are evaluating, and compare their results and needs with your own.

You should probably not start an evaluation without conferring with your legal counsel due to the increased risk to the organization if you cannot remediate what you have found (this is not a legal advice, as I am not an attorney).

Thursday, December 27, 2007

Books that I have read, or are reading that I think is helpful in understanding how information is created, used, and retrieved are:

Glut:


This book explains the origin of text, and how throughout history information has been created, stored and retrieved. Although not a book about how to find sensitive information, I don't believe there is a difference between finding any type of information, whether it is for the purpose of identifying sensitive information, or retrieval of other types of information. The problems are the same.

We are smarter than me:


This book is created by many authors, but is edited by a core group. I believe by reading this book, you will also gain insights into how documents and information will be created in corporations, not by single individuals, but by many individuals in a collaborative effort.

The third book I am currently reading is The stuff of thought:


This book explains the process of thought and how it affects language. I believe that without understanding how language works, understanding information retrieval created by users cannot be obtained.

I will add other books going forward that I believe is helping me understand information leakage prevention solutions.
What areas must be addressed for ILP solutions to be succesful?

1. Document searches must be easy. This sounds contrite, but is acutally a big challenge. It is hard to know how to distinguish one document from another unless you know what you are looking for. Knowing what to be looking for becomes increasingly difficult as the number of documents outnumbers what can be expected to be read by a person, or team of persons. The standard approach has been to search for regulatory compliance, or terms that one accociate with sensitive information. What has not been done yet, is to understand how these documents are related to other documents. If this relationship was understood, then automation could be used to find groups of documents and assign sensitivity to these documents by assigning "social" values (such as documents created in a finance group) etc. The problem is to establish a hierarchy of documents based on sensitivity. This cannot be achieved by just looking for syntax, context, and keywords or regular expressions. This combination will catch a subset of documents that look like what you are afraid of, but cannot tell you about documents you dont know needs to be protected, but you dont know how these documents looks like yet. Without understanding the cronologie of how these documents are created, and by who(m), it will not be solved. Most of the documents I create, are either created from a template, or use text from other documents, or downloaded information from either websites or data bases. I also rely on prior knowledge obtained through reading. It is easy to see that a strict hierarchy is impossible to create unless the origin of the information is understood. I believe that the best approach is to create meta data that follows the documents as they are incorporated into other documents, and as information is created. The only one who can do this, is the user(s) themselves. Today's document creation tools allows for some of this, but it is not allowing for assigning sensitivity.
I have released a new whitepaper on Microsoft's technet pages. You can read how we implemented an information loss prevention program at Microsoft by clicking this link http://technet.microsoft.com/en-us/library/bb897856.aspx. In addition, you can hear a webcast about our deployment here: http://www.microsoft.com/winme/0512/25568/TechNet_Radio_MP3.xml