Thursday, January 29, 2009

NIST and DLP vendor opportunities

NIST has published a draft guide for protecting PII and it will affect best practices and technology choices in years to come when the draft becomes a full standard. The NIST guide provide guidance to organizations on how they should manage PII stored or processed in their systems based on the level of sensitivity.

If the draft become a released standard, organizations will be using it to prove or disprove the ability to comply with best practices. Therefore mapping technology and policies to the standard is important, and it is important to understand that not one product can solve all of the issues. However a set of complementary products can solve it. DLP products does help in many ways, and it would be good for DLP vendors to start defining best practices that spans beyond DLP such as including Identity Management, Storage, Policy, Policy management, Encryption and risk management. The statement from NIST that not all PII is to be treated the same, is very telling, as a classification and tagging of the data would here help to apply the right set of controls for the high value items, and not overdo the controls for lesser value data.

Some observed issues with the NIST publication is that it defines PII but does not provide an exhaustive list. For example, for the Census Bureau, there may be additional types of PII that they specify are stricter.

NIST recommends that each organization Create Policies and Procedures, Conduct Training, De-identify PII, Employ proper Access Enforcement, Esure Transmission Confidentiality, and Audit Events.

So similar to PCI, DLP might not be the full answer to the story but can provide insight that helps to enable compliance for some of these areas. For de-identifying PII, DLP help by discovering PII. It is then it's up to the organization to de-identify it. This is of course not a straight forward process, and will need some thought before being implemented. With DLP, the organization gains understanding of the business units or groups that are having the most issues and concentrate or focus training activities. Likewise for create policies and procedures - this falls into the realm of understanding the PII inventory and what the priority levels are.

The new collaboration between RSA and Microsoft for DLP solutions coupled with DRM is clearly a step in the right direction.
Regular search does not consider the fact that sensitive documents are typically found in clusters. If your DLP search engine has found one sensitive document in a location such as a file share or laptop, the probability of there being more is very high, however they are found to be false negatives. For example if a sensitive document is found in a file share, there is a high likely hood that there are other documents of equal sensitivity that are not covered. The usage scenario could be an HR professional storing documents in a folder for a specific task. If the filter only finds one, the current assumption with DLP is that there are no other files in this folder that are sensitive. This is a false assumption based on my observations of real incidents.

How to remedy for this?
A manual review can be done for the rest of the folder and folders in the tree
The folder can be marked sensitive, and all documents in this folder is then considered sensitive
The folder can be automatically reviewed by a broader capture filter (filters used are usually tuned to reduce false positives leading to a higher number of false negatives)
Finger printing (full or partial) can be used to see if these documents resides elsewhere
Pattern creation can be used to improve the search patterns
Etc.

The true solution to this is a combined approach using manual inspection, machine learning, and making the assumption that the likely hood of one single sensitive document residing in a repository is low, and that the likely hood of more than one document is sensitive is high, and mitigate the risk of the cluster by classifying, tagging, and protecting the cluster instead of a single document.

Monday, January 26, 2009

Content protection should be tied into access certification. It seems that companies are now improving their compliance by implementing provisioning technologies according to the Burton Group. Considering how hard it is to control whom should have access to what, I believe that a coupling of provisioning tools and DLP is the next logical step.

The content custodian should be notified of the type of content by the DLP system, and the choices should be presented to the custodian for protection measures and marrying this with provisioning systems would lessen the burden on the custodian.
Not long after the public notification of the breach of Heartland Payment systems attorney firms such as Girard Gibbs LLP start their investigation into the breach, and solicit individuals that may be affected by the breach. This may have been the largest breach ever. The numbers may reach tens of millions of credit and debit card transactions according to this article in Washington Post,

Sunday, January 25, 2009

President Obama embarks on wide reaching changes in the regulatory environment according to New York Times. This should translate into busy times for any IT department managing regulatory and compliance issues for their companies.

Friday, January 23, 2009

Researchers have found a relationship between word choices in communications and how well a relationship is functioning. In other words, content in communications can be used to establish the overall health of a relationship: http://www.msnbc.msn.com/id/28814669/

If the choice of words can be used to determine the strength of a relationship based on frequency of certain words, it is not a far conclusion to be drawn that foul play could be found by the same type of study. The choice of words would be different of course, but if there was a large collection of communications that could be mined between criminals, it should be possible to use pattern recognition to ferret out such communications in network traffic.
According to Network World, Forrester research predicts big opportunities for Tech Firms with the Obama Cyber security plan.

Wednesday, January 21, 2009

Interesting link to coverage of data breaches in 2008: http://www.insideidtheft.info/breaches.aspx?gclid=CNrSt4epnpgCFSMSagodjCXQmg
According to an article in Washington Post, Heartland Payment systems may have had the largest breach ever. The numbers may reach tens of millions of credit and debit card transactions.

Friday, January 16, 2009

Two areas for concern for 09 will be sensitive information going into virtualized environments, and into the cloud.

According to this article in WSJ, The Center for Strategic and International Studies report points out the trend towards greater industrial espionage: Quote from WSJ article "Supposedly confidential corporate information, the report warns, is almost certainly being hacked. As more individuals and companies rely on "cloud computing" -- storing information and services such as email remotely on supposedly secure servers -- foreign intelligence agencies and commercial snoops may have access." This is a troubling statement.

According to CIO magazine, CIO's are looking towards virtualization and the cloud for 09 to reduce operating and capital expenses. If these are the areas of investment, this is also where the criminals will spend their resources to wrestle valuable information from the rightful owners.

Internet News is running an article on this subject today
Using models from nature to identify sensitive information

One interesting hypothesis would be to evaluate sensitive information with a predator-prey model by realizing that information within an organization is bound by its physical and social networks, in other words there is a topology that can be mapped, and using differential equations, the contours can be described, so the topology of interest is mapped with a modified trophic web for the dispersal of information of value. The challenge of course will be to create a nonlinear system with the right set of variables. The question is what is the driving factor for these variables, and what would be an anomaly versus a genuine change point?

Time is of course the great equalizer. A patent expires, so does copy right, however the length of time it takes for the value of a copy right item to decay is much longer than a patent. Same goes for financial information. A 10Q or 10K’s value drastically is reduced upon publication which happens quarterly or annually, respectively.

Thursday, January 15, 2009

Social Networking, DLP, and Identity Management opportunities

A new area that may lend itself well to understanding the flow of information is social networking theories such as power law distributions, Mandelbrot statistics etc. The problem now of course becomes an issue of information overload. The amounts of data in such an analysis becomes quite large quickly, and the problem is inspection of findings. To make such a system scalable, the system should create local accountability.

With local accountability, I mean that either the individual will have to sign off on a compliance statement on a regular basis, or the manager, as they would be the closest to know whether the access is appropriate, or excessive.

Another interesting concept would be to look for change points, and flag these for further inspection. If change suddenly occurs, it should be possible to capture this change. Inspection of file share access, SharePoint access, Line of Business access etc, should be able to reveal a change in behavior such as the example from the data theft at Boeing.

So, what is needed to evaluate if access is appropriate or if it is misused?

To begin with, each individual with access to the network must be managed, and their access monitored. However, since most information is not confidential, access to it can be ignored if sensitive information is identified and cataloged.

To catalog the information, you will have to search across your repositories for sensitive information. I believe that the information as it is found must also be tagged. A tagging using the alternate file stream is interesting, but this tag is lost in most cases when the information leaves the network. A second approach is to tag the metadata of the file itself. This does not get lost when the information leaves the network.

An interesting approach would be to create a hash of the file as it has been classified and tagged. However if the tag also holds the hash, the hash of the file is altered if it is placed in the meta data of the file. It is not a problem with placing the tag in the alternate file stream. However if you create a hash and place it in the meta data, you could then just sign the file.

If these hashes are stored in a central repository, the hash can then be used to evaluate if copies of the file exists elsewhere. If copies exists, they should be tagged according to the first file found. This process could also be used to remove the copies.
New emphasis on SEC's role in policing the financial markets is on its way.

The Obama's SEC choice wows aggressive action according to this article in MSNBC: http://www.msnbc.msn.com/id/28674370/. What will this do to DLP? I believe it will be a boon to the industry, as this will require much better detection technologies for fraud and misuse of sensitive financial information
Information protection, DLP, Identity Management, outsourcing and vendor management, what is in store for the Enterprise for 2009 and future?

Information gleaned from several surveys gives a dismal outlook for data breaches.

In a survey by Enterprise Strategy Group, 50% of their respondents said internal breaches were the direct cause of loss of confidential data, while 19 % were caused by external attacks and 11 % were a combination of external and internal attacks. 14 % of the respondents said data loss came as a result of losing a device containing confidential data.

In a 2007 study by the Ponemon Institute, "the notification cost for a first party data breach is $197 per a record lost and for third party data breach is $231 per a record lost. (A third party organization includes professional services, outsourcers, vendors, business partners and others who possessed the data and was and responsible for its protection.)"

In a November survey by SailPoint Technologies of Fortune 1,000 companies shows that most of them are grossly unprepared to manage information technology (IT) security risk. They polled IT managers and directors and found that out of 116 respondents, 44 percent said that they could not “immediately remove all access privileges for terminated employees” if the company had a massive layoff. More than 65 percent reported that they would not be able to “present a complete record of user access privileges for each employee” if the company’s chief information officer wanted it that same day. And 46 percent said their company “failed an IT or security audit because of a lack of control around user access” in the past five years.

The good news is of course that DLP vendors have started to integrate with identity management systems help, but there is a long way to go before the problem is solved. The not so good news is that most enterprises do not have a good understanding of who has access to what information. This means that a loss could go undetected for a long time, and cause a higher cost to the enterprise. With the current financial situation with large layoffs, this becomes even more critical to solve.

The approach I would recommend to solve this issue, is to start cataloging and classifying information and information systems, and tying it to identity management information. Then as the business processes are understood, the principle of least privilege access should be used to manage these systems.


Even though this case is a stand alone case, former Boeing Employee charged in data theft case, it shows that actively monitoring who has access to sensitive information, and evaluating whether this is appropriate access is paramount. It is an established best practice for fraud prevention, and is a requirement for SOX compliance for financial systems. The issue is of course that enterprises today, do not safeguard critical business information in the same manner as they safeguard SOX information.

This of course leads one to look at Governance, Risk, and Compliance, to see how risk management can be streamlined for all sensitive information, not just information required by law or regulation to be safeguarded. This will drive down the cost of compliance, improve governance, and reduce the overall risk of loss of information.
NIST has published a draft guide for protecting PII

The NIST draft uses the work done by OMB (Office and Management and Budget) Memo from 2007: “information which can be used to distinguish or trace an individual’s identity”. NIST Provides a practical guide for organizations on how to handle PII, by distinguishing the varying levels of sensitivity of the PII as well as how it should be protected: http://csrc.nist.gov/publications/drafts/800-122/Draft-SP800-122.pdf
While looking at new players in the DLP space, I ran into Illumant. They have two interesting documents for downloads if you are in the market for DLP: http://illumant.com/Global/Solutions/DLP.php?gclid=CO6atfmJipgCFRsRagodRAO_DQ. There is both a white paper describing what should be considered when evaluating DLP vendors, as well as a matrix of vendors and their capabilities. The white paper could have been much more in depth, but is a good overview before starting to look in earnest. Both Forrester and Gartner provides a much more in depth coverage, and it is well worth the investment to purchase both companies' reports prior to investing in a DLP product.

Monday, January 12, 2009

Using the information already provided by the users to make assumptions about who should have access after protecting the document with DRM.

If a file share owner has granted read, read write, and admin access to a share, a group could be created dynamically that would include these members, and the rights could be created according to the original ACLs on the file share.

This would allow a group owner (the share owner) to add and remove users from a document, or sets of documents after they leave the file share. This would solve the problem around managing DRM rights. Currently, it is hard to manage granular sets of rights, as these are not readily automatable. However, with this approach, groups can be built on the fly based on sensitivity of the information and whom has access already. For example, certain PCI information is currently available to a PCI group, in this scenario, DRM rights would be granted to this PCI group on the fly for any document extracted from the central repository
Banks falling behind in protecting customer financial data according to a study done by PwC: http://security.cbronline.com/news/banks_falling_behind_on_data_security_090109
California Senator Dianne Feinstein (D-Calif.) is again proposing data breach legislation to the US Congress. The bill is Bills S.139, the Notification of Risk to Personal Data Act and S.141, the Social Security Number Misuse Prevention Act. This is her second attempt at creating a federal law setting requirements for handling of personally identifiable information. It would require federal agencies as well as business to notify both media and the private person whose information was lost.

http://www.internetnews.com/government/article.php/3795191/New+Data+Breach+Privacy+Bills+in+Congress.htm

Thursday, January 08, 2009

Plenty newsworthy items this week in the DLP space. According to Network World, CA will by DLP vendor Orchestria: CA to Buy Data-Leak Prevention Vendor, and ByteandSwitch publishes an article on DLP vendor and DRM partnerships:
Partnerships Spark New Life into Enterprise DRM. Of course, there are data breach news as well. Us Businesses reported close to a 50% increase in breaches in 2008:
Data Breaches Rise Almost 50 Percent in 2008, and CheckFree has to warn 5 million customers.:
CheckFree Warns 5 Million Customers after Hack

Thursday, January 01, 2009

Considerations when building queries for DLP products

Term weight is normally reduced the longer the document is. This may be counter intuitive to the need for scanning a document for compliance issues such as PCI, as a document with reoccurring terms may lead to a higher risk, than a document with fewer items. So when searching an inverse index, it is important not to reduce the scale either by adding one plus the log, or using the cosine on a vector based search.

However by doing this, the terms in the query becomes more important. A term that has a high occurrence in both a set containing sensitive documents, and its corresponding set of non sensitive documents will lead to a high occurrence of false positives. Because of this, an effectiveness of terms must be calculated and stored over time. A term with low effectiveness should either be eliminated from the query, or should have a lower weight.

Several solutions may be available here, one is to combine highly effective terms with less effective terms in a larger pattern. The question though, is if the distribution of terms in sensitive documents take on a Gaussian property with a bell curve, or if there are power law distributions in terms. To this question, I don’t know the answer yet, but I have noticed in practice that the distribution of documents follows power law distributions. This can be used in a query strategy, where an initial query with a high false negative rate is used initially to ferret out areas with a high probability of containing sensitive documents. When this approach is used, a broader query can be used in this space.

When considering a space, it can be a geographical space such as a site, it can be a logical site such as a file server supporting the HR department, or it can be a space in time. Most likely it is a combination of the above, and may even have more vectors such as user identity, frequency etc. So far, this is a trial and error based approach. To improve on this approach, large data sets would need to be collected and analyzed.