Archive for February, 2009

Digital Reef provides technology that I call a similarity engine.  The similarity concept within search is a powerful one – the ability to determine both exact duplicates and near duplicate content.  And being able to set the level of similarity – whether it is 10% or 90% similar – provides a great deal of utility for the users.   Additionally, finding similarity between different types of content – for example, a Word document and a PDF can be critically important.  Or perhaps there is a table embedded in a presentation and that same table is in an Excel spreadsheet.  Someone could have cut and pasted text from a document and put it in an email. 

Finding exact duplicates can be useful but add to this near duplicates and there is potentially even greater implications.  Near duplicates allow you to find the same content in different types of files.  Perhaps you can find different revisions of the same document.  There may be information from a document that has been quoted, cut and pasted, and used within any number of other documents. 

The practical use of similarity is important – above and beyond the casual quest for content or information for which people often use general purpose and consumer search tools – there is real “utility”  that can have implications to your business.  One such utility is day-to-day work flow – getting better use and productivity out of the content already created within your company.  Similarity can help with performing research that has a wide range of applicability – medical, legal, science, consumer, business, technology, etc.  There is a concept that I call content mining (more on this in my next blog) that could be enabled by similarity technology.  And probably the greatest driver for using similarity technology in today’s environment  is for e-Discovery.  As my fellow blogger – Daniel Garrie astutely points out – “The onslaught that is sure to come–companies large and small are going to be under legal siege in 2009. Whether it is white collar crime, bankruptcy or wrongful termination, we’ll see surges maybe even a Tsunami of new lawsuits.”

Similarity technology is different from a search engine throwing anything and everything at you based on a keyword.  Rather, it is sophisticated technology with intelligent algorithms that scan content and analyzes similarity based on ratios that are user defined.  Additionally, it will present back to you the level of similarity – 100%, 95%, 70%, 60%, and so on.

I believe that Digital Reef’s similarity technology can elevate and accelerate the access of critical data within the enterprise.  It is extremely valuable and unique technology – in fact – I don’t know of anything similar.

Read Full Post »

Many people pontificate on keywords and search technologies and the need to grasp the fundamentals of how they operate. Others discuss cost savings or the destructive effect eDiscovery is having on corporate America. All of these present valid and key concepts to be discussed, but underlying all of them exists a larger issue: Communication. Most problems that arise from the electronic discovery abyss derive from poor communication.

To ensure that preservation is executed within an organization, it is imperative that key legal and litigation support stakeholders communicate with internal technology teams. This is especially true when it comes to the initial deployment of preservation-driven technology. Once installed, technology teams need to both train and communicate with the litigation support individuals so they know how to execute a preservation hold.

In the preservation arena, an effective solution will empower lawyers to be self-sufficient–without having to rely on technologists each and every time they seek to execute a court order preservation hold. Due to recession, most companies simply do not have sufficient assets to build their own in-house solution. However, the companies that do successfully empower the lawyers can realize substantial cost savings. The potential savings of empowering lawyers, by providing the correct technology tools for them to execute legal hold, saves companies people-hours, service fees as well as eliminating error.

The elimination of human communication errors may represent the largest savings. A slew of case law suggests that the execution of preservation is critical. For example:

Anadarko Petroleum Corp. v. Davis, 2006 U.S. Dist. LEXIS 93594 (S.D. Tex. Dec. 28, 2006); http://www.ca10.uscourts.gov/conference/downloads/ediscovery7.pdf

Best Buy Stores, L.P. v. Developers Diversified Realty Corp., 247 F.R.D. 567 (D.Minn. 2007); http://www.ca10.uscourts.gov/conference/downloads/ediscovery7.pdf

In re Intel Corp. Microprocessor Antitrust Litig., 2008 WL 2310288 (D. Del. June 4, 2008) http://www.ca10.uscourts.gov/conference/downloads/ediscovery7.pdf

Johnson v. Big Lots Stores, Inc., 2008 WL 2191357 (E.D. La. May 7, 2008) http://www.ca10.uscourts.gov/conference/downloads/ediscovery7.pdf

While these cases vary, in terms of outcome, regarding whether or not litigants execute preservation orders to the extent set-forth by the courts. The collective outcomes demonstrate the importance of being able to execute a defensible preservation order.

While the manual and laborious process of preservation sounds like a great way for one to spend weekends. It still leaves a company open to sizable human risk error. It might make sense for companies to consider empowering the lawyers to execute preservation holds. In other words, let the lawyers be lawyers and let the technologists be technologists.

In my experience, most preservation issues can be avoided if enterprise systems are configured so that lawyers can seamlessly implement a preservation hold and not have to use the technology 911 pager to effectuate a preservation order.

Read Full Post »

Patternicity – defined by Michael Shermer – a writer for Scientific American – is the tendency to find meaningful patterns in meaningless noise.  When I read this article on Pattnernicity I immediately related it to the challenges we face with information access. 

Patternicity deals with false positives and we have a compartive with search tools – too many responses that may or may not be what we are looking for.  Human Patternicity is meant to err on the side of caution because as Shermer points out – “the cost of believing that the rustle in the grass is a dangerous predator when it is just the wind is relatively low compared with the opposite. Thus, there would have been a beneficial selection for believing that most patterns are real.”

Digital Patternicity is also meant to err on the side of caution because the cost of believing that the keyword matches your intentions is relatively low compared with returning a false negative.  Therefore returning a false positive is better than returning a false negative. 

The problem in both Human and Digital Patternicity is that the algorithms are limited and have stopped evolving because they don’t need to improve.   Human beings are very successful and don’t require more sophisticated methods for returning fewer false positives.  Likewise, search companies like Google are very successful and have built a huge business in spite of the number of false positives they return. 

However, increasingly within the world of business – where information equates to revenue, competitive advantage and market growth – there is a big price to pay with false positives and a shift in the evolution of Digital Patternicity must occur.  There will always be a place for acceptable false positives in the mass market – but when you get to specialization, when the stakes become too high, when survival is at risk – then evolution aggressively adapts.

Read Full Post »

According to industry analysts, a company with more than 10 legal matters a year should evaluate the acquisition of an electronic discovery solution. Besides the potential financial savings, a company can achieve the peace of mind of controlling their own data.

A dollar saved is a dollar earned. The costs of third-party technical and legal experts add up and the potential impact to the business is sizable, with the sky being the limit. See Sean M. McNee, Productivity as a metric for visual analytics: reflections on e-discovery. It is important to realize that the costs of getting, saving, searching and producing vary respective to the number of documents at issue. See J DeBono. Preventing And Reducing Costs And Burdens Associated With E-Discovery: The 2006 Amendments To The Federal Rules Of Civil Procedure, 59 Mercer Law Review 963 (Spring 2008). This means that if a company implements a solution in-house, one that synchronizes the document retention schedules with the systems, achieving control of the information that is stored and archived, the costs can fall as far as the number of documents retained.

Buying and executing an in-house solution is not always a simple feat. It requires collaboration and energy from the legal, technology, and business stakeholders to be successful. However, buying it certainly gives a company substantially more control over the costs associated with responding to a regulatory investigation, judicial case, or any document intensive production process.

While it is important not to discount the utility of outsourced solutions, when the proverbial fact pattern of the case requires, such solutions can be costly and result in a dependency on the vendor and a relinquishment of control over information and legal-business autonomy for an organization. Of course, bringing eDiscovery in-house requires an investment, but looking past a single quarter of earnings, an in-house solution can provide organizations with substantial tangible benefits year-after-year.

Renting a house is simply not as cost effective as buying it (discussing the tax benefits of buying a house v. renting are beyond the scope o this discussion). The technology benefits realized by buying instead of renting eDiscovery solutions vary by company. The benefits might include reduction in storage costs and/or better implementation of strategic storage initiatives; increase in data security; better responsiveness to requests for electronic information and possibly even happier workers. The savings, while difficult to quantify in a general way, can range from the thousands to the millions annually.

I welcome any comments or additional discussion or points around this benefit v. burden of buying a solution and brining the functionality in-house.

One other key point is how often do you obtain a great deal when the seller knows the buyer has a pressing timeline? When you have the functionality in-house, the costs around the eDiscovery process are certainly more predictable. For example, a company that does not have an in-house solution might find itself paying a substantial premium because the supply of competent document review companies cannot meet the demand, skewing the price curve for the point in time. A company that has an in-house solution can predictably control the costs and scale up and down as appropriate without paying substantial premiums. A services vendor who has already made commitments to other customers is in no position to cut you a better pricewhen you’re under tight time constraints

Read Full Post »

Alright we all know that we have a ton of data and its growing and growing.  And maybe you are sick of hearing about it.  But you should really listen.  I liken the growth of data in business to the growth of the human body taking on too much weight.  The result is that we may be able to function for a long time but eventually there will be serious ramifications if we don’t do the right things to become healthy. 

There is an interesting IDC report that was published in 2007 – a bit old but has some compelling information and insight.  Let’s break down some of it:

  •  In 2006, the amount of digital information created, captured, and replicated 161 exabytes or 161 billion gigabytes. This is about 3 million times the information in all the books ever written. Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes. 

My observation:  This numbers illustrate the sheer volume of digital data that has being created and further – tells you that we ain’t seen nothing yet. 

  •  IDC predicts that by 2010, while nearly 70% of the digital universe will be created by individuals, organizations (businesses of all sizes, agencies, governments, associations, etc.) will be responsible for the security, privacy, reliability, and compliance of at least 85% of that same digital universe.

My observation:  The importance of this is that organizations will have to manage data created by their customers and employees – which will have a real business impact.  And IDC left a few things out – accessing the data and protecting it. 

  • The cost of not responding to the avalanche of information can add up, yet not be immediately visible to CEOs and CFOs.

My observation:  This goes back to my unhealthy body analogy – you may not know what vital organ or system is going to collapse – it may be more than one – and you won’t know until something bad happens. 

  • In surveys of U.S. companies, we have found that information workers spend 14.5 hours per week reading and answering email, 13.3 hours creating documents, 9.6 hours searching for information, and 9.5 hours analyzing information.
  • We estimate that an organization employing 1,000 knowledge workers loses $5.7 million annually just in time wasted having to reformat information as they move among applications.  Not finding information costs that same organization an additional $5.3 million a year.

IDC is saying that poor data management can cost you $11 million annually just based on your users wasting time.  That doesn’t take into account other costs – such as outside audits, e-Discovery, litigation, etc.  You’ve just been told you have a severe case of diabetes and need to do something about it.

We need greater levels of integration between applications, storage systems and data management tools to turn data into information and then to get us the right information when we need it.  Okay?  Go make it happen 😉

Certainly this is easier said than done.  But the ecosystem – customers and the various vendors – must all move towards this objective.  We already have better tools to accomplish these tasks but we have a long way to go before reaching information utopia.  The first step is to recognize that there is an issue – a problem – and make it a priority to research and begin to address the unhealthiness and the short and long term ramifications.

Read Full Post »