I suggest that if you are interested in intelligent Enterprise search that you listen to  Steve Akers being interviewed by BeyeNetworks – it is a long session but well worth the time.  

Steve discusses a number of things including the challenges faced by Enterprises, how Digital Reef solves these problems and some customer use cases.  There were also two slides that I thought was a great overview of the Digital Reef solution:


  • Automatically identify and index all unstructured data
  • Provide tools to find and understand the data: 
  • Boolean searches (freeform, fuzzy, metadata, phrase, proximity)
  • Similarity searches using example files
  • Email thread reconstruction
  • Exact and near duplicate identification 
  • Pattern expression recognition
  • Organize the data using automatic classification 


  • Transform files into common file types
  • Collect and move data
  • Manage data retention policies


Designed for Scale and Security

  • Grid-based, distributed architecture provides performance and resiliency
  • Multi-tenant, role-based security model
  • Easily deployed and maintained
  • Indexes and prepares the full content and metadata of up to 10TBs of data in 24 hours with a standard configuration


Whither Thou ILM?

Some of you may remember the big buzz around Information Life-cycle Management or ILM.  EMC pushed the concept of ILM a few years back and many of their competitors followed them down this winding road.  You know that marketing campaigns are working when customers talk about ILM strategies using some of the same language as their vendors.  I witnessed this fairly extensively with ILM but the reality never matched the rhetoric.  

On some measure ILM has been successful.  A number of customers went to a multi-tier storage environment.  Some never moved data but actually became smarter about where they placed it to begin with.  Others would actually move data at either the volume level or if they were using file systems – at the file or file system level.  During this time a number of technologies and vendors came and went and when the dust settled there was modest levels of ILM but nowhere near the promise of the hype.  

The term ILM is rarely used these days and it is not going to open any doors for you.  However, just because we never reached the nirvana of ILM doesn’t mean that there wasn’t real value in the concept.  

In my view, the goal of ILM was to move data transparently to the appropriate storage tier balancing performance, protection and cost.  And the end result of implementing ILM included significant cost reductions and better utilization of your expensive IT infrastructure.  

But the mega-hype around ILM actually over-complicated it and created confusion.  There was and is no magic application or technology that could just make it all happen with a push of a button.  However, with the combination of people, process and technology there are great strides that can be made.  In fact, I know of IT professionals that have saved tons of money by implementing some form of ILM.  I submit that some level of ILM – regardless of what you call it – should be a requisite part of every data center.  In fact, it should be as fundamental a part of the data center as disaster recovery.

IT Challenges

Over the years I’ve spoken to hundreds of IT professionals and via research studies gained insight from thousands.  The following are some observations that I’ve made that seem somewhat consistent:


Change Management Often Means Don’t Change Anything.  On some level this makes sense because whenever we interject change into our environments there is a risk that problems will occur.  However, it is important not to create a culture that is anti-change.  One IT professional that I was working with claimed that he was a champion of implementing external storage virtualization but others in his group were opponents and squashed the project.  Another IT professional was combating their backup admin because the latter was dead set against implementing a disk-to-disk backup solution because he still clung onto his tape library.  Change must always be weighed in terms of risk and reward but a culture of no change can be destructive.  I do agree that “no” is a viable answer but it shouldn’t be the default.  


Innovation Often Causes Disruption.  There is actually a bit of a misperception that IT is typically ahead of the curve on implementing innovative solutions in the data center.  In reality it often takes years for innovation to permeate the masses.  We must also consider the fact that IT resources are limited and as such there is limited ability to actually evaluate and implement new systems.  


Incumbency Matters.  There is a legitimate reason why incumbency matters and that is because IT professionals invest a ton of time, money and resource in getting their infrastructure to work they way want it to.  And in the course of doing so they become experts on these systems.  To rip and replace with something completely new that they have little or no expertise can be counter-productive.  Having said that – it is still important to look at new solutions and it is critical that we don’t let incumbency trump excellence.  


We Often Over Engineer To Address Requirements.  We do this with storage, networks, servers, etc. – because the cost of risk is usually higher than the cost of capital.  However, in an economy as bad as the one we are facing – the cost of capital is arguably higher than the cost of risk.  The priorities have shifted with further and continued emphasis on optimization and utilization.  


Urgent and Important.  In 7 Habits of Highly Effective People, Steve Covey presented the idea that we should focus on those things that are important but not urgent.  However, any IT professional will tell you this is not the world that they live in.  Putting out fires is necessary and unavoidable.  But consider his point.  Let’s say someone has a heart attack.  That is both important and urgent and obviously must be tended to immediately.  However, if that person had eaten right, exercised and went for annual check ups – all IMPORTANT things to do – then perhaps the heart attack would have been avoided.  


    “Many IT departments are kicked around by the circumstances: fighting fires wherever they appear, dealing with botched-up, antiquated systems, heterogeneous infrastructures, incompatible interfaces, undocumented specifications and shattered, often overlapping applications. Between two breaths, business and IT people may find a few moments to discuss requirements, ideas, plans (speed-dating, really). Then it is back to the usual.”  


    The above quote was from Ron Tolido, CTO of Capgemini and a frequent IT industry blogger (I highly recommend his blog). It brings up an issue with the challenges of IT as being perceived as more tactical versus strategic to the business.  


     In many cases IT is seen as an operational function that provides services to the “business” people in their respective companies.   As such, they are considered a great asset having technical skills and knowledge that others in the organization lack.  However, even though they are skilled and respected they are often just seen as service providers and fundamentally a necessary overhead to the business.  However, when IT combines both right brain creative thinking with left brain logical thinking greater leaps in value are achieved.  I hate to go to the default of Google as a company as the best example of this – but fundamentally IT is their business and as such it informs nearly everything that they do.  


     IT needs to be invited to the business table not just to provide operational services – however valuable they are.  Additionally, IT should also be consulted on what new ways technology can increase market share, raise brand awareness, improve revenue, increase profitability and develop new products, services and markets.  


    That is a tall order but if it can be achieved great things can happen.  Asking IT professionals to be more creative in their day-to-day business isn’t crazy.  Most IT guys that I know have dozens of ideas and opinions on improving products and services.  They are constantly pushing their vendors for better features, they are often adopting new gadgets for personal use, they are up on the latest technologies and trends, and they love mixing it up with their peers.  The problem is that they aren’t often asked (or more to the point – listened to) to provide feedback for the companies they work for specific to the businesses they are in.  


    We are in a tough economy and it is at these times when we can instigate change.  Companies should look to IT for new ideas and should create a culture that includes them as a part of the business process.  CEOs need to overcome their intimidation of IT – which I think is one of the big reasons for the divide.  More CIOs need to become CEOs – which is not the typical trend.  Additionally, CIOs need to see themselves not just as service providers whose primary job is keeping the lights on – but also helping to make sure the light bill is paid by generating income for the company.  Essentially it is easier to just be overhead – and do your job well – then to go out and be responsible for revenue.  An old manager of mine told me once that there was nothing more strategic than revenue to a business.  If IT wants to become truly strategic to the business – beyond the necessary role of keeping the databases running and email working – then it must also generate revenue for the company.  

The recent announcement of Digital Reef and Microsoft FAST is an important one.  At first glance it might be a bit confusing since both companies provide search capabilities.  Digital Reef does provide a search engine and additionally they offer capabilities that make their search smarter than other solutions and they also have features that live above the search stack. 

As a result Digital Reef can be a totally turn-key solution and also work with other third party search and indexing engines in order to protect the investment that customers have already made.  If an organization has already indexed tons and tons of content – then why go through the process again?  In addition to their own indexing capability, Digital Reef also will leverage existing popular indexes – in this case the FAST index – and bring the Digital Reef federation, performance, scalability, archiving and similarity engine to Microsoft FAST and SharePoint customers.

Digital Reef is a search and indexing solution but it is much more – if it wasn’t then why bother building a new company?  Digital Reef is Enterprise-class search and an information application with the goal of providing relevant data to its users – rapidly and efficiently.

Special thanks to Yoav Griver and Siddartha Rao for their contributions to this series.

ESI and technology issues relating to data storage and retrieval are often critical to litigation; there are many examples of high-stakes litigation that has turned on issues involving data management and e-discovery. See, e.g., United States v. Microsoft, 253 F.3d 34, 71–74 (D.C. Cir. 2001).  New legal frameworks have been created to deal with the reality of electronic data in litigation, and parties considering M&A deals should be aware of the potential litigation issues involving a merging counterparty or target company’s ESI and data management systems.

Data Storage and Potential Litigation Issues

Counsel must perform data due diligence that includes identification of existing legacy systems and the data stored within them.  Failure to do so may create integration issues, as well as data loss and data recovery issues that will create substantial costs and dangers in the event of future litigation.

For example, the ability to present data in multiple forms can raise the cost of discovery because courts can order litigants to convert discovery data into new formats.  This makes it all the more important that parties to M&A transactions conduct data due diligence to discover the location and formats of ESI in legacy data systems of M&A counterparties.  In the 1980 case of National Union Electric Corp. v. Matsushita Electric Industrial Co., 494 F. Supp. 1257 (E.D. Pa. 1980),  the defendants requested National Union to provide a “computer readable tape” copy of documents already produced in paper form. See National Union, 494 F. Supp. at 1258.   National Union resisted the motion on the grounds that under discovery rules National Union had an obligation to produce already existing documents, but had no such obligation to manufacture data in a new format. Id. at 1259.  The court acknowledged the distinction, but ultimately rejected the argument as inconsistent with the realities of data use and storage:

We now live in an era when much of the data our society desires to retain is stored in computer discs.  This process will escalate in years to come. We suspect that by the year 2000,  virtually all data will be stored in some form of computer memory.  To interpret the Federal Rules which, after all, are to be construed to “secure the just, speedy, and inexpensive determination of every action,” in a manner which would preclude the production of material such as is requested here, would eventually defeat their purpose. Id. at 1261–63

At the time of this opinion, the court could confidently state that it found “no case in which the court has ordered the programming of a computer to manufacture a computer tape not theretofore in physical existence.” Id. at 1261.  In contrast, today, “[t]he law is clear that data in computerized form is discoverable even if paper ‘hard copies’ of the information have been produced, and . . . the producing party can be required to design a computer program to extract the data from its computerized business records, subject to the Court’s discretion as to the allocation of the costs of designing such a computer program.” See Anti-Monopoly, Inc. v. Hasbro, Inc., 1995 U.S. Dist. LEXIS 16355, 1 (S.D.N.Y. Nov. 3, 1995).

When ordering the preservation or production of ESI, courts are sensitive to the relevance of the ESI to the litigation, the value of the ESI to the requesting party, and the cost to the producing party—courts will not foist irrational discovery requirements and costs upon litigants. See, Wright v. AmSouth Bancorp, 320 F.3d 1198 (11th Cir. 2003).

Nonetheless, where it is the producing party’s own document retention scheme which escalates the costs of production, courts may order the producing party to bear these costs.  For example, in In re Brand Name Prescription Drugs Antitrust Litigation, Brand, 1995 U.S. Dist. LEXIS 8281, defendant CIBA-Geigy Corporation argued that the class plaintiffs’ motion to compel the production of inter-corporate emails was overly broad, burdensome, and expensive and that the class plaintiff should bear the estimated $50,000–$70,000 costs of culling through over 30 million stored email documents. Id. At 2-4.  The court rejected this argument, noting that at least four other defendant manufacturers had produced emails without requesting payment of costs and succinctly stating that ”Class plaintiffs should not be forced to bear a burden caused by CIBA’s choice of electronic storage.” Id. at 6–7

Not surprisingly, the course of events has vindicated the predictions of the National Union court, and requests to produce data in specific formats are no longer unusual. See L.H. v. Schwarzenegger, 2008 U.S. Dist. LEXIS 86829 (E.D.Cal. May 14, 2008).

However, without proper data due diligence that accounts for document retention or legacy data management systems, such routine requests can create large litigation costs.  To the extent such costs are avoidable with proper data due diligence, the failure to conduct data due diligence on a counterparty’s legacy systems or ESI is tantamount to ignoring a potentially large liability when valuing a merging counterparty or target company.

I came across an article on Digital Reef that I think has some excellent points.

Here is an excerpt:

“…it turns out that Digital Reef has built something fairly new and interesting, a “similarity search engine” for big corporate networks that can start with one document—say, a Word or Excel file—and find others that resemble it.

That could be very useful if, for instance, you were a compliance officer at a big health plan and you wanted to see whether any of your employees had unsecured patient records sitting around on their laptop hard drives (which would be a big violation of federal healthcare privacy regulations). Just plop an example of a patient record into the Digital Reef system, and it will scour the network for other examples. Or say you were a lawyer at a big firm writing a brief in an employment case and you wanted to find out whether any of your colleagues working on similar cases in the past had already assembled the relevant citations. You could simply submit your entire draft to Digital Reef, and see what washed up.”

The author, Wade Roush does an excellent job of explaining the value of the Digital Reef similarity engine.  He cites two examples but the possibilities are numerous.  The power of similarity helps with discovery, compliance, efficient workflow, research, analysis, etc.