Feeds:
Posts
Comments

I suggest that if you are interested in intelligent Enterprise search that you listen to  Steve Akers being interviewed by BeyeNetworks – it is a long session but well worth the time.  

Steve discusses a number of things including the challenges faced by Enterprises, how Digital Reef solves these problems and some customer use cases.  There were also two slides that I thought was a great overview of the Digital Reef solution:

Discover

  • Automatically identify and index all unstructured data
  • Provide tools to find and understand the data: 
  • Boolean searches (freeform, fuzzy, metadata, phrase, proximity)
  • Similarity searches using example files
  • Email thread reconstruction
  • Exact and near duplicate identification 
  • Pattern expression recognition
  • Organize the data using automatic classification 

 Manage

  • Transform files into common file types
  • Collect and move data
  • Manage data retention policies

 

Designed for Scale and Security

  • Grid-based, distributed architecture provides performance and resiliency
  • Multi-tenant, role-based security model
  • Easily deployed and maintained
  • Indexes and prepares the full content and metadata of up to 10TBs of data in 24 hours with a standard configuration

 

Whither Thou ILM?

Some of you may remember the big buzz around Information Life-cycle Management or ILM.  EMC pushed the concept of ILM a few years back and many of their competitors followed them down this winding road.  You know that marketing campaigns are working when customers talk about ILM strategies using some of the same language as their vendors.  I witnessed this fairly extensively with ILM but the reality never matched the rhetoric.  

On some measure ILM has been successful.  A number of customers went to a multi-tier storage environment.  Some never moved data but actually became smarter about where they placed it to begin with.  Others would actually move data at either the volume level or if they were using file systems – at the file or file system level.  During this time a number of technologies and vendors came and went and when the dust settled there was modest levels of ILM but nowhere near the promise of the hype.  

The term ILM is rarely used these days and it is not going to open any doors for you.  However, just because we never reached the nirvana of ILM doesn’t mean that there wasn’t real value in the concept.  

In my view, the goal of ILM was to move data transparently to the appropriate storage tier balancing performance, protection and cost.  And the end result of implementing ILM included significant cost reductions and better utilization of your expensive IT infrastructure.  

But the mega-hype around ILM actually over-complicated it and created confusion.  There was and is no magic application or technology that could just make it all happen with a push of a button.  However, with the combination of people, process and technology there are great strides that can be made.  In fact, I know of IT professionals that have saved tons of money by implementing some form of ILM.  I submit that some level of ILM – regardless of what you call it – should be a requisite part of every data center.  In fact, it should be as fundamental a part of the data center as disaster recovery.

IT Challenges

Over the years I’ve spoken to hundreds of IT professionals and via research studies gained insight from thousands.  The following are some observations that I’ve made that seem somewhat consistent:

 

Change Management Often Means Don’t Change Anything.  On some level this makes sense because whenever we interject change into our environments there is a risk that problems will occur.  However, it is important not to create a culture that is anti-change.  One IT professional that I was working with claimed that he was a champion of implementing external storage virtualization but others in his group were opponents and squashed the project.  Another IT professional was combating their backup admin because the latter was dead set against implementing a disk-to-disk backup solution because he still clung onto his tape library.  Change must always be weighed in terms of risk and reward but a culture of no change can be destructive.  I do agree that “no” is a viable answer but it shouldn’t be the default.  

 

Innovation Often Causes Disruption.  There is actually a bit of a misperception that IT is typically ahead of the curve on implementing innovative solutions in the data center.  In reality it often takes years for innovation to permeate the masses.  We must also consider the fact that IT resources are limited and as such there is limited ability to actually evaluate and implement new systems.  

 

Incumbency Matters.  There is a legitimate reason why incumbency matters and that is because IT professionals invest a ton of time, money and resource in getting their infrastructure to work they way want it to.  And in the course of doing so they become experts on these systems.  To rip and replace with something completely new that they have little or no expertise can be counter-productive.  Having said that – it is still important to look at new solutions and it is critical that we don’t let incumbency trump excellence.  

 

We Often Over Engineer To Address Requirements.  We do this with storage, networks, servers, etc. – because the cost of risk is usually higher than the cost of capital.  However, in an economy as bad as the one we are facing – the cost of capital is arguably higher than the cost of risk.  The priorities have shifted with further and continued emphasis on optimization and utilization.  

 

Urgent and Important.  In 7 Habits of Highly Effective People, Steve Covey presented the idea that we should focus on those things that are important but not urgent.  However, any IT professional will tell you this is not the world that they live in.  Putting out fires is necessary and unavoidable.  But consider his point.  Let’s say someone has a heart attack.  That is both important and urgent and obviously must be tended to immediately.  However, if that person had eaten right, exercised and went for annual check ups – all IMPORTANT things to do – then perhaps the heart attack would have been avoided.  

 

    ”Many IT departments are kicked around by the circumstances: fighting fires wherever they appear, dealing with botched-up, antiquated systems, heterogeneous infrastructures, incompatible interfaces, undocumented specifications and shattered, often overlapping applications. Between two breaths, business and IT people may find a few moments to discuss requirements, ideas, plans (speed-dating, really). Then it is back to the usual.”  

 

    The above quote was from Ron Tolido, CTO of Capgemini and a frequent IT industry blogger (I highly recommend his blog). It brings up an issue with the challenges of IT as being perceived as more tactical versus strategic to the business.  

 

     In many cases IT is seen as an operational function that provides services to the “business” people in their respective companies.   As such, they are considered a great asset having technical skills and knowledge that others in the organization lack.  However, even though they are skilled and respected they are often just seen as service providers and fundamentally a necessary overhead to the business.  However, when IT combines both right brain creative thinking with left brain logical thinking greater leaps in value are achieved.  I hate to go to the default of Google as a company as the best example of this – but fundamentally IT is their business and as such it informs nearly everything that they do.  

 

     IT needs to be invited to the business table not just to provide operational services – however valuable they are.  Additionally, IT should also be consulted on what new ways technology can increase market share, raise brand awareness, improve revenue, increase profitability and develop new products, services and markets.  

 

    That is a tall order but if it can be achieved great things can happen.  Asking IT professionals to be more creative in their day-to-day business isn’t crazy.  Most IT guys that I know have dozens of ideas and opinions on improving products and services.  They are constantly pushing their vendors for better features, they are often adopting new gadgets for personal use, they are up on the latest technologies and trends, and they love mixing it up with their peers.  The problem is that they aren’t often asked (or more to the point – listened to) to provide feedback for the companies they work for specific to the businesses they are in.  

 

    We are in a tough economy and it is at these times when we can instigate change.  Companies should look to IT for new ideas and should create a culture that includes them as a part of the business process.  CEOs need to overcome their intimidation of IT – which I think is one of the big reasons for the divide.  More CIOs need to become CEOs – which is not the typical trend.  Additionally, CIOs need to see themselves not just as service providers whose primary job is keeping the lights on – but also helping to make sure the light bill is paid by generating income for the company.  Essentially it is easier to just be overhead – and do your job well – then to go out and be responsible for revenue.  An old manager of mine told me once that there was nothing more strategic than revenue to a business.  If IT wants to become truly strategic to the business – beyond the necessary role of keeping the databases running and email working – then it must also generate revenue for the company.  


The recent announcement of Digital Reef and Microsoft FAST is an important one.  At first glance it might be a bit confusing since both companies provide search capabilities.  Digital Reef does provide a search engine and additionally they offer capabilities that make their search smarter than other solutions and they also have features that live above the search stack. 

As a result Digital Reef can be a totally turn-key solution and also work with other third party search and indexing engines in order to protect the investment that customers have already made.  If an organization has already indexed tons and tons of content – then why go through the process again?  In addition to their own indexing capability, Digital Reef also will leverage existing popular indexes – in this case the FAST index – and bring the Digital Reef federation, performance, scalability, archiving and similarity engine to Microsoft FAST and SharePoint customers.

Digital Reef is a search and indexing solution but it is much more – if it wasn’t then why bother building a new company?  Digital Reef is Enterprise-class search and an information application with the goal of providing relevant data to its users – rapidly and efficiently.

Special thanks to Yoav Griver and Siddartha Rao for their contributions to this series.

ESI and technology issues relating to data storage and retrieval are often critical to litigation; there are many examples of high-stakes litigation that has turned on issues involving data management and e-discovery. See, e.g., United States v. Microsoft, 253 F.3d 34, 71–74 (D.C. Cir. 2001).  New legal frameworks have been created to deal with the reality of electronic data in litigation, and parties considering M&A deals should be aware of the potential litigation issues involving a merging counterparty or target company’s ESI and data management systems.

Data Storage and Potential Litigation Issues

Counsel must perform data due diligence that includes identification of existing legacy systems and the data stored within them.  Failure to do so may create integration issues, as well as data loss and data recovery issues that will create substantial costs and dangers in the event of future litigation.

For example, the ability to present data in multiple forms can raise the cost of discovery because courts can order litigants to convert discovery data into new formats.  This makes it all the more important that parties to M&A transactions conduct data due diligence to discover the location and formats of ESI in legacy data systems of M&A counterparties.  In the 1980 case of National Union Electric Corp. v. Matsushita Electric Industrial Co., 494 F. Supp. 1257 (E.D. Pa. 1980),  the defendants requested National Union to provide a “computer readable tape” copy of documents already produced in paper form. See National Union, 494 F. Supp. at 1258.   National Union resisted the motion on the grounds that under discovery rules National Union had an obligation to produce already existing documents, but had no such obligation to manufacture data in a new format. Id. at 1259.  The court acknowledged the distinction, but ultimately rejected the argument as inconsistent with the realities of data use and storage:

We now live in an era when much of the data our society desires to retain is stored in computer discs.  This process will escalate in years to come. We suspect that by the year 2000,  virtually all data will be stored in some form of computer memory.  To interpret the Federal Rules which, after all, are to be construed to “secure the just, speedy, and inexpensive determination of every action,” in a manner which would preclude the production of material such as is requested here, would eventually defeat their purpose. Id. at 1261–63

At the time of this opinion, the court could confidently state that it found “no case in which the court has ordered the programming of a computer to manufacture a computer tape not theretofore in physical existence.” Id. at 1261.  In contrast, today, “[t]he law is clear that data in computerized form is discoverable even if paper ‘hard copies’ of the information have been produced, and . . . the producing party can be required to design a computer program to extract the data from its computerized business records, subject to the Court’s discretion as to the allocation of the costs of designing such a computer program.” See Anti-Monopoly, Inc. v. Hasbro, Inc., 1995 U.S. Dist. LEXIS 16355, 1 (S.D.N.Y. Nov. 3, 1995).

When ordering the preservation or production of ESI, courts are sensitive to the relevance of the ESI to the litigation, the value of the ESI to the requesting party, and the cost to the producing party—courts will not foist irrational discovery requirements and costs upon litigants. See, Wright v. AmSouth Bancorp, 320 F.3d 1198 (11th Cir. 2003).

Nonetheless, where it is the producing party’s own document retention scheme which escalates the costs of production, courts may order the producing party to bear these costs.  For example, in In re Brand Name Prescription Drugs Antitrust Litigation, Brand, 1995 U.S. Dist. LEXIS 8281, defendant CIBA-Geigy Corporation argued that the class plaintiffs’ motion to compel the production of inter-corporate emails was overly broad, burdensome, and expensive and that the class plaintiff should bear the estimated $50,000–$70,000 costs of culling through over 30 million stored email documents. Id. At 2-4.  The court rejected this argument, noting that at least four other defendant manufacturers had produced emails without requesting payment of costs and succinctly stating that ”Class plaintiffs should not be forced to bear a burden caused by CIBA’s choice of electronic storage.” Id. at 6–7

Not surprisingly, the course of events has vindicated the predictions of the National Union court, and requests to produce data in specific formats are no longer unusual. See L.H. v. Schwarzenegger, 2008 U.S. Dist. LEXIS 86829 (E.D.Cal. May 14, 2008).

However, without proper data due diligence that accounts for document retention or legacy data management systems, such routine requests can create large litigation costs.  To the extent such costs are avoidable with proper data due diligence, the failure to conduct data due diligence on a counterparty’s legacy systems or ESI is tantamount to ignoring a potentially large liability when valuing a merging counterparty or target company.

I came across an article on Digital Reef that I think has some excellent points.

Here is an excerpt:

“…it turns out that Digital Reef has built something fairly new and interesting, a “similarity search engine” for big corporate networks that can start with one document—say, a Word or Excel file—and find others that resemble it.

That could be very useful if, for instance, you were a compliance officer at a big health plan and you wanted to see whether any of your employees had unsecured patient records sitting around on their laptop hard drives (which would be a big violation of federal healthcare privacy regulations). Just plop an example of a patient record into the Digital Reef system, and it will scour the network for other examples. Or say you were a lawyer at a big firm writing a brief in an employment case and you wanted to find out whether any of your colleagues working on similar cases in the past had already assembled the relevant citations. You could simply submit your entire draft to Digital Reef, and see what washed up.”

The author, Wade Roush does an excellent job of explaining the value of the Digital Reef similarity engine.  He cites two examples but the possibilities are numerous.  The power of similarity helps with discovery, compliance, efficient workflow, research, analysis, etc.

http://www.xconomy.com/boston/2009/03/03/digital-reefs-similarity-based-search-helps-corporate-data-speak-for-itself/

One of the persistent puzzles surrounding mergers and acquisitions (“M&A”) activity is its propensity for failure. In fact, hundreds of studies suggest that fifty to eighty percent ofmergers and acquisitions are failures. Thus, while the goal of an M&A deal is that the whole is worth more than the party, the converse is frequently true. An important determinant of any M&A transaction’s post integration success is data due diligence. In today’s M&A environment, where transaction experience substantial scrutiny and technology plays a crucial role, data due diligence is tantamount.

Nonetheless, merging or acquiring companies often fail to perform adequate data due diligence and fail to consider the electronically stored information (“ESI”) and data storage systems of the target company or merging counterpart. This oversight presents substantial risks and can cause substantial post-integration problem and, in turn, increase the likelihood of M&A failure.

Creating an E-Discovery Checklist

One of the crucial ways that in-house and outside counsel can fail to conduct proper data due diligence is by ignoring potential eDiscovery issues as part of the M&A deal.

Why is this important?

eDiscovery issues may well affect the value of the company being acquired, the cost and difficulty of merging the two companies, or heighten litigation risk going forward. Corporations and law firms have fine-tuned due diligence checklists to account for various traditional business risks such as legal, contractual, regulatory, securities, financial and undisclosed liabilities, yet eDiscovery is noticeably absent.

This failure of counsel to conduct data due diligence on a target company’s e-discovery issues, e.g. preservation and cost obligations regarding its ESI, can cause substantial losses for the acquiring company, impacting the expected return.

An e-discovery checklist could have many elements and would vary with respect to the industry and company, but regardless, it should account for:

  1. The state of the target company’s ESI, ensuring that it has been thoroughly identified, categorized, and sourced;
  2. Existing preservation and litigation holds;
  3. The cost of preserving data for existing or anticipated legal holds;
  4. and Both structured and unstructured data

I would like to hear your comments on this checklist, including additions. More thoughts on M&A coming in future posts.

For years there has been a dialog about turning data into information and the usual reaction is the slight tilt of the head and a glassy gaze from the listener that is akin to a dog’s reaction to a high-pitched whistle.  We then go into the definitions of data and information discussing the differences between the two. 

EMC is one of the leading storage vendors and they have a smart and pithy tagline – “Where Information Lives”.  Information does live within storage systems and they house information, protect it and make it accessible to users and applications.  However, storage systems do not make the information useful to the front-line businesses that own them. 

Many storage system folks will respond, “ That isn’t true.  We have CAS and NAS solutions that make use of information.”  This might be true on a limited basis but overall we have failed to bridge the gap between storage infrastructure and information.  Once we had high hopes that this would occur, but it is yet to be realized on a mass scale. 

The reason we have not achieved this is because it is the job of applications to deal with the use of information.  Therefore, information lives on storage systems but the greater use of that information is via applications. 

The problem with applications is that there are so many of them and they each generate information that is proprietary.  There is no correlation of information.  There is no single pool of information.  All of our information is stovepiped.  As such, we limit the use because our information is bounded and boxed. 

I submit that we need information applications that have a universal or federated view of all the information within a company.  The information application (IA) would provide users the ability to access data regardless of what application created it.  At the heart of the IA would be search and indexing – consider it middleware or the engine – that sits between the IA and the company’s storage.  The IA would allow for analysis and cross correlation of information.  The IA would provide tools to re-use of data.

Search already is an application providing tools and use of information.  However, I am also talking about using search and indexing for deeper integration within existing applications and as an engine for new applications.  There will be applications that use the concept of IA as a component and pure IA applications. 

This is a new idea and like so many new ideas it will be met with some misunderstanding (we can already do that), with lots of questions (how much will this cost and whose budget does it come out of?), and with hopefully inspiration to take it further (we can build it).  Information applications will turn the Information Age into the Useful Information Age.

As many of you may know Microsoft acquired FAST Search and Transfer in 2008 for a sizable sum of money ($1.2 billion).   For those of you that aren’t familiar with FAST, there were a fairly successful search and indexing company that was used by Enterprise end user companies.  From the onset there seemed to be a real desire to marry FAST with Microsoft SharePoint and after some twists and turns that is still the strategy – see this Beyond Search blog entry for more insights.

Certainly it makes sense for Microsoft to integrate FAST with SharePoint but what about the rest of the Enterprise?  Yes, Microsoft wants to have all unstructured content be stored in SharePoint but since that hasn’t happened yet and it may never be a fully realized goal – it is important to have Enterprise search be heterogeneous. 

One thing that many readers may not know is that FAST did a great job partnering with storage vendors including EMC and HDS – just to name two of the biggest.  It is uncertain what is going to happen with these relationships based on the fact that Microsoft seems to be focused on making FAST work only with Microsoft products.

It appears that Microsoft still really doesn’t know what to do with its consumer or Enterprise search strategy.  They have a ton of smart people, lots of resources at their finger tips, etc, etc.  However, search software seems to be a perpetual stumbling block.

Older Posts »