Archive for the ‘Storage’ Category

I suggest that if you are interested in intelligent Enterprise search that you listen to  Steve Akers being interviewed by BeyeNetworks – it is a long session but well worth the time.  

Steve discusses a number of things including the challenges faced by Enterprises, how Digital Reef solves these problems and some customer use cases.  There were also two slides that I thought was a great overview of the Digital Reef solution:


  • Automatically identify and index all unstructured data
  • Provide tools to find and understand the data: 
  • Boolean searches (freeform, fuzzy, metadata, phrase, proximity)
  • Similarity searches using example files
  • Email thread reconstruction
  • Exact and near duplicate identification 
  • Pattern expression recognition
  • Organize the data using automatic classification 


  • Transform files into common file types
  • Collect and move data
  • Manage data retention policies


Designed for Scale and Security

  • Grid-based, distributed architecture provides performance and resiliency
  • Multi-tenant, role-based security model
  • Easily deployed and maintained
  • Indexes and prepares the full content and metadata of up to 10TBs of data in 24 hours with a standard configuration


Read Full Post »

Some of you may remember the big buzz around Information Life-cycle Management or ILM.  EMC pushed the concept of ILM a few years back and many of their competitors followed them down this winding road.  You know that marketing campaigns are working when customers talk about ILM strategies using some of the same language as their vendors.  I witnessed this fairly extensively with ILM but the reality never matched the rhetoric.  

On some measure ILM has been successful.  A number of customers went to a multi-tier storage environment.  Some never moved data but actually became smarter about where they placed it to begin with.  Others would actually move data at either the volume level or if they were using file systems – at the file or file system level.  During this time a number of technologies and vendors came and went and when the dust settled there was modest levels of ILM but nowhere near the promise of the hype.  

The term ILM is rarely used these days and it is not going to open any doors for you.  However, just because we never reached the nirvana of ILM doesn’t mean that there wasn’t real value in the concept.  

In my view, the goal of ILM was to move data transparently to the appropriate storage tier balancing performance, protection and cost.  And the end result of implementing ILM included significant cost reductions and better utilization of your expensive IT infrastructure.  

But the mega-hype around ILM actually over-complicated it and created confusion.  There was and is no magic application or technology that could just make it all happen with a push of a button.  However, with the combination of people, process and technology there are great strides that can be made.  In fact, I know of IT professionals that have saved tons of money by implementing some form of ILM.  I submit that some level of ILM – regardless of what you call it – should be a requisite part of every data center.  In fact, it should be as fundamental a part of the data center as disaster recovery.

Read Full Post »

As many of you may know Microsoft acquired FAST Search and Transfer in 2008 for a sizable sum of money ($1.2 billion).   For those of you that aren’t familiar with FAST, there were a fairly successful search and indexing company that was used by Enterprise end user companies.  From the onset there seemed to be a real desire to marry FAST with Microsoft SharePoint and after some twists and turns that is still the strategy – see this Beyond Search blog entry for more insights.

Certainly it makes sense for Microsoft to integrate FAST with SharePoint but what about the rest of the Enterprise?  Yes, Microsoft wants to have all unstructured content be stored in SharePoint but since that hasn’t happened yet and it may never be a fully realized goal – it is important to have Enterprise search be heterogeneous. 

One thing that many readers may not know is that FAST did a great job partnering with storage vendors including EMC and HDS – just to name two of the biggest.  It is uncertain what is going to happen with these relationships based on the fact that Microsoft seems to be focused on making FAST work only with Microsoft products.

It appears that Microsoft still really doesn’t know what to do with its consumer or Enterprise search strategy.  They have a ton of smart people, lots of resources at their finger tips, etc, etc.  However, search software seems to be a perpetual stumbling block.

Read Full Post »

I was an industry analysts for many years. I focused heavily on storage systems and was convinced that search and storage would eventually be like peanut butter and jelly. Although we have not seen the realization of this yet – I am still convinced that it needs to and will ultimately occur. However, like all things in the data center, it just takes time.

There are practical reasons why storage and search aren’t more bounded together. For one, search solutions haven’t been scalable or intelligent enough to provide the value that IT professionals are looking for. Second, most search solutions have been associated with specific storage systems and not the entire storage complex. That is very limiting. We need Enterprise search solutions that can access all storage within an organization. The third big issue is that storage adminstrators haven’t figured out why they need it. There are some applications and use cases that are a priority – such as eDiscovery. But storage admins have not found the killer app that gives them that “aha” moment where they just need to have it and are willing to invest time and money.

What is the killer app for search and storage? I believe one killer app is using a universal search application as a tool to give Enterprise end users greater access to the company’s data. We create so much content, using any number of applications, and instead of looking for data via the various application interfaces, having a single pane of glass, to get to any and all content in the Enterprise, would provide huge increases in productivity and efficiency.

This concept should not be a leap for most people , but since no one is complaining about it or demanding it, it isn’t a priorty. However, if you think about the power of being able to easily access content – data – information – we all know that mountains can be moved when this ability is provided. This is where storage admins have to transcend their nuts-and-bolts view of the world and think about the business and how they can apply technology to elevate the companies they work for. It is “right brain” thinking (creative) versus the typical “left brain” logical and rational thinking that is typically needed in the data center. Only by combining the creative and the logical can real leaps forward be made.

Read Full Post »

Digital Reef provides technology that I call a similarity engine.  The similarity concept within search is a powerful one – the ability to determine both exact duplicates and near duplicate content.  And being able to set the level of similarity – whether it is 10% or 90% similar – provides a great deal of utility for the users.   Additionally, finding similarity between different types of content – for example, a Word document and a PDF can be critically important.  Or perhaps there is a table embedded in a presentation and that same table is in an Excel spreadsheet.  Someone could have cut and pasted text from a document and put it in an email. 

Finding exact duplicates can be useful but add to this near duplicates and there is potentially even greater implications.  Near duplicates allow you to find the same content in different types of files.  Perhaps you can find different revisions of the same document.  There may be information from a document that has been quoted, cut and pasted, and used within any number of other documents. 

The practical use of similarity is important – above and beyond the casual quest for content or information for which people often use general purpose and consumer search tools – there is real “utility”  that can have implications to your business.  One such utility is day-to-day work flow – getting better use and productivity out of the content already created within your company.  Similarity can help with performing research that has a wide range of applicability – medical, legal, science, consumer, business, technology, etc.  There is a concept that I call content mining (more on this in my next blog) that could be enabled by similarity technology.  And probably the greatest driver for using similarity technology in today’s environment  is for e-Discovery.  As my fellow blogger – Daniel Garrie astutely points out – “The onslaught that is sure to come–companies large and small are going to be under legal siege in 2009. Whether it is white collar crime, bankruptcy or wrongful termination, we’ll see surges maybe even a Tsunami of new lawsuits.”

Similarity technology is different from a search engine throwing anything and everything at you based on a keyword.  Rather, it is sophisticated technology with intelligent algorithms that scan content and analyzes similarity based on ratios that are user defined.  Additionally, it will present back to you the level of similarity – 100%, 95%, 70%, 60%, and so on.

I believe that Digital Reef’s similarity technology can elevate and accelerate the access of critical data within the enterprise.  It is extremely valuable and unique technology – in fact – I don’t know of anything similar.

Read Full Post »

Alright we all know that we have a ton of data and its growing and growing.  And maybe you are sick of hearing about it.  But you should really listen.  I liken the growth of data in business to the growth of the human body taking on too much weight.  The result is that we may be able to function for a long time but eventually there will be serious ramifications if we don’t do the right things to become healthy. 

There is an interesting IDC report that was published in 2007 – a bit old but has some compelling information and insight.  Let’s break down some of it:

  •  In 2006, the amount of digital information created, captured, and replicated 161 exabytes or 161 billion gigabytes. This is about 3 million times the information in all the books ever written. Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes. 

My observation:  This numbers illustrate the sheer volume of digital data that has being created and further – tells you that we ain’t seen nothing yet. 

  •  IDC predicts that by 2010, while nearly 70% of the digital universe will be created by individuals, organizations (businesses of all sizes, agencies, governments, associations, etc.) will be responsible for the security, privacy, reliability, and compliance of at least 85% of that same digital universe.

My observation:  The importance of this is that organizations will have to manage data created by their customers and employees – which will have a real business impact.  And IDC left a few things out – accessing the data and protecting it. 

  • The cost of not responding to the avalanche of information can add up, yet not be immediately visible to CEOs and CFOs.

My observation:  This goes back to my unhealthy body analogy – you may not know what vital organ or system is going to collapse – it may be more than one – and you won’t know until something bad happens. 

  • In surveys of U.S. companies, we have found that information workers spend 14.5 hours per week reading and answering email, 13.3 hours creating documents, 9.6 hours searching for information, and 9.5 hours analyzing information.
  • We estimate that an organization employing 1,000 knowledge workers loses $5.7 million annually just in time wasted having to reformat information as they move among applications.  Not finding information costs that same organization an additional $5.3 million a year.

IDC is saying that poor data management can cost you $11 million annually just based on your users wasting time.  That doesn’t take into account other costs – such as outside audits, e-Discovery, litigation, etc.  You’ve just been told you have a severe case of diabetes and need to do something about it.

We need greater levels of integration between applications, storage systems and data management tools to turn data into information and then to get us the right information when we need it.  Okay?  Go make it happen 😉

Certainly this is easier said than done.  But the ecosystem – customers and the various vendors – must all move towards this objective.  We already have better tools to accomplish these tasks but we have a long way to go before reaching information utopia.  The first step is to recognize that there is an issue – a problem – and make it a priority to research and begin to address the unhealthiness and the short and long term ramifications.

Read Full Post »

I’m a senior consultant and founder for the INI Group and am working very closely with Digital Reef as a consultant, advisor and blogger. I’ve been in the high tech industry for over 23 years with a focus on the data management and storage arena and you can find out more about me at www.contemplatingIT.com. I believe Digital Reef has brought to the table an extremely impressive solution at a critical time when our unstructured data content is growing to massive levels.

In addition to being and advisor and consultant for Digital Reef – I’m going to be blogging for them on a regular basis discussing a wide range of topic areas from business issues, compelling technology, market dynamics and visions going forward.

Who and what is Digital Reef? They are a startup – an emerging vendor – that came to right conclusion that Enterprise search is woefully inadequate on multiple levels – the mechanics of making it work efficiently and intelligently in environments with massive amounts of content; and the ability to get relevant data to the user rapidly and without drowning them with irrelevant results.

I describe the Digital Reef solution as a data and content management platform leveraging intelligent and scalable search and indexing technologies. Digital Reef provides appliances with a grid architecture that ingests and indexes massive amounts of content spread across heterogeneous storage throughout the Enterprise creating a global federated index. Some of the biggest challenges with indexing include scalability, transparency and true global federation – and Digital Reef solves all three.

Once you have all of your unstructured data indexed – what are you going to do with it? Another big challenge with management of unstructured data is making order out of chaos. If you just use keyword searches there will be a large number of irrelevant returns that obscure what you really need.

The problem with keywords is that there is very little useful context. Digital Reef’s magic ingredient is its similarity engine – the ability to analyze content including documents, email threads and terms and return to you results based on a user defined similarity ratio. Digital Reef’s similarity engine is sophisticated technology that not only uses keywords but the associations of terms and the context in which they are used within unstructured data – providing relevant results.

Companies are frustrated because information is really three dimensional but we are using two dimensional tools to access and manage them. The first step is to implement solutions that provide us rapid access to relevant data for reactive purposes such as a discovery process, audits, research, customer support, projects, etc. However, think of the potential of really using information to also build revenue generating products and services leveraging existing intellectual property. The potential is compelling and landscape changing.

Read Full Post »