Archive for the ‘Search’ Category

I suggest that if you are interested in intelligent Enterprise search that you listen to  Steve Akers being interviewed by BeyeNetworks – it is a long session but well worth the time.  

Steve discusses a number of things including the challenges faced by Enterprises, how Digital Reef solves these problems and some customer use cases.  There were also two slides that I thought was a great overview of the Digital Reef solution:


  • Automatically identify and index all unstructured data
  • Provide tools to find and understand the data: 
  • Boolean searches (freeform, fuzzy, metadata, phrase, proximity)
  • Similarity searches using example files
  • Email thread reconstruction
  • Exact and near duplicate identification 
  • Pattern expression recognition
  • Organize the data using automatic classification 


  • Transform files into common file types
  • Collect and move data
  • Manage data retention policies


Designed for Scale and Security

  • Grid-based, distributed architecture provides performance and resiliency
  • Multi-tenant, role-based security model
  • Easily deployed and maintained
  • Indexes and prepares the full content and metadata of up to 10TBs of data in 24 hours with a standard configuration


Read Full Post »

The recent announcement of Digital Reef and Microsoft FAST is an important one.  At first glance it might be a bit confusing since both companies provide search capabilities.  Digital Reef does provide a search engine and additionally they offer capabilities that make their search smarter than other solutions and they also have features that live above the search stack. 

As a result Digital Reef can be a totally turn-key solution and also work with other third party search and indexing engines in order to protect the investment that customers have already made.  If an organization has already indexed tons and tons of content – then why go through the process again?  In addition to their own indexing capability, Digital Reef also will leverage existing popular indexes – in this case the FAST index – and bring the Digital Reef federation, performance, scalability, archiving and similarity engine to Microsoft FAST and SharePoint customers.

Digital Reef is a search and indexing solution but it is much more – if it wasn’t then why bother building a new company?  Digital Reef is Enterprise-class search and an information application with the goal of providing relevant data to its users – rapidly and efficiently.

Read Full Post »

As many of you may know Microsoft acquired FAST Search and Transfer in 2008 for a sizable sum of money ($1.2 billion).   For those of you that aren’t familiar with FAST, there were a fairly successful search and indexing company that was used by Enterprise end user companies.  From the onset there seemed to be a real desire to marry FAST with Microsoft SharePoint and after some twists and turns that is still the strategy – see this Beyond Search blog entry for more insights.

Certainly it makes sense for Microsoft to integrate FAST with SharePoint but what about the rest of the Enterprise?  Yes, Microsoft wants to have all unstructured content be stored in SharePoint but since that hasn’t happened yet and it may never be a fully realized goal – it is important to have Enterprise search be heterogeneous. 

One thing that many readers may not know is that FAST did a great job partnering with storage vendors including EMC and HDS – just to name two of the biggest.  It is uncertain what is going to happen with these relationships based on the fact that Microsoft seems to be focused on making FAST work only with Microsoft products.

It appears that Microsoft still really doesn’t know what to do with its consumer or Enterprise search strategy.  They have a ton of smart people, lots of resources at their finger tips, etc, etc.  However, search software seems to be a perpetual stumbling block.

Read Full Post »

I was an industry analysts for many years. I focused heavily on storage systems and was convinced that search and storage would eventually be like peanut butter and jelly. Although we have not seen the realization of this yet – I am still convinced that it needs to and will ultimately occur. However, like all things in the data center, it just takes time.

There are practical reasons why storage and search aren’t more bounded together. For one, search solutions haven’t been scalable or intelligent enough to provide the value that IT professionals are looking for. Second, most search solutions have been associated with specific storage systems and not the entire storage complex. That is very limiting. We need Enterprise search solutions that can access all storage within an organization. The third big issue is that storage adminstrators haven’t figured out why they need it. There are some applications and use cases that are a priority – such as eDiscovery. But storage admins have not found the killer app that gives them that “aha” moment where they just need to have it and are willing to invest time and money.

What is the killer app for search and storage? I believe one killer app is using a universal search application as a tool to give Enterprise end users greater access to the company’s data. We create so much content, using any number of applications, and instead of looking for data via the various application interfaces, having a single pane of glass, to get to any and all content in the Enterprise, would provide huge increases in productivity and efficiency.

This concept should not be a leap for most people , but since no one is complaining about it or demanding it, it isn’t a priorty. However, if you think about the power of being able to easily access content – data – information – we all know that mountains can be moved when this ability is provided. This is where storage admins have to transcend their nuts-and-bolts view of the world and think about the business and how they can apply technology to elevate the companies they work for. It is “right brain” thinking (creative) versus the typical “left brain” logical and rational thinking that is typically needed in the data center. Only by combining the creative and the logical can real leaps forward be made.

Read Full Post »

Digital Reef recently came out of stealth mode and is now talking to press openly about their solution. I spoke to a few trade press editors about Digital Reef and they wanted to know what made Digital Reef uniquely valuable in a market that seems to have a wide range of solutions for customers to choose from. It is not enough that a vendor is valuable or unique. If competitors offer the same value then the solution may have no real market traction. If a solution is unique but that singular capability offers no real value then customers will not pay for it.

In the world of high-tech there is often confusion because we often use the same terms to mean different things and different terms to mean the same thing. Therefore XYZ vendor may say they provide Enterprise-class search and indexing and are able to scale and provide rapid access to content for users. Therefore when Digital Reef states that they are “Enterprise-class” – it is important to distinguish and articulate what makes them uniquely valuable.

The ability to provide Enterprise-class search and indexing requires two very different core competencies. The first requirement is to build a platform – IT infrastructure – to address the needs of the Enterprise. These include massive amounts of content that is stored on heterogeneous storage that is most likely geographically dispersed. How do you index all of the existing content – which consists of hundreds of terabytes and perhaps even petabytes – while new data is created continuously? How long will it take the solution to catch up? Days? Weeks? Months? Years? Ever?

Digital Reef has built a scalable system that works like a grid or cluster – enabling you to add more compute resources to tackle this huge challenge. In other words, they have developed and provide sophisticated infrastructure – applying grow-able grid technology leveraging massive amounts of compute power in a unified fashion to index mountains of content.

The other core competency is to quickly access relevant content. Digital Reef provides this through keyword search and their unique similarity engine – I discussed this in greater depth in my last blog – The Power of Similarity. Their search capability enables you to get results based on context. Consider the sentence – “I’m feeling blue” – which has nothing to do with the actual color but a pure keyword search would be swimming with content that included a myriad of references to the color blue including paints, fabrics, the sky, the ocean, etc.

Digital Reef excels when looking for abstract concepts, metaphors, idioms, specifics, vertical terminology, and word associations. And the magic of all of this is mathematics – complex, reasoned, considered and sophisticated algorithms.

It is the combination of their scalable clustered architecture and similarity engine that makes Digital Reef uniquely valuable.

Read Full Post »

Digital Reef provides technology that I call a similarity engine.  The similarity concept within search is a powerful one – the ability to determine both exact duplicates and near duplicate content.  And being able to set the level of similarity – whether it is 10% or 90% similar – provides a great deal of utility for the users.   Additionally, finding similarity between different types of content – for example, a Word document and a PDF can be critically important.  Or perhaps there is a table embedded in a presentation and that same table is in an Excel spreadsheet.  Someone could have cut and pasted text from a document and put it in an email. 

Finding exact duplicates can be useful but add to this near duplicates and there is potentially even greater implications.  Near duplicates allow you to find the same content in different types of files.  Perhaps you can find different revisions of the same document.  There may be information from a document that has been quoted, cut and pasted, and used within any number of other documents. 

The practical use of similarity is important – above and beyond the casual quest for content or information for which people often use general purpose and consumer search tools – there is real “utility”  that can have implications to your business.  One such utility is day-to-day work flow – getting better use and productivity out of the content already created within your company.  Similarity can help with performing research that has a wide range of applicability – medical, legal, science, consumer, business, technology, etc.  There is a concept that I call content mining (more on this in my next blog) that could be enabled by similarity technology.  And probably the greatest driver for using similarity technology in today’s environment  is for e-Discovery.  As my fellow blogger – Daniel Garrie astutely points out – “The onslaught that is sure to come–companies large and small are going to be under legal siege in 2009. Whether it is white collar crime, bankruptcy or wrongful termination, we’ll see surges maybe even a Tsunami of new lawsuits.”

Similarity technology is different from a search engine throwing anything and everything at you based on a keyword.  Rather, it is sophisticated technology with intelligent algorithms that scan content and analyzes similarity based on ratios that are user defined.  Additionally, it will present back to you the level of similarity – 100%, 95%, 70%, 60%, and so on.

I believe that Digital Reef’s similarity technology can elevate and accelerate the access of critical data within the enterprise.  It is extremely valuable and unique technology – in fact – I don’t know of anything similar.

Read Full Post »

Patternicity – defined by Michael Shermer – a writer for Scientific American – is the tendency to find meaningful patterns in meaningless noise.  When I read this article on Pattnernicity I immediately related it to the challenges we face with information access. 

Patternicity deals with false positives and we have a compartive with search tools – too many responses that may or may not be what we are looking for.  Human Patternicity is meant to err on the side of caution because as Shermer points out – “the cost of believing that the rustle in the grass is a dangerous predator when it is just the wind is relatively low compared with the opposite. Thus, there would have been a beneficial selection for believing that most patterns are real.”

Digital Patternicity is also meant to err on the side of caution because the cost of believing that the keyword matches your intentions is relatively low compared with returning a false negative.  Therefore returning a false positive is better than returning a false negative. 

The problem in both Human and Digital Patternicity is that the algorithms are limited and have stopped evolving because they don’t need to improve.   Human beings are very successful and don’t require more sophisticated methods for returning fewer false positives.  Likewise, search companies like Google are very successful and have built a huge business in spite of the number of false positives they return. 

However, increasingly within the world of business – where information equates to revenue, competitive advantage and market growth – there is a big price to pay with false positives and a shift in the evolution of Digital Patternicity must occur.  There will always be a place for acceptable false positives in the mass market – but when you get to specialization, when the stakes become too high, when survival is at risk – then evolution aggressively adapts.

Read Full Post »

Older Posts »