Archive for the ‘Search & Indexing’ Category

As many of you may know Microsoft acquired FAST Search and Transfer in 2008 for a sizable sum of money ($1.2 billion).   For those of you that aren’t familiar with FAST, there were a fairly successful search and indexing company that was used by Enterprise end user companies.  From the onset there seemed to be a real desire to marry FAST with Microsoft SharePoint and after some twists and turns that is still the strategy – see this Beyond Search blog entry for more insights.

Certainly it makes sense for Microsoft to integrate FAST with SharePoint but what about the rest of the Enterprise?  Yes, Microsoft wants to have all unstructured content be stored in SharePoint but since that hasn’t happened yet and it may never be a fully realized goal – it is important to have Enterprise search be heterogeneous. 

One thing that many readers may not know is that FAST did a great job partnering with storage vendors including EMC and HDS – just to name two of the biggest.  It is uncertain what is going to happen with these relationships based on the fact that Microsoft seems to be focused on making FAST work only with Microsoft products.

It appears that Microsoft still really doesn’t know what to do with its consumer or Enterprise search strategy.  They have a ton of smart people, lots of resources at their finger tips, etc, etc.  However, search software seems to be a perpetual stumbling block.

Read Full Post »

I was an industry analysts for many years. I focused heavily on storage systems and was convinced that search and storage would eventually be like peanut butter and jelly. Although we have not seen the realization of this yet – I am still convinced that it needs to and will ultimately occur. However, like all things in the data center, it just takes time.

There are practical reasons why storage and search aren’t more bounded together. For one, search solutions haven’t been scalable or intelligent enough to provide the value that IT professionals are looking for. Second, most search solutions have been associated with specific storage systems and not the entire storage complex. That is very limiting. We need Enterprise search solutions that can access all storage within an organization. The third big issue is that storage adminstrators haven’t figured out why they need it. There are some applications and use cases that are a priority – such as eDiscovery. But storage admins have not found the killer app that gives them that “aha” moment where they just need to have it and are willing to invest time and money.

What is the killer app for search and storage? I believe one killer app is using a universal search application as a tool to give Enterprise end users greater access to the company’s data. We create so much content, using any number of applications, and instead of looking for data via the various application interfaces, having a single pane of glass, to get to any and all content in the Enterprise, would provide huge increases in productivity and efficiency.

This concept should not be a leap for most people , but since no one is complaining about it or demanding it, it isn’t a priorty. However, if you think about the power of being able to easily access content – data – information – we all know that mountains can be moved when this ability is provided. This is where storage admins have to transcend their nuts-and-bolts view of the world and think about the business and how they can apply technology to elevate the companies they work for. It is “right brain” thinking (creative) versus the typical “left brain” logical and rational thinking that is typically needed in the data center. Only by combining the creative and the logical can real leaps forward be made.

Read Full Post »

Digital Reef recently came out of stealth mode and is now talking to press openly about their solution. I spoke to a few trade press editors about Digital Reef and they wanted to know what made Digital Reef uniquely valuable in a market that seems to have a wide range of solutions for customers to choose from. It is not enough that a vendor is valuable or unique. If competitors offer the same value then the solution may have no real market traction. If a solution is unique but that singular capability offers no real value then customers will not pay for it.

In the world of high-tech there is often confusion because we often use the same terms to mean different things and different terms to mean the same thing. Therefore XYZ vendor may say they provide Enterprise-class search and indexing and are able to scale and provide rapid access to content for users. Therefore when Digital Reef states that they are “Enterprise-class” – it is important to distinguish and articulate what makes them uniquely valuable.

The ability to provide Enterprise-class search and indexing requires two very different core competencies. The first requirement is to build a platform – IT infrastructure – to address the needs of the Enterprise. These include massive amounts of content that is stored on heterogeneous storage that is most likely geographically dispersed. How do you index all of the existing content – which consists of hundreds of terabytes and perhaps even petabytes – while new data is created continuously? How long will it take the solution to catch up? Days? Weeks? Months? Years? Ever?

Digital Reef has built a scalable system that works like a grid or cluster – enabling you to add more compute resources to tackle this huge challenge. In other words, they have developed and provide sophisticated infrastructure – applying grow-able grid technology leveraging massive amounts of compute power in a unified fashion to index mountains of content.

The other core competency is to quickly access relevant content. Digital Reef provides this through keyword search and their unique similarity engine – I discussed this in greater depth in my last blog – The Power of Similarity. Their search capability enables you to get results based on context. Consider the sentence – “I’m feeling blue” – which has nothing to do with the actual color but a pure keyword search would be swimming with content that included a myriad of references to the color blue including paints, fabrics, the sky, the ocean, etc.

Digital Reef excels when looking for abstract concepts, metaphors, idioms, specifics, vertical terminology, and word associations. And the magic of all of this is mathematics – complex, reasoned, considered and sophisticated algorithms.

It is the combination of their scalable clustered architecture and similarity engine that makes Digital Reef uniquely valuable.

Read Full Post »

Alright we all know that we have a ton of data and its growing and growing.  And maybe you are sick of hearing about it.  But you should really listen.  I liken the growth of data in business to the growth of the human body taking on too much weight.  The result is that we may be able to function for a long time but eventually there will be serious ramifications if we don’t do the right things to become healthy. 

There is an interesting IDC report that was published in 2007 – a bit old but has some compelling information and insight.  Let’s break down some of it:

  •  In 2006, the amount of digital information created, captured, and replicated 161 exabytes or 161 billion gigabytes. This is about 3 million times the information in all the books ever written. Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes. 

My observation:  This numbers illustrate the sheer volume of digital data that has being created and further – tells you that we ain’t seen nothing yet. 

  •  IDC predicts that by 2010, while nearly 70% of the digital universe will be created by individuals, organizations (businesses of all sizes, agencies, governments, associations, etc.) will be responsible for the security, privacy, reliability, and compliance of at least 85% of that same digital universe.

My observation:  The importance of this is that organizations will have to manage data created by their customers and employees – which will have a real business impact.  And IDC left a few things out – accessing the data and protecting it. 

  • The cost of not responding to the avalanche of information can add up, yet not be immediately visible to CEOs and CFOs.

My observation:  This goes back to my unhealthy body analogy – you may not know what vital organ or system is going to collapse – it may be more than one – and you won’t know until something bad happens. 

  • In surveys of U.S. companies, we have found that information workers spend 14.5 hours per week reading and answering email, 13.3 hours creating documents, 9.6 hours searching for information, and 9.5 hours analyzing information.
  • We estimate that an organization employing 1,000 knowledge workers loses $5.7 million annually just in time wasted having to reformat information as they move among applications.  Not finding information costs that same organization an additional $5.3 million a year.

IDC is saying that poor data management can cost you $11 million annually just based on your users wasting time.  That doesn’t take into account other costs – such as outside audits, e-Discovery, litigation, etc.  You’ve just been told you have a severe case of diabetes and need to do something about it.

We need greater levels of integration between applications, storage systems and data management tools to turn data into information and then to get us the right information when we need it.  Okay?  Go make it happen 😉

Certainly this is easier said than done.  But the ecosystem – customers and the various vendors – must all move towards this objective.  We already have better tools to accomplish these tasks but we have a long way to go before reaching information utopia.  The first step is to recognize that there is an issue – a problem – and make it a priority to research and begin to address the unhealthiness and the short and long term ramifications.

Read Full Post »

I’m a senior consultant and founder for the INI Group and am working very closely with Digital Reef as a consultant, advisor and blogger. I’ve been in the high tech industry for over 23 years with a focus on the data management and storage arena and you can find out more about me at www.contemplatingIT.com. I believe Digital Reef has brought to the table an extremely impressive solution at a critical time when our unstructured data content is growing to massive levels.

In addition to being and advisor and consultant for Digital Reef – I’m going to be blogging for them on a regular basis discussing a wide range of topic areas from business issues, compelling technology, market dynamics and visions going forward.

Who and what is Digital Reef? They are a startup – an emerging vendor – that came to right conclusion that Enterprise search is woefully inadequate on multiple levels – the mechanics of making it work efficiently and intelligently in environments with massive amounts of content; and the ability to get relevant data to the user rapidly and without drowning them with irrelevant results.

I describe the Digital Reef solution as a data and content management platform leveraging intelligent and scalable search and indexing technologies. Digital Reef provides appliances with a grid architecture that ingests and indexes massive amounts of content spread across heterogeneous storage throughout the Enterprise creating a global federated index. Some of the biggest challenges with indexing include scalability, transparency and true global federation – and Digital Reef solves all three.

Once you have all of your unstructured data indexed – what are you going to do with it? Another big challenge with management of unstructured data is making order out of chaos. If you just use keyword searches there will be a large number of irrelevant returns that obscure what you really need.

The problem with keywords is that there is very little useful context. Digital Reef’s magic ingredient is its similarity engine – the ability to analyze content including documents, email threads and terms and return to you results based on a user defined similarity ratio. Digital Reef’s similarity engine is sophisticated technology that not only uses keywords but the associations of terms and the context in which they are used within unstructured data – providing relevant results.

Companies are frustrated because information is really three dimensional but we are using two dimensional tools to access and manage them. The first step is to implement solutions that provide us rapid access to relevant data for reactive purposes such as a discovery process, audits, research, customer support, projects, etc. However, think of the potential of really using information to also build revenue generating products and services leveraging existing intellectual property. The potential is compelling and landscape changing.

Read Full Post »