Digital Reef provides technology that I call a similarity engine. The similarity concept within search is a powerful one – the ability to determine both exact duplicates and near duplicate content. And being able to set the level of similarity – whether it is 10% or 90% similar – provides a great deal of utility for the users. Additionally, finding similarity between different types of content – for example, a Word document and a PDF can be critically important. Or perhaps there is a table embedded in a presentation and that same table is in an Excel spreadsheet. Someone could have cut and pasted text from a document and put it in an email.
Finding exact duplicates can be useful but add to this near duplicates and there is potentially even greater implications. Near duplicates allow you to find the same content in different types of files. Perhaps you can find different revisions of the same document. There may be information from a document that has been quoted, cut and pasted, and used within any number of other documents.
The practical use of similarity is important – above and beyond the casual quest for content or information for which people often use general purpose and consumer search tools – there is real “utility” that can have implications to your business. One such utility is day-to-day work flow – getting better use and productivity out of the content already created within your company. Similarity can help with performing research that has a wide range of applicability – medical, legal, science, consumer, business, technology, etc. There is a concept that I call content mining (more on this in my next blog) that could be enabled by similarity technology. And probably the greatest driver for using similarity technology in today’s environment is for e-Discovery. As my fellow blogger – Daniel Garrie astutely points out – “The onslaught that is sure to come–companies large and small are going to be under legal siege in 2009. Whether it is white collar crime, bankruptcy or wrongful termination, we’ll see surges maybe even a Tsunami of new lawsuits.”
Similarity technology is different from a search engine throwing anything and everything at you based on a keyword. Rather, it is sophisticated technology with intelligent algorithms that scan content and analyzes similarity based on ratios that are user defined. Additionally, it will present back to you the level of similarity – 100%, 95%, 70%, 60%, and so on.
I believe that Digital Reef’s similarity technology can elevate and accelerate the access of critical data within the enterprise. It is extremely valuable and unique technology – in fact – I don’t know of anything similar.