I came across an article on Digital Reef that I think has some excellent points.
Here is an excerpt:
“…it turns out that Digital Reef has built something fairly new and interesting, a “similarity search engine” for big corporate networks that can start with one document—say, a Word or Excel file—and find others that resemble it.
That could be very useful if, for instance, you were a compliance officer at a big health plan and you wanted to see whether any of your employees had unsecured patient records sitting around on their laptop hard drives (which would be a big violation of federal healthcare privacy regulations). Just plop an example of a patient record into the Digital Reef system, and it will scour the network for other examples. Or say you were a lawyer at a big firm writing a brief in an employment case and you wanted to find out whether any of your colleagues working on similar cases in the past had already assembled the relevant citations. You could simply submit your entire draft to Digital Reef, and see what washed up.”
The author, Wade Roush does an excellent job of explaining the value of the Digital Reef similarity engine. He cites two examples but the possibilities are numerous. The power of similarity helps with discovery, compliance, efficient workflow, research, analysis, etc.
Read Full Post »
Digital Reef provides technology that I call a similarity engine. The similarity concept within search is a powerful one – the ability to determine both exact duplicates and near duplicate content. And being able to set the level of similarity – whether it is 10% or 90% similar – provides a great deal of utility for the users. Additionally, finding similarity between different types of content – for example, a Word document and a PDF can be critically important. Or perhaps there is a table embedded in a presentation and that same table is in an Excel spreadsheet. Someone could have cut and pasted text from a document and put it in an email.
Finding exact duplicates can be useful but add to this near duplicates and there is potentially even greater implications. Near duplicates allow you to find the same content in different types of files. Perhaps you can find different revisions of the same document. There may be information from a document that has been quoted, cut and pasted, and used within any number of other documents.
The practical use of similarity is important – above and beyond the casual quest for content or information for which people often use general purpose and consumer search tools – there is real “utility” that can have implications to your business. One such utility is day-to-day work flow – getting better use and productivity out of the content already created within your company. Similarity can help with performing research that has a wide range of applicability – medical, legal, science, consumer, business, technology, etc. There is a concept that I call content mining (more on this in my next blog) that could be enabled by similarity technology. And probably the greatest driver for using similarity technology in today’s environment is for e-Discovery. As my fellow blogger – Daniel Garrie astutely points out – “The onslaught that is sure to come–companies large and small are going to be under legal siege in 2009. Whether it is white collar crime, bankruptcy or wrongful termination, we’ll see surges maybe even a Tsunami of new lawsuits.”
Similarity technology is different from a search engine throwing anything and everything at you based on a keyword. Rather, it is sophisticated technology with intelligent algorithms that scan content and analyzes similarity based on ratios that are user defined. Additionally, it will present back to you the level of similarity – 100%, 95%, 70%, 60%, and so on.
I believe that Digital Reef’s similarity technology can elevate and accelerate the access of critical data within the enterprise. It is extremely valuable and unique technology – in fact – I don’t know of anything similar.
Read Full Post »
January is always a time for reflecting back on the previous year, but picking the judicial top cases for electronic discovery is a complex and perhaps insurmountable task.
My key picks for 2008 are just a small sampling–the very tip of a giant legal-case iceberg. It would be great to hear your thoughts on the cases and on my views:
- Common Sense and respect for the courts and opposing counsel
Qualcomm Inc. v. Broadcomm , a case that lays out why counsel must apply common sense when responding to an e-discovery order and be mindful that a court does not consider digital incompetency a defense. Qualcomm dispute was around patent issues the discovery issues came about when Qualcomm had the audacity to both lie and hide 21 emails, as well as over 46,000 emails with attachments, totaling 20,000 plus pages of relevant evidence–legal term for relevant information. The key lesson here is simple: do not lie and hide stuff and if you do and get caught, get ready to be punished.While the elegance of the adversarial litigation system certainly does not require litigants to hold hands and sing songs around a camp fire, the other extreme of stealing a million dollars to test the security measures of bank is simply not a tenable position for a company.
- Hiding It Will Cost You Plenty & They Will Find It
The Qualcomm case provides tons of fodder for this critical piece of information. The electronic age leaves copies everywhere and anywhere, making 100% elimination of any traces of information being created, sent, or received a challenge. Investigators with the right tools can pick from a basket of sources, such as sender or recipients personal machine, external storage devices, company or third party servers, internal or external databases and etc.The best course of action is for a company’s legal and technology team to be transparent in their execution of preservation order and/or delivery of an enterprise records management solution. Judge Grimm in Mancia v. Mayflower Textile Services Co., Civ. No. 1:08-CV-00273-CCB (D. Md. October 15, 2008 ) lays out why playing ‘hide the information’ and not playing nice in the sandbox is bad for your client and more importantly the other side eventually found the information. Companies should collaborate with one another to define process and, more important, make sure that the document retention schedules and litigation holds that exist on paper are executed operationally.
- Finding It Requires A Bit More than Just Common Sense
Victor Stanley Inc. v. Creative Pipe, a copyright infringement case where the relevance to electronic discovery is that keywords can be a game changer. Judge Grimm in Victor Stanley provides some great ESI 101 concepts that litigators and attorneys as a whole should have some operational knowledge of in some form. In Victor, the defendants relinquished their attorney-client and work product privileges to 165 ESI files because they screwed electronic search and review in the production dance. Judge Grimm’s opinion certainly implies that lawyers utilizing keyword searches alone are in for a big shock. Lawyers, I suggest you reach out to your technologists and collaborate on search strategy.
In 2009, it is critical that lawyers and technologists realize that finding information in an enterprise requires more than just some well crafted key words and requires a comprehensive strategy.
The onslaught that is sure to come–companies large and small are going to be under legal siege in 2009. Whether it is white collar crime, bankruptcy or wrongful termination, we’ll see surges maybe even a Tsunami of new lawsuits. The current mentality “if you build it they will sue” is sure to provide fodder and, dare I say, job security for a great many of my legal and technology peers.
The concept of preparation, while foreign to many companies, is critical to initiate because of many possible benefits: saving money, reducing negative press, avoiding a drop in share price, meeting compliance requirements and better legal outcomes. This concept of preparation is analogous to the idea of not lending money to people who cannot pay it back. Just as failure to follow this seemingly simple construct has placed many people and companies in an untenable position, failure to prepare for electronic discovery will find many marquis companies in a high-risk, business –damaging position, or worse.
Read Full Post »
Posted in Compliance, Corporate Governance, Data Management, eDiscovery, IT Business, Network storage, Search, Search & Indexing, Storage, tagged Data Management, Digital Reef, Indexing, Network Storage, Search, Tony Asaro on January 21, 2009|
2 Comments »
I’m a senior consultant and founder for the INI Group and am working very closely with Digital Reef as a consultant, advisor and blogger. I’ve been in the high tech industry for over 23 years with a focus on the data management and storage arena and you can find out more about me at www.contemplatingIT.com. I believe Digital Reef has brought to the table an extremely impressive solution at a critical time when our unstructured data content is growing to massive levels.
In addition to being and advisor and consultant for Digital Reef – I’m going to be blogging for them on a regular basis discussing a wide range of topic areas from business issues, compelling technology, market dynamics and visions going forward.
Who and what is Digital Reef? They are a startup – an emerging vendor – that came to right conclusion that Enterprise search is woefully inadequate on multiple levels – the mechanics of making it work efficiently and intelligently in environments with massive amounts of content; and the ability to get relevant data to the user rapidly and without drowning them with irrelevant results.
I describe the Digital Reef solution as a data and content management platform leveraging intelligent and scalable search and indexing technologies. Digital Reef provides appliances with a grid architecture that ingests and indexes massive amounts of content spread across heterogeneous storage throughout the Enterprise creating a global federated index. Some of the biggest challenges with indexing include scalability, transparency and true global federation – and Digital Reef solves all three.
Once you have all of your unstructured data indexed – what are you going to do with it? Another big challenge with management of unstructured data is making order out of chaos. If you just use keyword searches there will be a large number of irrelevant returns that obscure what you really need.
The problem with keywords is that there is very little useful context. Digital Reef’s magic ingredient is its similarity engine – the ability to analyze content including documents, email threads and terms and return to you results based on a user defined similarity ratio. Digital Reef’s similarity engine is sophisticated technology that not only uses keywords but the associations of terms and the context in which they are used within unstructured data – providing relevant results.
Companies are frustrated because information is really three dimensional but we are using two dimensional tools to access and manage them. The first step is to implement solutions that provide us rapid access to relevant data for reactive purposes such as a discovery process, audits, research, customer support, projects, etc. However, think of the potential of really using information to also build revenue generating products and services leveraging existing intellectual property. The potential is compelling and landscape changing.
Read Full Post »