{thumbnail}
http://www.artifactmanager.com/papers/ArtifactManager_Organize-n-Search.pdf
WHITE PAPER
Organize-n-Search
State-of-the-art Low-budget Document Management Solutions
“We are living in the information age… The information explosion…” We have heard it so many times that have stopped paying any attention to it. However, information penetrates into every aspect of our lives. We are constantly trying to acquire new knowledge and looking for opportunities to benefit from it.
Users who actively work with documents and information, frequently face the problems related to search, organization and efficient use of documents. Copyeditors, writers, journalists, researchers, analysts, consultants, lawyers, medical workers, students, all run into the same challenges at home and at work.
This paper is intended for a wide range of people, who, for personal or business need, work with a large number of documents and other information. We take a close look at the problems of information management, benefits of using advanced technologies in the low-budget personal information management system, as well as system selection criteria to meet personal and professional needs of information workers.
Challenges of Document Management
Nowadays big part of information is stored in a form of text: books, articles, reports, memo, notes, specifications, descriptions, whitepapers, and manuals, not to mention a huge amount of time sensitive information, such as invoices, bank statements, schedules, contracts, and tax returns.
Yesterday, papers, photo albums, music disks, and video tapes were kept in drawers, boxes, and cabinets. But the development of personal computers and Internet has started the era of digital information.
Development of electronic formats has significantly increased system storage capacity and allowed accumulation of large information volumes. However, recent developments in the fields of computer systems and data storage have led to a new question: how can we effectively manage digital information?
Recent studies by IDC (Susan Feldman, Joshua Duhl, Julie Rahal Marobella, Alison Crawford. The Hidden Costs of Information Work. March 2005) revealed that on average 13 hours of every 40-hour work week are spent on creating documents. 9.5 hours per week are spent on searching for information, while almost 9.6 hours on analyzing the information. 6.5 hours are wasted on searching for information that is never found leading to the need to recreate the content. Formatting of information between different applications takes about 3.8 hours per week, whereas version control related issues take 2.2 hours.
Issues, effects and implications of information management are summarized in the following Figure.
Issues
Slow search
Search without desired results
Redundant search
Recreation of documents
Difficulty of use of the found information
Effects
Employer
Unplanned for wasted time
Work slowdown
Decrease in productivity
Decline in quality
Employee
Increased workload
Negative attitude towards work
Decline in the level of satisfaction from the job
Implications
Missed deadlines
Project failure
Lost revenue
Loss of employee
Figure 1: Issues, effects and implications of information management
* What is the best way to organize the information to find it faster in the future?
* How to easily find information inside of large volume of materials?
* How to find documents that are related?
* How to save the search results and view them in the future?
* How to share found information with colleagues and friends?
* How to effectively use found information?
Importance and significance of those problems are major factors that stimulate the development of new solutions and information management systems. Information Retrieval, Data and Knowledge Bases, Document & Content Management, to name a few, are the branches of information technologies that deal with the problems of information management.
Solutions to Document Management Problems
Solutions to document management problems are tightly linked to the following challenges: improving the efficiency of information access, improving quality and speed of search, improving the efficiency of information processing, improving reliability and safety of storage.
Efficient Access to Information
It is necessary to quickly and easily extract the text documents which meet certain criteria from an array of available information. These requirements are diverse and constantly changing. For example, original sources for articles, data for reports, textbooks to prepare for the exam, patient’s medical records, or precedents for court case – all have high, but temporary value to resolve the pressing challenges.
After finding the required documents, working through them, and creating a number of versions, the user will need to consolidate and store the results. For example, one may need to save a set of documents, or add comments to a set of documents for future use. One possible solution to meet the changing needs is to place a document in several groups. A group could consist of documents on certain topic, papers of the same author, articles of the same journal issue, previous versions of the article, or materials used to write an article.
Searching and organizing information in a meaningful way takes up a lot of time. To shorten the cycle and make a process more enjoyable, a number of solutions have been proposed.
Quality and Speed of Search
In some cases users can find the documents they need by using a query – a word or combination of words that might be in those documents.
In the past, search required scanning of all files on the computer drives and going through their content comparing the key words with words in the document. This called for the sequential scanning of all files for each request. But increased size and number of files have dramatically slowed down the search process. In addition, morphology was neglected and multiple queries were needed to find the document.
Best solutions for effective search of information are based on search engines and information retrieval technologies. The entire collection of files is pre-processed and the information about the documents and key words is stored in the index files. Indexing works for various file formats and takes into account all possible forms of the same word. This “smart” pre-processing mechanism significantly accelerates the search and improves its quality.
Organization
In many cases the user is unaware of the words contained in the document of interest. It’s also possible that the user is not able to generate a query that returns desired outcomes, or the number of documents is too large, or some documents may not contain the right words. In these scenarios the user has no choice but manually look for a desired document. To save the results of manual search, many use the systems designed specifically for organizing the information.
Simplified versions of organization systems use fields and registration cards to link the documents and accompanying information (date, author, title, a brief description, etc.) However, field sets are fixed and limited, and often do not allow grouping of the documents to accommodate changing needs of the users.
Enhanced systems use a hierarchy of folders (catalogs, or directories). However, in most cases, when a document belongs to multiple topics, the user may end up facing several problems. For example, in the hierarchy of file system folders, a document can not be assigned to several folders without duplication. In this case, duplication may result in an unnecessary increase of information volume as well as inconsistencies in content after one of the documents has been modified.
Top notch tools to organize the information use multiple hierarchical categorizations which came from the domain of knowledge bases and ontologies.
Version Control
Authoring of a complex document is a long process and requires many edits, corrections and rewritings. To avoid confusion, it is necessary to maintain a history of changes in the document. The old-fashion solution was to save the changes in the separate file with a unique name, which often resulted in lost files, more storage space as well as difficulties in finding the right version of the document. These and other problems related to tracking the history of the content, storing different versions of the document, and returning to its previous versions have been addressed by the invention of the versioning systems. These systems are designed to provide access to the previous versions and history of changes.
Figure 2: Authoring a document
Effective Work with Information
Search, organization, and version control, by themselves, significantly simplify the process. But till now, most of these functions were only provided by separate software tools. The first program implements search. The second program organizes information. The third program edits it. The fourth program keeps version history. And so on.
A user has to run multiple applications, toggle between them, import and export documents, and move and copy the files. This process dramatically slows down the work, decreases productivity, increases pressure, and therefore leads to mistakes and reduces work