Date Posted: December 7, 2006
Update: September 18, 2008 New version integrates part-of-speech (POS) tagging capabilities for an important set of languages: Arabic, Danish, Dutch, English, Spanish, French, Italian, German, Portuguese, Japanese and Chinese.
What is Text Analytics Tools and Runtime for IBM LanguageWare?
IBM® LanguageWare® is a set of run-time libraries and an easy-to-use Eclipse-based development environment for building custom text analyzers in various languages. Deployable in Apache UIMA, these analyzers can expose the information buried in text to any application. The Eclipse-based tools makes creating analyzers simple and fast, even for non-technical users. The tools make it easy to build dictionaries, ontologies, and rules for identifying key information, relationships and meaning.
The package is a complete development environment that requires no specialist knowledge of the underlying technologies of natural language processing or UIMA; therefore, you can focus on concepts and relationships of interest and develop analyzers that extract them from text without writing any code. The resulting code is wrapped as UIMA annotators, which can be seamlessly plugged into any UIMA-compliant application. (See further information about UIMA here at alphaWorks® and at the UIMA Apache site.
IBM LanguageWare® provides a full range of text analysis functions. It is useful in solutions that mine facts from large repositories of text and it makes it easy to create, manage and deploy analysis engines and their resources. LanguageWare is used in such various products as Lotus Notes®, Domino®, Information Integrator OmniFind™ Edition (IBM's search technology), and more.
LanguageWare Resource Workbench runs on Windows® and Linux®. (The core LanguageWare libraries support many more platforms; for details, please see the product documentation).
How does it work?
LanguageWare Resource Workbench allows users to easily
- develop rules to spot facts, entities, and relationships using a simple drag-and-drop paradigm
- build language and domain resources into a LanguageWare dictionary or ontology
- import and export dictionary data to or from a database
- browse the dictionaries to assess their content and quality
- test rules and dictionaries in real time on documents
- create UIMA annotators for annotating text with the contents of dictionaries and rules
- annotate text and browse the contents of each annotation.
The package contains the following tools:
- a dictionary viewer/editor
- an XML-based dictionary builder
- a database-based dictionary builder (DB2® and Apache Derby support are provided)
- a dictionary comparison tool
- a rule viewer/editor/builder
- a UIMA annotator generator, which allows text documents to be annotated and the results displayed (details provided in the LanguageWare Annotators Guide
- a UIMA CAS (common annotation structure) comparator, which allows you to compare the results of two different analyses by comparing the CASes generated by each run.
LanguageWare Resource Workbench is documented in the Getting Started Guide. It can be installed by using the Windows or Linux installers or by using the respective .zip files.
LanguageWare technology can be used in any application that uses text analytics. Good examples are
- business intelligence
- information search and retrieval
- the semantic Web (in particular LanguageWare, supports semantic analysis of documents based on ontologies)
- analysis of social networks
- semantic tagging applications
- semantic search applications
- any application wishing to extract useful data from unstructured text.
About the technology author(s)
LanguageWare is a worldwide organization comprising a highly qualified team of specialists with a diverse combination of backgrounds: linguists, computer scientists, mathematicians, cognitive scientists, physicists, and computational linguists. This team is responsible for developing innovative Natural Language Processing technology for IBM""s Software Group.
LanguageWare, along with LanguageWare Resource Workbench, is a collaborative project combining skills, technologies, and ideas gathered from various IBM product teams and IBM Research division.
