TRACE: Clinical Text Analytics using a Big Data Platform

TRACE (Tactical Rules-based AQL Clinical Extractor) is an information extractor, developed by IMT, for converting unstructured prescription documents into structured annotations. This includes drug information such as RxNorm concept codes and prescription attributes such as frequency and duration.

TRACE is written in AQL (Annotation Query Language) that is part of the IBM BigInsights platform for text analytics. BigInsights Text Analytics is a powerful information extraction system for analyzing large volumes of data and producing annotations.

TRACE produces a set of concept annotations using a hierarchy of rules. The base-level rules include dictionary and regular expression matching. For drug name lookup and concept code matching, TRACE uses the RxTerms database. Annotations are abstracted above the base-level with AQL rules for entity extraction or longer pattern recognition. For example, a Strength annotation is a number reference immediately followed by a strength unit.

We presented TRACE, including annotation performance using a set of documents and HL7 messages containing prescription information, to the research community at the 2014 IEEE International Congress on Big Data (June 27 – July 2, Anchorage, USA). Additional details on TRACE may be found in our congress proceedings paper, Tactical Clinical Text Mining for Improved Patient Characterization.


Figure 1: Example of Entity Extraction from a Prescription Document

Diagram showing TRACE process, unstructured data from a prescription is tagged and identified.