Automated Data Labeling & Curation

Customer Challenge

The Army Intelligence Community (IC) seeks an automated, AI-based system for reliable data labeling and curation that allows users to quickly search, filter, and select datasets for further downstream analysis.

Innovative Solution

For this effort, Illumination Works prototyped our Theia automated data labeling solution. Theia’s automated pipelines apply natural language processing (NLP) and computer vision (CV) algorithms to curate datasets and extract and label key information of interest. The solution automatically processes datasets and identifies labels/topics of importance from textual and image components. Labels are stored in a graph database for downstream analytics, processing, and filtering, and are served to end-users via an intuitive, interactive interface to provide Army IC analysts quick insights into the content and context of datasets without manual review.


  • Identified techniques to extract entities from textual data to include person, place, date, and business-specific entities such as military equipment
  • Applied methods to identify people and military equipment from images, yielding a self-learning and semi-supervised approach that can expand to other focus areas


  • Open-source Python solution
  • Application visualization: AdobeXD wireframes
  • Data science: machine learning, CV, NLP, named entity recognition, knowledge graph, subject-verb-object extraction, self-learning network

Business Value

  • Saves significant time over manual labeling
  • Informs analysts of content within datasets without having to manually review
  • Facilitates interconnected insights between textual and visual data sources
  • Enables extension to other domains such as medicine, manufacturing, repair, and agriculture via self-learning approaches

Domain Expertise

  • DoD
  • Intelligence Community
  • Healthcare
  • Market Intelligence

Related Case Studies You May Like

Interested In Working With Us?