Automated Knowledge Curation & Data Labeling
Customer Challenge
The Army had a need for automated data labeling of web data to make use of the large volumes of data available for their training of explosive ordinance disposal classification models.
Innovative Solution
Illumination Works prototyped our Theia™ automated knowledge curation and data labeling solution. Theia’s automated pipelines apply natural language processing and unsupervised computer vision to curate datasets and extract and label key information of interest from textual and image components. Innovative processes clean and deconflict data points and store metadata in a graph database to build and maintain an authoritative ontology for downstream analytics. An interactive user interface provides human-machine-teaming, enabling users to search by text or image to focus on informative data for decision making.
Benefits/Outcomes
- Self-learning and semi-supervised approach
- Fully automated data curation across text, images, and metadata
- Machine learning-driven labeling and reasoning within a unified graph
- Unsupervised computer vision for ordnance ID, with no labeled training data
- Human in the loop oversight to preserve domain authority
- User-friendly, camera-enabled mobile interface
Toolbox
- Open-source Python solution
- Application visualization: React library
- Data science: machine learning, computer vision, NLP, named entity recognition, knowledge graph, subject-verb-object extraction, self-learning network, AI assistant
Business Value
- Save significant time over manual labeling
- Inform analysts of content within datasets without having to manually review
- Facilitate interconnected insights between textual and visual data sources
- Enable extension to other domains such as medicine, manufacturing, repair, and agriculture via self-learning approaches
Domain Expertise
- DoD
- Army
- Explosive ordinance disposal