Semiring Inc. was finalist in the VA AI Tech Sprint 2024!
During April and May of 2024 the Semiring team participated in the VA AI Tech Sprint and passed all three gates as a finalist!
The challenge included 3 gates with growing complexity for NLP and AI processing medical faxes or reports in TIFF or PDF format. The tasks included to receive a batch of documents not seen before and extract the contained information, mapping it to databases and other information sources, generating a summary paragraph, and linking the terminology to approved Knowledge Graphs and terminological databases. The challenges at every gate included an previously unspecified number of documents and file formats.
We used our NLP-pipelines and various Large Language Models (LLM) for the information extraction and summarization. The document pre-processing phase involved:
- File-format detection and conversion for image and text formats (e.g., TIFF, PNG, JPEG, as well as PDF, Word, LibreOffice, and related formats)
- Optical Character Recognition (OCR) for image to text conversion of all image formats and specific PDF files that are image-based
- Cleaning of text and identification of paragraph boundaries and types
The Semiring Natural Language pipelines include:
- Tokenizers
- Sentence and Clause segmentation
- Lemmatization
- Part-of-speech (PoS) tagging
- Named Entity Labeling and Recognition (NER) extended to medical terminology and other domain specific vocabulary
- Sentiment analysis
- Temporal logic and sequencing analysis
- Relation extraction for RDF triples and Knowledge Graphs
The medical reports contained scanned faxes with visual noise impacting negatively the quality of the extracted text. Repairing word and text fragments using specific Semiring strategies and algorithms is essential for high-quality information extraction.
The Named Entities and relations extracted from the medical reports were linked to the terminology in Knowledge Graphs as for example the Unified Medical Language System (UMLS).
Summaries were generated using LLMs and also using the extracted Named Entities and Entity Relations.
The high-quality output based on noisy content is one of the strengths of the Semiring system. The high-performance (processing of 10 documents in 2 minutes) is another of the clear advantages and strengths of the Semiring technologies.
Please get in touch with us to schedule a meeting for a discussion and for a demo!
Your Semiring Team!