GHP

Contact

gus@parsertongue.org
+1 (520) xxx-xxxx
Tucson, AZ

Gus Hahn-Powell

Computational Linguist

About

I am a computational linguist with over 10 years of experience in Natural Language Processing (NLP) and Machine Learning (ML), 20+ peer-reviewed publications, and more than 10 years as a professional software developer responsible for designing, implementing, and deploying custom ML solutions.

My expertise centers around information extraction and assembling knowledge graphs from unstructured text. I believe in choosing the right tool for the job and pride myself on identifying simple and effective solutions.

I have a track record of multi-disciplinary collaborative research as a tenure-track professor in academia funded by agencies such as DARPA, NSF, and the CDC. I also bring years of applied research experience in industry as both a co-founder of a bootstrapped startup and an individual contributor at a large corporation developing new IP and production quality code in languages like Python and Scala.

Experience

Key accomplishments
  • Investigator that helped secure over $9M in grant-based funding from multiple federal agencies.
  • Managed and trained remote and in-person teams of junior NLP researchers and software developers.
  • Developed, documented, and deployed CI pipelines and NLP software for multiple federal agencies on AWS and AWS GovCloud.
  • Within a year of being hired during the University of Arizona's largest budget cuts, I designed a graduate curriculum, oversaw the development of its courses, and launched a fully online MS program in Human Language Technology that has attracted a global body of students.

As an Applied Scientist with the Product Graph team within Personalization, I work on automatically discovering novel product dimensions and enriching the Amazon Catalog to help customers make informed purchase decisions.

Key accomplishments
  • Designed and implemented a neuro-symbolic system to discover novel attributes and compatibility information from product profiles in support of a multi-team knowledge graph initiative.
  • In less than one week, I designed and implemented a scalable system for weakly labeling sections of e-Commerce web pages for document layout analysis and targeted information extraction.
  • Implemented and deployed a series of scheduled Amazon-scale ETL tasks using Apache Spark and PySpark to improve automation and prevent data drift.

I am co-founder of Lum AI, a small bootstrapped NLP startup focusing on large-scale machine reading and rapid text annotation/data labeling.

In addition to product development, I manage our AWS deployments (mostly ECS + terraform + GitHub Actions).

Key accomplishments
  • Designed and implemented a resilient actor-based distributed version of the Odinson information extraction system that requires less than 30% of the resources of an Elasticsearch cluster.
  • Designed, implemented, and deployed horizontally-scalable containerized services for information extraction on AWS with rolling deployments triggered via changes to the default branch of a Git repository.
  • Managed a small, distributed team of software developers and applied scientists over multiple projects.

Publications

For a complete list of my publications, please see my CV or ORCID profile.

Course and workshop development

For a complete list of courses and workshops I've developed, please see https://parsertongue.org/courses/.

Education

2014-2018

PhD (Computational Linguistics)

InstitutionUniversity of Arizona
MinorInformation
The aim of this work is to accelerate scientific discovery by advancing machine reading approaches designed to extract claims and assertions made in the literature, assemble these statements into cohesive models, and generate novel hypotheses that synthesize findings from isolated research communities. [...]
2012-2014

MS (Human Language Technology)

InstitutionUniversity of Arizona
BioNER: A hybrid approach to identifying mentions of protein complexes
2008-2010

MA (Applied Linguistics)

InstitutionUniversity of Alabama
The 'Worthy of Attention' Collostruction: Frequency, synonymy, and learnability
Modeling L2 synonym learning using ART-2 neural networks.
2004-2008

BA (Japanese)

InstitutionUniversity of Alabama
Language in Zen poetry

Projects and Software

2024-present

Daedalus

Vision-based Piping & Instrumentation Diagram (P&ID) understanding system for the construction industry.

Lum AIAWS Lambdacomputer visionDocker
2023-present

Odinsynth-LLM

Interactive system that generates explainable, editable, and generalizable information extraction rules from a single example using LLMs, preference optimization, and reinforcement learning with machine feedback (RLMF).

LLMsneuro-symbolic AI
2022-2024

Odinsynth

A neural program synthesis approach to generating explainable and editable information extraction rules from a handful of examples.

AWS ECSDockerneuro-symbolic AIprogram synthesisPythonPyTorchScalatransformers
2022-2023

clu-azahead

A toolkit to improve information access and aggregation for public health.

ASRDockerFastAPIneural searchPythonPyTorchquestion answeringSBERTStreamlittransformerswhisper
2021-2022

Annotaurus Tex(t)

A web-based platform (hosted solution) for rapid text annotation and data labeling for NLP.

Lum AIactive learningannotationAWS ECSdata labelinginformation extractionJavaScriptOdinsonPostgreSQL
2019-present

Odinson

A fast and highly scalable language and runtime system for information extraction that supports patterns composed of graph traversals and token-level constraints. The successor to Odin. Odinson is four orders of magnitude faster that the previous state of the art.

Key contributions
  • IDE design, development, and deployment (closed source)
  • a REST API and companion Python library
  • language features and testing
  • development of a distributed version for web-scale information extraction using Akka (development of this component was funded by DARPA's Causal Exploration program).
actor-based concurrencyAkkaApache Luceneinformation extractionScala
2017-2019

Influence Search

A platform for literature-based discovery that incorporates multi-domain extractions of causal interactions into a single searchable knowledge graph. Originally developed to support the Bill and Melinda Gates Foundation's efforts to improve child and maternal health. Create conceptual models (interest maps) by searching for direct and indirect influence relations, merging concepts, injecting your own expertise, and collaboratively editing models.

Key contributions
  • system architecture and deployment (AWS)
  • open domain machine reader and assembly system
  • incorporation and alignment of citation graph (MAG) information and clinical trials (this component was funded by the Bill and Melinda Gates Foundation as part of their KI Platform Prototype)
Apache SparkJavaScriptknowledge graphsNeo4joperational transformationScala
2014-2018

Reach

Information extraction system for BioNLP that includes components for event extraction, NER, domain-specific coreference resolution, causal event ordering, and grounding. Reach was the most precise and highest throughput machine reading system in DARPA's Big Mechanism program, and has been used by biologists to discover novel and plausible biological hypotheses for multiple cancers.

Key contributions
BioNLPcausal orderingdeduplicationinformation extractionPythonScala

Grants and Contracts

2022-present

ADHS-CDC COVID Disparities Initiative

AgencyCDC & Arizona Department of Health Services
RoleCo-I
URLhttps://crh.arizona.edu/programs/covid-disparities-initiative
Award$8M

Address COVID-19 health disparities among underserved and high-risk populations in Arizona, including racial and ethnic minorities as well as rural communities.

Key contributions
  • personalized question-answering and semantic search systems for different audiences (community health workers, patients, etc.) that operate over curated document collections
  • automatic speech recognition (ASR)
  • monitoring trusted information sources to detect policy changes
  • machine translation, summarization, and customized message generation
  • modernizing cyberinfrastructure for health communication (ex. telemedicine systems)
2020-present

Democratizing machine reading for non-experts

AgencyNSF
RoleCo-PI
URLhttps://mr4all.parsertongue.org/about
Award$499K

Democratizing machine reading for non-experts: Easy and interpretable methods to extract structured information from text

This work aims to democratize machine reading technology to make it accessible to subject matter experts (ex. molecular biologists) who may be entirely unfamiliar with natural language processing and machine learning. In an effort to hybridize symbolic and statistical approaches, my collaborators and I are leveraging neural methods for program synthesis and reinforcement learning to generate editable and executable and human-editable rules for rapid information extraction.

Key contributions
  • Determining research direction
  • System design and implementation
  • Deployment of GPU-accelerated sotware demos on AWS
2020

Supply chain Quantification Using Imperfect Data (SQUID)

AgencyDARPA
RolePI (Phase I subcontract through Raytheon BBN)
URLhttps://darpa.mil/program/logx
Award$282K

Improve the efficiency of the military supply chain by contructing operational process models from fragmented data.

Key contributions
  • information and event extraction related to logistics processes (supply chain events) and event ordering
  • query parsing and intent understanding to power a chatbot interface for logisticians
  • document layout analysis for PDFs

Awards

2019

Best System Demonstration

Multilingual extension of Influence Search that extends the open-domain machine reader to Portuguese.

OrganizationProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

Skills

Apache Spark
AWS (ECS, Fargate, EMR, Lambda)
Deep Learning (DL)
Docker
Machine Learning (ML)
Natural Language Processing (NLP)
Neuro-symbolic AI
OpenAPI
Python
Scala
Terraform