Gus Hahn-Powell

Computational Linguist

About

I am a computational linguist with over 10 years of experience in Natural Language Processing (NLP) and Machine Learning (ML), 20+ peer-reviewed publications, and more than 10 years as a professional software developer responsible for designing, implementing, and deploying custom ML solutions.

My expertise centers around information extraction and assembling knowledge graphs from unstructured text. I believe in choosing the right tool for the job and pride myself on identifying simple and effective solutions.

I have a track record of multi-disciplinary collaborative research as a tenure-track professor in academia funded by agencies such as DARPA, NSF, and the CDC. I also bring years of applied research experience in industry as both a co-founder of a bootstrapped startup and an individual contributor at a large corporation developing new IP and production quality code in languages like Python and Scala.

Experience

2019-present

Assistant Professor @ University of Arizona

Assistant Professor (tenure-track) of Computational Linguistics at the University of Arizona and founding director of both the online MS in Human Language Technology (HLT) and the Graduate Certificate Program in Natural Language Processing (NLP).

I design and teach graduate-level courses in statistical natural language processing (NLP) that cover both "classical" machine learning (ML) and deep learning (DL) methods for NLP.

My research is funded by agencies such as DARPA, NSF, and the CDC. I also hold appointments in the Cognitive Science GIDP and the Computational Social Science Graduate Certificate Program.

Key accomplishments

Investigator that helped secure over $9M in grant-based funding from multiple federal agencies.
Managed and trained remote and in-person teams of junior NLP researchers and software developers.
Developed, documented, and deployed CI pipelines and NLP software for multiple federal agencies on AWS and AWS GovCloud.
Within a year of being hired during the University of Arizona's largest budget cuts, I designed a graduate curriculum, oversaw the development of its courses, and launched a fully online MS program in Human Language Technology that has attracted a global body of students.

2022-2023

Applied Scientist @ Amazon

As an Applied Scientist with the Product Graph team within Personalization, I work on automatically discovering novel product dimensions and enriching the Amazon Catalog to help customers make informed purchase decisions.

Key accomplishments

Designed and implemented a neuro-symbolic system to discover novel attributes and compatibility information from product profiles in support of a multi-team knowledge graph initiative.
In less than one week, I designed and implemented a scalable system for weakly labeling sections of e-Commerce web pages for document layout analysis and targeted information extraction.
Implemented and deployed a series of scheduled Amazon-scale ETL tasks using Apache Spark and PySpark to improve automation and prevent data drift.

2017-present

Co-founder and Applied Scientist @ Lum AI

I am co-founder of Lum AI, a small bootstrapped NLP startup focusing on large-scale machine reading and rapid text annotation/data labeling.

In addition to product development, I manage our AWS deployments (mostly ECS + terraform + GitHub Actions).

Key accomplishments

Designed and implemented a resilient actor-based distributed version of the Odinson information extraction system that requires less than 30% of the resources of an Elasticsearch cluster.
Designed, implemented, and deployed horizontally-scalable containerized services for information extraction on AWS with rolling deployments triggered via changes to the default branch of a Git repository.
Managed a small, distributed team of software developers and applied scientists over multiple projects.

Publications

For a complete list of my publications, please see my CV or ORCID profile.

Course and workshop development

For a complete list of courses and workshops I've developed, please see https://parsertongue.org/courses/.

Education

2014-2018

PhD (Computational Linguistics)

InstitutionUniversity of Arizona

MinorInformation

Machine Reading for Scientific Discovery

The aim of this work is to accelerate scientific discovery by advancing machine reading approaches designed to extract claims and assertions made in the literature, assemble these statements into cohesive models, and generate novel hypotheses that synthesize findings from isolated research communities. [...]

2012-2014

MS (Human Language Technology)

InstitutionUniversity of Arizona

BioNER: A hybrid approach to identifying mentions of protein complexes

2008-2010

MA (Applied Linguistics)

InstitutionUniversity of Alabama

The 'Worthy of Attention' Collostruction: Frequency, synonymy, and learnability

Modeling L2 synonym learning using ART-2 neural networks.

2004-2008

BA (Japanese)

InstitutionUniversity of Alabama

Language in Zen poetry

Projects and Software

2024-present

Daedalus

Vision-based Piping & Instrumentation Diagram (P&ID) understanding system for the construction industry.

Lum AIAWS Lambdacomputer visionDocker

2023-present

Odinsynth-LLM

Interactive system that generates explainable, editable, and generalizable information extraction rules from a single example using LLMs, preference optimization, and reinforcement learning with machine feedback (RLMF).

LLMsneuro-symbolic AI

2022-2024

Odinsynth

A neural program synthesis approach to generating explainable and editable information extraction rules from a handful of examples.

AWS ECSDockerneuro-symbolic AIprogram synthesisPythonPyTorchScalatransformers

CodeDemoVideoPaper

2022-2023

clu-azahead

A toolkit to improve information access and aggregation for public health.

ASRDockerFastAPIneural searchPythonPyTorchquestion answeringSBERTStreamlittransformerswhisper

Code

2021-2022

Annotaurus Tex(t)

A web-based platform (hosted solution) for rapid text annotation and data labeling for NLP.

Lum AIactive learningannotationAWS ECSdata labelinginformation extractionJavaScriptOdinsonPostgreSQL

2019-present

Odinson

A fast and highly scalable language and runtime system for information extraction that supports patterns composed of graph traversals and token-level constraints. The successor to Odin. Odinson is four orders of magnitude faster that the previous state of the art.

Key contributions

IDE design, development, and deployment (closed source)
a REST API and companion Python library
language features and testing
development of a distributed version for web-scale information extraction using Akka (development of this component was funded by DARPA's Causal Exploration program).

actor-based concurrencyAkkaApache Luceneinformation extractionScala

CodePaper

2017-2019

Influence Search

A platform for literature-based discovery that incorporates multi-domain extractions of causal interactions into a single searchable knowledge graph. Originally developed to support the Bill and Melinda Gates Foundation's efforts to improve child and maternal health. Create conceptual models (interest maps) by searching for direct and indirect influence relations, merging concepts, injecting your own expertise, and collaboratively editing models.

Key contributions

system architecture and deployment (AWS)
open domain machine reader and assembly system
incorporation and alignment of citation graph (MAG) information and clinical trials (this component was funded by the Bill and Melinda Gates Foundation as part of their KI Platform Prototype)

Apache SparkJavaScriptknowledge graphsNeo4joperational transformationScala

DemoVideoPaper

2014-2018

Reach

Information extraction system for BioNLP that includes components for event extraction, NER, domain-specific coreference resolution, causal event ordering, and grounding. Reach was the most precise and highest throughput machine reading system in DARPA's Big Mechanism program, and has been used by biologists to discover novel and plausible biological hypotheses for multiple cancers.

Key contributions

broad coverage and extensible information extraction of biomolecular statements described in scholarly documents. These statements often describe complex nested relations (e.g., a positive regulation involving a particular post-translational modification)
assembly and causal ordering of model fragments of cell signaling pathways
coreference resolution tailored to the biomedical domain ("how can we automatically determine the antecedent of an expression like the protein?")

BioNLPcausal orderingdeduplicationinformation extractionPythonScala

CodePaper

Grants and Contracts

2022-present

ADHS-CDC COVID Disparities Initiative

Agency	CDC & Arizona Department of Health Services
Role	Co-I
URL	https://crh.arizona.edu/programs/covid-disparities-initiative
Award	$8M

Address COVID-19 health disparities among underserved and high-risk populations in Arizona, including racial and ethnic minorities as well as rural communities.

Key contributions

personalized question-answering and semantic search systems for different audiences (community health workers, patients, etc.) that operate over curated document collections
automatic speech recognition (ASR)
monitoring trusted information sources to detect policy changes
machine translation, summarization, and customized message generation
modernizing cyberinfrastructure for health communication (ex. telemedicine systems)

Code

2020-present

Democratizing machine reading for non-experts

Agency	NSF
Role	Co-PI
URL	https://mr4all.parsertongue.org/about
Award	$499K

Democratizing machine reading for non-experts: Easy and interpretable methods to extract structured information from text

This work aims to democratize machine reading technology to make it accessible to subject matter experts (ex. molecular biologists) who may be entirely unfamiliar with natural language processing and machine learning. In an effort to hybridize symbolic and statistical approaches, my collaborators and I are leveraging neural methods for program synthesis and reinforcement learning to generate editable and executable and human-editable rules for rapid information extraction.

Key contributions

Determining research direction
System design and implementation
Deployment of GPU-accelerated sotware demos on AWS

CodeDemoVideo

2020

Supply chain Quantification Using Imperfect Data (SQUID)

Agency	DARPA
Role	PI (Phase I subcontract through Raytheon BBN)
URL	https://darpa.mil/program/logx
Award	$282K

Improve the efficiency of the military supply chain by contructing operational process models from fragmented data.

Key contributions

information and event extraction related to logistics processes (supply chain events) and event ordering
query parsing and intent understanding to power a chatbot interface for logisticians
document layout analysis for PDFs

Code

Awards

2019

Best System Demonstration

Multilingual extension of Influence Search that extends the open-domain machine reader to Portuguese.

OrganizationProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

DemoPaper

Skills

Apache Spark

AWS (ECS, Fargate, EMR, Lambda)

Deep Learning (DL)

Docker

Machine Learning (ML)

Natural Language Processing (NLP)

Neuro-symbolic AI

OpenAPI

Python

Scala

Terraform

Contact

Links

Gus Hahn-Powell

Computational Linguist

About

Experience

Assistant Professor @ University of Arizona

Key accomplishments

Applied Scientist @ Amazon

Key accomplishments

Co-founder and Applied Scientist @ Lum AI

Key accomplishments

Publications

Course and workshop development

Education

PhD (Computational Linguistics)

MS (Human Language Technology)

MA (Applied Linguistics)

BA (Japanese)

Projects and Software

Daedalus

Odinsynth-LLM

Odinsynth

clu-azahead

Annotaurus Tex(t)

Odinson

Key contributions

Influence Search

Key contributions

Reach

Key contributions

Grants and Contracts

ADHS-CDC COVID Disparities Initiative

Key contributions

Democratizing machine reading for non-experts

Key contributions

Supply chain Quantification Using Imperfect Data (SQUID)

Key contributions

Awards

Best System Demonstration

Skills