# Overview

Welcome to Human Language Technology I!

This fully online offering of the course has a compressed format with a full semester's worth of content delivered asynchronously over 7.5 weeks.

## Description

From the course catalog:

This class serves as an introduction to human language technology (HLT), an emerging interdisciplinary field that encompasses most subdisciplines of linguistics, as well as computational linguistics, natural language processing, computer science, artificial intelligence, psychology, philosophy, mathematics, and statistics. Content includes a combination of theoretical and applied topics such as (but not limited to) tokenization across languages, $n$-grams, word representations, basic probability theory, introductory programming, and version control.

The main programming language used in the course will be Python (3.8).

## Course Objectives

In this course, we will ...

• cover fundamental concepts related to human language technology, such as ...
• an overview of human language technology applications
• tokens and their attributes
• text normalization techniques
• tokenization and regular expressions
• character and word $n$-grams
• basics of probability theory
• representing words and documents as vectors
• vector-based comparisons
• regular expressions
• foster technical skills, such as ...
• linux command line basics
• virtualization and Containerization technologies
• version control (ex. Git) and the feature branch workflow

## Locations and Times

This is an online, asynchronous course. Content is released in a staggered fashion via the course home page.

1. office hours will be held virtually
2. course lectures will be prerecorded and delivered asynchronously

## Instructor

Hi! My name is Gus Hahn-Powell. I'll be your instructor for this course.

I'm a computational linguist interested in ways we can use natural language processing to accelerate scientific discovery by mining millions of scholarly documents.

In my free time, I enjoy climbing (mostly bouldering recently), reading/watching scifi & historical fiction, "fixing" things, communicating with cats, and playing video games (mostly indie stuff).

NameGus Hahn-Powell
Emailhahnpowell AT arizona DOT edu
Office HoursSee our course page on D2L
Appointmentshttps://calendar.parsertongue.com

### Am I eligible to take this course?

While helpful, you don't need a background in linguistics and advanced math to take this course. We'll cover the necessary pieces in class.

In order to take this class, you must be comfortable programming ($\geq$ 1 semester's worth of programming coursework or equivalent experience). Familiarity with Python and using and defining classes is ideal. If you've never used Python, you'll need to learn the basics to complete the programming assignments (some basic exercises/tutorials will be provided).

### Why do we have to use Python?

Python is an open-source programming language that is widely used in both academia and industry. It has some very useful and popular libraries for linear algebra and machine learning (Numpy, Tensorflow, PyTorch, MXNet, etc.). We'll learn how to use some of these in this course.

### Why do we have to use Linux and Docker?

A great deal of time can be wasted trying to install and configure software. Most of these issues relate to differences in the starting configuration of users (operating systems, existing software installations, etc.).

While the focus of this class is on statistical natural language processing, we elect to take a bit of time at the beginning of the course to walk everyone through setting up a uniform development environment known to be compatible with all assignments and tutorials used in this course. Adopting a uniform development environment helps us provide better technical support and a single set of up-to-date instructions.

We feel it is important that the development environment be something freely available to everyone. There are many free and open source distributions of Linux that are lightweight and run on a variety of hardware configurations. We'll be using one of these distributions.

As an added bonus, familiarizing yourself with the technologies we'll use in this course for things like version control and reproducibility may make you more productive in the future.

### What's the turnaround time for grading assignments?

Normally, we should have assignments graded and posted within a week.

### What is the policy on due dates and late work?

Make sure that you don’t wait until the last minute to start your work. Late work will not be accepted.

### Where should I ask questions?

If you have questions about the course, I'd prefer you share them through the course forum. If it's something you don't want others to see, you can sending me a direct message (and consider including our TA(s)).

If the forum is ever down and you need to reach me about the course, you can send me an email with [LING 529] in the subject line (but think of email as a last resort).

For planning purposes, please note that your instructor responds to posted questions MWF from 9AM–5PM (MST). Typically, you can expect a response within a day.

### Where can I find information on what assignments are due each week?

From the Content link from the course D2L page, you will find different units referencing the material that you will complete each week of the course. You'll start with Unit 0 where you'll set up your development environment for the course. Due dates will be listed in each unit and in the course calendar. To keep things predictable, all units have the same general structure.

Each unit has links to lessons, videos, readings, and assignments/activities. Be sure to check out the Unit Overview link for each new unit.

The course calendar (accessible from the nav bar) provides a good overview of all due dates. To help avoid missing important deadlines, I recommend that you enter all due dates in the calendar system (ex. Google Calendar, Microsoft Outlook, etc.) you are most comfortable using and set reminders.

### How is attendance taken?

We don't take attendance in this course.

### So this class will be less work than a traditional face-to-face class?

This is a common misconception that online classes are inherently easier than face-to-face classes. In actuality, they are quite different in structure, pacing, and the way that you have to manage your time in order to be successful. While you will be meeting the same outcomes and learning the same material as a face-to-face class, the way that you participate changes. You lose the synchronous element (i.e., sitting in a chair in a room on campus with me and your classmates), but you gain a good deal more asynchronous reading, writing, listening, thinking, and responding.

Note too that this is a 3-unit 7.5-week class. That means that the expectations for content and workload are the same as a 3-unit 15-week class, but in half the time.

### Do I need any special equipment?

The assignments are designed to be run using docker in a Linux environment (the first unit guides you through configuring your development environment). While it may be possible to run the assignments on Windows or another OS with some modifications, technical support will only be provided for the Linux environment introduced in the first unit. If you decide to use a virtual machine (as demonstrated in the first unit), we recommend you run this on a system with a minimum of 8GB of RAM and 50GB of storage. If you opt to install Linux natively (not required), you can probably get by with half the amount of RAM.