Welcome to Statistical Natural Language Processing!
This fully online offering of the course has a compressed format with a full semester's worth of content delivered asynchronously over 7.5 weeks.
From the course catalog:
This course introduces the key concepts underlying statistical natural language processing. Students will learn a variety of techniques for the computational modeling of natural language, including: n-gram models, smoothing, Hidden Markov models, Bayesian Inference, Expectation Maximization, Viterbi, Inside-Outside Algorithm for Probabilistic Context-Free Grammars, and higher-order language models.
The main programming language used in the course will be Python (3.8).
This is an online, asynchronous course. Content is released in a staggered fashion via the course home page.
Hi! My name is Gus Hahn-Powell. I'll be your instructor for this course.
I'm a computational linguist interested in ways we can use natural language processing to accelerate scientific discovery by mining millions of scholarly documents.
In my free time, I enjoy climbing (mostly bouldering these days), reading/watching scifi & historical fiction, "fixing" things, communicating with cats, and playing video games (mostly indie stuff).
hahnpowell AT arizona DOT edu
|See our course page on D2L
While helpful, you don't need a background in linguistics and advanced math to take this course. We'll cover the necessary pieces in class.
This is a hands-on algorithms course. As such, you must be comfortable programming ( 2 semester's worth of programming coursework or equivalent experience). Familiarity with Python and and using and defining classes is ideal. If you've never used Python, you'll need to learn the basics to complete the programming assignments (some basic exercises/tutorials will be provided).
|ISTA 355: Intro to NLP
|LING 529: HLT I
Python is an open-source programming language that is widely used in both academia and industry. It has some very useful and popular libraries for linear algebra and machine learning (Numpy, Tensorflow, PyTorch, MXNet, etc.). We'll learn how to use some of these in this course.
A great deal of time can be wasted trying to install and configure software. Most of these issues relate to differences in the starting configuration of users (operating systems, existing software installations, etc.).
While the focus of this class is on statistical natural language processing, we elect to take a bit of time at the beginning of the course to walk everyone through setting up a uniform development environment known to be compatible with all assignments and tutorials used in this course. Adopting a uniform development environment helps us provide better technical support and a single set of up-to-date instructions.
We feel it is important that the development environment be something freely available to everyone. There are many free and open source distributions of Linux that are lightweight and run on a variety of hardware configurations. We'll be using one of these distributions.
As an added bonus, familiarizing yourself with the technologies we'll use in this course for things like version control and reproducibility may make you more productive in the future.
Normally, we should have assignments graded and posted within a week.
Make sure that you don’t wait until the last minute to start your work. Late work will not be accepted.
If you have questions about the course, I'd prefer you share them through the course forum. If it's something you don't want others to see, you can sending me a direct message (and consider including our TA(s)).
If the forum is ever down and you need to reach me about the course, you can send me an email with
[LING 439/539] in the subject line (but think of email as a last resort).
For planning purposes, please note that your instructor responds to posted questions Monday & Friday between 9AM–5PM (MST). Typically, you can expect a response within a day.
From the Content link from the course D2L page, you will find different units referencing the material that you will complete each week of the course. You'll start with Unit 0 where you'll set up your development environment for the course. Due dates will be listed in each unit and in the course calendar. To keep things predictable, all units have the same general structure.
Each unit has links to lessons, videos, readings, and assignments/activities. Be sure to check out the Unit Overview link for each new unit.
The course calendar (accessible from the nav bar) provides a good overview of all due dates. To help avoid missing important deadlines, I recommend that you enter all due dates in the calendar system (ex. Google Calendar, Microsoft Outlook, etc.) you are most comfortable using and set reminders.
You will be able to access your grades via the D2L Grades tab.
We don't take attendance in this course.
This is a common misconception that online classes are inherently easier than face-to-face classes. In actuality, they are quite different in structure, pacing, and the way that you have to manage your time in order to be successful. While you will be meeting the same outcomes and learning the same material as a face-to-face class, the way that you participate changes. You lose the synchronous element (i.e., sitting in a chair in a room on campus with me and your classmates), but you gain a good deal more asynchronous reading, writing, listening, thinking, and responding.
Note too that this is a 3-unit 7.5-week class. That means that the expectations for content and workload are the same as a 3-unit 15-week class, but in half the time.
The assignments are designed to be run using docker in a Linux environment (the first unit guides you through configuring your development environment). While it may be possible to run the assignments on Windows or another OS with some modifications, technical support will only be provided for the Linux environment introduced in the first unit. If you decide to use a virtual machine (as demonstrated in the first unit), we recommend you run this on a system with a minimum of 8GB of RAM and 50GB of storage. If you opt to install Linux natively (not required), you can probably get by with half the amount of RAM.