| CARVIEW |
Posts tagged "ai"
Transcript Episode 98: Helping computers decode sentences - Interview with Emily M. Bender
This is a transcript for Lingthusiasm episode ‘Helping computers decode sentences - Interview with Emily M. Bender’. It’s been lightly edited for readability. Listen to the episode here or wherever you get your podcasts. Links to studies mentioned and further reading can be found on the episode show notes page.
[Music]
Lauren: Welcome to Lingthusiasm, a podcast that’s enthusiastic about linguistics! I’m Lauren Gawne. Today, we’re getting enthusiastic about computers and linguistics with Professor Emily M. Bender.
But first, November is our traditional anniversary month! This year, we’re celebrating eight years of Lingthusiasm. Thank you for sharing your enthusiasm for linguistics with us. We’re also running a Lingthusiasm listener survey for the third and final time. As part of our anniversary celebrations, we’re running the survey as a way to learn more about our listeners, get your suggestions for topics, and to run some linguistics experiments. If you did the survey in a previous year, there’re new questions, so you can totally participate again this year. There’s also a spot for asking us your linguistics advice questions, since our first linguistics advice bonus episode was so popular.
You can hear about the results of the previous surveys in two bonus episodes, which we’ll link to in the show notes. We’ll have the results from this year’s survey in an episode for you next year. To do the survey or read more details, go to bit.ly/lingthusiasmsurvey24 – that’s bit.ly/lingthusiasmsurvey24 (the numbers 2 and 4) – before December 15 anywhere on Earth. This project has ethics board approval from La Trobe University, and we’re already including results from previous surveys into some academic papers. You, too, could be part of science if you do the survey.
Our most recent bonus episode was a linguistics travelogue. We discuss Gretchen’s recent trip to Europe where she saw cool language museums, and what she did to prepare for encountering several different languages on the way, as well as planning our fantasy linguistic excursion to Martha’s Vineyard. Go to patreon.com/lingthusiasm to hear this and many more bonus episodes and to help keep the show running ad-free.
Also, very exciting news from Patreon, which is that they’re finally adding the ability to buy Patreon memberships as a gift for someone else. If you’d be excited to receive a Patreon membership to Lingthusiasm as a gift, we’ll have a link in the show notes for you to forward to your friends and/or family with a little wink wink, nudge nudge. We also have lots of Lingthusiasm merch that makes a great gift for the linguistics enthusiast in your life.
[Music]
Lauren: Today, I am delighted to be joined by Emily M. Bender who is a professor at the University of Washington in the Department of Linguistics. She is the director of the Computational Linguistics Laboratory there. Emily’s research and teaching expertise is in multilingual grammar engineering and societal impacts of language technologies. She runs the live-streaming podcast Mystery AI Hype Theater 3000 with sociologist Dr. Alex Hanna. Welcome to the show, Emily!
Emily: I am so enthusiastic to be on Lingthusiasm.
Lauren: We are so delighted to have you here today. Before we ask you about some of your current work with computational linguistics, how did you get into linguistics?
Emily: It was a while ago. Back when I was in high school, we didn’t have things like the Lingthusiasm podcast – or podcasts for that matter – to spread the word about what linguistics was. I actually hadn’t heard about linguistics until I got to university. Someone gave me the excellent advice to get the course catalogue ahead of time – it was a physical book in those days – and just flip through it and circle anything that looked interesting. There was this one class called “An Introduction to Language.” In my second term, I was looking for a class that would fulfil some kind of requirements, and it did, and I took it. Let me tell you, I was hooked on the first day. Even though the first day was actually about the bee dance and other animal communication, I just fell in love with it immediately. I think, honestly, I had always been a linguist. I loved studying languages. My ideal undergraduate course of study would’ve been, like, take the first year of all the languages I could.
Lauren: That would be an amazing degree. Just like, “I have a bachelors in introductory language.”
Lingthusiasm Episode 98: Helping computers decode sentences - Interview with Emily M. Bender
When a human learns a new word, we’re learning to attach that word to a set of concepts in the real world. When a computer “learns” a new word, it is creating some associations between that word and other words it has seen before, which can sometimes give it the appearance of understanding, but it doesn’t have that real-world grounding, which can sometimes lead to spectacular failures: hilariously implausible from a human perspective, just as plausible from the computer’s.
In this episode, your host Lauren Gawne gets enthusiastic about how computers process language with Dr. Emily M. Bender, who is a linguistics professor at the University of Washington, USA, and cohost of the podcast Mystery AI Hype Theater 3000. We talk about Emily’s work trying to formulate a list of rules that a computer can use to generate grammatical sentences in a language, the differences between that and training a computer to generate sentences using the statistical likelihood of what comes next based on all the other sentences, and the further differences between both those things and how humans map language onto the real world. We also talk about paying attention to communities not just data, the labour practices behind large language models, and how Emily’s persistent questions led to the creation of the Bender Rule (always state the language you’re working on, even if it’s English).
Click here for a link to this episode in your podcast player of choice or read the transcript here.
Announcements:
The 2024 Lingthusiasm Listener Survey is here! It’s a mix of questions about who you are as our listener, as well as some fun linguistics experiments for you to participate in. If you have taken the survey in previous years, there are new questions, so you can participate again this year.
In this month’s bonus episode we get enthusiastic about three places where we can learn things about linguistics!! We talk about two linguistically interesting museums that Gretchen recently visited: the Estonian National Museum, as well as Mundolingua, a general linguistics museum in Paris. We also talk about Lauren’s dream linguistics travel destination: Martha’s Vineyard.
Join us on Patreon now to get access to this and 90+ other bonus episodes. You’ll also get access to the Lingthusiasm Discord server where you can chat with other language nerds.
Also, Patreon now has gift memberships! If you’d like to get a gift subscription to Lingthusiasm bonus episodes for someone you know, or if you want to suggest them as a gift for yourself, here’s how to gift a membership.
Here are the links mentioned in the episode:
- Emily Bender
- Emily Bender on Bluesky and Twitter
- Mystery AI Hype Theater 3000
- Mystery AI Hype Theater 3000: The Newsletter
- The AI Con by Emily M. Bender and Alex Hanna
- ‘Data Sovereignty and the Kaitiakitanga License’ on Te Hiku
- wordfreq by Robyn Speer on GitHub
- Lingthusiasm Episode ‘Making machines learn language - Interview with Janelle Shane’
- Bonus with Janelle Shane: we do a dramatic reading of the funniest auto-generated Lingthusiasm episodes
You can listen to this episode via Lingthusiasm.com, Soundcloud, RSS, Apple Podcasts/iTunes, Spotify, YouTube, or wherever you get your podcasts. You can also download an mp3 via the Soundcloud page for offline listening.
To receive an email whenever a new episode drops, sign up for the Lingthusiasm mailing list.
You can help keep Lingthusiasm ad-free, get access to bonus content, and more perks by supporting us on Patreon.
Lingthusiasm is on Bluesky, Twitter, Instagram, Facebook, Mastodon, and Tumblr. Email us at contact [at] lingthusiasm [dot] com
Gretchen is on Bluesky as @GretchenMcC and blogs at All Things Linguistic.
Lauren is on Bluesky as @superlinguo and blogs at Superlinguo.
Lingthusiasm is created by Gretchen McCulloch and Lauren Gawne. Our senior producer is Claire Gawne, our production editor is Sarah Dopierala, our production assistant is Martha Tsutsui Billins, our editorial assistant is Jon Kruk, and our technical editor is Leah Velleman. Our music is ‘Ancient City’ by The Triangles.
This episode of Lingthusiasm is made available under a Creative Commons Attribution Non-Commercial Share Alike license (CC 4.0 BY-NC-SA).
“
Gretchen: I’ve got the blogpost up about the ice cream flavours from the middle school students, and some of them are really good. There are these whimsical flavours like “It’s Sunday” and “Cherry Poet” and “Brittle Cheesecake” and “Honey Vanilla Happy.” These seem like kind of reasonable ice cream flavours, right?
Lauren: I’d be open to ordering a “Vanilla Nettle.”
Gretchen: “Cherry Cherry Cherry.” If you like cherries, this is the flavour for you. There are also some weirder flavours from this data set like, “Chocolate Finger” and “Caramel Book” and –
Lauren: “Washing Chocolate.”
Gretchen: “Texas Charlie Covered Stunt.” Then, there’s this even weirder category, “Nuts with Mattery,” “Brown Crunch,” “Cookies and Green.”
Lauren: Aww, so close, and yet…
Gretchen: “Mango Cats.”
Lauren: They’re weird to us because of the semantics of them – just to be linguist-y and spoil the moment for a second – but they still are English words, or they look like something we’d recognise as English words, even though I don’t think “mattery” is a word that I know of. I think it’s worth saying artificial intelligence doesn’t know what ice cream is, right, it’s just using this list of flavours to figure out what kind of patterns could fit into that list.
Janelle: Exactly. It’s doing it at a very basic level. Like, what kinds of letters tend to come after other letters? What letters are we often finding in combination? Which letters are we never finding in combination? It’ll learn frequent words like “chocolate” or something. It’ll learn how to spell that after some false starts during training, but, yeah, without any concept of what chocolate is.
Gretchen: If it ends up with something like “Vervette’s Caramel Borfle,” it learned “caramel” but who “Vervette” and “borfle” are, I don’t know. That’s just randomly combining some letters in ways that are probable as English words.
Janelle: Yeah, it’s like a kid who learns how to write and immediately starts putting down letters on paper like, “Is this a word? Is this a word? How do you pronounce this?”
”—
Excerpt from Episode 40 of Lingthusiasm: Making machines learn language - Interview with Janelle Shane
Listen to the episode, read the full transcript, or check out more links about pragmatics, language and society, and further interviews.
Transcript Episode 40: Making machines learn language - Interview with Janelle Shane
This is a transcript for Lingthusiasm Episode 40: Making machines learn language - Interview with Janelle Shane. It’s been lightly edited for readability. Listen to the episode here or wherever you get your podcasts. Links to studies mentioned and further reading can be found on the Episode 40 show notes page.
[Music]
Lauren: Welcome to Lingthusiasm, a podcast that’s enthusiastic about linguistics! I’m Lauren Gawne.
Gretchen: I’m Gretchen McCulloch. Today, we’re getting enthusiastic about artificial intelligence – teaching computers language – with special guest Dr Janelle Shane, who runs the blog A.I.weirdness.com and is the author of You Look Like a Thing and I Love You, which is a fun new book about A.I. But first, we have some announcements.
Lauren: It’s a new year and we have new, big, exciting plans for the Lingthusiasm Patreon page. We are introducing a Discord, which is an online chat space, for patrons to share their lingthusiasm with their fellow lingthusiasts.
Gretchen: We’ve heard from a lot of you that you got into linguistics because of Lingthusiasm or it reawakened your memories of how much you like linguistics because you did some courses on it way back when and now you wish you could talk about linguistics more. We’re giving you a space where you can talk about linguistics, share your interesting linguistics links that you come across, and talk about them in a space with other lingthusiasm fans. We’re really excited to see what this community becomes. It’s a bit of an experiment, but we think it’ll be really fun to do. You can join the Patreon at the tier where you get bonus episodes as well, and you also have a space to talk about those bonus episodes and the regular Lingthusiasm episodes and any other linguistics things you wanna talk about.
Lauren: We want to see more Lingthusiasm not just online but also on all kinds of things, which is why we are also sending stickers over the next few months to patrons at the Ling-phabet tier. Patrons who are at that tier for three months or more will get stickers that say, “Lingthusiast” on them.
Gretchen: You can stick that to your laptop, your water bottle, your notebook, anything else in your life. Because the original trial run of stickers that we did with the special offer last year were really popular, we thought we’d provide a way for you to do that around the year. You can join that tier on Patreon as well.
Lauren: You can get other items at our lingthusiasm.com/merch page, but the stickers are an exclusive for our patrons.
Gretchen: Thanks to everybody who’s been a patron so far. We’re really excited to see you in the Discord. And we’re excited to get to try that out.
Lauren: Our last exciting announcement is that our patrons also helped us meet a new funding goal, which means that we now have some additional ling-ministration support.
Gretchen: Our fantastic producer Claire, who’s been with us since the very beginning, is also going to be taking on some more of the administration for the podcast, so you’ll see her around a bit on social media and on Patreon. You can listen to a bonus episode with Claire if you’d like to get to know her better as well.
Lauren: Our current bonus is on the future of English and what English might look like in a couple of centuries from now, inspired by Gretchen’s New York Times article.
Gretchen: You can get access to this episode and 34 other bonus episodes – that’s twice as much Lingthusiasm that you can listen to – at patreon.com/lingthusiasm.
[Music]
Gretchen: Hello, Janelle. Welcome to Lingthusiasm!
Janelle: Hi, it’s great to be here.
Lauren: Janelle, we are so excited to have you on the show today to talk about how we can make machines do language.
Gretchen: I think one of the things that we have in common, definitely one of the reasons I enjoy following your blog and Twitter feed and so on, is that both linguists and your approach to A.I. like poking at systems and seeing where they break.
Janelle: Yeah, for sure.
Gretchen: In case some people aren’t already following you on all of the internets, I wanna give people an idea of some of the stuff that you have tried to make break.
Lauren: Janelle, in your work, for people who haven’t seen it, you take large data sets of particular sets of terms or particular language genres, I guess, and then you feed them into an artificial intelligence, and we’ll talk about what that is later, and then it spits out these delightfully whimsical outputs. It takes inspiration from the data set that it’s given. I have a sustained history of laughing inappropriately loudly on public transport while reading your blog because the results are always so entertaining. Gretchen, do you have a favourite to share with us so I can chortle inappropriately?
Gretchen: Lauren, I think we should start with ice cream because I know you have a deep and abiding love of ice cream, and Janelle has come up with ice cream flavours.
Lauren: Yes! Yes, yes, yes. Janelle, where did the ice cream data come from? Did you have a list of ice cream flavours that someone gave you or…?
Janelle: Yeah. In this case, it was a group of middle-schoolers, actually. There’s a school in Austin, Texas, called Kealing Middle School where there is a group of students in the coding classes who decided that – they saw my blog. They wanted to do it too, and they wanted to generate ice cream flavours.
Lauren: Aww.
Gretchen: That’s so great!
Janelle: The thing is, I had looked at that, and I’m like, “Oh, this would be cool.” Then, I looked online and I say, “I need examples of existing ice cream flavours” because the A.I. has to have something to imitate. It doesn’t know about ice cream flavours unless I have some to tell it about. They’re scattered around. There wasn’t any big master list of them. So, I kinda said, “Oh, well. I guess that’s not gonna work.” Then, these middle-schoolers kicked my butt because they went and there was, I dunno, dozens of them – 50, 60 of them. Like, a lot of them. Each of them went and collected a few from this site or that site. Each one site would only have a few at a time. They had to manually copy and paste to this data set. They just, through the sheer numbers and having the time to do it, they put together this amazing data set of existing ice cream flavours. These middle-schoolers ended up getting about 1600 different ice cream flavours. Whereas, I only managed to get together 200. With the data set that much bigger, it made a huge difference. They started generating pretty amusing flavours.
Bonus 36: Generating a Lingthusiasm episode using a neural net
Lauren: Welcome to Lingthusiasm, a podcast that’s enthusiastic about linguistics! I’m Lauren Gawne.
Gretchen: And I’m Lauren Gawne.
Lauren: And I’m Gretchen McCulloch. So I mentioned some of my favourite Harry Potter books, and I found Harry Potter and the Cursed Child –
Gretchen: The Cruciatus Linguists?
This is what happens when the robots take over Lingthusiasm.
In this bonus episode of Lingthusiasm, Janelle Shane, our interview guest from episode 40, demonstrates to Gretchen and Lauren how to train an artificial intelligence called GPT-2 on our 70+ Lingthusiasm transcripts, of both main and bonus episodes. Together, we create on air Robo-Lauren and Robo-Gretchen, cohosts of Robo-Lingthusiasm, a podcast that is enthusiastic (but not very coherent) about linguistics.
We then perform some of our favourite Robo-Lingthusiasm snippets that the AI generated, including the part where we prompted the neural net on a mix of Lingthusiasm episodes and Harry Potter fanfiction and got some really, um, magical results. (We couldn’t decide whether to make a regular-length episode or keep in more of the funny examples, so we’ve split the difference by putting extra examples after the credits, so you can decide whether you want to listen to them or not.) Needless to say, none of the Robo-Lingthusiasm parts should be taken as accurate information about linguistics.
We also have an early patron-only announcement about a new public project in 2020.
Lingthusiasm Episode 40: Making machines learn language - Interview with Janelle Shane
If you feed a computer enough ice cream flavours or pictures annotated with whether they contain giraffes, the hope is that the computer may eventually learn how to do these things for itself: to generate new potential ice cream flavours or identify the giraffehood status of new photographs. But it’s not necessarily that easy, and the mistakes that machines make when doing relatively silly tasks like ice cream naming or giraffe identification can illuminate how artificial intelligence works when doing more serious tasks as well.
In this episode, your hosts Gretchen McCulloch and Lauren Gawne interview Dr Janelle Shane, author of You Look Like A Thing And I Love You and person who makes AI do delightfully weird experiments on her blog and twitter feed. We talk about how AI “sees” language, what the process of creating AI humour is like (hint: it needs a lot of human help to curate the best examples), and ethical issues around trusting algorithms.
Click here for a link to this episode in your podcast player of choice or read the transcript here
Announcements:
Janelle helped us turn one of the big neural nets on our own 70+ transcripts of Lingthusiasm episodes, to find out what Lingthusiasm would sound like if Lauren and Gretchen were replaced by robots! This part got so long and funny that we made it into a whole episode on its own, which is technically the February bonus episode, but we didn’t want to make you wait to hear it, so we’ve made it available right now! This bonus episode includes a more detailed walkthrough with Janelle of how she generated the Robo-Lingthusiasm transcripts, and live-action reading of some of our favourite Robo-Lauren and Robo-Gretchen moments.
Also for our patrons, we’ve made a Lingthusiasm Discord server – a private chatroom for Lingthusiasm patrons! Chat about the latest Lingthusiasm episode, share other interesting linguistics links, and geek out with other linguistics fans. (We even made a channel where you can practice typing in the International Phonetic Alphabet, if that appeals to you!)
Here are the links mentioned in this episode:
- Bonus robo-generated Lingthusiasm episode
- Lingthusiasm now has a Discord for patrons!
- Janelle Shane’s AI Weirdness blog
- Janelle Shane on Twitter (@JanelleCShane)
- Janelle Shane’s website
- You Look Like a Thing and I Love You (Janelle’s book)
- Janelle Shane’s TED talk about the weirdness of artificial intelligence
- AI Weirdness ice cream
- AI Weirdness recipes
- How many giraffes on the cover of Because Internet?
- AI Weirdness craft beer
- The Fine Stranger beer
- GPT-2
You can listen to this episode via Lingthusiasm.com, Soundcloud, RSS, Apple Podcasts/iTunes, Spotify, YouTube, or wherever you get your podcasts. You can also download an mp3 via the Soundcloud page for offline listening.
To receive an email whenever a new episode drops, sign up for the Lingthusiasm mailing list.
You can help keep Lingthusiasm ad-free, get access to bonus content, and more perks by supporting us on Patreon.
Lingthusiasm is on Twitter, Instagram, Facebook, and Tumblr. Email us at contact [at] lingthusiasm [dot] com
Gretchen is on Twitter as @GretchenAMcC and blogs at All Things Linguistic.
Lauren is on Twitter as @superlinguo and blogs at Superlinguo.
Lingthusiasm is created by Gretchen McCulloch and Lauren Gawne. Our senior producer is Claire Gawne, our production editor is Sarah Dopierala, our production manager is Liz McCullough, and our music is ‘Ancient City’ by The Triangles.
This episode of Lingthusiasm is made available under a Creative Commons Attribution Non-Commercial Share Alike license (CC 4.0 BY-NC-SA).
About Lingthusiasm
A podcast that's enthusiastic about linguistics by Gretchen McCulloch and Lauren Gawne.
Weird and deep conversations about the hidden language patterns that you didn't realize you were already making.
New episodes (free!) the third Thursday of the month.