Model moves computers closer to understanding human conversation – The Hub at Johns Hopkins

Model moves computers closer to understanding human conversation – The Hub at Johns Hopkins

Wick Eisenberg


Dec 20, 2021

An engineer from the Johns Hopkins Center for Language and Speech Processing has developed a machine learning model that can distinguish functions of speech in transcripts of dialogues outputted by language understanding, or LU, systems in an approach that could eventually help computers “understand” spoken or written text in much the same way that humans do.

Developed by CLSP Assistant Research Scientist Piotr Zelasko, the new model identifies the intent behind words and organizes them into categories such as “Statement,” “Question,” or “Interruption,” in the final transcript: a task called “dialog act recognition.” By providing other models with a more organized and segmented version of text to work with, Zelasko’s model could become a first step in making sense of a conversation, he said.

“This new method means that LU systems no longer have to deal with huge, unstructured chunks of text, which they struggle with when trying to classify things such as the topic, sentiment, or intent of the text. Instead, they can work with a series of expressions, which are saying very specific things, like a question or interruption. My model enables these systems to work where they might have otherwise failed,” said Zelasko, whose study appeared recently in Transactions of the Association for Computational Linguistics.

Image caption: Piotr Zelasko, assistant research scientist at the Johns Hopkins Center for Language and Speech Processing

Image credit: Courtesy of Piotr Zelasko

In that paper, Zelasko adapts some recently introduced language-understanding models with the goal of organizing and categorizing words and phrases, and investigates how different variables, such as punctuation, affect those models’ performance.

“We found that punctuation provides the models with very strong cues that do not seem to be otherwise present in the text, such as the content of a conversation,” Zelasko said.

During his time in industry working on human-to-human conversational analytics, Zelasko noticed that many natural language processing algorithms operate well only when the text has a clear structure, such as when a person speaks in complete sentences. However, in real life, people seldom speak so formally, making it difficult for systems to ascertain exactly where a sentence starts and ends. Zelasko wanted to make sure his system could understand ordinary conversation.

“This is where the ‘dialog act’ framework comes in,” Zelasko said. “With that, we can at least find ‘units’ of a conversation. This can possibly help with a large range of tasks such as summarization, intent recognition, and the detection of key phrases.”

Zelasko believes that his model could eventually help companies that use speech analytics, a process that some businesses use to gain insights from analysis of interactions between customers and call center customer service representatives. Speech analytics usually involve automatic transcription of conversation and keyword searches, which Zelasko says provide limited opportunities for insight.

“With the old approach, you might be able to say that highlights of a conversation involve whatever type of phone the customer owns, ‘technical issues,’ and ‘refund,’ …….