Deep Learning for NLP and Speech Recognition

Jimmy Whitaker
3 min readJun 20, 2019

A comprehensive resource for deep learning in natural language processing and speech recognition.

Cover of Deep Learning for NLP and Speech Recognition

I am extremely excited to announce the availability of our textbook: Deep Learning for NLP and Speech Recognition!

Deep learning has quickly become a foundational technique in almost every AI and machine learning application. The ability to learn complex concepts directly from data and achieve much higher accuracies have rocketed it to one of the most popular research areas in technology. When my colleagues and I began putting this book together, we noticed that there wasn’t a single resource that covered the modern deep learning approaches in the areas of NLP and Speech Recognition. Instead, the necessary resources were scattered between academic papers, conference recordings, portions of textbooks, medium blogs, or even derived from other areas of research (such as computer vision). In our book, we sought to provide a single resource that comprehensively covers the intersection of deep learning, NLP, and automatic speech recognition (ASR). In this blog, I’d like to give an overview of the book to tell you a little bit about what we thought were some of the areas that we thought would be the most important concepts.

Book Contents

Within this book, we introduce a thorough survey and exploration of deep learning techniques that have led to state-of-the-art quality on a variety of natural language processing tasks. From text classification, to machine translation, to speech recognition, deep learning is playing a pivotal role. In general, the target audience is graduate students and NLP practitioners, due to the depth of some of the mathematical explanations, however many of the core concepts and case studies can provide insight for those who are less math-savvy.

Speech Recognition

Recent years have given way to more improvements in Speech Recognition than many could ever imagine.

Fig. 8.8: Diagram of statistical speech recognition

We introduce the fundamental elements of speech, discussing how common features (spectrograms, MFCCs, etc.) are extracted and the traditional, probabilistic approaches modeled speech. We then show how deep learning began to be incorporated into ASR with feature extraction and phoneme classification.

Fig. 12.9: End-to-end speech processing network from [KHW17]

In our advanced speech recognition chapter, we explore end-to-end deep learning architectures, such as RNN, CNN, and attention-based networks. We also explore the application of language models to improve quality through various types of language model fusion.

Case Studies

Each chapter is accompanied by a case study, showing the application of the techniques introduced in the chapter. In general, the case studies rely on or extend open source libraries that leverage these techniques. Apart from a few exceptions, the majority of the case studies are given in Python. It has become the default language interacting with machine learning libraries due to its presence in academic communities and general popularity. You can find the source code for these case studies in our Github Repository.

In the ASR case studies, focus on building speech recognition systems on the Common Voice dataset. Through case studies in two chapters we explore four separate open source projects that are commonly used for speech recognition and compare the results.

Fig. 12.17: Output from the base Deep Speech 2 model. Note how many of the
mistakes seem phonetic and create nonlogical words, such as shee and ashe

Conclusion

The fields of NLP and Speech are currently growing at incredible speeds. We hope that this book serves as a useful resource to understand the foundational elements as well as some of the advanced concepts that have been introduced in recent years.

Purchase on Amazon

Purchase on Springer

--

--

Jimmy Whitaker

Applying AI the right way | Chief Scientist — AI & Strategy @HPE | Computer Science @UniOfOxford | Published @SpringerCompSci