top of page

Transfer Learning for NLP

Updated: Feb 5, 2021



At one point of time in life, we have all wondered whether there will come a day when machines would be able to hold a conversation with humans entirely on their own. It has been portrayed in a lot of movies like Iron man, Terminator series etc. where a machine can speak and understand any human language. A lot of science fiction books and cartoons also fantasize this idea. Wouldn’t it be nice if Chitti ( movie: Robot, 2010), Jarvis ( movie: Iron man), Optimus Prime ( movie: Transformers), Doraemon etc., were a real thing? This notion has been around for quite some time now. Is it realizable? Today we will discuss about the possibility of achieving such a feat.

The research towards making machine capable of having conversations began in the 1950s. This is when the idea of artificial intelligence was first introduced. It was Alan Turing who explored the possibility of AI and suggested that like humans, machines can also make use of the available information and reason to solve problems and make decisions. In his paper COMPUTING MACHINERY AND INTELLIGENCE, Turing laid down a framework for building machines and testing their intelligence.


Initially scientists were able to make machines respond via text to answer certain predefined questions. Gradually, they were able to get the response in various other forms such as voice, gesture etc. but the scope of the machine's response was limited to predefined set of questions (or rules) only. If a user asked a question which wasn't pre-programmed into the machine, then the machine failed to provide a response.


Now we have come a long way. Compared to back then, today we have made a tremendous progress in the field of Artificial Intelligence. Unfortunately, we still haven't produced a machine which can talk in any language perfectly. The main reason behind this is the complexity of the language. With the power of machine learning and Natural Language Processing (NLP), we may be able to teach the human language to our machines.


Natural Language Processing (NLP)

Let us first briefly discuss what NLP is:

It is a branch of artificial intelligence that helps machine to understand, interpret and manipulate human language. It converts the user input into sentences and words. It also processes the text through a series of techniques, for example, converting it all to lowercase or correcting spelling mistakes before determining if the word is an adjective or a verb. Natural Language Processing (NLP) comprises of the below steps:

  1. Tokenization –The NLP filters set of words in the form of tokens.

  2. Sentiment Analysis –The machine interprets the user responses to align with their emotions.

  3. Normalization –It checks the typo errors that can alter the meaning of the user query.

  4. Entity Recognition –The machine looks for different categories of information required.

  5. Dependency Parsing –The machine searches for common phrases that users want to convey.


The main problem with training the machine learning models using NLP is that they remain domain specific. The model fails when it encounters unseen conditions. In simple words, a model needs to be trained every time to work with different data in order to perform accurately. Also, in real world problems, the data is not always available in sufficient amount, or the data may not be clean, this leads to a model which makes poor generalization (performing many tasks with one model). Training a model can take up from hours to days based on how large the data is, doing this is not cost-effective.


To solve this problem, we want the model to learn from the past in order to deal with the conditions of the present, just like us humans. So, that we don't have to start from scratch each time. How can we do this? This is where Transfer learning comes in.



Transfer Learning


It is the ability to transfer knowledge from a pre-trained machine learning model to a new condition. To elaborate further, it is a process in which a model is trained on a large database and then that model is used to produce results for other relatable tasks.


Advantages of transfer learning:

  • It allows simpler training requirements

  • Minimizes the use of memory space

  • It allows considerably fast training of models i.e. it takes just a few seconds to train a model instead of hours and days.


Why transfer learning in NLP?


Many NLP tasks requires a common knowledge about a language, for example, structural similarities, linguistic representation etc., this can be achieved with transfer learning. Also. knowledge about syntax, semantics etc., from a model can be used to inform other tasks. Moreover, transfer learning helps to generalise the models on various target tasks and is thus desirable in NLP.

Types of transfer learning in NLP:


There are different types of transfer learning common in NLP; these can be broadly classified into three types based on:


a) Whether the source and target settings deal with the same task; and

b) The nature of the source and target domains; and

c) The order in which the tasks are learned.


As per Pan and Yang [2010], transfer learning can be mainly divided in to two main categories: transductive and inductive, which is further divided into domain adaption, cross-lingual learning, multi-task learning and sequential transfer learning.

The following figure represents the same.


Domain Adaptation:

This is the most commonly used type in industries where we want to use a model trained on a task in a domain for another domain. It can be done with either no or minimal label data for target.


Cross Lingual Learning:

It enables us to compare words across different languages. It is important for tasks like translation and cross-lingual retrieval. But more importantly, these integrations can help us transfer knowledge from resource-rich to resource-poor languages by providing a common represent space.


Types of alignments used to learn cross-lingual word integration/embedding:

  • Word-level alignment: It uses dictionaries containing word-pairs in different languages. It is the most common approach and can also make use of other modalities like images.

  • Sentence-level alignment: It uses sentence pairs which are similar to those used for making machine translation systems. They typically use Europarl corpus which is a sentence-aligned corpus of proceedings of European parliament.

  • Document-level alignment: It requires parallel documents which have aligned translated sentences. As it’s rare to get such documents, comparable documents are used more often. Such data can be created using topics of Wikipedia and gathering data in different languages.


Multi Task Learning (MTL):

In general, models are trained for a specific task only. This limits the model to accomplish other relatable tasks. If the model is trained for dealing with multiple tasks then it will become more generalized by sharing representations for all relatable tasks. Training the model to deal with multiple tasks is called Multi Task Learning (also known as joint learning).

The beauty of multi-task learning comes from using the same parameters for different tasks.


Sequential Transfer Learning (STL):

It involves transfer of knowledge with a sequence of steps where the source and target task are not similar. Here tasks are learnt in two stages. The first stage consists of pre-training the model on source data and the second stage includes training the source model for target task (also known as adaptation).

The pre-training task is usually costly but is only performed once. The adaption task is usually faster as it acts like a fine-tuning step.


STL is useful in three cases:

  • Source and target task data is not available at the same time

  • Source task has more data than the target task

  • Adaptation to many target tasks is required

It looks similar to MTL but is very different in the way knowledge transfer takes place. In MTL, both the source and target are trained together while in STL, first the source is trained and later target is trained.


STL is the most popular technique at present.


Some of the models used in transfer learning for NLP are: ELMo, BERT, ULMFiT, OpenAI Transformer. You can look them up.


Let's summarize the methodology to achieve transfer learning:


First and foremost the data on which the source model would be trained is acquired. Usually this dataset is very large. The source model is trained on this data, it is called pre-training. The pretrained model is modified to a target task ( adaptation).


Since, pretraining the model on a large scale data is costly in terms of computation, it's best to use open-source models whenever possible. Although, if you need to train your own model then you should share it with the community.


The only limitation in achieving a perfect model is the complexity of languages. A model trained on one language can't be used for another language because of the difference in grammatical formations. But, with the advancement in NLP and transfer learning the situation seems hopeful. In the coming years, AI driven robots which are perfectly able to speak in any language with complete comprehension may become a reality.


To get any kind of assistance on the above mentioned topics feel free to leave us a mail at contact@codersarts.com
bottom of page