Traditional Approach to Natural Language Processing

By IT Patasala

Updated On:

Traditional Approach to Natural Language Processing

Join WhatsApp

Join Now

NLP: Definition and Explanation

Natural Language Processing (NLP) is a field of computer science and linguistics that attempts to give machines the capability to comprehend, interpret, and produce human language. In NLP’s initial stages, the standard and traditional approaches used were rule-based and statistical. These approaches formed the basis of the various kinds of NLP today but have evolved into more sophisticated neural networks or deep learning models. However, even as NLP has become more prevalent, there are considerations regarding traditional approaches, mainly when focusing on interpretability, resource optimization, or specific language characteristics.

This document thoroughly studies the traditional NLP techniques of choice, explaining the methods, fields, and areas of application, benefits, and weaknesses with appropriate examples.

Traditional Approach to Natural Language Processing

1. Medieval NLP: Methods and Strategies in Use

The traditional forms of NLP methods have rule-based and statistical approaches. In rule-based approaches, linguists and domain experts develop a set of synthetic regulations based on language and grammar. On the contrary, statistical NLP employs a mathematical model based on extensive data text with a derived probability. The other form of traditional NLP methods is early machine learning algorithms that do not use deep learning, for instance, Naïve Bayes and Support vector machines (SVM).

2. Rule-Based Methods

Natural language processing using a rule-based approach depends on rules that linguists make, spanning the language’s syntax, morphology, and semantics. Distinguishing these rules allows the computations to interpret language constructs in a more organized way.

Regular Expressions

Regular expressions (regex) are strings of characters that set out a pattern for all searches made upon text featured in strings, primarily to search. Email and phone numbers, as well as text patterns, can be automated with regex. While they are not in-depth linguistically, since regular expressions are built upon patterns within languages, they are great for string pattern matching.

Example:

Let’s say I want to retrieve all the email addresses present in a document with the above text:

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

This pattern will retrieve anything that is anywhere that contains the characters example@domain.com or info@company.co

Grammar-Based Parsing

All the types of language parsing that don’t involve machine learning are based on some form of language rule called grammar. These grammars would include things such as subject, verb, and object. By systematically applying these rules, sentences can be deconstructed into their parts.

Example:

“Subject: The Cat – Predicate: sat on the mat” would be an expected output due to parsing the actual language “The cat sat on the mat.” In addition, “the cat sat on the mat” would have a tree structure with “the cat” designated to the subject, while “sat on the mat” would appropriately fit under the predicate.

The recursive descent parser would apply rules recursively to decompose the given sentence into its constituents according to the grammar rules.

Rule-based systems have the following advantages:

Rule-based systems are interpretable and predictable.

They follow structured tasks that don’t have much variety.

However, there are some demerits of Rule-based systems:

Rules have to be devised and updated by practitioners, which is an intricate process.

However, such systems have limitations in dealing with the ambiguity and variability of natural languages.

3. Statistical NLP Techniques

Statistical NLP Techniques use probabilities and statistical models to manipulate and understand language data. These models focus on word frequency and word sequence probability, thus performing NLP functions such as language modeling, text classification, etc,

n-gram Models

n-gram models are probabilistic measures that estimate the probability of a word based on previously mentioned words. An n-gram of order ‘n’ takes n words on focus where “n” describes the quantity in figures; an n-gram with n=1 looks at single words, n=2 considers pairs, and n-3 threes, etc.

Example:

A bigram model assigns probability estimations to the sentence ‘I love NLP’ which approximates the sentence’s estimation as:

P(“I love NLP”)=P(“I”)×P(“love”∣”I”)×P(“NLP”∣”love”)

Applications: Used in predictive text, speech recognition, and earlier machine translation models.

A Hidden Markov Model (HMMs)

Hidden Markov Models are statistical models of sequences of observations commonly used in speech recognition and POS tagging. The strengths are based on the hidden process a set of observed data stems from. Gaps exist between what the states are in NLP and what the actual words are. From the data shown, 2 distinct circles appear. The great one displays the unknown view, and the smallest one displays the overwhelming facts.

Example:

Every sentence has a specific hidden part of the speech tagging for each word. If we have a sentence “the cat sat,” a model could employ the rule determining the maximum likelihood or, rather, the most likely sequence of three finite tags based on the data.

The Advantages of Statistical Models:

Statistical methods efficiently deal with dramatically larger datasets. In addition,

They can capture variation better than strictly rule-based strategies.

The Limitations of Statistical Models:

The complete dependency on the annotated dataset poses a problem.

The semantic assumptions of the probability-based models are lacking at times.

4. Machine Learning Applied in NLP

Before deep learning stepped into the ring, NLP was accustomed to using traditional machine learning algorithms. It encompasses techniques such as the Naïve Bayes algorithm, Support Vector Machines, and Logistic Regression, which make predictions based on feature extraction & engineering.

Naïve Bayes

Naïve Bayes is a generative and predictive statistical model that considers the relation between the input features independent, making the calculation easier. In NLP, Naïve Bayes can be implemented to classify different types of text, i.e., it is used to filter spam messages or analyze human sentiments.

Example:

For email firewall spam detection, the spam filter calculates the probability of email spam by looking at the words contained. For example, when emailing, if win or free, words will be applied, which means the email is spam, and hence, some clients with these words will be marked as spam.

Support Vector Machines

SVMs are supervised learning with a dividing line between two datasets with labels. Finding a hyperplane that is optimal for classifying data points that belong to different classes is the main aim of SVM. In the case of the abuse of point deformation for introducing a more complex model, SVM does it poorly because, in text classification tasks, a high degree of interpretability is needed.

Example:

An SVM could categorize articles like sports news and political news. By rotating every news article into some point in a multidimensional space, an h hyperplane could be classified between these different news articles.

Logistic Regression

Logistic regression is a statistical tool that is characterized by a binary outcome. It differs from the other models in that a specific function seeks to predict a subset’s occurrence. If a numerical predictor exists, logistic regression is suitable for predicting outcomes where only two or more classes exist, such as when constructing a spam message classifier, sentiment analysis, etc.

Example:

According to sentiment analysis, a review can be termed positive or negative based on the presence of defined words through logistic regression.

Advantages of Traditional Machine Learning Models:

The models are much easier to train and explain.

They do well with smaller data sets, especially with excellent features.

Limitations of Traditional Machine Learning Models:

They often need extensive feature engineering to obtain results.

They cannot capture details of complex linguistic nuances as Deep Learning models can.

5. Examples of Applications Using Traditional NLP

Spam Detection

Spam emails can easily be detected by Naïve Bayes classifiers that use prior knowledge about spam’s properties by analyzing word frequencies in the emails.

Sentiment Analysis

If the text has a score of the number of words known to be positive or negative, logistic regression and Naïve Bayes classifiers can decide whether the text is conveying positive or negative sentiment.

Machine Translation

The first translation systems used n-gram models and statistical relationships to determine the probable translation, which formed the basic structure around machine translation, SMT systems.

Named Entity Recognition (NER)

Early NER tasks relied primarily on HMMs and rule-based methods, which performed text analysis to point out chunks of the text coded for names, places, dates, etc.

Speech Recognition

HMMs were fundamental for the early automatic speech recognition systems probabilistically based on audio feature sequences of words.

6. Barriers and Shortcomings

There are serious shortcomings within the conventional natural language processing frameworks.

Data Dependency: It is evident that statistical and machine learning techniques require a significant amount of annotated data to achieve high accuracy.

Limited Scope: There are logical limitations within the n-gram and Markov models since they cannot capture the elaborate figurative language and large-scope relationships.

Graphical Feature Extraction: Though very useful, machine learning approaches rely on extracting and preparing specific features, which is tedious and highly sector-specific.

Scalability: Developments involving rule-based systems have also recorded challenges in scaling across different languages and domains due to the variety of languages.

How do we resolve the failure to pull the Helm chart?

7. Comparison to Contemporary NLP Methods

Although old strategies laid solid foundations, present-day processes use deep learnings offered by networks such as transformers, bidirectional encoder representations from transformers, and generative pre-trained transformer structures, among many others. The performance of these sophisticated models is enhanced in areas such as machine translation and text summarization because these models have contextual, ambiguous, and long-dependent relationships. Traditional methods remain relevant, especially in settings where resources are limited and the interpretability of methods and lesser computational power are required.

Conclusion

We want to observe that the first stages of the development of NLP technologies were based on their classical theories, thanks to which there is an opportunity to comprehend and manipulate human languages computationally.

Simple principles such as rules, statistical principles, and classical machine learning classification engraving made natural language processing development possible in its earlier days. Indeed, from an era dominated by deep learning, this survives as one of the other major branches of AI tradition, which builds towards what the more dynamic deep learning techniques can achieve. The good news is that they still stand the test of time, and when paired with other specific contexts, they can be practical and powerful resources for many NLP tasks.

Leave a Comment