Traditional Approach to Natural Language Processing

NLP: Definition and Explanation

Natural Language Processing (NLP) is defined as a field of computer science and linguistics that attempts to get machines the capability of comprehending, interpreting and producing human language. In NLP’s initial stages, the common and traditional approaches used were the rule-based and the statistical. These approaches formed the basis of the various kinds of NLP that are present today but have evolved into more sophisticated neural networks or deep learning models. But even as NLP has evolved, there are considerations on the use of traditional approaches, particularly when the focus is on interpretability, resource optimization or specific language characteristics.

This document studies the traditional NLP techniques of choice in great detail, explaining the methods, fields and areas of application, benefits and also weaknesses with examples where appropriate.

Traditional Approach to Natural Language Processing

Traditional Approach to Natural Language Processing

1. Medieval NLP: Methods and Strategies in Use

The traditional forms of NLP methods have two approaches, rule based and statistical approach. In rule based approaches, linguists and domain experts develop a set of synthetic rules that is based on grammar of language. On the contrary, statistical NLP employs a mathematical model based on large data text that has a derived probability. The other form of traditional NLP methods are early machine learning algorithms which do not employ deep learning, for instance Naive Bayes and Support vector machines (SVM).

2. Rule-Based Methods

Natural Language processing using a rule-based approach is dependent on rules that linguists make, spanning across the syntax, morphology and semantics of the language. By devising these rules, it allows the computations to interpret language constructs in a more organized way.

Regular Expressions

Regular expressions (regex) are strings of characters which sets out a pattern to all searches made upon text featured in strings, primarily to search. Email and phone numbers, as well as text patterns, can be automated with regex. While they are not in-depth linguistically, since regular expressions are built upon patterns within languages, they are great for string pattern matching.

Example:

Let’s say I want to retrieve all the email addresses present in a document with the above text:

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

This pattern will retrieve anything that is anywhere that contains the characters example@domain.com or info@company.co

Grammar-Based Parsing

All the types of language parsing that don’t involve machine learning are based on some form of language rules called grammars. These grammars would include things such as subject, verb, and object. By systematically applying these rules, sentences can be deconstructed into their component parts.

Example:

“Subject: The Cat – Predicate: sat on the mat” would be an expected output as a result of parsing the actual language “The cat sat on the mat.” In addition, “the cat sat on the mat” would have a tree structure with “the cat” designated to the subject while “sat on the mat” would appropriately fit under the predicate.

The recursive descent parser would apply rules recursively so as to decompose the given sentence into its constituents according to the grammar rules.

Rule-based systems have the following advantages:

Rule-based systems are interpretable and predictable.

They follow structured tasks which don’t have much variety.

However, there are some demerits of Rule-based systems:

Rules have to be devised and updated by practitioners which is an intricate process.

But such systems have limitations in dealing with ambiguity and variability of natural languages.

3. Statistical NLP Techniques

Statistical NLP Techniques involve the use of probabilities and statistical models in the manipulation and understanding of language data. These models focus on word frequency and word sequence probability thus performing NLP functions such as language modeling, text classification and so forth.

n-gram Models

n-gram models are probabilistic measures that estimate the probability of a word based on previously mentioned words. An n-gram of order ‘n’ takes n words on focus where “n” describes the quantity in figures where; a n-gram with n=1 looks at single words, n=2 considers pairs, and n-3 threes, and so on.

Example:

A bigram model assigns probability estimations to the sentence ‘I love NLP’ which approximates the sentence’s estimation as:

P("I love NLP")=P("I")×P("love"∣"I")×P("NLP"∣"love")


Applications: Used in predictive text, speech recognition and earlier machine translation models.

A Hidden Markov Model (HMMs)

Commonly used in speech recognition and POS tagging, Hidden Markov Models are statistical models of sequences of observations. The strengths are based on the hidden process a set of observed data stems from. Gaps exist between what the states are in NLP and what are the actual words. From the data shown 2 distinct circles appear. The great one displays the unknown view and the lower one the overwhelming piece of facts.

Example:

In every sentence, there is a specific hidden part of the speech tagging for each specific word. If we have a sentence which is “the cat sat,” a model could employ the rule determining the maximum likelihood or rather the most likely sequence of three finite tags based on the data.

The Advantages of Statistical Models:

Statistical methods efficiently deal with dramatically larger datasets. In addition,

They are able to capture variation better than strictly rule-based strategies.

The Limitations of Statistical Models:

The full dependency on the annotated dataset poses a problem.

The semantic assumptions of the probability-based models are lacking at times.

4. Machine Learning Applied in NLP

NLP accustomed in using traditional machine learning algorithms before deep learning stepped in the ring. It encompasses techniques such as the Naive Bayes algorithm, Support Vector Machines and Logistic Regression which do predictions based on feature extraction & engineering.

Naive Bayes

Naive Bayes is a generative and predictive statistical model, which considers that the relation between the input features is independent of each other and hence makes the calculation easier. In NLP, Naive Bayes can be implemented in classifying different types of text i.e. it is used for filtering spam messages or analyzing human sentiments.

Example:

For email firewall spam detection, the spam filter calculates the probability of email spam by looking at words contained. For example, when emailing, if win or free words will be most likely applied which probably meant the email is spam and hence, some clients with these words will be marked as spam.

Support Vector Machines

SVMs can be described as a type of supervised learning where there exists a dividing line between two datasets with labels. Finding a hyperplane that is optimal for classifying data points that belong to different classes is the main aim of SVM. In the case of the abuse of point deformation for the introduction of a more complex model, SVM does it poorly, because in text classification tasks a high degree of interpretability is needed.

Example:

An SVM could categorize articles like sports news and political news. By rotating every news article into some point in a multidimensional space and a hyperplane could be classified to be in between these different articles of news.

Logistic Regression

Logistic regression is a statistical tool that is characterized by a binary outcome. It is a little different from the other models in that there is a specific function that seeks to predict the occurrence of a subset. That is if there exists a numerical predictor, then logistic regression is suitable while predicting outcomes where only two or more classes exist for example in constructing a spam message classifier, sentiment analysis, etc.

Example:

According to sentiment analysis, a review can be termed positive or negative from the presence of defined words through logistic regression.

Advantages of Traditional Machine Learning Models:

The models are much easier to train and explain.

They do well with smaller data sets especially with excellent features.

Limitations of Traditional Machine Learning Models:

They often need extensive feature engineering to obtain results.

They are unable to capture details of complex linguistic nuances as Deep Learning models can.

5. Examples of Applications Using Traditional NLP

Spam Detection

Spam emails can easily be detected by Naïve Bayes classifiers that use prior knowledge about spam’s properties through analyzing word frequencies in the emails.

Sentiment Analysis

If the text has a score of the number of words that are known to be positive or negative, logistic regression and Naïve Bayes classifiers can decide whether the text is conveying positive or negative sentiment.

Machine Translation

The first translation systems used n-gram models and statistical relationships to determine the probable translation which formed the basic structure around machine translation, SMT systems.

Named Entity Recognition (NER)

Early NER tasks relied mostly on HMMs and rule-based methods which performed text analysis to point out chunks of the text that coded for names, places, dates, etc.

Speech Recognition

HMMs were fundamental for the early automatic speech recognition systems that were probabilistically based on audio feature sequences of words.

6. Barriers and Shortcomings

There are serious shortcomings within the conventional natural language processing frameworks.

Data Dependency: It is evident that the statistical and machine learning techniques require a significant amount of annotated data if they are to achieve high accuracy.

Limited Scope: There are logical limitations within n-gram and Markov models since they can be unable to capture elaborate figurative language and large-scope relationships.

Graphical Feature Extraction: Though very useful, machine learning approaches rely on extracted and prepared specific features which is tedious and highly sector specific.

Scalability: Developments involving rule based systems have also recorded challenges in scaling across different languages and domains due to the variety of languages.

How to resolve failed to pull Helm chart

7. Comparison to Contemporary NLP Methods

Although old strategies laid solid foundations, present-day processes make use of deep learning which is offered by networks such as transformers, bidirectional encoder representations from transformers and generative pre-trained transformer structures among many others. The performance of these sophisticated models is enhanced in areas such as machine translation and summarization of text because these models are able to learn contextual, ambiguous and long-dependent relationships. Traditional methods remain relevant especially in settings where resources are limited and where interpretability of methods as well as lesser computational power is required.

Conclusion

We would like to observe that the first stages of development of NLP technologies were based on their classical theories, thanks to which there is an opportunity to comprehend and manipulate human languages in a computational way.

Actually, quite simple principles such as rules, statistical principles, and classical machine learning classification engraving made it possible for natural language processing development in its earlier days. Indeed, from an era dominated by deep learning this survives as one of the other major branches of AI tradition, in fact, builds towards what the more dynamic deep learning techniques can achieve. The good news is that they still stand the test of time and when paired with other specific context they can be effective and powerful resources for many NLP tasks.

Leave a Comment