# Dimensionality Reduction in Machine Learning:Top 5 techniques you must know

August 9, 2023 | by simpletechtales.com

## Table of Contents

**Introduction**

Dimensionality reduction in machine learning is a crucial process. As datasets grow in size and complexity, the number of features or dimensions can become overwhelming. Dimensionality reduction is like a magic trick that helps simplify data without losing important information. This blog explores what dimensionality reduction is, why it’s essential, and five super cool techniques with examples and Python code to simplify data.

**What is Dimensionality Reduction?**

Imagine you have a bunch of features or characteristics about something, like the color, size, and shape of a fruit. Dimensionality reduction is like putting on special glasses that help you see the most crucial stuff while ignoring the less important details. It’s like turning a ton of data into a more manageable and understandable version. It seeks to improve the performance of machine learning models by simplifying data representation, removing unnecessary or redundant features, and doing so.

**Why Do We Need Dimensionality Reduction in Machine Learning?**

**1. Curse of Dimensionality**

The curse of dimensionality refers to the challenge we face when dealing with datasets that have a lot of features (dimensions). As the number of features increases, the amount of data required to adequately cover all those dimensions grows exponentially. Understanding and analyzing the data becomes challenging because we might not have enough data to fully explore all the features. The curse of dimensionality can lead to problems like overfitting in machine learning or difficulties in visualizing and comprehending the data. To handle this curse, we often use techniques for dimensionality reduction in machine learning to simplify the data, making it more manageable and easier to work with.

### 2. **Computational Efficiency**

High-dimensional data processing requires greater computational power and processing time. We can considerably increase processing performance by lowering dimensionality.

### 3. **Visualization**

Data visualization is like using a treasure map to find what you’re looking for. Dimensionality reduction in machine learning helps create those maps, showing us patterns and relationships that are hard to spot otherwise.

### 4. **Noise Reduction**

Data with high dimensions may include noise or unimportant information. Dimensionality reduction aids in reducing the influence of noise and concentrating on essential patterns.

**5. Improved Model Performance**

Reducing dimensionality can lead to better model generalization and performance, especially when dealing with limited data or overfitting.

**Five Most Important ****Techniques** for Dimensionality Reduction in Machine Learning

**Techniques**for Dimensionality Reduction

### 1. **Principal Component Analysis (PCA)**

Consider a dataset containing a large number of features that describe various characteristics of an object, such as its size, color, weight, etc. Although some of these features might not significantly increase the total variability of the data, others may be related to one another. For PCA to function, a new coordinate system must be discovered in which the first principle component explains the majority of the variance in the data. The second most significant variance is captured by the second principal component, and so on.

**Steps of PCA**

**Standardization**: Standardizing the data by taking away the mean and dividing by the standard deviation for each feature is necessary before using PCA. This step makes sure that all features are scaled equally.**Covariance Matrix**: The covariance matrix of the standardized data is then computed by PCA. The covariance matrix calculates the variance of each feature about the others.**Eigendecomposition**: PCA then performs eigendecomposition on the covariance matrix. The eigenvectors represent the directions of the principal components, and the corresponding eigenvalues indicate the amount of variance captured in those directions**Selecting Principal Components**: PCA chooses the top k eigenvectors (principal components) that account for the majority of the variance in the data after sorting the eigenvalues in descending order. The level of dimensionality reduction desired will influence the choice of k.**Transformation**: To obtain the lower-dimensional representation of the data, PCA finally projects the original data onto the primary components that were chosen.

```
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Step 1: Generate random data with three features (3D data)
np.random.seed(0)
num_samples = 100
mean = [0, 0, 0]
covariance_matrix = [[2, 0.5, 0.3], [0.5, 1, 0.2], [0.3, 0.2, 0.5]]
data = np.random.multivariate_normal(mean, covariance_matrix, num_samples)
# Step 2: Standardization
mean_centered_data = data - np.mean(data, axis=0)
std_data = mean_centered_data / np.std(mean_centered_data, axis=0)
# Step 3: Compute Covariance Matrix
cov_matrix = np.cov(std_data.T)
# Step 4: Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# Sort eigenvectors and eigenvalues in descending order
sorted_indices = np.argsort(eigenvalues)[::-1]
sorted_eigenvalues = eigenvalues[sorted_indices]
sorted_eigenvectors = eigenvectors[:, sorted_indices]
# Step 5: Selecting Principal Components
num_components = 2
principal_components = sorted_eigenvectors[:, :num_components]
# Step 6: Transform Data to Lower-Dimensional Space
reduced_data = std_data.dot(principal_components)
# Plotting the original data in 3D
fig = plt.figure(figsize=(10, 5))
# 3D subplot for original data
ax1 = fig.add_subplot(121, projection='3d')
ax1.scatter(std_data[:, 0], std_data[:, 1], std_data[:, 2], label='Original Data')
for component in principal_components.T:
ax1.quiver(0, 0, 0, component[0], component[1], component[2],
color='r', label='Principal Component')
ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
ax1.set_zlabel('Feature 3')
ax1.legend()
ax1.set_title('Original Data with Principal Components')
# 2D subplot for reduced data
ax2 = fig.add_subplot(122)
ax2.scatter(reduced_data[:, 0], reduced_data[:, 1], label='Reduced Data')
ax2.set_xlabel('Principal Component 1')
ax2.set_ylabel('Principal Component 2')
ax2.legend()
ax2.set_title('Reduced Data')
plt.tight_layout()
plt.show()
```

### 2. **t-distributed Stochastic Neighbor Embedding (t-SNE**)

Imagine a dataset containing a large number of features, each of which represents a particular property of an entity or phenomenon. It becomes challenging to see and comprehend the connections between these qualities. The local structure of the data might not be preserved by conventional approaches like PCA, although they can help reduce dimensionality. This is where t-SNE can save the day.

**Understanding t-SNE**

**Probability Distribution**: For the data in both the original high-dimensional space and the lower-dimensional space, t-SNE first defines a probability distribution. Finding a mapping that maintains the commonalities between data points is the objective.**Similarity Measurement**: Based on the feature distances between the data points, t-SNE determines how similar they are. It employs a Student’s t-distribution in the lower-dimensional space and a Gaussian distribution to evaluate similarity in the original space**Mapping to Lower Dimension**: The procedure iteratively maps data points to the lower-dimensional space to minimize the divergence between the probability distributions in both regions. It keeps the intimate connections between nearby locations and brings related spots together.

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.manifold import TSNE
# Step 1: Load the Iris dataset
data = load_iris()
X = data.data
y = data.target
# Step 2: Perform t-SNE with 2 components
tsne = TSNE(n_components=2, random_state=42)
reduced_data = tsne.fit_transform(X)
# Step 3: Plot the reduced data
plt.figure(figsize=(8, 6))
for i in range(len(np.unique(y))):
plt.scatter(reduced_data[y == i, 0], reduced_data[y == i, 1], label=data.target_names[i])
plt.xlabel('t-SNE Component 1')
plt.ylabel('t-SNE Component 2')
plt.title('t-SNE Visualization of Iris Dataset')
plt.legend()
plt.show()
```

### 3. **Linear Discriminant Analysis (LDA): Uncovering Class Separability for Better Classification**

In order to better achieve class separation in a classification task, we can select the most discriminative features with the aid of linear discriminant analysis (LDA), a potent dimensionality reduction technique. LDA is a supervised algorithm, which means it uses class labels throughout its training process, in contrast to other dimensionality reduction techniques.

Imagine you have a dataset with multiple classes, and each class is represented by a set of features. However, some features may be more informative in distinguishing between classes than others. LDA steps in to find the best combination of features that maximize class separability, leading to more accurate and robust classification models.

**Understanding LDA**

**Within-Class and Between-Class Scatter**: LDA calculates both the scatter of data between various classes (between-class scatter) and the scatter of data within each class (within-class scatter). It seeks to simultaneously maximize between-class scatter and reduce within-class scatter.**Transformation Matrix**: LDA computes a transformation matrix that projects the data onto a new subspace, where the class separation is optimized.**Eigenvalues and Eigenvectors**: The scatter matrices are subjected to eigendecomposition by LDA, which then chooses the eigenvectors corresponding to the biggest eigenvalues. The new axes of the subspace are these eigenvectors**Mapping to Lower Dimension**: LDA successfully reduces the number of dimensions while retaining class separability by projecting the data onto the new subspace.

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
# Step 1: Load the Iris dataset
data = load_iris()
X = data.data
y = data.target
# Step 2: Perform LDA with 2 components
lda = LDA(n_components=2)
reduced_data = lda.fit_transform(X, y)
# Step 3: Plot the reduced data
plt.figure(figsize=(8, 6))
for i in range(len(np.unique(y))):
plt.scatter(reduced_data[y == i, 0], reduced_data[y == i, 1], label=data.target_names[i])
plt.xlabel('LDA Component 1')
plt.ylabel('LDA Component 2')
plt.title('LDA Visualization of Iris Dataset')
plt.legend()
plt.show()
```

### 4. Autoencoders

Autoencoders are a type of neural network architecture designed to learn efficient representations of data by compressing and then reconstructing it. Applications for these self-learning networks include feature extraction, noise reduction, dimensionality reduction, and even the creation of new data.

**Mechanics of Autoencoders**

Fundamentally, autoencoders figure out how to convert input data into a representation with fewer dimensions known as the “latent space” or “code.” The fundamental components of the data are preserved by this code. Encoding and decoding are the two essential processes in the process.

**Encoder**: The encoder network converts the input data into a representation with less dimensions. It is made up of one or more dense, fully connected hidden layers. The dimensionality of the latent space is dependent on the number of neurons in the bottleneck layer**Bottleneck Layer**: An essential component of the autoencoder system is the bottleneck layer. In the latent space, it stores the encoded version of the data. The network is forced to only record the most prominent features because this layer has fewer neurons than the input and output layers.**Decoder**: The original data is then recreated by the decoder using the encoded representation. It has hidden layers, just as the encoder, that gradually expand the dimensions to match the original input.

```
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from keras.layers import Input, Dense
from keras.models import Model
# Load the Wine dataset
data = load_wine()
X = data.data
y = data.target
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Normalize the data
X_train = (X_train - X_train.min()) / (X_train.max() - X_train.min())
X_test = (X_test - X_test.min()) / (X_test.max() - X_test.min())
# Build the autoencoder model
input_data = Input(shape=(X.shape[1],))
encoded = Dense(2, activation='relu')(input_data) # Reduced dimensionality to 2
decoded = Dense(X.shape[1], activation='sigmoid')(encoded)
autoencoder = Model(input_data, decoded)
# Compile the autoencoder model
autoencoder.compile(optimizer='adam', loss='mean_squared_error')
# Train the autoencoder on the training data
autoencoder.fit(X_train, X_train, epochs=100, batch_size=16, shuffle=True, validation_data=(X_test, X_test))
# Build the encoder model for dimensionality reduction
encoder = Model(input_data, encoded)
# Encode the test data
encoded_data = encoder.predict(X_test)
# Plot the original and encoded data side by side
plt.figure(figsize=(15, 6))
# Original data scatter plot
plt.subplot(1, 2, 1)
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='viridis')
plt.colorbar()
plt.title('Original Wine Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
# Encoded data scatter plot
plt.subplot(1, 2, 2)
plt.scatter(encoded_data[:, 0], encoded_data[:, 1], c=y_test, cmap='viridis')
plt.colorbar()
plt.title('Dimensionality-Reduced Wine Data using Autoencoder')
plt.xlabel('Encoded Feature 1')
plt.ylabel('Encoded Feature 2')
plt.tight_layout()
plt.show()
```

### 5. Random Projection

Random Projection is a dimensionality reduction technique that offers a unique approach to simplifying high-dimensional data. By randomly translating data into a lower-dimensional space, Random Projection takes a less complicated approach than elaborate algorithms that necessitate complex computations. This method is an effective tool for data pretreatment and analysis because it has the remarkable ability to preserve relationships and structure within the data.

**Understanding Random Projection**

**Random Matrix Creation**: The intended lower-dimensional space’s dimensions are used to build a random matrix. This matrix’s components are chosen randomly, frequently according to a Gaussian or uniform distribution.**Projecting Data**: Each data point is multiplied by the random matrix. This projects the data into the lower-dimensional space**Preserving Structure**: Despite its simplicity, Random Projection surprisingly preserves pairwise distances between data points. This is vital for preserving clusters and neighborhoods.

```
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.random_projection import GaussianRandomProjection
from sklearn.decomposition import PCA
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target
# Apply Random Projection
random_projection = GaussianRandomProjection(n_components=2)
X_random_projected = random_projection.fit_transform(X)
# Apply PCA for comparison
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
# Plot the results
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.scatter(X_random_projected[:, 0], X_random_projected[:, 1], c=y, cmap='viridis')
plt.title('Random Projection')
plt.xlabel('Random Projection Component 1')
plt.ylabel('Random Projection Component 2')
plt.subplot(1, 2, 2)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis')
plt.title('PCA')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.tight_layout()
plt.show()
```

## Conclusion

In order to handle high-dimensional data, lower computational complexity, and enhance the performance of machine learning models, dimensionality reduction approaches are essential. These methods enable us to obtain insights, see complex relationships, and utilize computer resources more effectively by translating data into a lower-dimensional space. Dimensionality reduction in machine learning is a crucial tool in the arsenal of data scientists, whether it’s the well-known PCA or the cutting-edge t-SNE.

Though these methods have many advantages, keep in mind that the specific traits and goals of your data and the current challenge should guide your choice of dimensionality reduction strategy. Happy exploring and simplifying your data with dimensionality reduction!

Please can drop a comment or can go to *contact page* for any queries. Also please do let me know if you want me to cover some topics in my blog

## RELATED POSTS

View all