torchfry.transforms

Module contents

Implementation of Fastfood and Random Kitchen Sink custom PyTorch layers.

class torchfry.transforms.FastfoodLayer(input_dim, output_dim, scale=1, learn_S=False, learn_G=False, learn_B=False, device=None, nonlinearity=True, hadamard=None)[source]

Bases: Module

Implementation of Fastfood transformation layer for efficient random feature mapping.

This layer approximates a dense random projection using the Fastfood algorithm, which utilizes structured matrices (Hadamard, diagonal random, permutation matrices) to reduce time complexity from Random Kitchen Sink’s \(O(nd)\) to \(O(n \log d)\) and space complexity from \(O(n^2)\) to \(O(n)\), where \(d\) is the input_dim and \(n\) is the output_dim.

Parameters:

input_dim (int) – The input data feature dimension. (\(d\))
output_dim (int) – The output dimension to be projected into. (\(n\))
scale (float) – Scalar factor for normalization. (\(\sigma\))
learn_S (bool) – If \(S\) matrix is to be learnable.
learn_G (bool) – If \(G\) matrix is to be learnable.
learn_B (bool) – If \(B\) matrix is to be learnable.
device (torch.device) – The device on which computations will be performed.
nonlinearity (bool) – If True, apply nonlinearity of \(cos(Vx + u)\).
hadamard (str) – Type of hadamard function desired: Dao, matrix multiplication, or Recursive FWHT (Dao, Matmul, PyTorch).

Notes

\[Vx = \frac{1}{\sigma \sqrt{d}} SHG \Pi HB\]

\(S\): Diagonal scaling matrix, allows our rows of \(V\) to be independent of one another. For Fastfood, this helps us match the radial shape from an RBF Kernel.

\(H\): Hadamard function is a square symmetric matrix of \(1\) and \(-1\) where each column is orthogonal. torchfry comes with three options for Hadamard:

Matmul explicitly builds a Hadamard matrix of size \(d \times d\) and stores it to memory, which will be used for matrix multiplication of the input against this matrix to achieve the Hadamard transform. This implementation is extremely quick when leveraging strong GPUs, but requires storing a large matrix.

Dao is adapted from https://github.com/Dao-AILab/fast-hadamard-transform, which is written in CUDA and provides a PyTorch interface to leverage GPU power even further.

PyTorch repeatedly splits the input tensor into pairs, sums and subtracts them, and concatenates the results step-by-step, applying the Hadamard transform efficiently without explicitly creating the matrix.

\(G\): Diagonal Gaussian matrix. Data sampled from a normal distribution with variance proportional to the dimension of the input data.

\(\Pi\): Applies a permutation to randomize the order of the rows. After the second Hadamard is applied, the rows are independent of one another.

\(B\): Diagonal binary matrix, drawn from a \(\{-1,+1\}\), helps input data become dense.

When nonlinearity is used, the layer is computed as:

\[\cos(Vx + u)\]

References

Examples

A simple example of the Fastfood layer on a linear regression dataset with noise.

>>> import torch
>>> import torch.nn as nn
>>> from torchfry.transforms import FastfoodLayer
>>>
>>> device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
>>>
>>> # Linear regression with noise
>>> x = torch.randn(128, 1, device=device)
>>> y = 2*x + 3 + 0.1*torch.randn(128, 1, device=device)
>>>
>>> model = nn.Sequential(
>>>     FastfoodLayer(1, 512, scale=1, learn_B=True, learn_G=True, learn_S=True, device=device, hadamard="Torch"),
>>>     FastfoodLayer(512, 512, scale=1, learn_B=True, learn_G=True, learn_S=True, device=device, hadamard="Torch"),
>>>     nn.Linear(512, 1)).to(device)
>>>
>>> criterion = nn.MSELoss()
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
>>>
>>> # Training loop for 10 epochs
>>> epochs = 10
>>> for epoch in range(epochs):
>>>     # model.train()
>>>     optimizer.zero_grad()
>>>     y_pred = model(x)
>>>     loss = criterion(y_pred, y)
>>>     loss.backward()
>>>     optimizer.step()
>>>     print(f'Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}')
Epoch [1/10], Loss: 14.2901
Epoch [2/10], Loss: 14.2365
Epoch [3/10], Loss: 14.2638
Epoch [4/10], Loss: 14.2251
Epoch [5/10], Loss: 14.2316
Epoch [6/10], Loss: 14.0655
Epoch [7/10], Loss: 14.0691
Epoch [8/10], Loss: 14.0698
Epoch [9/10], Loss: 14.0007
Epoch [10/10], Loss: 13.9704

forward(x)[source]

Applies the Fastfood transform to the input tensor following the Fastfood formula from the cited paper.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, input_dim).
Returns:: x – Transformed tensor of shape (batch_size, output_dim) after projection, optionally passed through a cosine-based nonlinearity if enabled.
Return type:: torch.Tensor

new_feature_map(dtype)[source]

Sample new permutation and scaling matrices for the Fastfood feature map.

This function initializes the permutation matrix \(P\), the binary scaling matrix \(B\), the Gaussian scaling matrix \(G\), and the scaling matrix \(S\) based on the learnable parameters.

Parameters:: dtype (torch.dtype) – Specifies the precision of the floats.

phi(x)[source]

Apply random Fourier feature mapping using cosine transformation:

\[\cos(Vx + u)\]

This operation adds a random phase shift to the input tensor and applies a cosine nonlinearity, effectively projecting the data into a randomized feature space for kernel approximation.

Parameters:: x (torch.Tensor) – Input tensor that will be transformed.
Returns:: x – Output tensor of the same shape after normalization.
Return type:: torch.Tensor

class torchfry.transforms.RKSLayer(input_dim, output_dim, scale, learn_G=False, device=None, nonlinearity=True)[source]

Bases: Module

Implementation of the Random Kitchen Sink layer for efficient random feature mapping.

This layer approximates a dense random projection using the Random Kitchen Sink algorithm, which utilizes a random Gaussian matrix. The layer explicitly builds a dense matrix of random Gaussian noise for this. If no nonlinearity is applied, this is simply a linear layer. The scale is matched to FastfoodLayer as well.

Parameters:

input_dim (int) – The input data feature dimension. (\(d\))
output_dim (int) – The output dimension to be projected into. (\(n\))
scale (float) – Scalar factor for normalization. (\(\sigma\))
learn_G (bool) – If True, allows the random Gaussian matrix \(G\) to be learnable.
nonlinearity (bool) – If True, apply nonlinearity of \(cos(Vx + u)\).
device (torch.device) – The device on which computations will be performed.

References

Examples

A simple example of the Random Kitchen Sink layer on a linear regression dataset with noise.

>>> import torch
>>> import torch.nn as nn
>>> from torchfry.transforms import RKSLayer
>>>
>>> device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
>>>
>>> # Linear regression with noise
>>> x = torch.randn(128, 1, device=device)
>>> y = 2*x + 3 + 0.1*torch.randn(128, 1, device=device)
>>>
>>> model = nn.Sequential(
>>>     RKSLayer(1, 512, scale=1, learn_G=True, device=device),
>>>     RKSLayer(512, 512, scale=1, learn_G=True, device=device),
>>>     nn.Linear(512, 1)
>>> ).to(device)
>>>
>>> criterion = nn.MSELoss()
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
>>>
>>> # Training loop for 10 epochs
>>> epochs = 10
>>> for epoch in range(epochs):
>>>     # model.train()
>>>     optimizer.zero_grad()
>>>     y_pred = model(x)
>>>     loss = criterion(y_pred, y)
>>>     loss.backward()
>>>     optimizer.step()
>>>     print(f'Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}')
Epoch [1/10], Loss: 13.3238
Epoch [2/10], Loss: 13.1642
Epoch [3/10], Loss: 13.3305
Epoch [4/10], Loss: 12.9485
Epoch [5/10], Loss: 13.1686
Epoch [6/10], Loss: 12.8200
Epoch [7/10], Loss: 13.2698
Epoch [8/10], Loss: 12.9570
Epoch [9/10], Loss: 13.0187
Epoch [10/10], Loss: 13.1325

forward(x)[source]

Applies the Random Kitchen Sink transform to the input tensor by performing matrix multiplication against the random Gaussian matrix, optionally followed by cosine nonlinearity.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, input_dim).
Returns:: X – Transformed tensor of shape (batch_size, output_dim) after projection, optionally passed through a cosine-based nonlinearity if enabled.
Return type:: torch.Tensor

phi(x)[source]

Apply random Fourier feature mapping using cosine transformation:

\[\cos(Vx + u)\]

This operation adds a random phase shift to the input tensor and applies a cosine nonlinearity, effectively projecting the data into a randomized feature space for kernel approximation.

Parameters:: x (torch.tensor) – Input tensor that will be transformed.
Returns:: x – Output tensor of the same shape after normalization.
Return type:: torch.tensor