Implementation of Fastfood transformation layer for efficient random feature mapping.
This layer approximates a dense random projection using the Fastfood algorithm,
which utilizes structured matrices (Hadamard, diagonal random, permutation matrices)
to reduce time complexity from Random Kitchen Sink’s \(O(nd)\) to
\(O(n \log d)\) and space complexity from \(O(n^2)\) to \(O(n)\), where
\(d\) is the input_dim and \(n\) is the output_dim.
Parameters:
input_dim (int) – The input data feature dimension. (\(d\))
output_dim (int) – The output dimension to be projected into. (\(n\))
scale (float) – Scalar factor for normalization. (\(\sigma\))
learn_S (bool) – If \(S\) matrix is to be learnable.
learn_G (bool) – If \(G\) matrix is to be learnable.
learn_B (bool) – If \(B\) matrix is to be learnable.
device (torch.device) – The device on which computations will be performed.
nonlinearity (bool) – If True, apply nonlinearity of \(cos(Vx + u)\).
hadamard (str) – Type of hadamard function desired: Dao, matrix multiplication, or Recursive
FWHT (Dao, Matmul, PyTorch).
Notes
\[Vx = \frac{1}{\sigma \sqrt{d}} SHG \Pi HB\]
\(S\): Diagonal scaling matrix, allows our rows of \(V\) to be independent of
one another. For Fastfood, this helps us match the radial shape from an RBF Kernel.
\(H\): Hadamard function is a square symmetric matrix of \(1\) and \(-1\)
where each column is orthogonal. torchfry comes with three options for Hadamard:
Matmul explicitly builds a Hadamard matrix of size \(d \times d\) and
stores it to memory, which will be used for matrix multiplication of the input
against this matrix to achieve the Hadamard transform. This implementation is
extremely quick when leveraging strong GPUs, but requires storing a large matrix.
PyTorch repeatedly splits the input tensor into pairs, sums and subtracts
them, and concatenates the results step-by-step, applying the Hadamard transform
efficiently without explicitly creating the matrix.
\(G\):
Diagonal Gaussian matrix. Data sampled from a normal distribution with variance
proportional to the dimension of the input data.
\(\Pi\):
Applies a permutation to randomize the order of the rows. After the second
Hadamard is applied, the rows are independent of one another.
\(B\):
Diagonal binary matrix, drawn from a \(\{-1,+1\}\), helps input data become dense.
When nonlinearity is used, the layer is computed as:
\[\cos(Vx + u)\]
References
Examples
A simple example of the Fastfood layer on a linear regression dataset with noise.
Sample new permutation and scaling matrices for the Fastfood feature map.
This function initializes the permutation matrix \(P\), the binary scaling
matrix \(B\), the Gaussian scaling matrix \(G\), and the scaling matrix
\(S\) based on the learnable parameters.
Parameters:
dtype (torch.dtype) – Specifies the precision of the floats.
Apply random Fourier feature mapping using cosine transformation:
\[\cos(Vx + u)\]
This operation adds a random phase shift to the input tensor and applies
a cosine nonlinearity, effectively projecting the data into a randomized
feature space for kernel approximation.
Parameters:
x (torch.Tensor) – Input tensor that will be transformed.
Returns:
x – Output tensor of the same shape after normalization.
Implementation of the Random Kitchen Sink layer for efficient random feature mapping.
This layer approximates a dense random projection using the Random Kitchen Sink
algorithm, which utilizes a random Gaussian matrix. The layer explicitly builds a
dense matrix of random Gaussian noise for this. If no nonlinearity is applied, this
is simply a linear layer. The scale is matched to FastfoodLayer as well.
Parameters:
input_dim (int) – The input data feature dimension. (\(d\))
output_dim (int) – The output dimension to be projected into. (\(n\))
scale (float) – Scalar factor for normalization. (\(\sigma\))
learn_G (bool) – If True, allows the random Gaussian matrix \(G\) to be learnable.
nonlinearity (bool) – If True, apply nonlinearity of \(cos(Vx + u)\).
device (torch.device) – The device on which computations will be performed.
References
Examples
A simple example of the Random Kitchen Sink layer on a linear regression dataset with noise.
Applies the Random Kitchen Sink transform to the input tensor by performing
matrix multiplication against the random Gaussian matrix, optionally followed
by cosine nonlinearity.
Parameters:
x (torch.Tensor) – Input tensor of shape (batch_size, input_dim).
Returns:
X – Transformed tensor of shape (batch_size, output_dim) after projection,
optionally passed through a cosine-based nonlinearity if enabled.
Apply random Fourier feature mapping using cosine transformation:
\[\cos(Vx + u)\]
This operation adds a random phase shift to the input tensor and applies
a cosine nonlinearity, effectively projecting the data into a randomized
feature space for kernel approximation.
Parameters:
x (torch.tensor) – Input tensor that will be transformed.
Returns:
x – Output tensor of the same shape after normalization.