Run this notebook: Open in Colab Open in Kaggle

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

1. Implicit Neural Representations¶

Concept:¶

Represent signal as continuous function:

\[f_\theta: \mathbb{R}^d \rightarrow \mathbb{R}^c\]

For image: \(f_\theta(x, y) \rightarrow (r, g, b)\)

SIREN Activation:¶

\[\phi(x) = \sin(\omega_0 x)\]

📚 Reference Materials:

cv_3d_foundation.pdf - Cv 3D Foundation
cv_3d_research.pdf - Cv 3D Research

class SineLayer(nn.Module):
    """Sine activation with special initialization."""
    
    def __init__(self, in_features, out_features, omega_0=30.0, is_first=False):
        super().__init__()
        self.omega_0 = omega_0
        self.linear = nn.Linear(in_features, out_features)
        
        # Special initialization
        if is_first:
            self.linear.weight.uniform_(-1 / in_features, 1 / in_features)
        else:
            bound = np.sqrt(6 / in_features) / omega_0
            self.linear.weight.uniform_(-bound, bound)
    
    def forward(self, x):
        return torch.sin(self.omega_0 * self.linear(x))

print("SineLayer defined")

SIREN Network¶

SIREN (Sinusoidal Representation Networks) uses \(\sin\) activation functions instead of ReLU, which gives the network and all its derivatives continuous, well-behaved structure. The key insight is that sinusoidal activations are their own derivatives (up to scaling), so a SIREN network is equally expressive for representing a function and its spatial gradients, Laplacian, or any higher-order differential quantity. The initialization scheme is critical: weights are drawn from \(\mathcal{U}(-\sqrt{6/n}, \sqrt{6/n})\) (except the first layer which uses a frequency parameter \(\omega_0\) scaling) to preserve the distribution of activations through the network. SIRENs have achieved state-of-the-art results for tasks ranging from image fitting to solving partial differential equations.

class SIREN(nn.Module):
    """Sinusoidal Representation Network."""
    
    def __init__(self, in_features, hidden_features, hidden_layers, out_features, omega_0=30.0):
        super().__init__()
        
        # First layer
        layers = [SineLayer(in_features, hidden_features, omega_0, is_first=True)]
        
        # Hidden layers
        for _ in range(hidden_layers - 1):
            layers.append(SineLayer(hidden_features, hidden_features, omega_0))
        
        # Final layer
        final_linear = nn.Linear(hidden_features, out_features)
        bound = np.sqrt(6 / hidden_features) / omega_0
        final_linear.weight.uniform_(-bound, bound)
        layers.append(final_linear)
        
        self.net = nn.Sequential(*layers)
    
    def forward(self, coords):
        return self.net(coords)

print("SIREN defined")

Image Fitting¶

In the image fitting task, the SIREN learns to map pixel coordinates \((x, y)\) directly to RGB color values, representing an entire image as a continuous function. The training data is simply all (coordinate, color) pairs from the target image, and the loss is MSE between predicted and actual colors. Unlike a discrete pixel grid, the learned continuous representation can be evaluated at arbitrary coordinates – enabling resolution-independent rendering and smooth spatial interpolation. This is the simplest demonstration of implicit neural representations, and the same principle extends to 3D shapes, radiance fields, and video.

def get_coordinates(H, W):
    """Generate normalized coordinate grid."""
    y = torch.linspace(-1, 1, H)
    x = torch.linspace(-1, 1, W)
    yy, xx = torch.meshgrid(y, x, indexing='ij')
    coords = torch.stack([xx, yy], dim=-1)
    return coords.view(-1, 2)

def create_test_image(size=64):
    """Create simple test image."""
    img = torch.zeros(size, size, 3)
    
    # Circle
    y, x = torch.meshgrid(torch.linspace(-1, 1, size), torch.linspace(-1, 1, size), indexing='ij')
    dist = torch.sqrt(x**2 + y**2)
    circle = (dist < 0.5).float()
    
    img[:, :, 0] = circle
    img[:, :, 1] = 1 - circle
    
    return img

# Create image
img = create_test_image(64)
H, W, C = img.shape

# Get coordinates and pixel values
coords = get_coordinates(H, W).to(device)
pixels = img.view(-1, C).to(device)

print(f"Image: {H}x{W}, Coords: {coords.shape}, Pixels: {pixels.shape}")

Train SIREN¶

Training proceeds by sampling batches of coordinate-value pairs from the target signal and minimizing the reconstruction loss (MSE). SIRENs converge remarkably quickly compared to ReLU networks for this task, often fitting high-frequency image details in just a few hundred iterations. The frequency parameter \(\omega_0\) in the first layer acts as a prior on the spectral content: higher values allow the network to represent higher-frequency details but may cause optimization difficulties if set too large.

# Model
model = SIREN(
    in_features=2,
    hidden_features=256,
    hidden_layers=3,
    out_features=3,
    omega_0=30.0
).to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# Train
losses = []
for epoch in range(500):
    pred = model(coords)
    loss = F.mse_loss(pred, pixels)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    losses.append(loss.item())
    
    if (epoch + 1) % 100 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.6f}")

Visualize Results¶

Comparing the SIREN reconstruction with the original image at the training resolution and at higher resolutions reveals the quality of the learned implicit representation. At the training resolution, the reconstruction should be near-perfect. At higher resolutions (super-resolution), the continuous representation produces smooth, plausible interpolations rather than pixelated artifacts. Visualizing the network’s gradient outputs (which are analytically available) shows edge detection and surface normal estimation for free – a unique benefit of differentiable implicit representations.

# Reconstruct
model.eval()
with torch.no_grad():
    pred = model(coords)
    pred_img = pred.view(H, W, C).cpu().clamp(0, 1)

# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Original
axes[0].imshow(img.cpu())
axes[0].set_title('Original', fontsize=12)
axes[0].axis('off')

# Reconstructed
axes[1].imshow(pred_img)
axes[1].set_title('SIREN Reconstruction', fontsize=12)
axes[1].axis('off')

# Loss curve
axes[2].plot(losses)
axes[2].set_xlabel('Iteration', fontsize=11)
axes[2].set_ylabel('MSE Loss', fontsize=11)
axes[2].set_title('Training Loss', fontsize=12)
axes[2].grid(True, alpha=0.3)
axes[2].set_yscale('log')

plt.tight_layout()
plt.show()

Super-Resolution¶

Because the SIREN represents the image as a continuous function, we can query it at coordinates denser than the original pixel grid to produce a super-resolved image. Unlike traditional super-resolution methods that require training on paired low/high-resolution images, the implicit representation approach achieves super-resolution as a natural byproduct of the continuous coordinate mapping. The quality depends on how well the network has learned the underlying signal structure rather than memorizing pixel values – networks with appropriate frequency capacity produce smooth, natural-looking upscaled results.

# Generate high-resolution coordinates
coords_hr = get_coordinates(128, 128).to(device)

with torch.no_grad():
    pred_hr = model(coords_hr)
    pred_hr_img = pred_hr.view(128, 128, 3).cpu().clamp(0, 1)

# Plot
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

axes[0].imshow(pred_img)
axes[0].set_title('64x64 Reconstruction', fontsize=12)
axes[0].axis('off')

axes[1].imshow(pred_hr_img)
axes[1].set_title('128x128 Super-Resolution', fontsize=12)
axes[1].axis('off')

plt.tight_layout()
plt.show()

Summary¶

Implicit Neural Representations:¶

Key Ideas:

Continuous function parameterization
Coordinate-based input
Resolution-independent
Compact representation

SIREN:¶

Periodic sine activation
Special weight initialization
Natural for signals with derivatives
Better than ReLU for smooth functions

Advantages:¶

Memory efficient: Single network vs pixel array
Super-resolution: Query at any resolution
Derivatives: Analytical gradients
Compression: Compact signal storage

Applications:¶

3D shape representation (DeepSDF, NeRF)
Image compression
Video representation
PDEs solving
Novel view synthesis

Variants:¶

NeRF: 5D function (x,y,z,θ,φ) → (rgb,σ)
Fourier features: Positional encoding
BACON: Bias-free activation
WIRE: Random Fourier with nonlinearities

Advanced Implicit Neural Representations Theory¶

1. Introduction to Neural Fields¶

1.1 From Discrete to Continuous Representations¶

Traditional approach: Discrete grids (images, voxels, meshes)

Images: H×W×C array
3D shapes: N×N×N voxel grid
Videos: T×H×W×C tensor

Limitations:

Fixed resolution (cannot zoom arbitrarily)
Memory grows cubically with resolution (O(N³) for 3D)
Aliasing artifacts
Difficult to apply transformations

Implicit Neural Representations (INRs): Continuous functions via neural networks

f_θ: ℝᵈ → ℝᶜ
Coordinates → Signal values

Examples:

Images: f(x, y) → RGB
3D shapes: f(x, y, z) → occupancy or SDF
Videos: f(x, y, t) → RGB
Audio: f(t) → amplitude

1.2 Advantages of INRs¶

Resolution independence: Query at any coordinate
Memory efficiency: Parameters independent of resolution
Smooth interpolation: Continuous by construction
Differentiable: Gradients available everywhere
Compact: Single network encodes entire signal

Applications:

Novel view synthesis (NeRF)
3D shape representation
Image/video compression
Solving PDEs
Generative modeling

2. Coordinate-Based MLPs¶

2.1 Basic Architecture¶

Standard MLP:

x ∈ ℝᵈ → [Linear → ReLU]^L → Linear → y ∈ ℝᶜ

Problem: Low-frequency bias (spectral bias)

MLPs learn low frequencies first
Struggles with high-frequency details (textures, edges)

Reason: ReLU/tanh have limited frequency content

2.2 Spectral Bias Problem¶

Observation (Rahaman et al., 2019): Neural networks are biased toward low frequencies

Experiment: Fit f(x) = sin(kx) for various k

Low k (slow oscillation): Converges quickly
High k (fast oscillation): Converges slowly or fails

Consequence: Images/shapes miss fine details

3. Positional Encoding (Fourier Features)¶

3.1 Random Fourier Features¶

Idea (Tancik et al., 2020): Map coordinates to higher-dimensional frequency space

Encoding:

γ(x) = [sin(2πb₁ᵀx), cos(2πb₁ᵀx), ..., sin(2πbₘᵀx), cos(2πbₘᵀx)]
where bᵢ ~ N(0, σ²I)

Network:

f_θ(x) = MLP(γ(x))

Effect: Projects input to frequency spectrum

Enables learning high-frequency details
σ controls frequency range (higher σ → higher frequencies)

Relationship to kernel methods: γ(x)ᵀγ(x’) approximates kernel k(x, x’)

3.2 Learned Positional Encoding¶

Alternative: Learn encoding parameters

γ(x) = [sin(W₁x + b₁), cos(W₁x + b₁), ..., sin(Wₘx + bₘ), cos(Wₘx + bₘ)]

where W, b are learned

Advantage: Adapts encoding to data

3.3 Frequency Analysis¶

NTK perspective (Neural Tangent Kernel):

Positional encoding changes kernel’s frequency response
Enables higher frequency eigenfunctions

Rule of thumb: σ ∝ max frequency in signal

4. Sinusoidal Representation Networks (SIREN)¶

4.1 Architecture¶

Key innovation (Sitzmann et al., 2020): Use sine activations throughout

Network:

x → [Linear → sin]^L → Linear → y

Initialization: Critical for training stability

W ~ Uniform(-√(6/n_in), √(6/n_in))  (first layer: √(6/c))

where c is tuning parameter (typically c = 6)

4.2 Properties¶

Derivatives: ∂sin(x)/∂x = cos(x)

Periodic derivatives
Enables solving PDEs naturally

Higher-order derivatives: Available via automatic differentiation

∇²f_θ(x) = Laplacian (used in physics-informed learning)

Frequency content: Richer than ReLU

Captures both low and high frequencies
No need for positional encoding

4.3 Theoretical Foundation¶

Observation: Sine activations create periodic basis functions

Fourier perspective: Network computes Fourier-like expansion

f(x) ≈ Σᵢ aᵢ sin(ωᵢx + φᵢ)

Advantage for PDEs: Derivatives remain periodic

5. Neural Radiance Fields (NeRF)¶

5.1 Problem: Novel View Synthesis¶

Input: Images of scene from different viewpoints
Output: Render scene from new viewpoint

Traditional: Reconstruct 3D mesh → render
NeRF: Implicit volumetric representation

5.2 NeRF Formulation¶

5D function:

F_θ: (x, y, z, θ, φ) → (r, g, b, σ)
      position  direction   color  density

Volume rendering equation:

C(r) = ∫ T(t) σ(r(t)) c(r(t), d) dt
where T(t) = exp(-∫₀ᵗ σ(r(s)) ds)  (transmittance)

Discrete approximation (quadrature):

Ĉ(r) = Σᵢ Tᵢ (1 - exp(-σᵢδᵢ)) cᵢ
where Tᵢ = exp(-Σⱼ₌₁ⁱ⁻¹ σⱼδⱼ)

5.3 NeRF Architecture¶

Two MLPs:

Coarse network: Low-resolution sampling
Fine network: Importance sampling based on coarse

Input encoding:

Position (x,y,z): Positional encoding γ(x) with 10 frequencies
Direction (θ,φ): Positional encoding γ(d) with 4 frequencies

Network structure:

γ(x) → [256 → 256 → 256 → 256] → σ
                                 ↓ (features)
                          [256 → 128] + γ(d) → RGB

Hierarchical sampling:

Sample N_c points uniformly along ray
Evaluate coarse network → weights w_i
Importance sample N_f additional points based on w_i
Evaluate fine network → final color

5.4 Training NeRF¶

Loss: Photometric reconstruction

L = Σᵣ ||C(r) - Ĉ(r)||²

Data: Multi-view images + camera poses

Optimization: Adam, ~300k iterations per scene

Challenges:

Slow rendering (many MLP queries per ray)
Per-scene optimization (no generalization)
Static scenes only

6. Improvements and Variants¶

6.1 Faster NeRF Variants¶

Instant NGP (Müller et al., 2022):

Multi-resolution hash encoding
Speedup: 1000× faster training (5s vs. hours)
Real-time rendering

Plenoxels (Fridovich-Keil et al., 2022):

Explicit voxel grid + spherical harmonics
No neural network!
Faster optimization

TensoRF (Chen et al., 2022):

Tensor factorization for radiance field
Compact + fast

6.2 Generalizable NeRF¶

pixelNeRF (Yu et al., 2021):

Encode image features via CNN
Query features + coordinates
Generalizes to new scenes

IBRNet (Wang et al., 2021):

Image-based rendering with transformers
Few-shot view synthesis

6.3 Dynamic NeRF¶

D-NeRF (Pumarola et al., 2021):

Add time dimension: F(x, y, z, t, θ, φ)
Deformation field + canonical space

HyperNeRF (Park et al., 2021):

Topological changes (e.g., cutting balloon)

K-Planes (Fridovich-Keil et al., 2023):

Factorized 4D space-time representation

6.4 NeRF for 360° Scenes¶

Mip-NeRF 360 (Barron et al., 2022):

Unbounded scenes (not just objects)
Anti-aliasing via integrated positional encoding
Distortion loss for better geometry

7. Signed Distance Functions (SDFs)¶

7.1 SDF Definition¶

SDF: Distance to nearest surface

SDF(x) = {  d   if x outside surface
         { -d   if x inside surface
         {  0   if x on surface

Properties:

||∇SDF(x)|| = 1 (eikonal equation)
Zero level set = surface

7.2 DeepSDF¶

Idea (Park et al., 2019): Neural network as SDF

Network:

f_θ: ℝ³ → ℝ
(x, y, z) → signed distance

Loss:

L = Σᵢ |f_θ(xᵢ) - SDFᵢ|

Shape representation: Implicit surface at f_θ(x) = 0

Advantages:

Watertight surfaces
Handles arbitrary topology
Compact encoding

7.3 Eikonal Regularization¶

Problem: Network may not satisfy ||∇f|| = 1

Solution: Eikonal loss

L_eikonal = E_x[(||∇f_θ(x)|| - 1)²]

Combined loss:

L = L_reconstruction + λ L_eikonal

7.4 NeuS and VolSDF¶

Challenge: SDF alone doesn’t give appearance

NeuS (Wang et al., 2021):

Combine SDF with color network
Volume rendering with SDF
Bias control parameter for surface sharpness

VolSDF (Yariv et al., 2021):

Similar approach with geometric initialization
Better surface reconstruction

8. Occupancy Networks¶

8.1 Occupancy Representation¶

Occupancy function:

o: ℝ³ → [0, 1]
o(x) = probability that x is inside shape

Network:

f_θ(x, y, z) → σ(logit) ∈ [0, 1]

Loss (binary cross-entropy):

L = -Σᵢ [yᵢ log o(xᵢ) + (1-yᵢ) log(1-o(xᵢ))]

8.2 Occupancy Networks (Mescheder et al., 2019)¶

Architecture:

Encoder: Point cloud/image → latent code z
Decoder: (x, z) → occupancy

Applications:

3D reconstruction from partial observations
Shape generation
Completion

9. Implicit Differentiation and Meta-Learning¶

9.1 Hypernetworks for INRs¶

Idea: Learn to generate network weights

Hypernetwork: Input data → θ (weights of INR)
INR: Coordinates → Signal values

Advantage: Amortize optimization across dataset

9.2 MAML for INRs¶

Meta-learning: Learn initialization that adapts quickly

Procedure:

Initialize θ
For each task (scene/shape):
- Fine-tune: θ’ = θ - α∇L_task(θ)
- Accumulate meta-gradient
Update θ

Benefit: Fast adaptation to new data

10. Applications Beyond Graphics¶

10.1 Solving PDEs¶

Physics-Informed Neural Networks (PINNs):

Use SIREN to represent solution u(x, t)
Loss = PDE residual + boundary conditions

Example (heat equation):

∂u/∂t = α∇²u
L = ||∂u/∂t - α∇²u||² + ||u(x, 0) - u₀||²

Advantage: Meshless, continuous solution

10.2 Compression¶

Image compression:

Fit INR to image
Store network weights (typically <1KB for small MLPs)
Decode at any resolution

Comparison:

JPEG: Fixed resolution, block artifacts
INR: Continuous, smooth, but slower decode

10.3 Inverse Problems¶

Super-resolution: Low-res image → INR → High-res output

Inpainting: Masked image → INR (train on visible pixels) → Complete image

11. Computational Complexity¶

11.1 Training Cost¶

Per-iteration:

Forward: O(L · H² · B) (L layers, H hidden dim, B batch)
Backward: Same as forward

Total: O(L · H² · B · N_iter)

NeRF specific:

Rays per image: H_img × W_img
Samples per ray: N_c + N_f (typically 64 + 128)
Per scene: ~1M gradient steps

11.2 Inference Cost¶

Single query: O(L · H²)

Full image:

Standard NeRF: O(H_img · W_img · N_samples · L · H²)
- Example: 800×800 pixels × 192 samples × 8 layers × 256² ≈ 10¹¹ ops
- Slow! (~30s per image on GPU)

Fast variants:

Instant NGP: O(H_img · W_img · L) via hash table
- Real-time (30 FPS)

12. Comparison of Techniques¶

12.1 Activation Functions¶

Activation	Frequency	Derivatives	Best For
ReLU	Low	Discontinuous	General tasks
Tanh	Low	Smooth	Smooth signals
Sine	High	Periodic	PDEs, high-freq
GELU	Medium	Smooth	Transformers

12.2 Encoding Strategies¶

Method	Frequencies	Learnable	Overhead
None (Raw coords)	Low	No	None
Fourier Features	Fixed	No	2M dim
Learned Encoding	Adaptive	Yes	2M dim
SIREN	Adaptive	Implicit	None
Hash Encoding (NGP)	Multi-scale	Yes	O(T·F)

13. Recent Advances (2020-2024)¶

13.1 3D Gaussian Splatting (2023)¶

Idea: Represent scene as 3D Gaussians (not neural field)

Each Gaussian: position, covariance, color, opacity
Differentiable rasterization
Much faster than NeRF (140 FPS vs. 0.03 FPS)

Advantage: Explicit representation + real-time rendering

13.2 Neural Light Fields¶

Plenoptic function: 5D light field L(x, y, z, θ, φ)

Neural encoding: Replace volume rendering with learned interpolation

Speed: Faster inference than volumetric NeRF

13.3 Semantic NeRF¶

Idea: Predict semantic labels along with color

F_θ: (x, y, z, θ, φ) → (RGB, σ, semantics)

Applications: Object removal, scene editing

13.4 NeRF for Generative Modeling¶

π-GAN (Chan et al., 2021):

Generator: Latent z → NeRF parameters
Discriminator: Rendered images
3D-aware GAN

EG3D (Chan et al., 2022):

Efficient tri-plane representation
High-resolution 3D-aware generation

14. Limitations and Challenges¶

14.1 Current Limitations¶

Speed: Slow rendering (addressed by Instant NGP, 3DGS)
Generalization: Per-scene optimization (addressed by pixelNeRF)
Dynamic scenes: Complex deformations
Lighting: Baked lighting (can’t relight easily)
Reflections: Struggles with mirrors, specularities
Transparency: Semi-transparent objects difficult

14.2 Open Research Problems¶

Editing: Interactive scene manipulation
Compositionality: Combine multiple objects
Few-shot learning: Reconstruct from 1-3 images
Physical consistency: Enforce physics constraints
Scalability: City-scale scenes

15. Software and Tools¶

15.1 Libraries¶

PyTorch: Most implementations use PyTorch
tiny-cuda-nn: Fast CUDA MLPs (used in Instant NGP)
nerfstudio: Unified framework for NeRF variants
threestudio: Text-to-3D with NeRF

15.2 Datasets¶

NeRF Synthetic: Blender-rendered objects
LLFF: Real forward-facing scenes
Tanks and Temples: Large-scale reconstruction
ShapeNet: 3D shape dataset for occupancy/SDF

16. Key Takeaways¶

INRs represent signals as continuous functions via neural networks
Spectral bias: MLPs struggle with high frequencies → positional encoding
SIREN: Sine activations enable derivatives for PDEs
NeRF: 5D radiance field for photorealistic novel view synthesis
SDFs: Implicit surfaces with geometric properties
Fast variants: Instant NGP, 3DGS achieve real-time rendering
Applications: Graphics, compression, PDEs, inverse problems

When to use INRs:

Need continuous representation (resolution independence)
Memory constraints (compact encoding)
Derivatives required (physics simulation)
Novel view synthesis (NeRF)

When NOT to use:

Speed critical (explicit representations faster)
Simple tasks (overkill for MNIST)
Limited data (overfitting risk)

17. Mathematical Foundations¶

17.1 Universal Approximation¶

Theorem: Neural networks can approximate any continuous function

For INRs: f_θ: [0,1]ᵈ → ℝᶜ can represent any image/shape

17.2 Nyquist-Shannon Sampling¶

Classic theorem: Sample rate ≥ 2× max frequency

For INRs: Positional encoding frequency σ should match signal frequency

17.3 Volume Rendering Equation¶

Continuous:

C(r) = ∫₀^∞ T(t) σ(r(t)) c(r(t), d) dt
T(t) = exp(-∫₀ᵗ σ(r(s)) ds)

Discrete (alpha compositing):

C = Σᵢ αᵢ Tᵢ cᵢ
where αᵢ = 1 - exp(-σᵢδᵢ), Tᵢ = Πⱼ₌₁ⁱ⁻¹ (1 - αⱼ)

18. References¶

Foundational:

Sitzmann et al. (2020): SIREN - Implicit Neural Representations with Periodic Activation Functions
Tancik et al. (2020): Fourier Features Let Networks Learn High Frequency Functions
Mildenhall et al. (2020): NeRF - Representing Scenes as Neural Radiance Fields

Fast NeRF:

Müller et al. (2022): Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Kerbl et al. (2023): 3D Gaussian Splatting for Real-Time Radiance Field Rendering

Shape Representations:

Park et al. (2019): DeepSDF - Learning Continuous Signed Distance Functions
Mescheder et al. (2019): Occupancy Networks
Wang et al. (2021): NeuS - Learning Neural Implicit Surfaces

Analysis:

Rahaman et al. (2019): On the Spectral Bias of Neural Networks
Jacot et al. (2018): Neural Tangent Kernel (theoretical foundation)

Applications:

Raissi et al. (2019): Physics-Informed Neural Networks (PINNs)
Chan et al. (2022): EG3D - Efficient Geometry-aware 3D GANs

"""
Complete Implicit Neural Representations Implementations
=========================================================
Includes: SIREN, Fourier features, NeRF (simplified), DeepSDF, occupancy networks,
positional encoding, volume rendering.
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# ============================================================================
# 1. Positional Encoding (Fourier Features)
# ============================================================================

class FourierFeatures(nn.Module):
    """
    Random Fourier Features for positional encoding.
    
    γ(x) = [sin(2πBx), cos(2πBx)]
    
    Args:
        input_dim: Input dimension
        num_frequencies: Number of frequency bands M
        scale: Frequency scale σ (controls max frequency)
    """
    def __init__(self, input_dim, num_frequencies=256, scale=10.0):
        super(FourierFeatures, self).__init__()
        
        # Sample random frequencies from Gaussian
        B = torch.randn(num_frequencies, input_dim) * scale
        self.register_buffer('B', B)
    
    def forward(self, x):
        """
        Args:
            x: Input coordinates [batch, input_dim]
        Returns:
            features: [batch, 2*num_frequencies]
        """
        # x @ B^T: [batch, num_frequencies]
        x_proj = 2 * np.pi * x @ self.B.T
        
        # Concatenate sin and cos
        return torch.cat([torch.sin(x_proj), torch.cos(x_proj)], dim=-1)


class LearnedFourierFeatures(nn.Module):
    """
    Learnable Fourier features.
    
    Frequencies and phases are learned parameters.
    """
    def __init__(self, input_dim, num_frequencies=256, scale=10.0):
        super(LearnedFourierFeatures, self).__init__()
        
        # Initialize with random frequencies
        self.B = nn.Parameter(torch.randn(num_frequencies, input_dim) * scale)
        self.b = nn.Parameter(torch.zeros(num_frequencies))
    
    def forward(self, x):
        x_proj = 2 * np.pi * (x @ self.B.T + self.b)
        return torch.cat([torch.sin(x_proj), torch.cos(x_proj)], dim=-1)


# ============================================================================
# 2. SIREN (Sinusoidal Representation Networks)
# ============================================================================

class SineLayer(nn.Module):
    """
    Linear layer followed by sine activation.
    
    Special initialization required for SIREN.
    """
    def __init__(self, in_features, out_features, bias=True, 
                 is_first=False, omega_0=30.0):
        super(SineLayer, self).__init__()
        self.omega_0 = omega_0
        self.is_first = is_first
        
        self.linear = nn.Linear(in_features, out_features, bias=bias)
        
        # Initialize weights
        self._init_weights()
    
    def _init_weights(self):
        """SIREN initialization."""
        with torch.no_grad():
            if self.is_first:
                # First layer: Uniform(-1/n, 1/n)
                bound = 1 / self.linear.in_features
            else:
                # Later layers: Uniform(-√(6/n)/ω₀, √(6/n)/ω₀)
                bound = np.sqrt(6 / self.linear.in_features) / self.omega_0
            
            self.linear.weight.uniform_(-bound, bound)
    
    def forward(self, x):
        return torch.sin(self.omega_0 * self.linear(x))


class SIREN(nn.Module):
    """
    Sinusoidal Representation Network.
    
    f(x) = W_n sin(ω₀ W_{n-1} sin(...ω₀ W₁ sin(ω₀ W₀ x)...))
    
    Args:
        input_dim: Input dimension (e.g., 2 for images, 3 for 3D)
        output_dim: Output dimension (e.g., 3 for RGB, 1 for SDF)
        hidden_dim: Hidden layer size
        num_layers: Number of layers
        omega_0: Frequency parameter
    """
    def __init__(self, input_dim, output_dim, hidden_dim=256, 
                 num_layers=5, omega_0=30.0):
        super(SIREN, self).__init__()
        
        layers = []
        
        # First layer
        layers.append(SineLayer(input_dim, hidden_dim, is_first=True, omega_0=omega_0))
        
        # Hidden layers
        for _ in range(num_layers - 2):
            layers.append(SineLayer(hidden_dim, hidden_dim, is_first=False, omega_0=omega_0))
        
        # Final layer (linear, no sine)
        final_layer = nn.Linear(hidden_dim, output_dim)
        
        # Initialize final layer
        with torch.no_grad():
            bound = np.sqrt(6 / hidden_dim) / omega_0
            final_layer.weight.uniform_(-bound, bound)
        
        layers.append(final_layer)
        
        self.network = nn.Sequential(*layers)
    
    def forward(self, x):
        """
        Args:
            x: Coordinates [batch, input_dim]
        Returns:
            output: [batch, output_dim]
        """
        return self.network(x)
    
    def gradient(self, x):
        """
        Compute gradient ∇f(x) w.r.t. input.
        
        Useful for normals, physics-informed learning.
        """
        x = x.requires_grad_(True)
        y = self.forward(x)
        
        # Compute gradient for each output dimension
        grads = []
        for i in range(y.shape[1]):
            grad = torch.autograd.grad(
                y[:, i].sum(), x, create_graph=True, retain_graph=True)[0]
            grads.append(grad)
        
        return torch.stack(grads, dim=1)


# ============================================================================
# 3. Coordinate MLP with Fourier Features
# ============================================================================

class FourierMLP(nn.Module):
    """
    MLP with Fourier feature encoding.
    
    Combines positional encoding with standard ReLU MLP.
    """
    def __init__(self, input_dim, output_dim, hidden_dim=256, num_layers=4,
                 num_frequencies=256, freq_scale=10.0):
        super(FourierMLP, self).__init__()
        
        # Positional encoding
        self.fourier = FourierFeatures(input_dim, num_frequencies, freq_scale)
        
        # MLP
        layers = []
        in_dim = 2 * num_frequencies
        
        for i in range(num_layers - 1):
            layers.extend([
                nn.Linear(in_dim if i == 0 else hidden_dim, hidden_dim),
                nn.ReLU(inplace=True)
            ])
        
        layers.append(nn.Linear(hidden_dim, output_dim))
        
        self.network = nn.Sequential(*layers)
    
    def forward(self, x):
        features = self.fourier(x)
        return self.network(features)


# ============================================================================
# 4. Neural Radiance Field (Simplified NeRF)
# ============================================================================

class NeRFEncoding(nn.Module):
    """
    Positional encoding for NeRF (learned frequencies).
    
    γ(p) = [p, sin(2⁰πp), cos(2⁰πp), ..., sin(2^{L-1}πp), cos(2^{L-1}πp)]
    """
    def __init__(self, input_dim, num_freqs=10):
        super(NeRFEncoding, self).__init__()
        self.num_freqs = num_freqs
        
        # Frequency bands: 2^0, 2^1, ..., 2^{L-1}
        freq_bands = 2.0 ** torch.linspace(0, num_freqs - 1, num_freqs)
        self.register_buffer('freq_bands', freq_bands)
    
    def forward(self, x):
        """
        Args:
            x: [batch, input_dim]
        Returns:
            encoded: [batch, input_dim * (2*num_freqs + 1)]
        """
        out = [x]
        
        for freq in self.freq_bands:
            for func in [torch.sin, torch.cos]:
                out.append(func(x * freq * np.pi))
        
        return torch.cat(out, dim=-1)


class SimpleNeRF(nn.Module):
    """
    Simplified Neural Radiance Field.
    
    F: (x, y, z, θ, φ) → (r, g, b, σ)
    
    Args:
        pos_freqs: Positional encoding frequencies for position
        dir_freqs: Positional encoding frequencies for direction
    """
    def __init__(self, pos_freqs=10, dir_freqs=4, hidden_dim=256):
        super(SimpleNeRF, self).__init__()
        
        # Encodings
        self.pos_encoding = NeRFEncoding(3, pos_freqs)
        self.dir_encoding = NeRFEncoding(3, dir_freqs)
        
        # Dimensions after encoding
        pos_dim = 3 * (2 * pos_freqs + 1)
        dir_dim = 3 * (2 * dir_freqs + 1)
        
        # Position network (→ density + features)
        self.pos_net = nn.Sequential(
            nn.Linear(pos_dim, hidden_dim),
            nn.ReLU(inplace=True),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(inplace=True),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(inplace=True),
        )
        
        # Density head
        self.density_head = nn.Sequential(
            nn.Linear(hidden_dim, 1),
            nn.ReLU(inplace=True)  # Density must be non-negative
        )
        
        # Color network (features + direction → RGB)
        self.color_net = nn.Sequential(
            nn.Linear(hidden_dim + dir_dim, hidden_dim // 2),
            nn.ReLU(inplace=True),
            nn.Linear(hidden_dim // 2, 3),
            nn.Sigmoid()  # RGB in [0, 1]
        )
    
    def forward(self, pos, direction):
        """
        Args:
            pos: Position (x, y, z) [batch, 3]
            direction: View direction (θ, φ) [batch, 3] (normalized)
        Returns:
            rgb: Color [batch, 3]
            sigma: Density [batch, 1]
        """
        # Encode inputs
        pos_encoded = self.pos_encoding(pos)
        dir_encoded = self.dir_encoding(direction)
        
        # Position-dependent features
        features = self.pos_net(pos_encoded)
        
        # Density (view-independent)
        sigma = self.density_head(features)
        
        # Color (view-dependent)
        rgb = self.color_net(torch.cat([features, dir_encoded], dim=-1))
        
        return rgb, sigma


def volume_rendering(rgb, sigma, z_vals):
    """
    Volume rendering equation (discrete).
    
    C = Σᵢ Tᵢ (1 - exp(-σᵢδᵢ)) cᵢ
    
    Args:
        rgb: Colors [batch, num_samples, 3]
        sigma: Densities [batch, num_samples, 1]
        z_vals: Depth values [batch, num_samples]
    Returns:
        rgb_map: Rendered color [batch, 3]
        depth_map: Expected depth [batch, 1]
    """
    # Compute distances between samples
    dists = z_vals[..., 1:] - z_vals[..., :-1]
    dists = torch.cat([dists, torch.ones_like(dists[..., :1]) * 1e10], dim=-1)
    
    # Alpha (opacity)
    alpha = 1.0 - torch.exp(-sigma.squeeze(-1) * dists)
    
    # Transmittance: Tᵢ = exp(-Σⱼ₌₁ⁱ⁻¹ σⱼδⱼ) = Πⱼ₌₁ⁱ⁻¹ (1 - αⱼ)
    transmittance = torch.cumprod(
        torch.cat([torch.ones_like(alpha[..., :1]), 1.0 - alpha + 1e-10], dim=-1),
        dim=-1
    )[..., :-1]
    
    # Weights
    weights = alpha * transmittance
    
    # Composite RGB
    rgb_map = (weights[..., None] * rgb).sum(dim=1)
    
    # Expected depth
    depth_map = (weights * z_vals).sum(dim=-1, keepdim=True)
    
    return rgb_map, depth_map


# ============================================================================
# 5. DeepSDF (Signed Distance Functions)
# ============================================================================

class DeepSDF(nn.Module):
    """
    Deep Signed Distance Function network.
    
    f: ℝ³ → ℝ (signed distance)
    
    With eikonal regularization: ||∇f(x)|| = 1
    """
    def __init__(self, hidden_dim=256, num_layers=8):
        super(DeepSDF, self).__init__()
        
        layers = []
        
        # First layer
        layers.extend([
            nn.Linear(3, hidden_dim),
            nn.ReLU(inplace=True)
        ])
        
        # Hidden layers with skip connection at layer 4
        for i in range(num_layers - 2):
            if i == 3:
                # Skip connection
                layers.extend([
                    nn.Linear(hidden_dim + 3, hidden_dim),
                    nn.ReLU(inplace=True)
                ])
            else:
                layers.extend([
                    nn.Linear(hidden_dim, hidden_dim),
                    nn.ReLU(inplace=True)
                ])
        
        # Output layer
        layers.append(nn.Linear(hidden_dim, 1))
        
        # Store layers for skip connection
        self.layers = nn.ModuleList(layers)
    
    def forward(self, x):
        """
        Args:
            x: 3D coordinates [batch, 3]
        Returns:
            sdf: Signed distance [batch, 1]
        """
        h = x
        
        for i, layer in enumerate(self.layers):
            # Skip connection at layer 4 (index 8 in layer list)
            if i == 8:
                h = torch.cat([h, x], dim=-1)
            
            h = layer(h)
        
        return h
    
    def eikonal_loss(self, x):
        """
        Eikonal constraint: ||∇f(x)|| = 1
        
        Returns:
            loss: E[(||∇f|| - 1)²]
        """
        x = x.requires_grad_(True)
        sdf = self.forward(x)
        
        # Compute gradient
        grad = torch.autograd.grad(
            sdf.sum(), x, create_graph=True)[0]
        
        # Eikonal loss
        loss = ((grad.norm(dim=-1) - 1) ** 2).mean()
        
        return loss


# ============================================================================
# 6. Occupancy Network
# ============================================================================

class OccupancyNetwork(nn.Module):
    """
    Occupancy network: Predict probability that point is inside shape.
    
    f: ℝ³ → [0, 1]
    """
    def __init__(self, hidden_dim=256, num_layers=5):
        super(OccupancyNetwork, self).__init__()
        
        layers = []
        
        layers.extend([
            nn.Linear(3, hidden_dim),
            nn.ReLU(inplace=True)
        ])
        
        for _ in range(num_layers - 2):
            layers.extend([
                nn.Linear(hidden_dim, hidden_dim),
                nn.ReLU(inplace=True)
            ])
        
        layers.extend([
            nn.Linear(hidden_dim, 1),
            nn.Sigmoid()  # Output in [0, 1]
        ])
        
        self.network = nn.Sequential(*layers)
    
    def forward(self, x):
        """
        Args:
            x: 3D coordinates [batch, 3]
        Returns:
            occupancy: Probability [batch, 1]
        """
        return self.network(x)


# ============================================================================
# 7. Demonstrations
# ============================================================================

def demo_fourier_features():
    """Demonstrate Fourier features for image fitting."""
    print("="*70)
    print("Fourier Features Demo")
    print("="*70)
    
    # Create simple 2D signal (checkerboard)
    size = 32
    coords = torch.stack(torch.meshgrid(
        torch.linspace(-1, 1, size),
        torch.linspace(-1, 1, size),
        indexing='ij'
    ), dim=-1).reshape(-1, 2)
    
    # Target: Checkerboard pattern
    target = ((coords[:, 0] * 8) % 2 < 1) & ((coords[:, 1] * 8) % 2 < 1)
    target = target.float().unsqueeze(-1)
    
    # Compare: No encoding vs. Fourier features
    print("Training without Fourier features...")
    model_no_enc = nn.Sequential(
        nn.Linear(2, 256),
        nn.ReLU(),
        nn.Linear(256, 256),
        nn.ReLU(),
        nn.Linear(256, 1),
        nn.Sigmoid()
    )
    
    optimizer = torch.optim.Adam(model_no_enc.parameters(), lr=1e-3)
    for epoch in range(500):
        optimizer.zero_grad()
        pred = model_no_enc(coords)
        loss = F.mse_loss(pred, target)
        loss.backward()
        optimizer.step()
        
        if (epoch + 1) % 100 == 0:
            print(f"  Epoch {epoch+1}: Loss = {loss.item():.6f}")
    
    print("\nTraining with Fourier features...")
    model_fourier = FourierMLP(2, 1, hidden_dim=256, num_frequencies=128, freq_scale=5.0)
    
    optimizer = torch.optim.Adam(model_fourier.parameters(), lr=1e-3)
    for epoch in range(500):
        optimizer.zero_grad()
        pred = model_fourier(coords)
        pred = torch.sigmoid(pred)
        loss = F.mse_loss(pred, target)
        loss.backward()
        optimizer.step()
        
        if (epoch + 1) % 100 == 0:
            print(f"  Epoch {epoch+1}: Loss = {loss.item():.6f}")
    
    print("\nFourier features enable learning high-frequency patterns!")
    print()


def demo_siren():
    """Demonstrate SIREN for image representation."""
    print("="*70)
    print("SIREN Demo")
    print("="*70)
    
    # Create 2D signal
    size = 64
    coords = torch.stack(torch.meshgrid(
        torch.linspace(-1, 1, size),
        torch.linspace(-1, 1, size),
        indexing='ij'
    ), dim=-1).reshape(-1, 2)
    
    # Target: Concentric circles
    r = torch.sqrt(coords[:, 0]**2 + coords[:, 1]**2)
    target = torch.sin(10 * r).unsqueeze(-1)
    
    # Train SIREN
    print("Training SIREN...")
    model = SIREN(input_dim=2, output_dim=1, hidden_dim=256, num_layers=5, omega_0=30.0)
    
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
    for epoch in range(1000):
        optimizer.zero_grad()
        pred = model(coords)
        loss = F.mse_loss(pred, target)
        loss.backward()
        optimizer.step()
        
        if (epoch + 1) % 200 == 0:
            print(f"  Epoch {epoch+1}: Loss = {loss.item():.6f}")
    
    # Test gradient computation
    test_coords = torch.randn(10, 2) * 0.5
    test_coords.requires_grad_(True)
    output = model(test_coords)
    
    grad = torch.autograd.grad(output.sum(), test_coords, create_graph=True)[0]
    
    print(f"\nGradient shape: {grad.shape}")
    print(f"Gradient norm (should be smooth): {grad.norm(dim=-1).mean():.4f}")
    print()


def demo_simple_nerf():
    """Demonstrate simplified NeRF."""
    print("="*70)
    print("Simplified NeRF Demo")
    print("="*70)
    
    # Create simple NeRF
    nerf = SimpleNeRF(pos_freqs=6, dir_freqs=3, hidden_dim=128)
    
    # Sample points along a ray
    num_samples = 64
    ray_origin = torch.tensor([[0.0, 0.0, -2.0]])
    ray_direction = torch.tensor([[0.0, 0.0, 1.0]])
    
    # Sample depths
    z_vals = torch.linspace(0, 4, num_samples).unsqueeze(0)
    
    # Compute 3D positions
    positions = ray_origin + z_vals.unsqueeze(-1) * ray_direction
    
    # Query NeRF
    directions = ray_direction.expand(num_samples, -1)
    rgb, sigma = nerf(positions.squeeze(0), directions)
    
    print(f"Ray samples: {num_samples}")
    print(f"RGB shape: {rgb.shape}")
    print(f"Sigma shape: {sigma.shape}")
    print(f"Sigma range: [{sigma.min().item():.4f}, {sigma.max().item():.4f}]")
    
    # Volume rendering
    rgb_rendered, depth = volume_rendering(
        rgb.unsqueeze(0), sigma.unsqueeze(0), z_vals)
    
    print(f"\nRendered RGB: {rgb_rendered.squeeze().tolist()}")
    print(f"Expected depth: {depth.item():.4f}")
    print()


def demo_deepsdf():
    """Demonstrate DeepSDF."""
    print("="*70)
    print("DeepSDF Demo")
    print("="*70)
    
    # Create DeepSDF
    model = DeepSDF(hidden_dim=128, num_layers=6)
    
    # Sample points
    points = torch.randn(100, 3)
    
    # Forward pass
    sdf_values = model(points)
    
    print(f"Input points: {points.shape}")
    print(f"SDF values shape: {sdf_values.shape}")
    print(f"SDF range: [{sdf_values.min().item():.4f}, {sdf_values.max().item():.4f}]")
    
    # Compute eikonal loss
    eikonal = model.eikonal_loss(points)
    
    print(f"\nEikonal loss (||∇f|| - 1)²: {eikonal.item():.6f}")
    print("(Lower is better, should be close to 0 after training)")
    print()


def print_method_comparison():
    """Print comparison of INR methods."""
    print("="*70)
    print("Implicit Neural Representation Methods Comparison")
    print("="*70)
    print()
    
    comparison = """
┌──────────────────┬────────────────┬───────────────┬──────────────┬──────────────┐
│ Method           │ Activation     │ Encoding      │ Best For     │ Special      │
├──────────────────┼────────────────┼───────────────┼──────────────┼──────────────┤
│ Vanilla MLP      │ ReLU           │ None          │ Baseline     │ Spectral bias│
├──────────────────┼────────────────┼───────────────┼──────────────┼──────────────┤
│ Fourier MLP      │ ReLU           │ Random Fourier│ Images       │ High-freq OK │
│                  │                │ Features      │              │              │
├──────────────────┼────────────────┼───────────────┼──────────────┼──────────────┤
│ SIREN            │ Sine           │ None (implicit│ PDEs, images │ Derivatives  │
│                  │                │ in activation)│              │              │
├──────────────────┼────────────────┼───────────────┼──────────────┼──────────────┤
│ NeRF             │ ReLU           │ Learned       │ Novel view   │ 5D function  │
│                  │                │ positional    │ synthesis    │              │
├──────────────────┼────────────────┼───────────────┼──────────────┼──────────────┤
│ Instant NGP      │ ReLU           │ Hash encoding │ Fast NeRF    │ Real-time    │
├──────────────────┼────────────────┼───────────────┼──────────────┼──────────────┤
│ DeepSDF          │ ReLU + skip    │ None          │ 3D shapes    │ Eikonal loss │
├──────────────────┼────────────────┼───────────────┼──────────────┼──────────────┤
│ Occupancy Net    │ ReLU           │ None          │ 3D shapes    │ Binary class │
└──────────────────┴────────────────┴───────────────┴──────────────┴──────────────┘

**Training Speed:**

- SIREN: Fast (smooth gradients)
- Fourier MLP: Medium (high-dim features)
- NeRF: Slow (per-scene optimization, ~hours)
- Instant NGP: Fast (~seconds with hash encoding)

**Memory:**

- All INRs: O(parameters) independent of resolution
- Typical: 1-10MB for full scene (vs. GB for voxels)

**Applications:**

1. **Graphics**: NeRF, SDF, occupancy → 3D reconstruction
2. **Compression**: Store weights instead of pixels
3. **Physics**: SIREN + PDE residual → solve differential equations
4. **Editing**: Manipulate latent codes or network weights

**Decision Guide:**

- **Need high-frequency details?** → Fourier features or SIREN
- **Novel view synthesis?** → NeRF (or Instant NGP for speed)
- **3D shape representation?** → DeepSDF or Occupancy
- **Solve PDEs?** → SIREN (derivatives crucial)
- **Real-time rendering?** → Instant NGP or 3D Gaussian Splatting
"""
    
    print(comparison)
    print()


# ============================================================================
# Run Demonstrations
# ============================================================================

if __name__ == "__main__":
    torch.manual_seed(42)
    np.random.seed(42)
    
    demo_fourier_features()
    demo_siren()
    demo_simple_nerf()
    demo_deepsdf()
    print_method_comparison()
    
    print("="*70)
    print("Implicit Neural Representations Implementations Complete")
    print("="*70)
    print()
    print("Summary:")
    print("  • Fourier Features: Random encoding for high-frequency learning")
    print("  • SIREN: Sine activations enable derivatives for PDEs")
    print("  • NeRF: 5D radiance field for photorealistic view synthesis")
    print("  • DeepSDF: Signed distance functions with eikonal regularization")
    print("  • Occupancy: Binary classification for 3D shape occupancy")
    print()
    print("Key insight: INRs represent signals as continuous functions")
    print("Trade-off: Flexibility vs. computational cost")
    print("Applications: Novel view synthesis, compression, physics, 3D shapes")
    print()