Skip to main content

Command Palette

Search for a command to run...

Part4: Building a Simple Regression Model and Preparing Training Tensors

Updated
11 min read
B

With 2+ years of experience in web backend development, I now specialize in AI engineering, building intelligent systems and scalable solutions. Passionate about crafting innovative software, I love exploring new technologies, experimenting with AI models, and bringing ideas to life. Always learning, always building.

So far, we have implemented our dataset, layers, and an initializer. Now, it’s time to combine them and start shaping an actual neural network.

Before we get to the code, let’s first walk through the internal structure of the model we are about to build.

Model Structure

In Part 2, we generated synthetic data using this formula:

$$y=0.01x^3−0.5x+20+noise$$

This is a non-linear function because of the x³ term.

To capture non-linearity, our neural network must have at least:

  • 1 input feature (x)

  • a hidden layer with activation

  • 1 output feature (y)

Input Layer (1 → H)

The input layer receives a single input value x. Hence, the size of input layer is 1. It will send data to hideen layers.

Hidden Layer (H)

The hidden layer is the core component that allows the neural network to learn the non-linear relationship between the input x and the output y.

Each neuron receives input from the previous layer (the input layer, which is just x) and performs a calculation which introduces the necessary non-linearity needed to model the curve defined by the term x³. The hidden layer transforms the input x into a set of learned features or representations. These features are then passed to the output layer to make the final prediction. The size of the hidden layer is a hyperparameter chosen by the user and determines the network's capacity to learn complex patterns.

Output Layer (H → 1)

The output layer takes the features from the hidden layer and produces a single predicted value y.

Implementing the Configuration Structure

For robust and reproducible machine learning experiments, it is essential to manage training hyperparameters external to the code. We will use a .env file to store these configuration values.

Dependency Setup

First, add the dotenv crate to Cargo.toml. This crate automates reading the .env file and setup environment variables.

[package]
name = "simple-regression"
version = "0.1.0"
edition = "2021"

[dependencies]
burn = { version = "0.17.1", features = ["ndarray"] }
rand = "0.9.1"
rand_distr = "0.5.1"
dotenv = "0.15.0"

Defining the Configuration Struct

Next, define the TrainConfig struct. Notice the use of #[derive(Debug, Clone)]. It will be required for later implementation.

/// Configuration for training a neural network model.
///
/// This structure holds the parameters required to configure the training process,
/// such as the size of the hidden layer, the number of epochs, and the batch size.
#[derive(Debug, Clone)]
pub struct TrainConfig {
    /// The size of the hidden layer in the neural network.
    hidden_size: usize,
    /// The size of each batch used during training.
    batch_size: usize,
}

Implementing the initializer

The new function handles reading the configuration from the environment, prioritizing values set in the .env file or the system environment. We implement default values to ensure the application runs even if the configuration variables are not explicitly set.

impl TrainConfig {
    fn new() -> Self {
        let hidden_size = std::env::var("HIDDEN_SIZE")
            .unwrap_or_else(|_| "64".to_string())
            .parse()
            .unwrap_or(64);

        let batch_size = std::env::var("BATCH_SIZE")
            .unwrap_or_else(|_| "10".to_string())
            .parse()
            .unwrap_or(10);

        TrainConfig {
            hidden_size,
            batch_size,
        }
    }
}

Defining the Simple Regression Model Structure

Now it’s time to implement the model struct. We will defines the core neural network architecture using the configuration established earlier. This model encapsulates the entire network, including the layers and the hyperparameters.

The Model Structure

The SimpleRegressionModel struct uses generics to be compatible with various backends. It stores the essential information about the network's architecture.

pub struct SimpleRegressionModel<B: Backend> {
    // Stores the hyperparameters (hidden_size, batch_size, etc.)
    train_config: TrainConfig,
    // Dimensionality of the input (1 for our single feature x)
    d_input: usize,
    // Dimensionality of the output (1 for our single predicted value y)
    d_output: usize,
    // The Input -> Hidden layer connection (H neurons)
    input_layer: Layer<B>,
    // The Hidden -> Output layer connection (1 neuron)
    output_layer: Layer<B>,
    // The device (CPU or GPU) the model is initialized on
    device: B::Device,
}

The Initialization Function

The init function is the constructor for our model. It performs four critical steps: loads the configuration, defines the input/output dimensions, selects an initializer for the weights, and creates the two linear layers.

impl<B: Backend> SimpleRegressionModel<B> {
    /// Initializes the SimpleRegressionModel on the specified device.
    pub fn init(device: &B::Device) -> Self {
        // 1. Load Configuration
        let train_config = TrainConfig::new();
        
        // 2. Define Dimensions
        let d_input = 1; // x
        let d_output = 1; // y

        // 3. Choose Initializer (Weights and Biases)
        // Note: Initializer::Ones is typically used for debugging/simple cases. 
        let initializer = Initializer::Ones;

        // 4. Create Layers
        // Input Layer: 1 input feature -> H (hidden_size) neurons
        let input_layer = Layer::init_with(&initializer, d_input, train_config.hidden_size, device);
        
        // Output Layer: H (hidden_size) features -> 1 output feature
        let output_layer =
            Layer::init_with(&initializer, train_config.hidden_size, d_output, device);

        Self {
            train_config,
            d_input,
            d_output,
            input_layer,
            output_layer,
            device: device.clone(),
        }
    }
}

This initialization ensures the layers have the correct dimensions:

  • Input Layer Dimensions: 1 → Hidden Size

  • Output Layer Dimensions: Hidden Size → 1

Implementing the prepare_tensors Function

Now, we need to feed our traininig data to the network. However, the raw data cannot be processed directly. It must first be in a structured format: the tensor. This is where prepare_tensors comes in. It takes the raw data, converts it into the appropriate tensor format, and divides it into mini-batches for efficient training.

Let's look at how we can implement this in Rust.

💡
As we start dealing with tensors, pay close attention to the shape of each tensor. It's easy to get confused when tensors transform throughout the pipeline. In our Rust implementation, we'll try to keep these transformations as clear as possible so we don't get lost in the dimensions.

Function Signature and Purpose

fn prepare_tensors(
    &self,
    range: std::ops::Range<usize>,
) -> (Vec<(Tensor<B, 2>, Tensor<B, 1>)>) {
// ...
}
  • Input (range): Specifies the slice of the dataset (e.g., 0..1000) that should be processed.

  • Output: Returns a Vec of tuples, where each tuple represents a single mini-batch:

    • The first element (Tensor<B, 2>) is the batch of input features (x).

    • The second element (Tensor<B, 1>) is the batch of target values (y).

Step-by-Step Explanation

Data Loading and Initialization

let data = read_data_from_csv("data.csv").expect("should read data from csv");
let batch_size = self.train_config.batch_size;
let mut inputs: Vec<Tensor<B, 2>> = Vec::new();
let mut targets: Vec<Tensor<B, 1>> = Vec::new();

if range.end - range.start > data.len() || range.end > data.len() {
   panic!("Range is greater than dataset length {}", data.len());
}

let start = range.start;
let end = range.end;
  • The function begins by loading the synthetic data from "data.csv".

  • It retrieves the configured batch_size from self.train_config.

  • It initializes two vectors, inputs and targets, which will temporarily hold the data points as individual tensors before they are grouped into batches.

Indiviual Data Point Conversion and Unsqueezing

for (x, y) in data[start..end].iter() {
    let input_tensor = Tensor::<B, 1>::from_floats([*x], &self.device);
    let target_tensor = Tensor::<B, 1>::from_floats([*y], &self.device);
    inputs.push(input_tensor.unsqueeze()); // [1] → [1, 1]
    targets.push(target_tensor);
};
  • This loop iterates over the specified data slice.

  • For each (x, y) pair, it creates two Rank-1 tensors (Tensor<B, 1>): input_tensor (shape [1]) and target_tensor (shape [1]).

  • Crucial step: input_tensor.unsqueeze()

    • The neural network expects inputs to be at least Rank-2, with the dimensions [Batch Size, Input Features].

    • Since our input has 1 feature, an individual input tensor needs the shape [1, 1].

    • The unsqueeze() operation adds an extra dimension (at index 0 by default, or 1 in Burn's implementation here) to convert the shape from [1] to [1, 1]. The targets remain as Rank-1 tensors (shape [1]) since they will be compared element-wise in the loss function.

Don't worry if the exact reasons for the unsqueeze operation aren't obvious yet; the required tensor shapes will make sense once we define the forward pass logic.

Batch Creation (Grouping and Concatenation)

This section groups the individual tensors into batches and concatenates them along the first dimension (the batch dimension).

let mut batched_inputs: Vec<Tensor<B, 2>> = Vec::new();
let mut batched_targets: Vec<Tensor<B, 1>> = Vec::new();

for i in (0..inputs.len()).step_by(batch_size) {
    let end = std::cmp::min(i + batch_size, inputs.len());
    let batch_inputs = &inputs[i..end];
    let batch_targets = &targets[i..end];
    
    let input_tensor = Tensor::cat(
        batch_inputs.iter().map(|t| t.clone().unsqueeze()).collect(),
            0,
        );
    let target_tensor = Tensor::cat(
        batch_targets
            .iter()
            .map(|t| t.clone().unsqueeze())
            .collect(),
            0,
    );

    batched_inputs.push(input_tensor);
    batched_targets.push(target_tensor);
}
  • The outer loop uses step_by(batch_size) to determine the starting index (i) of each batch.

  • Concatenation Logic:

    • The Tensor::cat() function is used to join the individual tensors ([1, 1] for inputs, [1] for targets) into a single, larger tensor.

    • The concatenation axis is 0 (the batch dimension).

Tensor Type Shape before Concatenation (per sample) Final Batch Shape (for $N$ samples)
Input (Tensor<B, 2>) [1, 1] [N, 1]
Target (Tensor<B, 1>) [1] [N]

If the batch_size is 10, the final input tensor for that batch will have the shape [10, 1] and the target tensor will have the shape [10].

Final Output

batched_inputs.into_iter().zip(batched_targets).collect()

The function zips the vector of input batches and the vector of target batches together, returning the final Vec<(Tensor<B, 2>, Tensor<B, 1>)> ready for the training loop.

Here is the full code.

fn prepare_tensors(
        &self,
        range: std::ops::Range<usize>,
    ) -> (Vec<(Tensor<B, 2>, Tensor<B, 1>)>) {
        let data = read_data_from_csv("data.csv").expect("should read data from csv");
        let batch_size = self.train_config.batch_size;
        let mut inputs: Vec<Tensor<B, 2>> = Vec::new();
        let mut targets: Vec<Tensor<B, 1>> = Vec::new();

        if range.end - range.start > data.len() || range.end > data.len() {
            panic!("Range is greater than dataset length {}", data.len());
        }

        let start = range.start;
        let end = range.end;

        for (x, y) in data[start..end].iter() {
            let input_tensor = Tensor::<B, 1>::from_floats([*x], &self.device);
            let target_tensor = Tensor::<B, 1>::from_floats([*y], &self.device);
            inputs.push(input_tensor.unsqueeze()); // [1] → [1, 1]
            targets.push(target_tensor);
        };

        let mut batched_inputs: Vec<Tensor<B, 2>> = Vec::new();
        let mut batched_targets: Vec<Tensor<B, 1>> = Vec::new();

        for i in (0..inputs.len()).step_by(batch_size) {
            let end = std::cmp::min(i + batch_size, inputs.len());
            let batch_inputs = &inputs[i..end];
            let batch_targets = &targets[i..end];

            let input_tensor = Tensor::cat(
                batch_inputs.iter().map(|t| t.clone().unsqueeze()).collect(),
                0,
            );
            let target_tensor = Tensor::cat(
                batch_targets
                    .iter()
                    .map(|t| t.clone().unsqueeze())
                    .collect(),
                0,
            );

            batched_inputs.push(input_tensor);
            batched_targets.push(target_tensor);
        }

        batched_inputs.into_iter().zip(batched_targets).collect()
    }

Running the code

Let's test this function.

.env file

Create a .env file on the root directory and put values there:

GENERATE_DATASET=false
HIDDEN_SIZE=64
BATCH_SIZE=10

Cargo.toml

We are going to use wgpu as backend device. In order to use this, we need to add the wgpu feature at burn crate.

[package]
name = "simple-regression"
version = "0.1.0"
edition = "2021"

[dependencies]
burn = { version = "0.17.1", features = ["ndarray", "wgpu"] }
rand = "0.9.1"
rand_distr = "0.5.1"
dotenv = "0.15.0"

main function

Now let's write the code to call the functions we implemented earlier.

fn main() {
    dotenv().ok();

    let make_dataset = std::env::var("GENERATE_DATASET").is_ok_and(|v| v == "true");

    if make_dataset {
        let num_dataset: usize = std::env::var("NUM_DATASET")
            .unwrap_or_else(|_| "100000".to_string())
            .parse()
            .unwrap_or(100000);
        data_generator::generate_and_save_data(num_dataset).expect("Failed to generate dataset");
    }

    let device = WgpuDevice::default();
    let model: model::SimpleRegressionModel<Wgpu> = model::SimpleRegressionModel::init(&device);
    let tensors = model.prepare_tensors(0..10);

    println!("tensors: {:?}", tensors);
}

When you run the code, you should see an output similar to this:

tensors: [(Tensor { primitive: Float({ id: TensorId { value: 40 }, shape: [10, 1], should_drop: true, device: DefaultDevice }) }, Tensor { primitive: Float({ id: TensorId { value: 51 }, shape: [10], should_drop: true, device: DefaultDevice }) })]
  • The first tensor (input) has a shape of [10, 1], representing 10 samples with 1 feature each.

  • The second tensor (target) has a shape of [10], representing the 10 corresponding ground-truth values.

Conclusion

Here is what we accomplished:

  • Defined the Model Architecture: We established that a non-linear problem requires a hidden layer.

  • Built a Flexible Configuration: We implemented a system to manage hyperparameters using environment variables.

  • Implemented the Data Pipeline: We created a robust prepare_tensors function to handle data loading, "unsqueezing" dimensions, and batching.

Now that our structure is ready, it's time to make the model actually do something. In the next article, we will implement the Forward Pass and the Loss Function to begin our journey into model training.

Understanding Deep Learning by Building It in Rust

Part 5 of 8

Learn deep learning by building it from scratch in Rust using Burn only for tensors. We’ll implement activations, losses, backprop, and optimizers step by step to understand how neural networks truly work.

Up next

Part5: Forward Pass

With our dataset prepared and our model's architecture ready, we are at starting point of deep learning. The first part is the forward pass. In this part, we will implement the entire forward pass fro