Part4: Building a Simple Regression Model and Preparing Training Tensors
With 2+ years of experience in web backend development, I now specialize in AI engineering, building intelligent systems and scalable solutions. Passionate about crafting innovative software, I love exploring new technologies, experimenting with AI models, and bringing ideas to life. Always learning, always building.
So far, we have implemented our dataset, layers, and an initializer. Now, it’s time to combine them and start shaping an actual neural network.
Before we get to the code, let’s first walk through the internal structure of the model we are about to build.
Model Structure
In Part 2, we generated synthetic data using this formula:
$$y=0.01x^3−0.5x+20+noise$$
This is a non-linear function because of the x³ term.
To capture non-linearity, our neural network must have at least:
1 input feature (x)
a hidden layer with activation
1 output feature (y)
Input Layer (1 → H)
The input layer receives a single input value x. Hence, the size of input layer is 1. It will send data to hideen layers.
Hidden Layer (H)
The hidden layer is the core component that allows the neural network to learn the non-linear relationship between the input x and the output y.
Each neuron receives input from the previous layer (the input layer, which is just x) and performs a calculation which introduces the necessary non-linearity needed to model the curve defined by the term x³. The hidden layer transforms the input x into a set of learned features or representations. These features are then passed to the output layer to make the final prediction. The size of the hidden layer is a hyperparameter chosen by the user and determines the network's capacity to learn complex patterns.
Output Layer (H → 1)
The output layer takes the features from the hidden layer and produces a single predicted value y.
Implementing the Configuration Structure
For robust and reproducible machine learning experiments, it is essential to manage training hyperparameters external to the code. We will use a .env file to store these configuration values.
Dependency Setup
First, add the dotenv crate to Cargo.toml. This crate automates reading the .env file and setup environment variables.
[package]
name = "simple-regression"
version = "0.1.0"
edition = "2021"
[dependencies]
burn = { version = "0.17.1", features = ["ndarray"] }
rand = "0.9.1"
rand_distr = "0.5.1"
dotenv = "0.15.0"
Defining the Configuration Struct
Next, define the TrainConfig struct. Notice the use of #[derive(Debug, Clone)]. It will be required for later implementation.
/// Configuration for training a neural network model.
///
/// This structure holds the parameters required to configure the training process,
/// such as the size of the hidden layer, the number of epochs, and the batch size.
#[derive(Debug, Clone)]
pub struct TrainConfig {
/// The size of the hidden layer in the neural network.
hidden_size: usize,
/// The size of each batch used during training.
batch_size: usize,
}
Implementing the initializer
The new function handles reading the configuration from the environment, prioritizing values set in the .env file or the system environment. We implement default values to ensure the application runs even if the configuration variables are not explicitly set.
impl TrainConfig {
fn new() -> Self {
let hidden_size = std::env::var("HIDDEN_SIZE")
.unwrap_or_else(|_| "64".to_string())
.parse()
.unwrap_or(64);
let batch_size = std::env::var("BATCH_SIZE")
.unwrap_or_else(|_| "10".to_string())
.parse()
.unwrap_or(10);
TrainConfig {
hidden_size,
batch_size,
}
}
}
Defining the Simple Regression Model Structure
Now it’s time to implement the model struct. We will defines the core neural network architecture using the configuration established earlier. This model encapsulates the entire network, including the layers and the hyperparameters.
The Model Structure
The SimpleRegressionModel struct uses generics to be compatible with various backends. It stores the essential information about the network's architecture.
pub struct SimpleRegressionModel<B: Backend> {
// Stores the hyperparameters (hidden_size, batch_size, etc.)
train_config: TrainConfig,
// Dimensionality of the input (1 for our single feature x)
d_input: usize,
// Dimensionality of the output (1 for our single predicted value y)
d_output: usize,
// The Input -> Hidden layer connection (H neurons)
input_layer: Layer<B>,
// The Hidden -> Output layer connection (1 neuron)
output_layer: Layer<B>,
// The device (CPU or GPU) the model is initialized on
device: B::Device,
}
The Initialization Function
The init function is the constructor for our model. It performs four critical steps: loads the configuration, defines the input/output dimensions, selects an initializer for the weights, and creates the two linear layers.
impl<B: Backend> SimpleRegressionModel<B> {
/// Initializes the SimpleRegressionModel on the specified device.
pub fn init(device: &B::Device) -> Self {
// 1. Load Configuration
let train_config = TrainConfig::new();
// 2. Define Dimensions
let d_input = 1; // x
let d_output = 1; // y
// 3. Choose Initializer (Weights and Biases)
// Note: Initializer::Ones is typically used for debugging/simple cases.
let initializer = Initializer::Ones;
// 4. Create Layers
// Input Layer: 1 input feature -> H (hidden_size) neurons
let input_layer = Layer::init_with(&initializer, d_input, train_config.hidden_size, device);
// Output Layer: H (hidden_size) features -> 1 output feature
let output_layer =
Layer::init_with(&initializer, train_config.hidden_size, d_output, device);
Self {
train_config,
d_input,
d_output,
input_layer,
output_layer,
device: device.clone(),
}
}
}
This initialization ensures the layers have the correct dimensions:
Input Layer Dimensions: 1 → Hidden Size
Output Layer Dimensions: Hidden Size → 1
Implementing the prepare_tensors Function
Now, we need to feed our traininig data to the network. However, the raw data cannot be processed directly. It must first be in a structured format: the tensor. This is where prepare_tensors comes in. It takes the raw data, converts it into the appropriate tensor format, and divides it into mini-batches for efficient training.
Let's look at how we can implement this in Rust.
Function Signature and Purpose
fn prepare_tensors(
&self,
range: std::ops::Range<usize>,
) -> (Vec<(Tensor<B, 2>, Tensor<B, 1>)>) {
// ...
}
Input (
range): Specifies the slice of the dataset (e.g.,0..1000) that should be processed.Output: Returns a
Vecof tuples, where each tuple represents a single mini-batch:The first element (
Tensor<B, 2>) is the batch of input features (x).The second element (
Tensor<B, 1>) is the batch of target values (y).
Step-by-Step Explanation
Data Loading and Initialization
let data = read_data_from_csv("data.csv").expect("should read data from csv");
let batch_size = self.train_config.batch_size;
let mut inputs: Vec<Tensor<B, 2>> = Vec::new();
let mut targets: Vec<Tensor<B, 1>> = Vec::new();
if range.end - range.start > data.len() || range.end > data.len() {
panic!("Range is greater than dataset length {}", data.len());
}
let start = range.start;
let end = range.end;
The function begins by loading the synthetic data from
"data.csv".It retrieves the configured
batch_sizefromself.train_config.It initializes two vectors,
inputsandtargets, which will temporarily hold the data points as individual tensors before they are grouped into batches.
Indiviual Data Point Conversion and Unsqueezing
for (x, y) in data[start..end].iter() {
let input_tensor = Tensor::<B, 1>::from_floats([*x], &self.device);
let target_tensor = Tensor::<B, 1>::from_floats([*y], &self.device);
inputs.push(input_tensor.unsqueeze()); // [1] → [1, 1]
targets.push(target_tensor);
};
This loop iterates over the specified data slice.
For each (x, y) pair, it creates two Rank-1 tensors (
Tensor<B, 1>):input_tensor(shape[1]) andtarget_tensor(shape[1]).Crucial step:
input_tensor.unsqueeze()The neural network expects inputs to be at least Rank-2, with the dimensions
[Batch Size, Input Features].Since our input has 1 feature, an individual input tensor needs the shape
[1, 1].The
unsqueeze()operation adds an extra dimension (at index 0 by default, or 1 in Burn's implementation here) to convert the shape from[1]to[1, 1]. Thetargetsremain as Rank-1 tensors (shape[1]) since they will be compared element-wise in the loss function.
Don't worry if the exact reasons for the unsqueeze operation aren't obvious yet; the required tensor shapes will make sense once we define the forward pass logic.
Batch Creation (Grouping and Concatenation)
This section groups the individual tensors into batches and concatenates them along the first dimension (the batch dimension).
let mut batched_inputs: Vec<Tensor<B, 2>> = Vec::new();
let mut batched_targets: Vec<Tensor<B, 1>> = Vec::new();
for i in (0..inputs.len()).step_by(batch_size) {
let end = std::cmp::min(i + batch_size, inputs.len());
let batch_inputs = &inputs[i..end];
let batch_targets = &targets[i..end];
let input_tensor = Tensor::cat(
batch_inputs.iter().map(|t| t.clone().unsqueeze()).collect(),
0,
);
let target_tensor = Tensor::cat(
batch_targets
.iter()
.map(|t| t.clone().unsqueeze())
.collect(),
0,
);
batched_inputs.push(input_tensor);
batched_targets.push(target_tensor);
}
The outer loop uses
step_by(batch_size)to determine the starting index (i) of each batch.Concatenation Logic:
The
Tensor::cat()function is used to join the individual tensors ([1, 1]for inputs,[1]for targets) into a single, larger tensor.The concatenation axis is
0(the batch dimension).
| Tensor Type | Shape before Concatenation (per sample) | Final Batch Shape (for $N$ samples) |
|---|---|---|
Input (Tensor<B, 2>) |
[1, 1] | [N, 1] |
Target (Tensor<B, 1>) |
[1] | [N] |
If the batch_size is 10, the final input tensor for that batch will have the shape [10, 1] and the target tensor will have the shape [10].
Final Output
batched_inputs.into_iter().zip(batched_targets).collect()
The function zips the vector of input batches and the vector of target batches together, returning the final Vec<(Tensor<B, 2>, Tensor<B, 1>)> ready for the training loop.
Here is the full code.
fn prepare_tensors(
&self,
range: std::ops::Range<usize>,
) -> (Vec<(Tensor<B, 2>, Tensor<B, 1>)>) {
let data = read_data_from_csv("data.csv").expect("should read data from csv");
let batch_size = self.train_config.batch_size;
let mut inputs: Vec<Tensor<B, 2>> = Vec::new();
let mut targets: Vec<Tensor<B, 1>> = Vec::new();
if range.end - range.start > data.len() || range.end > data.len() {
panic!("Range is greater than dataset length {}", data.len());
}
let start = range.start;
let end = range.end;
for (x, y) in data[start..end].iter() {
let input_tensor = Tensor::<B, 1>::from_floats([*x], &self.device);
let target_tensor = Tensor::<B, 1>::from_floats([*y], &self.device);
inputs.push(input_tensor.unsqueeze()); // [1] → [1, 1]
targets.push(target_tensor);
};
let mut batched_inputs: Vec<Tensor<B, 2>> = Vec::new();
let mut batched_targets: Vec<Tensor<B, 1>> = Vec::new();
for i in (0..inputs.len()).step_by(batch_size) {
let end = std::cmp::min(i + batch_size, inputs.len());
let batch_inputs = &inputs[i..end];
let batch_targets = &targets[i..end];
let input_tensor = Tensor::cat(
batch_inputs.iter().map(|t| t.clone().unsqueeze()).collect(),
0,
);
let target_tensor = Tensor::cat(
batch_targets
.iter()
.map(|t| t.clone().unsqueeze())
.collect(),
0,
);
batched_inputs.push(input_tensor);
batched_targets.push(target_tensor);
}
batched_inputs.into_iter().zip(batched_targets).collect()
}
Running the code
Let's test this function.
.env file
Create a .env file on the root directory and put values there:
GENERATE_DATASET=false
HIDDEN_SIZE=64
BATCH_SIZE=10
Cargo.toml
We are going to use wgpu as backend device. In order to use this, we need to add the wgpu feature at burn crate.
[package]
name = "simple-regression"
version = "0.1.0"
edition = "2021"
[dependencies]
burn = { version = "0.17.1", features = ["ndarray", "wgpu"] }
rand = "0.9.1"
rand_distr = "0.5.1"
dotenv = "0.15.0"
main function
Now let's write the code to call the functions we implemented earlier.
fn main() {
dotenv().ok();
let make_dataset = std::env::var("GENERATE_DATASET").is_ok_and(|v| v == "true");
if make_dataset {
let num_dataset: usize = std::env::var("NUM_DATASET")
.unwrap_or_else(|_| "100000".to_string())
.parse()
.unwrap_or(100000);
data_generator::generate_and_save_data(num_dataset).expect("Failed to generate dataset");
}
let device = WgpuDevice::default();
let model: model::SimpleRegressionModel<Wgpu> = model::SimpleRegressionModel::init(&device);
let tensors = model.prepare_tensors(0..10);
println!("tensors: {:?}", tensors);
}
When you run the code, you should see an output similar to this:
tensors: [(Tensor { primitive: Float({ id: TensorId { value: 40 }, shape: [10, 1], should_drop: true, device: DefaultDevice }) }, Tensor { primitive: Float({ id: TensorId { value: 51 }, shape: [10], should_drop: true, device: DefaultDevice }) })]
The first tensor (input) has a shape of
[10, 1], representing 10 samples with 1 feature each.The second tensor (target) has a shape of
[10], representing the 10 corresponding ground-truth values.
Conclusion
Here is what we accomplished:
Defined the Model Architecture: We established that a non-linear problem requires a hidden layer.
Built a Flexible Configuration: We implemented a system to manage hyperparameters using environment variables.
Implemented the Data Pipeline: We created a robust
prepare_tensorsfunction to handle data loading, "unsqueezing" dimensions, and batching.
Now that our structure is ready, it's time to make the model actually do something. In the next article, we will implement the Forward Pass and the Loss Function to begin our journey into model training.