Insight into PyTorch.nn: Parameter vs. Linear vs. Embedding

In a recent PyTorch practice, I used the torch.nn.Parameter() class to create a module parameter but found the parameter was initialized with diminutive values like 1.4013e-45, which brought about very strange returned results. I replaced torch.nn.Parameter() with torch.nn.Linear() later, and surprisingly found the initialized values not odd anymore and the returned results reasonable.

I want to find an explanation for this. What do nn.Parameter() and nn.Linear() accurately do after being called? And what are the differences between them, and furthermore, nn.Embedding(), these frequently-used parameter building modules?


weight = torch.nn.Parameter(torch.FloatTensor(2,2))

This code above shows an example of how to use nn.Parameter() to create a module parameter. We can see weight is created using a given tensor, which means the initialized value of weight should be the same as the given tensor torch.FloatTensor(2,2). Now I understand the diminutive values I got are original from the given tensor! Let’s move on, what’s the value we get using torch.FloatTensor() .

a = torch.FloatTensor(2,2)
>> tensor([[4.6837e-39, 9.9184e-39],
[9.0000e-39, 1.0561e-38]])

Extremely small values in all elements! Near to 0.

In the online official guide, it says ‘torch.Tensor() is just an alias to torch.FloatTensor()’. And from the , it seems that torch.FloatTensor() is a drop-in replacement of numpy.empty() .

Now, all make sense. torch.FloatTensor(2,2) will return an empty tensor. All elements are diminutive, near to zero. That’s why I got the odd results with nn.Parameter() (Actually, it’s not about nn.Parameter(), but the given tensor).


The way to create parameters using nn.Linear() is a little different. From the official guide online, the way to instantiate is below,

CLASS torch.nn.Linear(in_features, out_features, bias=True)

We can set bias to False to make nn.Linear() perform like a simple matrix transformation. In my case, I used

weight = torch.nn.Linear(2, 2, bias=False)

Then, what’s about the initialized values? From the ,

The only additional step in __init__() is self.reset_parameters() , compared to what nn.Parameter() does. nn.Linear() uses kaiming_uniform to uniforms its weight, rather than simply using an empty tensor as weight.


nn.Embedding() creates a simple lookup table that stores embeddings of a fixed dictionary and size. This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

To create a nn.Embedding() parameter, a typical instance is

weight = nn.Embedding(num_embedding, embedding_dim)

Just num_embedding and embedding_dim are essential. So what does nn.Embedding() do in its initialization process? In the official code, it also uses nn.Parameter() to create the weight. Note that as same as what happens in nn.Linear() , the weight value is reset as well. To be specific init.normal_() method is used to fill the weight tensor with values drawn from the normal distribution.

So as a conclusion, nn.Parameter() receives the tensor that is passed into it, and does not do any initial processing such as uniformization. That means that if the tensor passed into is empty or uninitialized, the parameter will also be empty or uninitialized. But nn.Linear() and nn.Embedding() initialize their weight tensors with the uniform operation and normalization operation respectively. You won’t get an empty parameter even you only give the shape.

I study on knowledge graph representation learning and love to share :)