phast.embedding#

A Python module that endows graph neural networks with physical priors as part of the embeddings of atoms from their characteristic number.

This package contains the implementation of a set of classes that are used to create atomic embeddings from physical properties of periodic table elements.

The physical embeddings are learned or kept fixed depending on the specific use-case. The embeddings can also include information regarding the group and period of the elements.

In the context of the Open Catalyst datasets, tag embeddings can also be used.

This implementation relies on Mendeleev package to access the physical properties of elements from the periodic table.

graph rewiring
import torch
from phast.embedding import PhysEmbedding

z = torch.randint(1, 85, (3, 12)) # batch of 3 graphs with 12 atoms each
phys_embedding = PhysEmbedding(
    z_emb_size=32, # default
    period_emb_size=32, # default
    group_emb_size=32, # default
    properties_proj_size=32, # default is 0 -> no learned projection
    n_elements=85, # default
)
h = phys_embedding(z) # h.shape = (3, 12, 128)

tags = torch.randint(0, 3, (3, 12))
phys_embedding = PhysEmbedding(
    tag_emb_size=32, # default is 0, this is OC20-specific
    final_proj_size=64, # default is 0, no projection, just the concat. of embeds.
)

h = phys_embedding(z, tags) # h.shape = (3, 12, 64)

# Assuming torch_geometric is installed:

data = torch.load("examples/data/is2re_bs3.pt")
h = phys_embedding(data.atomic_numbers.long(), data.tags) # h.shape = (261, 64)

Classes#

PhysEmbedding

This module embeds inputs for use in a neural network, using both

PhysRef

This class implements an interface to access physical properties, period and

PropertiesEmbedding

A class for retrieving physical properties from atomic numbers.

class phast.embedding.PhysEmbedding(z_emb_size=32, tag_emb_size=0, period_emb_size=32, group_emb_size=32, properties=PhysRef.default_properties, properties_grad=False, properties_proj_size=0, final_proj_size=0, n_elements=85)[source]#

Bases: torch.nn.Module

This module embeds inputs for use in a neural network, using both standard embeddings and physical properties. The input to the embedding module can be a set of compositions, atomic numbers and tags, in addition to any extra physical properties specified.

You can disable embeddings by setting their size to 0.

Parameters
  • z_emb_size (int) – Size of the embedding for atomic number.

  • tag_emb_size (int) – Size of the embedding for tags.

  • period_emb_size (int) – Size of the embedding for periods.

  • group_emb_size (int) – Size of the embedding for groups.

  • properties (list) – List of the physical properties to include in the embedding. Each property is specified as a string, and should correspond to a valid attribute of the Pymatgen Composition class.

  • properties_proj_size (int) – Projection size of the physical properties embedding.

  • properties_grad (bool) – Whether to set the physical properties to be trainable or not.

  • final_proj_size (int) – Projection size for the final embedding.

  • n_elements (int) – Number of elements in the periodic table.

Raises
  • ValueError – if self.properties_proj_size is greater than 0 and self.properties is empty

  • ValueError – if self.full_emb_size is 0, i.e. all sizes were set to 0.

z_emb_size#

Size of the embedding for atomic number.

Type

int

tag_emb_size#

Size of the embedding for tags.

Type

int

period_emb_size#

Size of the embedding for periods.

Type

int

group_emb_size#

Size of the embedding for groups.

Type

int

properties#

List of the physical properties to include in the embedding. Each property must be a string as per the elements or fetch_ionization_energies Mendeleev tables.

Type

list

properties_grad#

Whether to set the physical properties to be trainable or not.

Type

bool

n_elements#

Number of elements in the periodic table to consider.

Type

int

phys_ref#

Reference physical information interface.

Type

PhysRef

full_emb_size#

Total size of the concatenated embeddings.

Type

int

final_emb_size#

Output size: either the final_proj_size or full_emb_size.

Type

int

embeddings#

Dictionary containing the different embeddings.

Type

nn.ModuleDict

phys_lin#

A linear layer to project the physical properties to the given size, if projection is requested.

Type

nn.Linear

final_proj#

A linear layer to project the final embedding to the requested size.

Type

nn.Linear

forward(z, tag=None)[source]#

Embeds the input(s) using the available embeddings. Final embedding size is the sum of the individual embedding sizes, except if final_proj_size is provided, in which case the final embedding is projected to the requested size with an unbiased linear layer.

Parameters
  • z (torch.Tensor) – Tensor of (long) atomic numbers.

  • tag (Optional[torch.Tensor]) – Open Catalyst Project-style tags. Defaults to None.

Returns

Embedded representation of the input(s).

Return type

torch.Tensor

reset_parameters()[source]#

Resets the parameters of the linear layers, and the embeddings.

class phast.embedding.PhysRef(properties=[], period=True, group=True, short=False, n_elements=85)[source]#

Bases: torch.nn.Module

This class implements an interface to access physical properties, period and group ids of elements from the periodic table.

Parameters
  • properties (list) –

  • period (bool) –

  • group (bool) –

  • short (bool) –

  • n_elements (int) –

default_properties[source]#

A list of the default properties part of atom embeddings.

Type

list

properties_list#

A list of the properties that are actually used for creating the embeddings.

Type

list

n_groups#

The number of groups of the elements.

Type

int

n_periods#

The number of periods of the elements.

Type

int

n_properties#

The number of properties of the elements that are used to create the embeddings.

Type

int

properties#

Whether to create an embedding of physical embeddings.

Type

bool

properties_grad#

Whether the physical properties embedding should be learned or kept fixed.

Type

bool

period#

Whether to use period embeddings.

Type

bool

group#

Whether to use group embeddings.

Type

bool

short#

A boolean flag indicating whether to keep only the columns that do not have NaN values.

Type

bool

group_mapping#

A tensor containing the mapping from the element atomic number to the corresponding group embedding.

Type

torch.Tensor

period_mapping#

A tensor containing the mapping from the element atomic number to the corresponding period embedding.

Type

torch.Tensor

properties_mapping#

A tensor containing the mapping from the element atomic number to the corresponding physical properties embedding.

Type

torch.Tensor

__init__()[source]#

Initializes the PhysRef class.

Parameters
  • properties (list) –

  • period (bool) –

  • group (bool) –

  • short (bool) –

  • n_elements (int) –

Return type

None

__repr__()[source]#

Returns a string representation of the class instance.

period_and_group()[source]#

Returns the period and group embeddings of the elements.

default_properties = ['atomic_radius', 'atomic_volume', 'density', 'dipole_polarizability', 'electron_affinity',...[source]#
period_and_group(z)[source]#
class phast.embedding.PropertiesEmbedding(properties, grad=False)[source]#

Bases: torch.nn.Module

A class for retrieving physical properties from atomic numbers.

Parameters
  • properties (torch.Tensor) – A tensor containing the properties to be embedded.

  • grad (bool) – Whether to enable gradient computation or not.

properties#

A parameter or buffer storing the properties.

Type

nn.Parameter or nn.Buffer

forward(z)[source]#

Returns the embedded properties at the specified indices.

Parameters

z (torch.Tensor) –

reset_parameters()[source]#

Does nothing in this class.

forward(z)[source]#

Returns a properties for each atom in the batch according to (1-based) atomic numbers.

Parameters

z (torch.Tensor) – Tensor of atomic numbers as torch.Long.

Returns

The properties for each atom.

reset_parameters()[source]#