Authors: Sérgio Moisés Macarringue

Published on: June 9, 2023

Geometric deep learning is a field of study that focuses on the development of machine learning algorithms capable of processing data with geometric structures, such as graphs, point clouds, and meshes. The field combines concepts from deep learning, graph theory, and geometry to address the challenges associated with learning from data with complex geometric structures.

The key idea behind geometric deep learning is to develop neural network architectures that can operate directly on geometric data representations. We know that the traditional deep learning methods operate on regular data structures such as images or sequences, which have a Euclidean geometric structure. However, many real-world data sets have more complex geometric structures, such as social networks, protein molecules, materials sequences, chemical data, or 3D point clouds, which require specialized methods to process.

Geometric deep learning models often leverage graph neural networks (GNNs) to process data with graph-like structures. GNNs operate on nodes and edges in the graph and update their hidden states based on the neighboring nodes and edges. This allows GNNs to learn complex patterns and relationships in the graph structure.

General blueprint of Geometric Deep Learning that can be recognized in the majority of popular deep neural architectures used for representation learning: a typical design consists of a sequence of equivariant layers (e.g. convolutional layers in CNNs), possibly followed by an invariant global pooling layer aggregating everything into a single output. In some cases, it is also possible to create a hierarchy of domains by some coarsening procedure that takes the form of local pooling.

### Geometric Deep Learning Blueprint

From the viewpoints of symmetry and invariance, geometric deep learning is an attempt to geometrically unify a large class of machine learning issues. The groundbreaking performance of convolutional neural networks and the current success of graph neural networks are both based on these ideas, which also offer a logical method for creating new varieties of problem-specific inductive biases.

Geometric deep learning has found applications in various fields, including computer vision, 3D modeling, and robotics. For example, it can be used to analyze the connectivity of neurons in the brain or to perform object recognition in 3D scenes. As the field continues to evolve, it is expected to significantly impact many areas of research and industry.

##### Categories of Geometric deep learning

Geometric deep learning is classified into four fundamental categories, as illustrated in the diagram below:

Categories of geometric deep learning.

Grids, groups, graphs, geodesics and gauges are the 5Gs that Bronstein refers to (Expanding the 4Gs previously identified by Max Welling). Since these final 2Gs are closely related we consider just four different categories, i.e. 4Gs.

Grid

The grid category includes data that is frequently sampled or gridded, including 2D photographs. These data might normally be used by traditional deep learning to make predictions. But it's also possible to view a lot of the traditional deep learning models from a geometric angle. (such as CNNs an their translational equivariance).

Group

The group category covers homogenous spaces with global symmetries. The sphere serves as an example for this category. Spherical data are generated in a variety of applications, including when data is collected directly on a sphere (over the Earth, for example, or via 360-degree cameras that take panoramic pictures and movies), as well as when taking into account spherical symmetries. (such as in molecular chemistry or magnetic resonance imaging). The sphere is the most typical group setting, although other groups and their associated symmetries can also be taken into account.

Graphs

The graph category includes data that can be represented by a node-and-edge computational graph. As a result of networks' suitability for such representations, graph deep learning has been widely used in the study of social networks. Given that a graph may effectively represent a wide range of data, the graph method to geometric deep learning offers tremendous versatility. However, this flexibility could come at the expense of specificity and the benefits it provides. For instance, the group setting can frequently be thought of using a graph approach, but in this case, one loses the group's underlying knowledge, which could otherwise be used.

Geodesics & Gauges

The geodesics and gauges category involves deep learning on more complex shapes, such as more general maniolds and 3D meshes. Such methods can be very helpful, for instance, in computer vision and graphics, where it is possible to deep learn using 3D models and their deformations.

### How does geometric deep learning work

Geometric deep learning algorithms are designed to process data with specific geometric structures such as graphs, point clouds, and meshes. These algorithms operate differently from traditional deep learning models that operate on Euclidean structures like images and audio.

Here is a general overview of how geometric deep learning works:

Data representation: Geometric deep learning algorithms take as input a geometric data structure such as a graph, point cloud, or mesh. These structures represent the underlying geometric relationships and patterns of the data.

Feature extraction: The next step is to extract features from the geometric data structure. These features can be derived from various sources such as node attributes, edge weights, or spatial coordinates.

Neural network architecture: Geometric deep learning models use a variety of neural network architectures such as graph neural networks (GNNs) or convolutional neural networks (CNNs) to process the features extracted from the geometric data structure. The network architecture can be customized to suit the specific geometric data structure being processed.

Message passing and aggregation: In GNNs, the network processes the graph structure through message passing and aggregation. At each node, the model aggregates information from its neighbours and uses it to update the node's hidden state. This process is repeated multiple times until a fixed number of iterations or convergence.

Output prediction: After the network has processed the geometric data structure, it produces an output prediction. The output can be a classification, regression, or segmentation result, depending on the specific task.

Geometric deep learning algorithms have found applications in a wide range of fields, including computer vision, robotics, materials informatics and bioinformatics. As the field continues to grow, we can expect to see new and innovative methods for processing and analyzing complex geometric data structures.

### Building block

While there are a number of different categories of geometric deep learning, as described above, and different types of geometric priors than can be exploited, all approaches to geometric deep learning essentially adopt different incarnations of the following fundamental underlying building blocks.

Geometric deep learning algorithms are constructed from several building blocks, which are tailored to handle geometric data structures. Here are some of the most common building blocks:

Graph Convolution: Graph convolution is a fundamental building block for graph neural networks (GNNs). It involves propagating information through the graph structure by aggregating information from neighbouring nodes. The aggregated information is then passed through a learnable function to produce a new representation of each node.

Pooling: Pooling is used to reduce the size of the input representation by aggregating information from several neighbouring nodes. In geometric deep learning, pooling is used to aggregate information from smaller subgraphs or patches of the input data structure.

Unpooling: Unpooling is used to upsample the input representation by duplicating information from previous layers. In geometric deep learning, unpooling is used to recover the original resolution of the input data structure.

Edge Convolution: Edge convolution is a technique used to operate on edge attributes in graph data structures. It involves computing new edge attributes by combining the attributes of the connected nodes.

Point Convolution: Point convolution is a technique used to operate on point clouds. It involves computing new features at each point by combining the features of the neighboring points.

Mesh Convolution: Mesh convolution is a technique used to operate on meshes. It involves computing new features at each vertex by aggregating information from the connected faces.

These building blocks can be combined and customized to construct deep learning models that are tailored to handle geometric data structures. Geometric deep learning is a rapidly evolving field, and new building blocks and techniques are continually being developed to improve the performance of these models.

### Applications of Geometric deep learning

Geometric deep learning has found applications in various fields, including computer vision, robotics, computational biology, and social network analysis. Here are some of the most exciting applications of geometric deep learning:

3D Object Recognition and Segmentation: Geometric deep learning algorithms have been used to recognize and segment 3D objects in point clouds and meshes. This has applications in robotics, autonomous driving, and augmented reality.

Graph-based Semi-Supervised Learning: Geometric deep learning algorithms have been used to perform semi-supervised learning on graphs. This has applications in social network analysis, where the goal is to identify influential nodes in a network.

Molecular Modelling: Geometric deep learning algorithms have been used to model molecular structures and predict chemical properties. This has applications in drug discovery and materials science.

Human Pose Estimation: Geometric deep learning algorithms have been used to estimate human poses from images or 3D data. This has applications in sports analysis, surveillance, and human-computer interaction.

Autonomous Driving: Geometric deep learning algorithms have been used to analyze 3D sensor data from self-driving cars. This has applications in navigation, object detection, and motion planning.

Computer-Aided Design: Geometric deep learning algorithms have been used to generate 3D models from 2D sketches or images. This has applications in architecture, product design, and digital art.

As the field of geometric deep learning continues to evolve, we can expect to see new and innovative applications in various fields.

##### 3D Object Recognition and Segmentation

3D object recognition and segmentation is one of the most exciting applications of geometric deep learning. Geometric deep learning algorithms can recognize and segment 3D objects in point clouds and meshes, which has applications in robotics, autonomous driving, and augmented reality.

Here are some of the key steps involved in 3D object recognition and segmentation using geometric deep learning:

Data Preparation: The first step is to prepare the input data, which can be a point cloud or a mesh. Point clouds are 3D representations of objects made up of a collection of points, while meshes are 3D representations of objects made up of interconnected triangles.

Feature Extraction: The next step is to extract features from the input data. In point clouds, features can be extracted using methods such as PointNet, which extracts features from individual points, or PointCNN, which applies convolutional filters directly to the point cloud. In meshes, features can be extracted using methods such as MeshCNN, which applies convolutional filters directly to the mesh.

Learning: The extracted features are then fed into a neural network to learn a representation of the input data. Convolutional neural networks (CNNs) are commonly used for this task, and can be designed specifically for processing point clouds or meshes.

Segmentation: Once the network has learned a representation of the input data, it can be used to segment the object into its constituent parts. This involves labeling each point or vertex of the input data with a corresponding object part label.

Recognition: After segmentation, the network can be used to recognize the object by classifying the segmented parts. This involves assigning a class label to each object part.

3D object recognition and segmentation using geometric deep learning is a challenging task due to the complexity of 3D data and the large number of possible object configurations. However, with advances in neural network architectures and training techniques, we can expect to see continued progress in this area.

##### Graph-based Semi-Supervised Learning

Graph-based semi-supervised learning is another exciting application of geometric deep learning. This involves using graph neural networks (GNNs) to perform semi-supervised learning on graphs, where the goal is to identify influential nodes in a network. This has applications in social network analysis, recommendation systems, and fraud detection.

Here are some of the key steps involved in graph-based semi-supervised learning using geometric deep learning:

Graph Construction: The first step is to construct a graph from the input data. The nodes of the graph represent data points, and the edges represent relationships between the data points. The graph can be constructed using various methods, such as k-nearest neighbor graphs, epsilon graphs, or similarity graphs.

Feature Extraction: Once the graph is constructed, features are extracted from each node using methods such as node embedding, where each node is represented as a low-dimensional vector. This allows for efficient computation of the GNN.

Learning: The extracted features are then fed into a GNN, which is a neural network designed specifically for processing graph-structured data. GNNs operate by propagating information through the graph structure by aggregating information from neighboring nodes. This allows for the GNN to learn a representation of the entire graph.

Semi-Supervised Learning: Once the GNN has learned a representation of the graph, it can be used for semi-supervised learning. This involves using the labeled data points to train the network, and then using the learned representation to make predictions for the unlabeled data points. The goal is to identify influential nodes in the network, such as nodes that are likely to belong to a particular class or nodes that have a high degree of centrality.

Graph-based semi-supervised learning using geometric deep learning is a powerful tool for analyzing complex data structures such as social networks or recommendation systems. By leveraging the graph structure, GNNs can learn powerful representations that capture the underlying relationships between the data points.

##### Molecular Modelling

Molecular modeling is another important application of geometric deep learning. It involves using geometric deep learning algorithms to model the behavior of molecules and predict their properties, such as their energy levels and reaction rates. This has applications in drug discovery, materials science, and chemical engineering.

Here are some of the key steps involved in molecular modeling using geometric deep learning:

Data Preparation: The first step is to prepare the input data, which can be a set of molecular structures represented as 3D coordinates of atoms and bonds.

Feature Extraction: The next step is to extract features from the input data. This involves converting the 3D molecular structures into graph representations, where atoms are nodes and bonds are edges. Features can then be extracted using methods such as node embedding, where each atom is represented as a low-dimensional vector.

Learning: The extracted features are then fed into a GNN, which is a neural network designed specifically for processing graph-structured data. GNNs can learn representations of molecular structures that capture the underlying relationships between atoms and bonds, allowing for accurate predictions of molecular properties.

Property Prediction: Once the GNN has learned a representation of the molecular structure, it can be used to predict various molecular properties, such as energy levels or reaction rates. This involves training the network on a dataset of known molecular properties, and then using the learned representation to make predictions for new molecules.

Molecular modeling using geometric deep learning is a powerful tool for predicting the properties of molecules and accelerating drug discovery and materials science. By leveraging the graph structure of molecular structures, GNNs can learn representations that capture the underlying relationships between atoms and bonds, leading to more accurate predictions of molecular properties.

##### Human Pose Estimation

Human pose estimation is another important application of geometric deep learning. It involves using geometric deep learning algorithms to estimate the pose of a human body from an image or video. This has applications in robotics, virtual reality, and human-computer interaction.

Here are some of the key steps involved in human pose estimation using geometric deep learning:

Data Preparation: The first step is to prepare the input data, which can be images or videos of humans performing various actions.

Feature Extraction: The next step is to extract features from the input data. This involves using a convolutional neural network (CNN) to extract features from the image or video frames. These features can then be fed into a GNN for further processing.

Learning: The extracted features are then fed into a GNN, which is a neural network designed specifically for processing graph-structured data. GNNs can learn representations of the human body that capture the underlying relationships between body parts, allowing for accurate predictions of the pose.

Pose Estimation: Once the GNN has learned a representation of the human body, it can be used to estimate the pose of the body. This involves predicting the position and orientation of various body parts, such as the arms, legs, and torso.

Human pose estimation using geometric deep learning is a powerful tool for analyzing human motion and interaction. By leveraging the graph structure of the human body, GNNs can learn representations that capture the underlying relationships between body parts, leading to more accurate predictions of human pose.

##### Autonomous Driving

Autonomous driving is another important application of geometric deep learning. It involves using geometric deep learning algorithms to process sensor data and make decisions about vehicle control in real-time. This has applications in transportation, logistics, and public safety.

Here are some of the key steps involved in autonomous driving using geometric deep learning:

Data Preparation: The first step is to prepare the input data, which can be sensor data such as lidar, radar, and camera data. This data is often noisy and incomplete, and must be preprocessed to remove noise and fill in missing information.

Feature Extraction: The next step is to extract features from the preprocessed data. This involves using various computer vision techniques to extract relevant features from the sensor data, such as object detection and segmentation.

Learning: The extracted features are then fed into a GNN, which is a neural network designed specifically for processing graph-structured data. GNNs can learn representations of the environment that capture the underlying relationships between objects, allowing for accurate predictions of the vehicle's state and the surrounding environment.

Decision Making: Once the GNN has learned a representation of the environment, it can be used to make decisions about vehicle control in real-time. This involves predicting the trajectory of the vehicle and making decisions about acceleration, braking, and steering.

Autonomous driving using geometric deep learning is a powerful tool for improving transportation safety and efficiency. By leveraging the graph structure of the environment, GNNs can learn representations that capture the underlying relationships between objects, leading to more accurate predictions of the vehicle's state and the surrounding environment, and enabling safe and efficient autonomous driving.

##### Computer-Aided Design

Computer-Aided Design (CAD) is another important application of geometric deep learning. It involves using geometric deep learning algorithms to analyze and design 3D models, allowing engineers and designers to create complex and detailed designs more quickly and accurately.

Here are some of the key steps involved in CAD using geometric deep learning:

Data Preparation: The first step is to prepare the input data, which can be a set of 3D models represented as point clouds or meshes.

Feature Extraction: The next step is to extract features from the input data. This involves using techniques such as convolutional neural networks (CNNs) to extract features from the point clouds or meshes, such as shape descriptors, curvature, and texture information.

Learning: The extracted features are then fed into a GNN, which is a neural network designed specifically for processing graph-structured data. GNNs can learn representations of the 3D models that capture the underlying relationships between the vertices and edges of the mesh, allowing for accurate predictions of various properties, such as segmentation, classification, and shape completion.

Design: Once the GNN has learned a representation of the 3D model, it can be used to generate new designs or modify existing ones. This involves using the learned representation to predict the effects of various design changes, allowing designers to optimize the design for various criteria, such as strength, weight, or cost.

CAD using geometric deep learning is a powerful tool for improving the design process, allowing designers to create complex and detailed designs more quickly and accurately. By leveraging the graph structure of 3D models, GNNs can learn representations that capture the underlying relationships between the vertices and edges of the mesh, leading to more accurate predictions of various properties and enabling more efficient and effective design.

### Conclusion

Geometric deep learning is a subfield of deep learning that aims to develop machine learning models capable of handling and processing data with a specific geometric structure, such as graphs, point clouds, meshes, and manifolds.

In contrast to traditional deep learning methods, which mainly focus on data represented in Euclidean space, geometric deep learning algorithms can handle non-Euclidean data structures by leveraging graph-based neural networks and other advanced techniques.

Geometric deep learning has applications in various fields, including computer vision, 3D modeling, bioinformatics, social network analysis, and recommender systems.

Overall, the wide range of applications of geometric deep learning makes it an exciting area of research with the potential to revolutionize many fields, from medicine to transportation to entertainment.

### References

Bronstein, M. M., Bruna, J., Cohen, T., & Veličković, P. (2021). Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. ArXiv. /abs/2104.13478

Hoffman, Jon. (2019). Cramnet: Layer-wise Deep Neural Network Compression with Knowledge

Geometric foundations of Deep Learning by @mmbronstein in @TDataSciencehttps://towardsdatascience.com/geometric-foundations-of-deep-learning-94cdd45b451d?source=social.tw

McEwen, Wallis, Mavor-Parker, Scattering Networks on the Sphere for Scalable and Rotationally Equivariant Spherical CNNs, ICLR (2022), arXiv:2102.02828

Cobb, Wallis, Mavor-Parker, Marignier, Price, d’Avezac, McEwen, Efficient Generalised Spherical CNNs, ICLR (2021), arXiv:2010.11661