datasets¶
GATNE dataset¶
-
class
cogdl.datasets.gatne.
GatneDataset
(root, name)[source]¶ Bases:
cogdl.data.dataset.Dataset
The network datasets “Amazon”, “Twitter” and “YouTube” from the “Representation Learning for Attributed Multiplex Heterogeneous Network” paper.
- Args:
root (string): Root directory where the dataset should be saved. name (string): The name of the dataset (
"Amazon"
,"Twitter"
,"YouTube"
).
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
url
= 'https://github.com/THUDM/GATNE/raw/master/data'¶
GCC dataset¶
-
class
cogdl.datasets.gcc_data.
Edgelist
(root, name)[source]¶ Bases:
cogdl.data.dataset.Dataset
-
num_classes
¶ The number of classes in the dataset.
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
url
= 'https://github.com/cenyk1230/gcc-data/raw/master'¶
-
-
class
cogdl.datasets.gcc_data.
GCCDataset
(root, name)[source]¶ Bases:
cogdl.data.dataset.Dataset
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
url
= 'https://github.com/cenyk1230/gcc-data/raw/master'¶
-
GTN dataset¶
-
class
cogdl.datasets.gtn_data.
GTNDataset
(root, name)[source]¶ Bases:
cogdl.data.dataset.Dataset
The network datasets “ACM”, “DBLP” and “IMDB” from the “Graph Transformer Networks” paper.
- Args:
root (string): Root directory where the dataset should be saved. name (string): The name of the dataset (
"gtn-acm"
,"gtn-dblp"
,"gtn-imdb"
).
-
num_classes
¶ The number of classes in the dataset.
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
HAN dataset¶
-
class
cogdl.datasets.han_data.
HANDataset
(root, name)[source]¶ Bases:
cogdl.data.dataset.Dataset
The network datasets “ACM”, “DBLP” and “IMDB” from the “Heterogeneous Graph Attention Network” paper.
- Args:
root (string): Root directory where the dataset should be saved. name (string): The name of the dataset (
"han-acm"
,"han-dblp"
,"han-imdb"
).
-
num_classes
¶ The number of classes in the dataset.
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
KG dataset¶
-
class
cogdl.datasets.kg_data.
BidirectionalOneShotIterator
(dataloader_head, dataloader_tail)[source]¶ Bases:
object
-
class
cogdl.datasets.kg_data.
FB13SDatset
[source]¶ Bases:
cogdl.datasets.kg_data.KnowledgeGraphDataset
-
url
= 'https://raw.githubusercontent.com/cenyk1230/test-data/main'¶
-
-
class
cogdl.datasets.kg_data.
KnowledgeGraphDataset
(root, name)[source]¶ Bases:
cogdl.data.dataset.Dataset
-
num_entities
¶
-
num_relations
¶
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
test_start_idx
¶
-
train_start_idx
¶
-
url
= 'https://raw.githubusercontent.com/thunlp/OpenKE/OpenKE-PyTorch/benchmarks'¶
-
valid_start_idx
¶
-
-
class
cogdl.datasets.kg_data.
TestDataset
(triples, all_true_triples, nentity, nrelation, mode)[source]¶ Bases:
torch.utils.data.dataset.Dataset
-
class
cogdl.datasets.kg_data.
TrainDataset
(triples, nentity, nrelation, negative_sample_size, mode)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Matlab matrix dataset¶
-
class
cogdl.datasets.matlab_matrix.
DblpNEDataset
[source]¶ Bases:
cogdl.datasets.matlab_matrix.NetworkEmbeddingCMTYDataset
-
class
cogdl.datasets.matlab_matrix.
MatlabMatrix
(root, name, url)[source]¶ Bases:
cogdl.data.dataset.Dataset
networks from the http://leitang.net/code/social-dimension/data/ or http://snap.stanford.edu/node2vec/
- Args:
- root (string): Root directory where the dataset should be saved.
name (string): The name of the dataset (
"Blogcatalog"
).
-
num_classes
¶ The number of classes in the dataset.
-
num_nodes
¶
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
class
cogdl.datasets.matlab_matrix.
NetworkEmbeddingCMTYDataset
(root, name, url)[source]¶ Bases:
cogdl.data.dataset.Dataset
-
num_classes
¶ The number of classes in the dataset.
-
num_nodes
¶
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
-
class
cogdl.datasets.matlab_matrix.
YoutubeNEDataset
[source]¶ Bases:
cogdl.datasets.matlab_matrix.NetworkEmbeddingCMTYDataset
PyG OGB dataset¶
-
class
cogdl.datasets.ogb.
OGBGDataset
(root, name)[source]¶ Bases:
cogdl.data.dataset.Dataset
-
num_classes
¶ The number of classes in the dataset.
-
PyG strategies dataset¶
This file is borrowed from https://github.com/snap-stanford/pretrain-gnns/
-
class
cogdl.datasets.strategies_data.
BACEDataset
(transform=None, pre_transform=None, pre_filter=None, empty=False)[source]¶ Bases:
cogdl.data.dataset.MultiGraphDataset
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
-
class
cogdl.datasets.strategies_data.
BBBPDataset
(transform=None, pre_transform=None, pre_filter=None, empty=False)[source]¶ Bases:
cogdl.data.dataset.MultiGraphDataset
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
-
class
cogdl.datasets.strategies_data.
BatchAE
(batch=None, **kwargs)[source]¶ Bases:
cogdl.data.data.Data
-
cat_dim
(key)[source]¶ Returns the dimension in which the attribute
key
with contentvalue
gets concatenated when creating batches.Note
This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.
-
static
from_data_list
(data_list)[source]¶ Constructs a batch object from a python list holding
torch_geometric.data.Data
objects. The assignment vectorbatch
is created on the fly.
-
num_graphs
¶ Returns the number of graphs in the batch.
-
-
class
cogdl.datasets.strategies_data.
BatchFinetune
(batch=None, **kwargs)[source]¶ Bases:
cogdl.data.data.Data
-
static
from_data_list
(data_list)[source]¶ Constructs a batch object from a python list holding
torch_geometric.data.Data
objects. The assignment vectorbatch
is created on the fly.
-
num_graphs
¶ Returns the number of graphs in the batch.
-
static
-
class
cogdl.datasets.strategies_data.
BatchMasking
(batch=None, **kwargs)[source]¶ Bases:
cogdl.data.data.Data
-
cumsum
(key, item)[source]¶ If
True
, the attributekey
with contentitem
should be added up cumulatively before concatenated together. .. note:This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.
-
static
from_data_list
(data_list)[source]¶ Constructs a batch object from a python list holding
torch_geometric.data.Data
objects. The assignment vectorbatch
is created on the fly.
-
num_graphs
¶ Returns the number of graphs in the batch.
-
-
class
cogdl.datasets.strategies_data.
BatchSubstructContext
(batch=None, **kwargs)[source]¶ Bases:
cogdl.data.data.Data
-
cat_dim
(key)[source]¶ Returns the dimension in which the attribute
key
with contentvalue
gets concatenated when creating batches.Note
This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.
-
cumsum
(key, item)[source]¶ If
True
, the attributekey
with contentitem
should be added up cumulatively before concatenated together. .. note:This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.
-
static
from_data_list
(data_list)[source]¶ Constructs a batch object from a python list holding
torch_geometric.data.Data
objects. The assignment vectorbatch
is created on the fly.
-
num_graphs
¶ Returns the number of graphs in the batch.
-
-
class
cogdl.datasets.strategies_data.
BioDataset
(data_type='unsupervised', empty=False, transform=None, pre_transform=None, pre_filter=None)[source]¶ Bases:
cogdl.data.dataset.MultiGraphDataset
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
-
class
cogdl.datasets.strategies_data.
ChemExtractSubstructureContextPair
(k, l1, l2)[source]¶ Bases:
object
-
class
cogdl.datasets.strategies_data.
DataLoaderAE
(dataset, batch_size=1, shuffle=True, **kwargs)[source]¶ Bases:
torch.utils.data.dataloader.DataLoader
-
class
cogdl.datasets.strategies_data.
DataLoaderFinetune
(dataset, batch_size=1, shuffle=True, **kwargs)[source]¶ Bases:
torch.utils.data.dataloader.DataLoader
-
class
cogdl.datasets.strategies_data.
DataLoaderMasking
(dataset, batch_size=1, shuffle=True, **kwargs)[source]¶ Bases:
torch.utils.data.dataloader.DataLoader
-
class
cogdl.datasets.strategies_data.
DataLoaderSubstructContext
(dataset, batch_size=1, shuffle=True, **kwargs)[source]¶ Bases:
torch.utils.data.dataloader.DataLoader
-
class
cogdl.datasets.strategies_data.
ExtractSubstructureContextPair
(l1, center=True)[source]¶ Bases:
object
-
class
cogdl.datasets.strategies_data.
MaskAtom
(num_atom_type, num_edge_type, mask_rate, mask_edge=True)[source]¶ Bases:
object
Borrowed from https://github.com/snap-stanford/pretrain-gnns/
-
class
cogdl.datasets.strategies_data.
MaskEdge
(mask_rate)[source]¶ Bases:
object
Borrowed from https://github.com/snap-stanford/pretrain-gnns/
-
class
cogdl.datasets.strategies_data.
MoleculeDataset
(data_type='unsupervised', transform=None, pre_transform=None, pre_filter=None, empty=False)[source]¶ Bases:
cogdl.data.dataset.MultiGraphDataset
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
-
class
cogdl.datasets.strategies_data.
NegativeEdge
[source]¶ Bases:
object
Borrowed from https://github.com/snap-stanford/pretrain-gnns/
-
class
cogdl.datasets.strategies_data.
TestBioDataset
(data_type='unsupervised', root='testbio', transform=None, pre_transform=None, pre_filter=None)[source]¶ Bases:
cogdl.data.dataset.MultiGraphDataset
-
class
cogdl.datasets.strategies_data.
TestChemDataset
(data_type='unsupervised', root='testchem', transform=None, pre_transform=None, pre_filter=None)[source]¶ Bases:
cogdl.data.dataset.MultiGraphDataset
-
cogdl.datasets.strategies_data.
graph_data_obj_to_nx_simple
(data)[source]¶ Converts graph Data object required by the pytorch geometric package to network x data object. NB: Uses simplified atom and bond features, and represent as indices. NB: possible issues with recapitulating relative stereochemistry since the edges in the nx object are unordered. :param data: pytorch geometric Data object :return: network x object
-
cogdl.datasets.strategies_data.
nx_to_graph_data_obj
(g, center_id, allowable_features_downstream=None, allowable_features_pretrain=None, node_id_to_go_labels=None)[source]¶
-
cogdl.datasets.strategies_data.
nx_to_graph_data_obj_simple
(G)[source]¶ Converts nx graph to pytorch geometric Data object. Assume node indices are numbered from 0 to num_nodes - 1. NB: Uses simplified atom and bond features, and represent as indices. NB: possible issues with recapitulating relative stereochemistry since the edges in the nx object are unordered. :param G: nx graph obj :return: pytorch geometric Data object
TU dataset¶
-
class
cogdl.datasets.tu_data.
TUDataset
(root, name)[source]¶ Bases:
cogdl.data.dataset.Dataset
-
num_classes
¶ The number of classes in the dataset.
-
num_edge_attributes
¶
-
num_edge_labels
¶
-
num_node_attributes
¶
-
num_node_labels
¶
-
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
url
= 'https://www.chrsmrrs.com/graphkerneldatasets'¶
-
-
cogdl.datasets.tu_data.
parse_txt_array
(src, sep=None, start=0, end=None, dtype=None, device=None)[source]¶
Module contents¶
-
cogdl.datasets.
register_dataset
(name)[source]¶ New dataset types can be added to cogdl with the
register_dataset()
function decorator.For example:
@register_dataset('my_dataset') class MyDataset(): (...)
- Args:
- name (str): the name of the dataset