DD 2020 will be held in San Diego, CA, USA from August 23 to 27, 2020. The Automatic Graph Representation Learning challenge (AutoGraph), the first ever AutoML challenge applied to Graph-structured data, is the AutoML track challenge in KDD Cup 2020 provided by 4Paradigm, ChaLearn, Stanford and Google. The challenge website could be found here: https://www.automl.ai/competitions/3
Graph-structured data have been ubiquitous in real-world, such as social networks, scholar networks, knowledge graph etc. Graph representation learning has been a very hot topic, and the goal is to learn low-dimensional representation of each node in the graph, which are used for downstream tasks, such as friend recommendation in a social network, or classifying academic papers into different subjects in a citation network. Traditionally, heuristics are exploited to extract features for each node from the graph, e.g., the degree statistics, or random walk based similarities. However, in recent years, sophisticated models such as graph neural networks (GNN) have been proposed for the graph representation learning tasks, which lead to the state-of-the-art results in many tasks, such as node classification, or link prediction.
Challenges in developing versatile models.
Nevertheless, no matter the traditional heuristic methods or recent GNN based methods, huge computational and expertise resources are needed to be invested to achieve a satisfying performance given a task. For example, in DeepWalk and node2vec, two well-known random walk based methods, various hyper-parameters like the length and number of walks per node, the window size, have to be fine-tuned to obtain better performance. And when using the GNN models, e.g. GraphSAGE or GAT, we have to spend quite a lot of time to choose the optimal aggregation function in GraphSAGE, or head numbers of self-attention in GAT. Therefore, it limits the application of the existing graph representation models due to the huge demand of human experts in fine-tuning process.
AutoML/AutoDL (https://autodl.chalearn.org) is a promising approach to lower the manpower costs of machine learning applications, and has achieved encouraging successes in hyper-parameter tuning, model selection, neural architecture search, and feature engineering. In order to enable more people and organizations to fully exploit their graph-structured data, we organize AutoGraph challenge dedicated to such data.
In this challenge, participants should design a computer program capable of providing solutions to graph representation learning problems autonomously (without any human intervention). Compared to previous AutoML competitions we organized, our new focus is on Graph-structured Data, where nodes with features and edges (connections among nodes) are available.
To prevail in the proposed challenge, participants should propose automatic solutions that can effectively and efficiently learn high-quality representation for each node based on the given features, neighborhood and structural information underlying the graph. The solutions should be designed to automatically extract and utilize any useful signals in the graph no matter by heuristic or systematic models.
Here, we list some specific questions that the participants should consider and answer:
How to automatically design heuristics to extract features for a node in graph?
How to automatically exploit the neighborhood information in a graph ?
How to automatically tune an optimal set of hyper-parameters for random walk based graph embedding methods ?
How to automatically choose the aggregation function when using the GNN-based models?
How to automatically design an optimal GNN architecture given different datasets?
How to automatically and efficiently select appropriate hyper-parameters for different models?
How to make the solution more generic, i.e., how to make it applicable for unseen tasks?
How to keep the computational and memory cost acceptable?