NIPS 2018 Challenge
The 3rd AutoML Challenge: AutoML for Lifelong Machine Learning
(Provided and Sponsored by 4Paradigm, ChaLearn, Microsoft and Acadia University)
The beginning date has been moved to 23rd July, 2018. Thanks for your attention.
We are updating this webpage. More detailed information will be released soon.
In many real-world machine learning applications, AutoML is strongly needed due to the limited machine learning expertise of developers. Moreover, batches of data in many real-world applications may be arriving daily, weekly, monthly, or yearly, for instance, and the data distributions are changing relatively slowly over time. This presents a continuous learning, or Lifelong Machine Learning challenge for an AutoML system. Typical learning problems of this kind include customer relationship management, on-line advertising, recommendation, sentiment analysis, fraud detection, spam filtering, transportation monitoring, econometrics, patient monitoring, climate monitoring, and manufacturing and so on. In this competition, which we are calling AutoML for Lifelong Machine Learning, large scale datasets collected from some of these real-world applications will be used. Compared with previous AutoML competitions(http://automl.
Although the scenario is fairly standard, this challenge introduces the following difficulties:
• Algorithm scalability. We will provide datasets that are 10-100 times larger than in previous challenges we organized.
• Varied feature types. Varied feature types will be included (continuous, binary, ordinal, categorical, multi-valued categorical, temporal). Categorical variables with a large number of values following a power law will be included.
• Concept drift. Instances in all datasets are chronologically ordered. The data distribution is slowly changing over time.
• Lifelong setting. The algorithms will be tested for their capability of adapting to changes in data distribution by exposing them to successive test sets chronologically ordered. After testing, the labels will be revealed to the learning machines and incorporated in the training data.
There’re two phases of the competition:
The Feedback phase is a phase with code submission, participants can practice on 5 datasets that are of similar nature as the datasets of the second phase. Participants can make a limited number of submissions, participants can download the labeled training data and the unlabeled test set. So participants can prepare their code submission at home and submit it later. The LAST submission must be a CODE SUBMISSION, because it will be forwarded to the next phase for final testing.
The AutoML phase is the blind test phase. The last submission of the previous phase is blind tested on five new datasets. Participant’s code will be trained and tested automatically, without human intervention. The final score will be evaluated by the result of the blind testing.
5 public datasets to be released.
Private Dataset Schemas
5 private datasets to be released.
|AutoML||1st Place $10000(+Travel Grant)||2nd Place $3000(+Travel Grant)||3rd Place $2000(+Travel Grant)|
* ChaLearn will provide travel grant to attend the conference and workshop.
The evaluation scheme is depicted in image below. Recall this is a competition receiving code submissions. Participants must prepare an AutoML program that will be uploaded to the challenge platform. The code will be executed in computer workers, autonomously; and allowed to run for a maximum amount of time. Code exceeding this time will be penalized with setting the dataset’s AUC as 0. Different from previous challenges, in this competition we will evaluate the Lifelong learning capabilities of AutoML solutions, hence an appropriate protocol has been designed.
The datasets will be split into different blocks, each block will represent a stage of the lifelong evaluation scenario. Code submitted by participants will use training data to generate a model, which will then be used to predict labels for the first test set. The performance on this test set will be recorded. After this, the labels of the first test set will be made available to the computer program. The computer program may use such labels to improve its initial model and make predictions for the subsequent test set. The process will continue until all of the test sets have been evaluated.
Each dataset will be split into blocks and the data will be progressively presented to the participants’ AutoML programs:
|STEP#||TRAINING DATA||TEST DATA|
|1||LABELED BLOCK_0||UNLABELED BLOCK_1|
|2||LABELED (BLOCK_0 + BLOCK_1)||UNLABELED BLOCK_2|
|3||LABELED (BLOCK_0 + BLOCK_1 + BLOCK_2)||UNLABELED BLOCK_3|
|N||EVERYTHING LABELED UP TO BLOCK_(N-1)||UNLABELED BLOCK_N|
• For each block of each dataset, the evaluation will consist in computing the area under the ROC curve (AUC).
• For each dataset, we will average of the AUC ranks over all the blocks of the dataset. A ranking will be performed according to this metric.
• For the final score, we will use the average rank over all datasets.
• There will be a time budget for each dataset. Code exceeding the maximum execution time will be aborted and assigned an AUC=0.
• The competition will be run in the CodaLab competition platform
• The competition is open for all interested researchers, specialists and students. Members of the Contest Organizing Committee cannot participate.
• Participants may submit solutions as teams made up of one or more persons.
• Each team needs to designate a leader responsible for communication with the Organizers.
• One person can only be part of one team.
• A winner of the competition is chosen on the basis of the final evaluation results. In the case of draws in the evaluation scores, time of the submission will be taken into account.
• Each team is obligated to provide a short report (fact sheet) describing their final solution.
• By enrolling to this competition you grant the organizers rights to process your submissions for the purpose of evaluation and post-competition research.
• Top ranked participants will be invited to attend a workshop collocated with NIPS 2018 to describe their methods and findings. Winners of prizes are expected to attend.
• The challenge is part of the competition program of the NIPS 2018 conference. Organizers are making arrangements for the possible publication of a book chapter or article written jointly by organizers and the participants with the best solutions.
2nd March, 2018: Preparation of the competition starts.
23rd July, 2018: Beginning of the competition, release of development data.
13rd October, 2018: End of development (Feedback) phase.
27th October, 2018: End of the competition.
31st October, 2018: Deadline for submitting the fact sheets.
3rd November, 2018: Release of results, announcements of winners.
November 2018: Post challenge analyses.
December, 2018: NIPS2018. Autonomous Machine Learning for Lifelong Machine Learning Ceremony
In case of any questions please send an email to email@example.com.
UPSud/INRIA Univ. Paris-Saclay, France & ChaLearn, USA
(Coordinator, Platform Administrator, Advisor)
4Paradigm Inc., Beijing, China
(Coordinator, Baseline Provider, Data Provider)
Hugo Jair Escalante
INAOE (Mexico), ChaLearn (USA)
(Platform Administrator, Coordinator)
Daniel L. Silver
4Paradigm Inc., Beijing, China
(Sponsor, Data Provider)
4Paradigm Inc., Beijing, China
Quanming Yao, 4Paradigm Inc. Beijing, China, (Baseline Provider, Data Provider) firstname.lastname@example.org
Mengshuo Wang, 4Paradigm Inc. Beijing, China, (Baseline Provider, Data Provider) email@example.com
Yuanyu Wan, 4Paradigm Inc. Beijing, China, (Baseline Provider, Data Provider) firstname.lastname@example.org
Hai Wang, 4Paradigm Inc. Beijing, China, (Baseline Provider, Data Provider)email@example.com
AutoML workshops can be found here.
Microsoft research blog post on AutoML Challenge can be found here.
KDD Nuggets post on AutoML Challenge can be found here.
I. Guyon et al. A Brief Review of the ChaLearn AutoML Challenge: Any-time Any-dataset Learning Without Human Intervention. ICML W 2016. link
I. Guyon et al. Design of the 2015 ChaLearn AutoML challenge. IJCNN 2015. link
Springer Series on Challenges in Machine Learning. link
Founded in early 2015, 4Paradigm (https://www.4paradigm.com/) is one of the world’s leading AI technology and service providers for industrial applications. 4Paradigm’s flagship product – the AI Prophet – is an AI development platform that enables enterprises to effortlessly build their own AI applications, and thereby significantly increase their operation’s efficiency. Using the AI Prophet, a company can develop a data-driven “AI Core System”, which could be largely regarded as a second core system next to the traditional transaction-oriented Core Banking System (IBM Mainframe) often found in banks. Beyond this, 4Paradigm has also successfully developed more than 100 AI solutions for use in various settings such as finance, telecommunication andInternet applications. These solutions include, but are not limited to, smart pricing, real-time anti-fraud systems, precision marketing, personalized recommendation and more. And while, it is clear that 4Paradigm can completely set up a new paradigm that an organization uses its data, its scope of services does not stop there. 4Paradigm uses state-of-the-art machine learning technologies and practical experiences to bring together a team of experts ranging from scientists to architects. This team has successfully built China’s largest machine learning system and the world’s first commercial deep learning system. However, 4Paradigm’s success does not stop there. With its core team pioneering the research of “Transfer Learning,” 4Paradigm takes the lead in this area, and as a result, has drawn great attention by worldwide tech giants.
ChaLearn (http://chalearn.org) is a non-profit organization with vast experience in the organization of academic challenges. ChaLearn is interested in all aspects of challenge organization, including data gathering procedures, evaluation protocols, novel challenge scenarios (e.g., coopetitions), training for challenge organizers, challenge analytics, results dissemination and, ultimately, advancing the state-of-the-art through challenges. ChaLearn is collaborating with the organization of the NIPS 2018 data competition (AutoML Challenge 2018).
The competition will be run in the CodaLab platform (https://competitions.codalab.org/). CodaLab is an open-source web-based platform that enables researchers, developers, and data scientists to collaborate, with the goal of advancing research fields where machine learning and advanced computation is used. CodaLab offers several features targeting reproducible research. In the context of the AutoML Challenge 2018, CodaLab is the platform that will allow the evaluation of participants solutions. Codalab is administered by Université Paris-Saclay and maintained by CKcollab, LLC. This will be possible by the funding of 4Paradigm and a Microsoft Azure for Research grant.
At Microsoft, we aim to empower every person and every organization on the planet to achieve more. We care deeply about having a global perspective and making a difference in lives and organizations in all corners of the planet. This involves playing a small part in the most fundamental of human activities: Creating tools that enable each of us along our journey to become something more. Our mission is grounded in both the world in which we live and the future we strive to create. Today, we live in a mobile-first, cloud-first world, and we aim to enable our customers to thrive in this world.https://www.microsoft.com/en-us/research/
The Lifelong Machine Learning and Reasoning Group (http://lmlr.acadiau.ca/) at Acadia University undertakes research into novel machine learning algorithms and their application to synthetic and real-world problems. The lab’s researchers and students specialize in developing machine learning algorithms in the area of Lifelong Machine Learning, Transfer Learning, Knowledge Consolidation, Multi-modal Learing, and Learning to Reason. The group has particular expertise in artificial neural networks and deep learning used for supervised, unsupervised, semi-supervised and recurrent (time series) problems. We also apply standard machine learning methods as well as more advanced LMLR approaches to problems in the areas of data analytics, adaptive systems, intelligent agents and robotics. Most notably our lab created context sensitive Multiple Task Learning (csMTL) networks in 2005 and have advanced them for use in Lifelong Machine Learning research and application. To review some of our most recent work on Multi-modal Learning using deep learning architectures please try the demo at https://ml3cpu.acadiau.ca/.