How to participate
Last updated
Last updated
To get started and submit your first solution, you will need to pass through the following steps:
Create an account on the CrunchDAO platform;
Setup your workspace to get access to the data;
Test your solution locally;
Get the confirmation that your code is running.
The QuickStarter Notebook below is designed to get you started in just 3 minutes.
Creating an account on the CrunchDAO platform will allow you to get access to the competition dataset. Follow the link below to join the competition.
Two types of submissions are possible. Your setup and the method of accessing the data will slightly differ depending on the use of a Python Notebook (.ipynb) or of a Python Script (.py).
⚠ To get your personal token ⚠
New tokens are generated every minute, and each token can only be used once within a 3-minute timeframe.
Go to https://hub.crunchdao.com/competitions/venture-capital-portfolio-prediction/submit and click on "reveal the command" button to access the commands that will set up your workspace. Execute the commands in a terminal, in a working directory of your choice.
Once you run the setup commands, the crunch
package will download the data and create a folder named after your username on the platform. Here is a snapshot of your working directory folder.
If you need to save some files to run your code on the CrunchDAO's servers, like the weights of your model, the tree structure... etc., you have to save them under the resources folder.
The crunch test command allows you to perform a local test of your code. The associated test set is purposefully very small and should be used to check the functionality of your code only.
This command conducts a series of tests to also verify whether your generated prediction file aligns with the expected format for the rally. The example_submission
file in the data folder serves as a reference for the expected format.
⚠️ Failure to pass these tests will result in your prediction not being scored and subsequently rejected.
This function of the crunch package will run your code locally, simulating how it is called in the cloud.
In a notebook,force_first_train=True
indicates that your model will be trained on the first date of the test set.
Similarly, --no-force-first-train
controls the same parameter for terminal calls to the function (Note that in this case, using this flag will do the opposite as force_first_train=True
in the notebook case)
The key tests performed are:
Column Names: The columns in your file must precisely match those in example_submission
.
Values Integrity: The prediction_column_name
column should not contain any NaNs (Not-a-Number) or infinite values.
Binary Values: Values in the prediction_column_name
column must exclusively be 0 or 1.
Moon Verification: Values in the moon_column_name
column should match those found in the X_test received from the infer
function.
ID Verification: Values in the id_column_name
column must match the corresponding ones in the X_test for each moon
.
The source code is public and can be accessed on the github repository here.
Download your notebook under the .ipynb format and upload it under the submit section of the CrunchDAO platform.
Since submitting a Notebook does not include a requirements.txt
, users can instead specify a package's version using requirement specifiers at the import level in a comment on the same line.
Specifying multiple times will cause the submission to be rejected if they are different.
Specifying versions on standard libraries will do nothing (but they will still be rejected if there is an inconsistent version).
If the submission is complete you will see it appear under your submission section.
The backend is parsing your submission to retrieve the code of the interface's functions (ie: train
, and infer
) and the dependencies of your code. By clicking on the right-side arrow you will access your submission content.
Make sure that the system properly parsed your code and imports.
To get a score on the leaderboard, you need to run your code on the competition server. Your code will be fed with never-seen data, and your predictions will be scored on this private test set.
In order to run your submission on the cloud and get a score, you need to click on a submission and then on the Run in the Cloud button.
Your code is called on each individual date. Code calls go through the dates sequentially, but are otherwise independent. Be reminded that the data contains, for each individual date, the cross-section of the investment vehicles of the universe at that time.
At each date, your code will access only the data available up to that point.
Here is a high-level overview of how your code will be called:
Once you successfully launch your run on the cloud, you can monitor its proper execution with the run logs.
The logs for the execution of your code are only displayed on the 5 first dates of the test set, to avoid meta-labeling (a common cheat method in data-science tournaments)