How to participate
To get started and submit your first solution, you will need to pass through the following steps:
Create an account on the CrunchDAO platform;
Setup your workspace to get access to the data;
Test your solution locally;
Get the confirmation that your code is running.
1. Create an account
Creating an account on the CrunchDAO platform will allow you to get access to the competition dataset. Follow the link below to join the competition.
2. Setup and data
Two types of submissions are possible. Your setup and the method of accessing the data will slightly differ depending on the use of a Python Notebook (.ipynb) or of a Python Script (.py).
2.1 Notebook Participation Setup
# Get the crunch library in your workspace.
%pip install crunch-cli --upgrade
# To use the library, import the crunch package and instantiate it to be able to access its functionality.
# You can do that using the following lines:
import crunch
crunch = crunch.load_notebook(__name__)
# Authenticates your user, downloads your project workspace, and enables your access to the data
!crunch setup <competition> --token <token>
2.2 Script Participation Setup
Go to https://hub.crunchdao.com/competitions/venture-capital-portfolio-prediction/submit and click on "reveal the command" button to access the commands that will set up your workspace. Execute the commands in a terminal, in a working directory of your choice.

3. Your working directory
Once you run the setup commands, the crunch
package will download the data and create a folder named after your username on the platform. Here is a snapshot of your working directory folder.
$tree
.
├── data
│ ├── X_test.parquet
│ ├── X_train.parquet
│ └── y_train.parquet
├── main.py
├── requirements.txt
└── resources
3 directories, 5 files
4. Testing your code locally
The crunch test command allows you to perform a local test of your code. The associated test set is purposefully very small and should be used to check the functionality of your code only.
This command conducts a series of tests to also verify whether your generated prediction file aligns with the expected format for the rally. The example_submission
file in the data folder serves as a reference for the expected format.
⚠️ Failure to pass these tests will result in your prediction not being scored and subsequently rejected.
# Upgrade the cruch-cli library to be sure to have the last version
pip install crunch-cli --upgrade
# Run a local test in a notebook
crunch.test(force_first_train=True)
# Run a local test in your terminal
crunch test --no-force-first-train
This function of the crunch package will run your code locally, simulating how it is called in the cloud.
In a notebook,
force_first_train=True
indicates that your model will be trained on the first date of the test set.Similarly,
--no-force-first-train
controls the same parameter for terminal calls to the function (Note that in this case, using this flag will do the opposite asforce_first_train=True
in the notebook case)
Usage: crunch test [OPTIONS]
Test your code locally.
Options:
-m, --main-file TEXT Entrypoint of your code. [default: main.py]
--model-directory TEXT Directory where your model is stored. [default:
resources]
--no-force-first-train Do not force the train at the first loop.
--train-frequency INTEGER Train interval. [default: 1]
--help Show this message and exit.
The key tests performed are:
Column Names: The columns in your file must precisely match those in
example_submission
.Values Integrity: The
prediction_column_name
column should not contain any NaNs (Not-a-Number) or infinite values.Binary Values: Values in the
prediction_column_name
column must exclusively be 0 or 1.Moon Verification: Values in the
moon_column_name
column should match those found in the X_test received from theinfer
function.ID Verification: Values in the
id_column_name
column must match the corresponding ones in the X_test for eachmoon
.
The source code is public and can be accessed on the github repository here.
5. Submit
Download your notebook under the .ipynb format and upload it under the submit section of the CrunchDAO platform.
Specifying package versions
Since submitting a Notebook does not include a requirements.txt
, users can instead specify a package's version using requirement specifiers at the import level in a comment on the same line.
# valid statement
import pandas # == 1.3
import sklearn # >= 1.2, < 2.0
import tqdm # [foo, bar]
import scikit # ~= 1.4.2
from requests import Session # == 1.5
Specifying multiple times will cause the submission to be rejected if they are different.
# inconsistent versions will be rejected
import pandas # == 1.3
import pandas # == 1.5
Specifying versions on standard libraries will do nothing (but they will still be rejected if there is an inconsistent version).
# will be ignored
import os # == 1.3
import sys # == 1.5
5.1 Submit with Crunch CLI (optional)
Usage: crunch push [OPTIONS]
Send the new submission of your code.
Options:
-m, --message TEXT Specify the change of your code. (like a commit
message)
-e, --main-file TEXT Entrypoint of your code. [default: main.py]
--model-directory TEXT Directory where your model is stored. [default:
resources]
--help Show this message and exit.
6. Check your submission
If the submission is complete you will see it appear under your submission section.

The backend is parsing your submission to retrieve the code of the interface's functions (ie: train
, and infer
) and the dependencies of your code. By clicking on the right-side arrow you will access your submission content.

7. Testing your code on the server
In order to run your submission on the cloud and get a score, you need to click on a submission and then on the Run in the Cloud button.

Your code is called on each individual date. Code calls go through the dates sequentially, but are otherwise independent. Be reminded that the data contains, for each individual date, the cross-section of the investment vehicles of the universe at that time.
At each date, your code will access only the data available up to that point.
Here is a high-level overview of how your code will be called:
# This loop over the private test set dates to avoid leaking the x of future periods
for date in dates:
# The wrapper will block the logging of users code after the 5 first dates
if date >= log_treshold:
log = False
# If the user asked for a retrain on the current date
if retrain:
# Cutting the sample such that the user's code will only access the right part of the data
X_train = X_train[X_train.date < date - embargo]
y_train = y_train[y_train.date < date - embargo]
# This is where your `train` code is called
train(X_train, y_train, model_directory_path)
# Only the current date
X_test = X_test[X_test.date == date]
# This is where your `infer` code is called
prediction = infer(model_directory_path, X_test)
if date > log_treshold:
predictions.append(prediction)
# Concat all of the individual predictions
prediction = pandas.concat(predictions)
# Upload it to our servers
upload(prediction)
# Upload the model's files to our servers
for file_name in os.listdir(model_directory_path):
upload(file_name)
8. Monitoring Your Code Runs
Once you successfully launch your run on the cloud, you can monitor its proper execution with the run logs.
Last updated