1) Goal of the Task: to develop and optimized methods of alterego detection
2) Submission modalities: a valid submission is to be deposed by mail with the subject "KKEC Task 1 submission" to address daniel at wizzion dot com before the deadline (28.10.2018 (AE48) 00:00 UTC)
3) Content of submission: A valid submission has to contain
The file with a name PARTICIPANT_ID_results.csv where PARTICIPANT_ID is participant's kyberia ID
The file under question is a standard CSV file of the form
ID,ALTEREGO_ID
whereby the first column (ID), contains the ID of the user and the second column the user_id of the alterego. Note that to evit confusion of the terms, ID < ALTEREGO_ID (i.e. ID is always older than his|her alterego)
The file method.txt containing a description of deployed methods. Ideally the description should be such that the given analysis should be reproducible.
shuffled_node_id - if of the node X visited by the user (NOTE: these values were randomly shuffled and do not represent actual node_id values stored in kyberia's database, there is, however, a 1 to 1 isomorph mapping between distinct values of actual node_ids and distinct values of shuffled ids)
owner - id of user who owned the node X in the moment of dumping
visits - number of time the user visited the node X
k - whether the user gave K to the node X
bookmark - whether X is bookmarked by the user
5) Results will be evaluated by a jury involving at least 5 members and will be published not later than 23.12.2018/AE48
Some useful keywords to start with: normalization, chi-squared test, shannon entropy, temporal sequences, stylometry
* corpus is quite huge: the gzipped version has cca 430 megabytes...given that it contains complete overview of kyberia's K "blockchain", noting down its md5 hash (b26a43cc7f8717945fa3ae0303a58f5a) can also turn out to be useful