Core Basics 4: Train a Coclustering¶
The steps to train a coclustering model with Khiops are very similar to what we have already seen in the basic classifier tutorials.
Make sure you have installed Khiops and Khiops CoVisualization.
We start by importing Khiops, checking its installation and defining some helper functions:
import os
import platform
import subprocess
from khiops import core as kh
# Define helper functions
def peek(file_path, n=10):
"""Shows the first n lines of a file"""
with open(file_path, encoding="utf8", errors="replace") as file:
for line in file.readlines()[:n]:
print(line, end="")
print("")
# If there are any issues you may Khiops status with the following command
# kh.get_runner().print_status()
As stated before, sometimes it is better to have a more adapted
visualization for an unsupervised analysis. We illustrate this point
with the dataset CountriesByOrganization
that contains the relation
country-organization for a large number of organizations and countries
(it is bit outdated though)
countries_kdic = os.path.join(
"data", "CountriesByOrganization", "CountriesByOrganization.kdic"
)
countries_data_file = os.path.join(
"data", "CountriesByOrganization", "CountriesByOrganization.csv"
)
print(f"CountriesByOrganization dictionary file location: {countries_kdic}")
print("")
peek(countries_kdic)
print(f"CountriesByOrganization data table file location: {countries_data_file}")
print("")
peek(countries_data_file)
CountriesByOrganization dictionary file location: data/CountriesByOrganization/CountriesByOrganization.kdic
Dictionary CountriesByOrganization
{
Categorical Country;
Categorical Organization;
};
CountriesByOrganization data table file location: data/CountriesByOrganization/CountriesByOrganization.csv
Country;Organization
Afghanistan;AsDB
Afghanistan;COLOMBO
Afghanistan;ECO
Afghanistan;ICCROM
Afghanistan;NAM
Afghanistan;PIARC
Afghanistan;SAARC
Afghanistan;WHO
Afghanistan;UN
We now create a coclustering model for this dataset
coclustering_report_file_path_CountriesByOrganization = os.path.join(
"exercises", "CountriesByOrganization", "CoclusteringResults.khcj"
)
countries_cc_report = kh.train_coclustering(
countries_kdic,
dictionary_name="CountriesByOrganization",
data_table_path=countries_data_file,
coclustering_variables=["Country", "Organization"],
coclustering_report_file_path=coclustering_report_file_path_CountriesByOrganization,
field_separator=";",
)
We can now browse the results with the Khiops Covisualization app:
# To visualize uncomment the line below
# kh.visualize_report(countries_cc_report)
We can now dump the country clusters and its metrics to a file with the
extract_clusters
function
country_clusters_file = os.path.join(
"exercises", "CountriesByOrganization", "CountryClusters.txt"
)
kh.extract_clusters(
countries_cc_report,
cluster_variable="Country",
clusters_file_path=country_clusters_file,
)
peek(country_clusters_file, n=100)
Cluster Value Frequency Typicality
{Germany, France, Denmark, ...} Germany 106 1
{Germany, France, Denmark, ...} France 125 0.968057
{Germany, France, Denmark, ...} Denmark 101 0.952673
{Germany, France, Denmark, ...} Netherlands 105 0.952506
{Germany, France, Denmark, ...} Sweden 102 0.943957
{Germany, France, Denmark, ...} Belgium 104 0.919928
{Germany, France, Denmark, ...} Finland 100 0.887537
{Germany, France, Denmark, ...} Norway 96 0.872681
{Germany, France, Denmark, ...} Italy 105 0.870872
{Germany, France, Denmark, ...} Spain 103 0.851888
{Germany, France, Denmark, ...} Austria 88 0.766636
{Germany, France, Denmark, ...} Portugal 94 0.761055
{Germany, France, Denmark, ...} United Kingdom 102 0.744776
{Germany, France, Denmark, ...} Luxembourg 81 0.73663
{Germany, France, Denmark, ...} Switzerland 90 0.73639
{Germany, France, Denmark, ...} Greece 87 0.692487
{Germany, France, Denmark, ...} Ireland 75 0.64078
{Germany, France, Denmark, ...} Iceland 55 0.429196
{United States of America, Canada, Japan, ...} United States of America 92 1
{United States of America, Canada, Japan, ...} Canada 85 0.809229
{United States of America, Canada, Japan, ...} Japan 81 0.748647
{United States of America, Canada, Japan, ...} Australia 75 0.742523
{United States of America, Canada, Japan, ...} New Zealand 60 0.53756
{United States of America, Canada, Japan, ...} South Korea 69 0.509906
{United States of America, Canada, Japan, ...} Taiwan 7 0.112925
{United States of America, Canada, Japan, ...} * 0 0
{Poland, Hungary, Turkey, ...} Poland 79 1
{Poland, Hungary, Turkey, ...} Hungary 72 0.897742
{Poland, Hungary, Turkey, ...} Turkey 78 0.887951
{Poland, Hungary, Turkey, ...} Czech Republic 64 0.86137
{Poland, Hungary, Turkey, ...} Russia 80 0.839033
{Poland, Hungary, Turkey, ...} Bulgaria 70 0.837675
{Poland, Hungary, Turkey, ...} Romania 69 0.833748
{Poland, Hungary, Turkey, ...} Slovakia 58 0.78181
{Poland, Hungary, Turkey, ...} Slovenia 56 0.70581
{Poland, Hungary, Turkey, ...} Ukraine 53 0.675801
{Poland, Hungary, Turkey, ...} Croatia 57 0.665962
{Poland, Hungary, Turkey, ...} Estonia 46 0.612454
{Poland, Hungary, Turkey, ...} Latvia 45 0.601949
{Poland, Hungary, Turkey, ...} Lithuania 43 0.54036
{Poland, Hungary, Turkey, ...} Albania 47 0.454971
{Poland, Hungary, Turkey, ...} Cyprus 62 0.448388
{Poland, Hungary, Turkey, ...} Macedonia 39 0.42804
{Poland, Hungary, Turkey, ...} Serbia 42 0.425104
{Poland, Hungary, Turkey, ...} Malta 52 0.418058
{Poland, Hungary, Turkey, ...} Israel 57 0.412066
{Poland, Hungary, Turkey, ...} Liechtenstein 20 0.330004
{Poland, Hungary, Turkey, ...} Monaco 32 0.307032
{Poland, Hungary, Turkey, ...} Bosnia and Herzegovina 33 0.277743
{Poland, Hungary, Turkey, ...} San Marino 17 0.153352
{Poland, Hungary, Turkey, ...} Andorra 13 0.148879
{Kazakhstan, Kyrgyzstan, Moldova, ...} Kazakhstan 47 1
{Kazakhstan, Kyrgyzstan, Moldova, ...} Kyrgyzstan 45 0.92458
{Kazakhstan, Kyrgyzstan, Moldova, ...} Moldova 47 0.892857
{Kazakhstan, Kyrgyzstan, Moldova, ...} Azerbaijan 41 0.885575
{Kazakhstan, Kyrgyzstan, Moldova, ...} Uzbekistan 41 0.877246
{Kazakhstan, Kyrgyzstan, Moldova, ...} Tajikistan 35 0.808354
{Kazakhstan, Kyrgyzstan, Moldova, ...} Turkmenistan 35 0.804177
{Kazakhstan, Kyrgyzstan, Moldova, ...} Georgia 42 0.763898
{Kazakhstan, Kyrgyzstan, Moldova, ...} Belarus 38 0.751741
{Kazakhstan, Kyrgyzstan, Moldova, ...} Armenia 37 0.722566
{Kazakhstan, Kyrgyzstan, Moldova, ...} Mongolia 36 0.357543
{Venezuela, Nicaragua, Ecuador, ...} Venezuela 87 1
{Venezuela, Nicaragua, Ecuador, ...} Nicaragua 73 0.966437
{Venezuela, Nicaragua, Ecuador, ...} Ecuador 79 0.947892
{Venezuela, Nicaragua, Ecuador, ...} Costa Rica 74 0.936956
{Venezuela, Nicaragua, Ecuador, ...} Colombia 83 0.936721
{Venezuela, Nicaragua, Ecuador, ...} Bolivia 72 0.909518
{Venezuela, Nicaragua, Ecuador, ...} Guatemala 71 0.909214
{Venezuela, Nicaragua, Ecuador, ...} Panama 72 0.903985
{Venezuela, Nicaragua, Ecuador, ...} Mexico 87 0.903091
{Venezuela, Nicaragua, Ecuador, ...} Peru 79 0.88863
{Venezuela, Nicaragua, Ecuador, ...} Brazil 86 0.872311
{Venezuela, Nicaragua, Ecuador, ...} Argentina 84 0.852174
{Venezuela, Nicaragua, Ecuador, ...} Honduras 67 0.850095
{Venezuela, Nicaragua, Ecuador, ...} El Salvador 64 0.841804
{Venezuela, Nicaragua, Ecuador, ...} Uruguay 72 0.825964
{Venezuela, Nicaragua, Ecuador, ...} Chile 80 0.825028
{Venezuela, Nicaragua, Ecuador, ...} Paraguay 65 0.798743
{Venezuela, Nicaragua, Ecuador, ...} Dominican Republic 67 0.735695
{Venezuela, Nicaragua, Ecuador, ...} Cuba 63 0.536944
{Venezuela, Nicaragua, Ecuador, ...} Haiti 62 0.502131
{Trinidad and Tobago, Barbados, Grenada, ...} Trinidad and Tobago 63 1
{Trinidad and Tobago, Barbados, Grenada, ...} Barbados 56 0.992078
{Trinidad and Tobago, Barbados, Grenada, ...} Grenada 50 0.920647
{Trinidad and Tobago, Barbados, Grenada, ...} Jamaica 63 0.906315
{Trinidad and Tobago, Barbados, Grenada, ...} Belize 51 0.835998
{Trinidad and Tobago, Barbados, Grenada, ...} Guyana 56 0.811344
{Trinidad and Tobago, Barbados, Grenada, ...} Dominica 47 0.808439
{Trinidad and Tobago, Barbados, Grenada, ...} Antigua and Barbuda 43 0.806975
{Trinidad and Tobago, Barbados, Grenada, ...} Saint Lucia 46 0.777826
{Trinidad and Tobago, Barbados, Grenada, ...} Saint Vincent and the Grenadines 41 0.771804
{Trinidad and Tobago, Barbados, Grenada, ...} The Bahamas 49 0.747432
{Trinidad and Tobago, Barbados, Grenada, ...} Suriname 48 0.694702
{Trinidad and Tobago, Barbados, Grenada, ...} Saint Kitts and Nevis 36 0.689177
{Niger, Ivory Coast, Benin, ...} Niger 66 1
{Niger, Ivory Coast, Benin, ...} Ivory Coast 83 0.991021
{Niger, Ivory Coast, Benin, ...} Benin 76 0.985146
{Niger, Ivory Coast, Benin, ...} Burkina Faso 75 0.98505
Exercise¶
We’ll build a coclustering for the Tokyo2021
dataset which contains
a table called Athletes
Tokyo 2021 Kaggle
dataset
where each athlete is described by three variables: - Name
: the name
of the competing athlete - Country
: the country (or organization) of
the athlete - Discipline
: the athlete’s discipline
The idea for this exercise is to make a coclustering between Country
and Discipline
and see which countries resemble the most in terms of
the athletes they bring to the Olympics.
We start by saving the dataset dictionary file and data table location into variables:
tokyo_kdic = os.path.join("data", "Tokyo2021", "Athletes.kdic")
tokyo_data_file = os.path.join("data", "Tokyo2021", "Athletes.csv")
coclustering_report_file_path_Tokyo2021 = os.path.join(
"exercises", "Tokyo2021", "CoclusteringResults.khcj"
)
peek
the contents of the dictionary and data files¶
print(f"Tokyo2021 dictionary file: {tokyo_kdic}")
print("")
peek(tokyo_kdic, n=15)
print(f"Tokyo data table file: {tokyo_data_file}")
print("")
peek(tokyo_data_file)
Tokyo2021 dictionary file: data/Tokyo2021/Athletes.kdic
Dictionary Athletes
{
Categorical Name;
Categorical Country;
Categorical Discipline;
};
Tokyo data table file: data/Tokyo2021/Athletes.csv
Name,Country,Discipline
AALERUD Katrine,Norway,Cycling Road
ABAD Nestor,Spain,Artistic Gymnastics
ABAGNALE Giovanni,Italy,Rowing
ABALDE Alberto,Spain,Basketball
ABALDE Tamara,Spain,Basketball
ABALO Luc,France,Handball
ABAROA Cesar,Chile,Rowing
ABASS Abobakr,Sudan,Swimming
ABBASALI Hamideh,Islamic Republic of Iran,Karate
Train the coclustering for the variables Country
and Discipline
¶
Do not forget that the separator is ,
tokyo_cc_report = kh.train_coclustering(
tokyo_kdic,
dictionary_name="Athletes",
coclustering_variables=["Country", "Discipline"],
data_table_path=tokyo_data_file,
coclustering_report_file_path=coclustering_report_file_path_Tokyo2021,
field_separator=",",
)
You may see the coclustering with the covisualization app:
# To visualize uncomment the line below
# kh.visualize_report(tokyo_cc_report)
Use extract_clusters
to extract the country clusters and peek
its contents¶
tokyo_country_clusters_file = os.path.join(
"exercises", "Tokyo2021", "CountryClusters.txt"
)
kh.extract_clusters(
tokyo_cc_report,
cluster_variable="Country",
clusters_file_path=tokyo_country_clusters_file,
)
peek(tokyo_country_clusters_file, n=200)
Cluster Value Frequency Typicality
{Ghana, Cameroon, Republic of Moldova, ...} Ghana 14 1
{Ghana, Cameroon, Republic of Moldova, ...} Cameroon 11 0.920849
{Ghana, Cameroon, Republic of Moldova, ...} Republic of Moldova 19 0.903523
{Ghana, Cameroon, Republic of Moldova, ...} Kosovo 10 0.889051
{Ghana, Cameroon, Republic of Moldova, ...} Tajikistan 8 0.875786
{Ghana, Cameroon, Republic of Moldova, ...} Guatemala 22 0.849855
{Ghana, Cameroon, Republic of Moldova, ...} Turkmenistan 8 0.844016
{Ghana, Cameroon, Republic of Moldova, ...} Pakistan 10 0.832082
{Ghana, Cameroon, Republic of Moldova, ...} Niger 7 0.802549
{Ghana, Cameroon, Republic of Moldova, ...} Bosnia and Herzegovina 7 0.794854
{Ghana, Cameroon, Republic of Moldova, ...} Haiti 6 0.771924
{Ghana, Cameroon, Republic of Moldova, ...} Madagascar 6 0.771924
{Ghana, Cameroon, Republic of Moldova, ...} Jordan 11 0.764815
{Ghana, Cameroon, Republic of Moldova, ...} Lebanon 6 0.76423
{Ghana, Cameroon, Republic of Moldova, ...} Qatar 14 0.763605
{Ghana, Cameroon, Republic of Moldova, ...} Panama 9 0.743808
{Ghana, Cameroon, Republic of Moldova, ...} Albania 8 0.733188
{Ghana, Cameroon, Republic of Moldova, ...} Gabon 5 0.726799
{Ghana, Cameroon, Republic of Moldova, ...} Mauritius 7 0.724113
{Ghana, Cameroon, Republic of Moldova, ...} Burundi 6 0.717779
{Ghana, Cameroon, Republic of Moldova, ...} Mozambique 8 0.713343
{Ghana, Cameroon, Republic of Moldova, ...} Democratic Republic of the Congo 7 0.712369
{Ghana, Cameroon, Republic of Moldova, ...} Malawi 5 0.711301
{Ghana, Cameroon, Republic of Moldova, ...} Nepal 5 0.711301
{Ghana, Cameroon, Republic of Moldova, ...} Burkina Faso 7 0.709689
{Ghana, Cameroon, Republic of Moldova, ...} Papua New Guinea 7 0.708507
{Ghana, Cameroon, Republic of Moldova, ...} Guyana 7 0.707809
{Ghana, Cameroon, Republic of Moldova, ...} Cape Verde 6 0.706279
{Ghana, Cameroon, Republic of Moldova, ...} North Macedonia 8 0.706252
{Ghana, Cameroon, Republic of Moldova, ...} Tonga 5 0.704952
{Ghana, Cameroon, Republic of Moldova, ...} Benin 7 0.704835
{Ghana, Cameroon, Republic of Moldova, ...} Antigua and Barbuda 6 0.699148
{Ghana, Cameroon, Republic of Moldova, ...} Nicaragua 8 0.698111
{Ghana, Cameroon, Republic of Moldova, ...} Grenada 6 0.690036
{Ghana, Cameroon, Republic of Moldova, ...} Bangladesh 6 0.688843
{Ghana, Cameroon, Republic of Moldova, ...} Malta 6 0.686951
{Ghana, Cameroon, Republic of Moldova, ...} Kuwait 10 0.685139
{Ghana, Cameroon, Republic of Moldova, ...} Seychelles 5 0.682904
{Ghana, Cameroon, Republic of Moldova, ...} Lao People's Democratic Republic 4 0.674209
{Ghana, Cameroon, Republic of Moldova, ...} Sierra Leone 4 0.674209
{Ghana, Cameroon, Republic of Moldova, ...} El Salvador 5 0.66886
{Ghana, Cameroon, Republic of Moldova, ...} Eswatini 4 0.660165
{Ghana, Cameroon, Republic of Moldova, ...} United Arab Emirates 4 0.658142
{Ghana, Cameroon, Republic of Moldova, ...} Uruguay 11 0.655258
{Ghana, Cameroon, Republic of Moldova, ...} Guam 5 0.65335
{Ghana, Cameroon, Republic of Moldova, ...} Guinea 5 0.65335
{Ghana, Cameroon, Republic of Moldova, ...} Afghanistan 5 0.651207
{Ghana, Cameroon, Republic of Moldova, ...} Oman 5 0.651207
{Ghana, Cameroon, Republic of Moldova, ...} Palestine 4 0.6508
{Ghana, Cameroon, Republic of Moldova, ...} Sudan 5 0.646547
{Ghana, Cameroon, Republic of Moldova, ...} Iceland 4 0.644667
{Ghana, Cameroon, Republic of Moldova, ...} Virgin Islands, US 4 0.644667
{Ghana, Cameroon, Republic of Moldova, ...} Monaco 6 0.63624
{Ghana, Cameroon, Republic of Moldova, ...} Djibouti 4 0.628158
{Ghana, Cameroon, Republic of Moldova, ...} Mali 4 0.614115
{Ghana, Cameroon, Republic of Moldova, ...} Aruba 3 0.612258
{Ghana, Cameroon, Republic of Moldova, ...} Saint Lucia 5 0.610116
{Ghana, Cameroon, Republic of Moldova, ...} Cambodia 3 0.607913
{Ghana, Cameroon, Republic of Moldova, ...} Democratic Republic of Timor-Leste 3 0.607913
{Ghana, Cameroon, Republic of Moldova, ...} Federated States of Micronesia 3 0.607913
{Ghana, Cameroon, Republic of Moldova, ...} Palau 3 0.607913
{Ghana, Cameroon, Republic of Moldova, ...} Maldives 4 0.59693
{Ghana, Cameroon, Republic of Moldova, ...} Cyprus 14 0.591335
{Ghana, Cameroon, Republic of Moldova, ...} Rwanda 5 0.589451
{Ghana, Cameroon, Republic of Moldova, ...} American Samoa 5 0.586707
{Ghana, Cameroon, Republic of Moldova, ...} Solomon Islands 3 0.584505
{Ghana, Cameroon, Republic of Moldova, ...} San Marino 4 0.581695
{Ghana, Cameroon, Republic of Moldova, ...} Marshall Islands 2 0.575845
{Ghana, Cameroon, Republic of Moldova, ...} St Vincent and the Grenadines 2 0.575845
{Ghana, Cameroon, Republic of Moldova, ...} Libya 4 0.570547
{Ghana, Cameroon, Republic of Moldova, ...} Kiribati 3 0.569648
{Ghana, Cameroon, Republic of Moldova, ...} Yemen 3 0.569007
{Ghana, Cameroon, Republic of Moldova, ...} Bhutan 3 0.562485
{Ghana, Cameroon, Republic of Moldova, ...} Sri Lanka 9 0.56242
{Ghana, Cameroon, Republic of Moldova, ...} Congo 3 0.561863
{Ghana, Cameroon, Republic of Moldova, ...} Equatorial Guinea 3 0.561863
{Ghana, Cameroon, Republic of Moldova, ...} Virgin Islands, British 3 0.561863
{Ghana, Cameroon, Republic of Moldova, ...} Bolivia 5 0.560008
{Ghana, Cameroon, Republic of Moldova, ...} Cayman Islands 5 0.560008
{Ghana, Cameroon, Republic of Moldova, ...} Chad 3 0.55415
{Ghana, Cameroon, Republic of Moldova, ...} Comoros 3 0.547007
{Ghana, Cameroon, Republic of Moldova, ...} Gambia 3 0.547007
{Ghana, Cameroon, Republic of Moldova, ...} Samoa 8 0.540306
{Ghana, Cameroon, Republic of Moldova, ...} Brunei Darussalam 2 0.532593
{Ghana, Cameroon, Republic of Moldova, ...} Central African Republic 2 0.532593
{Ghana, Cameroon, Republic of Moldova, ...} Zimbabwe 5 0.531839
{Ghana, Cameroon, Republic of Moldova, ...} Cook Islands 6 0.52258
{Ghana, Cameroon, Republic of Moldova, ...} Liberia 3 0.507698
{Ghana, Cameroon, Republic of Moldova, ...} Nauru 2 0.503693
{Ghana, Cameroon, Republic of Moldova, ...} Somalia 2 0.503693
{Ghana, Cameroon, Republic of Moldova, ...} Togo 4 0.493268
{Ghana, Cameroon, Republic of Moldova, ...} Iraq 4 0.489464
{Ghana, Cameroon, Republic of Moldova, ...} Senegal 9 0.481748
{Ghana, Cameroon, Republic of Moldova, ...} Dominica 2 0.481052
{Ghana, Cameroon, Republic of Moldova, ...} Lesotho 2 0.481052
{Ghana, Cameroon, Republic of Moldova, ...} Mauritania 2 0.481052
{Ghana, Cameroon, Republic of Moldova, ...} Saint Kitts and Nevis 2 0.481052
{Ghana, Cameroon, Republic of Moldova, ...} South Sudan 2 0.481052
{Ghana, Cameroon, Republic of Moldova, ...} Tuvalu 2 0.481052
{Ghana, Cameroon, Republic of Moldova, ...} United Republic of Tanzania 2 0.481052
{Ghana, Cameroon, Republic of Moldova, ...} Belize 3 0.472751
{Ghana, Cameroon, Republic of Moldova, ...} Suriname 3 0.460859
{Ghana, Cameroon, Republic of Moldova, ...} Vanuatu 2 0.457327
{Ghana, Cameroon, Republic of Moldova, ...} Guinea-Bissau 4 0.457054
{Ghana, Cameroon, Republic of Moldova, ...} Myanmar 2 0.444802
{Ghana, Cameroon, Republic of Moldova, ...} Paraguay 8 0.42349
{Ghana, Cameroon, Republic of Moldova, ...} Sao Tome and Principe 3 0.413163
{Ghana, Cameroon, Republic of Moldova, ...} Andorra 2 0.40303
{Ghana, Cameroon, Republic of Moldova, ...} Syrian Arab Republic 6 0.364929
{Ghana, Cameroon, Republic of Moldova, ...} Liechtenstein 5 0.352692
{Ghana, Cameroon, Republic of Moldova, ...} Bermuda 2 0.296652
{Ghana, Cameroon, Republic of Moldova, ...} * 0 0
{Poland, Switzerland, Lithuania, ...} Poland 195 1
{Poland, Switzerland, Lithuania, ...} Switzerland 115 0.77853
{Poland, Switzerland, Lithuania, ...} Lithuania 37 0.329817
{Poland, Switzerland, Lithuania, ...} Finland 45 0.32761
{Poland, Switzerland, Lithuania, ...} Estonia 33 0.308739
{Poland, Switzerland, Lithuania, ...} Peru 33 0.241423
{Colombia, Morocco, Ecuador, ...} Colombia 64 1
{Colombia, Morocco, Ecuador, ...} Morocco 48 0.98391
{Colombia, Morocco, Ecuador, ...} Ecuador 46 0.859845
{Colombia, Morocco, Ecuador, ...} Latvia 29 0.422137
{Colombia, Morocco, Ecuador, ...} Philippines 18 0.398873
{Colombia, Morocco, Ecuador, ...} Namibia 11 0.280155
{Colombia, Morocco, Ecuador, ...} Costa Rica 13 0.254137
{Chinese Taipei, Thailand, Indonesia, ...} Chinese Taipei 67 1
{Chinese Taipei, Thailand, Indonesia, ...} Thailand 39 0.55234
{Chinese Taipei, Thailand, Indonesia, ...} Indonesia 26 0.541865
{Chinese Taipei, Thailand, Indonesia, ...} Slovakia 38 0.392312
{Chinese Taipei, Thailand, Indonesia, ...} Vietnam 17 0.319059
{Austria, 'Hong Kong, China', Malaysia, ...} Austria 72 1
{Austria, 'Hong Kong, China', Malaysia, ...} Hong Kong, China 40 0.973696
{Austria, 'Hong Kong, China', Malaysia, ...} Malaysia 29 0.805575
{Austria, 'Hong Kong, China', Malaysia, ...} Singapore 23 0.796014
{Austria, 'Hong Kong, China', Malaysia, ...} Luxembourg 11 0.289615
{Uzbekistan, Azerbaijan, Mongolia, ...} Uzbekistan 63 1
{Uzbekistan, Azerbaijan, Mongolia, ...} Azerbaijan 41 0.98929
{Uzbekistan, Azerbaijan, Mongolia, ...} Mongolia 43 0.89835
{Uzbekistan, Azerbaijan, Mongolia, ...} Georgia 35 0.870435
{Uzbekistan, Azerbaijan, Mongolia, ...} Bulgaria 41 0.807297
{Uzbekistan, Azerbaijan, Mongolia, ...} Kyrgyzstan 16 0.456466
{Uzbekistan, Azerbaijan, Mongolia, ...} Refugee Olympic Team 26 0.453728
{Uzbekistan, Azerbaijan, Mongolia, ...} Armenia 16 0.453643
{Turkey, Tunisia, Venezuela, ...} Turkey 102 1
{Turkey, Tunisia, Venezuela, ...} Tunisia 57 0.844811
{Turkey, Tunisia, Venezuela, ...} Venezuela 43 0.650009
{Turkey, Tunisia, Venezuela, ...} Algeria 41 0.459588
{Ukraine, Belarus, Cuba} Ukraine 152 1
{Ukraine, Belarus, Cuba} Belarus 104 0.788693
{Ukraine, Belarus, Cuba} Cuba 69 0.57311
{Kazakhstan, Croatia, Greece} Kazakhstan 92 1
{Kazakhstan, Croatia, Greece} Croatia 57 0.825466
{Kazakhstan, Croatia, Greece} Greece 75 0.588455
{ROC} ROC 318 1
{Hungary, Montenegro} Hungary 155 1
{Hungary, Montenegro} Montenegro 35 0.504421
{Serbia, Islamic Republic of Iran} Serbia 83 1
{Serbia, Islamic Republic of Iran} Islamic Republic of Iran 66 0.932964
{Nigeria, Slovenia, Puerto Rico} Nigeria 59 1
{Nigeria, Slovenia, Puerto Rico} Slovenia 51 0.759133
{Nigeria, Slovenia, Puerto Rico} Puerto Rico 35 0.668318
{United States of America} United States of America 615 1
{Italy} Italy 356 1
{Dominican Republic, Israel} Dominican Republic 61 1
{Dominican Republic, Israel} Israel 85 0.929927
{Mexico} Mexico 155 1
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Jamaica 60 1
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Ethiopia 42 0.81877
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Trinidad and Tobago 31 0.437854
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Uganda 24 0.42675
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Bahamas 16 0.354908
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Botswana 13 0.237047
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Eritrea 13 0.213476
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Barbados 7 0.173651
{Kenya, Fiji} Kenya 78 1
{Kenya, Fiji} Fiji 28 0.565214
{Norway, Denmark, Portugal, ...} Norway 92 1
{Norway, Denmark, Portugal, ...} Denmark 103 0.826706
{Norway, Denmark, Portugal, ...} Portugal 85 0.595021
{Norway, Denmark, Portugal, ...} Angola 20 0.461764
{Norway, Denmark, Portugal, ...} Bahrain 31 0.435065
{Brazil, Sweden} Brazil 291 1
{Brazil, Sweden} Sweden 129 0.577046
{France} France 377 1
{Great Britain, Ireland} Great Britain 366 1
{Great Britain, Ireland} Ireland 116 0.548124
{New Zealand} New Zealand 202 1
{Spain} Spain 324 1
{South Africa} South Africa 171 1
{Netherlands} Netherlands 274 1
{Germany} Germany 400 1
{Belgium, Czech Republic} Belgium 125 1
{Belgium, Czech Republic} Czech Republic 117 0.799098
{India} India 117 1
{Japan} Japan 586 1
{Argentina} Argentina 180 1
{Republic of Korea} Republic of Korea 223 1
{Egypt} Egypt 133 1
{Australia} Australia 470 1
Use extract_clusters
to extract the discipline clusters and peek
its contents¶
tokyo_discipline_clusters_file = os.path.join(
"exercises", "Tokyo2021", "DisciplineClusters.txt"
)
kh.extract_clusters(
tokyo_cc_report,
cluster_variable="Discipline",
clusters_file_path=tokyo_discipline_clusters_file,
)
peek(tokyo_discipline_clusters_file, n=200)
Cluster Value Frequency Typicality
{Handball} Handball 343 1
{Hockey} Hockey 406 1
{Football} Football 567 1
{Rugby Sevens} Rugby Sevens 283 1
{Athletics} Athletics 2068 1
{Boxing, Weightlifting, Taekwondo} Boxing 270 1
{Boxing, Weightlifting, Taekwondo} Weightlifting 187 0.721619
{Boxing, Weightlifting, Taekwondo} Taekwondo 123 0.536492
{Judo} Judo 373 1
{Swimming} Swimming 743 1
{Rowing, Cycling Track} Rowing 496 1
{Rowing, Cycling Track} Cycling Track 208 0.496133
{Equestrian, Triathlon, Cycling Mountain Bike, ...} Equestrian 237 1
{Equestrian, Triathlon, Cycling Mountain Bike, ...} Triathlon 106 0.559253
{Equestrian, Triathlon, Cycling Mountain Bike, ...} Cycling Mountain Bike 74 0.515497
{Equestrian, Triathlon, Cycling Mountain Bike, ...} Beach Volleyball 90 0.486439
{Equestrian, Triathlon, Cycling Mountain Bike, ...} Skateboarding 77 0.41841
{Equestrian, Triathlon, Cycling Mountain Bike, ...} Cycling BMX Racing 43 0.355495
{Equestrian, Triathlon, Cycling Mountain Bike, ...} Surfing 38 0.313644
{Cycling Road, Golf, Canoe Slalom, ...} Cycling Road 190 1
{Cycling Road, Golf, Canoe Slalom, ...} Golf 115 0.605512
{Cycling Road, Golf, Canoe Slalom, ...} Canoe Slalom 78 0.474946
{Cycling Road, Golf, Canoe Slalom, ...} Marathon Swimming 49 0.269378
{Sailing} Sailing 336 1
{Shooting, Archery} Shooting 342 1
{Shooting, Archery} Archery 122 0.409113
{Badminton, Table Tennis} Badminton 164 1
{Badminton, Table Tennis} Table Tennis 164 0.920471
{Tennis, Artistic Gymnastics, Modern Pentathlon, ...} Tennis 178 1
{Tennis, Artistic Gymnastics, Modern Pentathlon, ...} Artistic Gymnastics 187 0.939428
{Tennis, Artistic Gymnastics, Modern Pentathlon, ...} Modern Pentathlon 69 0.515132
{Tennis, Artistic Gymnastics, Modern Pentathlon, ...} Cycling BMX Freestyle 19 0.151931
{Tennis, Artistic Gymnastics, Modern Pentathlon, ...} * 0 0
{Diving, Artistic Swimming, Trampoline Gymnastics, ...} Diving 133 1
{Diving, Artistic Swimming, Trampoline Gymnastics, ...} Artistic Swimming 98 0.874855
{Diving, Artistic Swimming, Trampoline Gymnastics, ...} Trampoline Gymnastics 31 0.391158
{Diving, Artistic Swimming, Trampoline Gymnastics, ...} Sport Climbing 37 0.28726
{Canoe Sprint} Canoe Sprint 236 1
{Baseball/Softball} Baseball/Softball 220 1
{Water Polo} Water Polo 269 1
{Basketball} Basketball 280 1
{Wrestling, Rhythmic Gymnastics, Karate} Wrestling 279 1
{Wrestling, Rhythmic Gymnastics, Karate} Rhythmic Gymnastics 95 0.344437
{Wrestling, Rhythmic Gymnastics, Karate} Karate 77 0.24672
{Volleyball} Volleyball 274 1
{Fencing, 3x3 Basketball} Fencing 249 1
{Fencing, 3x3 Basketball} 3x3 Basketball 62 0.272875