klips/python/k-means/README.md

96 lines
4.1 KiB
Markdown
Raw Permalink Normal View History

Install required dependencies for matplotlib GUI frontend and all pip other packages for this project
```bash
sudo apt install python3-tk
python3.9 -m pip install -r requirements.txt
```
CLI to run K-Means clustering algorithm on a set of data.
Data can be provided or randomly generated for testing.
```bash
python3.9 k-means.py -h
usage: k-means.py [-h] [--data [X,Y ...]] [--seeds [X,Y ...]] [--silent] [--verbose] [--random] [--radius [RADIUS]]
[--lock-radius] [--file [FILE_PATH]]
[CLUSTER_COUNT] [CENTROID_SHIFT] [LOOP_COUNT]
K-means clustering program for clustering data read from a file, terminal, or randomly generated
positional arguments:
CLUSTER_COUNT Total number of desired clusters
(default: '2')
CENTROID_SHIFT Centroid shift threshold. If cluster centroids move less-than this value, clustering is finished
(default: '1.0')
LOOP_COUNT Maximum count of loops to perform clustering
(default: '3')
optional arguments:
-h, --help show this help message and exit
--data [X,Y ...], -d [X,Y ...]
A list of data points separated by spaces as: x,y x,y x,y ...
(default: '[(1.0, 2.0), (2.0, 3.0), (2.0, 2.0), (5.0, 6.0), (6.0, 7.0), (6.0, 8.0), (7.0, 11.0), (1.0, 1.0)]')
--seeds [X,Y ...], --seed [X,Y ...], -s [X,Y ...]
A list of seed points separated by spaces as: x,y x,y x,y ...
Number of seeds provided must match CLUSTER_COUNT, or else CLUSTER_COUNT will be overriden.
--silent When this flag is set, scatter plot visualizations will not be shown
(default: 'False')
--verbose, -v When this flag is set, cluster members will be shown in output
(default: 'False')
--random, -r When this flag is set, data will be randomly generated
(default: 'False')
--radius [RADIUS] Initial radius to use for clusters
(default: 'None')
--lock-radius, -l When this flag is set, centroid radius will not be recalculated
(default: 'False')
--file [FILE_PATH], -f [FILE_PATH]
Optionally provide file for data to be read from. Each point must be on it's own line with format x,y
```
Running k-means clustering program
```bash
python3.9 k-means.py --file ./input.txt --silent
Finding K-means clusters for given data [(1.0, 2.0), (2.0, 3.0), (2.0, 2.0), (5.0, 6.0), (6.0, 7.0), (6.0, 8.0), (7.0, 11.0), (1.0, 1.0), (5.0, 5.0), (10.0, 10.0), (15.0, 15.0), (25.0, 25.0), (20.0, 20.0), (21.0, 21.0), (22.0, 22.0)]
Using 2 clusters, 1.0 max centroid shift, and 3 iterations
Clustering iteration 0
Updating cluster membership using cluster seeds, radius:
((5.0000, 5.0000), 10.6066)
((20.0000, 20.0000), 10.6066)
Outliers present: set()
Updated clusters ([(5.0, 5.0), (20.0, 20.0)]) with new centroids [(4.5, 5.5), (20.6, 20.6)]
New centroids [(4.5, 5.5), (20.6, 20.6)] shifted [0.7071, 0.8485] respectively
Showing final cluster result...
Initial cluster at (5.0000, 5.0000) moved to (4.5000, 5.5000)
Total shift: 0.7071
Final radius: 11.0365
Initial radius: 10.6066
Initial cluster at (20.0000, 20.0000) moved to (20.6000, 20.6000)
Total shift: 0.8485
Final radius: 11.0365
Initial radius: 10.6066
Stopping...
Cluster centroids have not shifted at least 1.0, clusters are stable
```
Running k-means clustering program on some random example data shows the following visual output
```bash
python3.9 k-means.py --random
# Output removed for GUI example
```
![](screenshot.png)