About Distance Measurement and Hierarchical Cluster Analysis
This article examines distance measurements and hierarchical cluster analysis.
Hello!
Today, we will learn about distance measurements and hierarchical cluster analysis.
Distance Measure
An important factor used to measure similarity between data in cluster analysis.
It is utilized to determine the similarity between clusters in cluster analysis algorithms by calculating the distance between data.
Typical street measures include Euclid Street, Manhattan Street, and cosine similarity.
Euclid Street
A method of calculating the distance of a straight line between data points, suitable for continuous variables.
Manhattan Street
A method of calculating vertical and horizontal distances between data points, suitable for continuous variables.
Cosine similarity
It is a method of measuring similarity using the angle between vectors, which is suitable for text data or sparse data.
Conclusion
It is important to select a distance measure that fits the characteristics and purpose of the data, as the shape and outcome of the cluster can vary depending on the distance measure selected.
Hierarchical clustering
Hierarchical cluster analysis is a method of presenting data in a hierarchical structure by grouping them sequentially or in combination.
There is a way to group the given data sequentially according to the distance, or to group all the data into one cluster and group the most similar data sequentially.
A dendrogram is a visual representation of the results of hierarchical cluster analysis and is used to identify the similarity of data and the structure between clusters.
Hierarchical cluster analysis can visually identify similarities between clusters and is useful for understanding hierarchical structures between data.
at the end of the day
These distance measurements and hierarchical cluster analysis are important methods for measuring similarity between data and forming clusters based on them, and it is important to select them appropriately for the characteristics and purpose of the data.
Thank you!