

单词 cluster analysis
cluster analysis

  • The task of assigning objects to recognizable groups called clusters, according to various measurements. These clusters commonly show correlation between different attributes. The notion of a cluster cannot be precisely given, and many different algorithms are hence used in cluster analysis.

  • A method for identifying data items that closely resemble one another, assembling them into clusters. A number of characteristics are measured for each of several items (which might be, for example, people, plants, machines, etc.). The process of formation of the clusters is often represented using a dendrogram. The most commonly used methods are the agglomerative clustering methods.

  • Any statistical technique for grouping a set of units into clusters of similar units on the basis of observed qualitative and/or quantitative measurements, usually on several variables. Cluster analysis aims to fulfil simultaneously the conditions that units in the same cluster should be similar, and that units in different clusters should be dissimilar. It is not usually possible to satisfy both conditions fully, and no single method can be recommended as best for all sets of data. Among other desirable properties of clusters are that some variables should be constant for all units within a cluster, which makes it possible to provide a simple scheme for identification of units in terms of clusters.

    Most cluster analysis methods require a similarity or distance measure to be defined between each pair of units, so that the units similar to a given unit may be identified. Similarity measures have been proposed for both quantitative (continuous) variables and qualitative (discrete) variables, using a weighted mean of similarity scores over all variables considered. The term distance comes from a geometric representation of data as points in multidimensional space: small distances correspond to large similarities.

    Hierarchical cluster analysis methods form clusters in sequence, either by amalgamation of units into clusters and clusters into larger clusters, or by subdivision of clusters into smaller clusters and single units. Whichever direction is chosen, the results can be represented by a dendrogram or family tree in which the units at one level are nested within units at all higher levels.

    Nonhierarchical cluster analysis methods allocate units to a fixed number of clusters so as to optimize some criterion representing a desired property of clusters. Such methods may be iterative, involving transfer of units between clusters until no further improvement can be achieved. The solution for a given number of clusters need bear little relation to the solution for a larger or smaller number.

    Cluster analysis is often used in conjunction with other methods of multivariate analysis to describe the structure of a complex set of data.

Electronics and Electrical Engineering
  • Techniques for grouping a collection of entities into clusters, so that entities in the same cluster are more similar to each other on some dimension of interest than they are to those in different clusters. Cluster analysis is often used for initial exploration of large datasets in order to determine the most appropriate notion of similarity or closeness for use in further analysis.

Geology and Earth Sciences
  • In statistics, the classification of observations into subsets based on a criterion of similarity.

  • The assignment of a set of objects into groups so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters. Cluster analysis is used when the researcher does not know the number of groups in advance but wishes to establish groups and then analyse group membership. For example, if the term ‘geomorphological processes’ is entered onto a search engine, there will be nearly 400000 results. Careful study reveals that these could be clustered into (at least): fluvial, aeolian, hillslope, glacial, tectonic, igneous, and biological processes. Kent (2006) PPG 30, 3 reviews recent changing patterns in the use of cluster analysis, and Mohseni Saravi et al. (2010, PPG 34, 2) use cluster analysis to delimit homogenous hydrological regions. For similarity analysis/minimum variance, see Ward (1963) Am. Stat. Ass. J. 5; for unweighted pairs group average, see Williams et al. (1966) J. Ecol. 54. For ordination techniques/detrenched correspondence analysis, see M. O. Hill (1979); for non-metric multidimensional scaling, see T. Cox and M. Cox (2000). CANOCO software does most multivariate techniques.

  • The general name for a number of different methods for grouping objects that have similar characteristics into sets or ‘clusters’. Cluster analysis is used to explore data by sorting different objects into sets so that the degree of association between two objects is maximal if they belong to the same set. It can be used to discover structures in data but provides no explanation for the structure.





Copyright © 2000-2023 Sciref.net All Rights Reserved
京ICP备2021023879号 更新时间:2025/3/10 11:15:17