About the Author
Alboukadel Kassambara is a PhD in Bioinformatics and Cancer Biology. He works since many years on genomic data analysis and visualization. He created a bioinformatics tool named GenomicScape (www.genomicscape.com) which is an easy-to-use web tool for gene expression data analysis and visualization. He developed also a website called STHDA (Statistical Tools for High-throughput Data Analysis, www.sthda.com/english), which contains many tutorials on data analysis and visualization using R software and packages. He is the author of the R packages survminer (for analyzing and drawing survival curves), ggcorrplot (for drawing correlation matrix using ggplot2) and factoextra (to easily extract and visualize the results of multivariate analysis such PCA, CA, MCA and clustering). You can learn more about these packages at: http://www.sthda.com/english/wiki/r-packages. Recently, he published two books on data visualization: i) Guide to Create Beautiful Graphics in R (at: https://goo.gl/vJ0OYb); 2) Complete Guide to 3D Plots in R (at: https://goo.gl/v5gwl0).
Read more
发表于2024-12-23
Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning (Multivariate Analysis) (Vol 2024 pdf epub mobi 电子书
图书标签: 数据分析 统计 R 机器学习 数据挖掘 数学 Statistics
Although there are several good books on unsupervised machine learning, we felt that many of them are too theoretical. This book provides practical guide to cluster analysis, elegant visualization and interpretation. It contains 5 parts. Part I provides a quick introduction to R and presents required R packages, as well as, data formats and dissimilarity measures for cluster analysis and visualization. Part II covers partitioning clustering methods, which subdivide the data sets into a set of k groups, where k is the number of groups pre-specified by the analyst. Partitioning clustering approaches include: K-means, K-Medoids (PAM) and CLARA algorithms. In Part III, we consider hierarchical clustering method, which is an alternative approach to partitioning clustering. The result of hierarchical clustering is a tree-based representation of the objects called dendrogram. In this part, we describe how to compute, visualize, interpret and compare dendrograms. Part IV describes clustering validation and evaluation strategies, which consists of measuring the goodness of clustering results. Among the chapters covered here, there are: Assessing clustering tendency, Determining the optimal number of clusters, Cluster validation statistics, Choosing the best clustering algorithms and Computing p-value for hierarchical clustering. Part V presents advanced clustering methods, including: Hierarchical k-means clustering, Fuzzy clustering, Model-based clustering and Density-based clustering.
实用。清晰。解释的不够详尽但是足够上手
评分这本书实在是太好了,把常用的聚类方法简洁地讲了一遍,以及它们的评价方法、优缺点和适用场景。也介绍了一些有趣的包——再次赞美ggplot2,以及factoextra这种直接生成ggplot2对象的包,看到+geom_violin()的时候就不禁赞叹R社区真的很棒啊!
评分实用。清晰。解释的不够详尽但是足够上手
评分https://github.com/kassambara/factoextra
评分https://github.com/kassambara/factoextra
Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning (Multivariate Analysis) (Vol 2024 pdf epub mobi 电子书