fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (2024)

    • Description
    • Install and load factoextra
    • Usage
    • Arguments
    • Value
    • Examples
      • Principal component analysis
      • fviz_pca_ind(): Graph of individuals
      • fviz_pca_var(): Graph of variables
      • fviz_pca_biplot(): Biplot of individuals of variables
    • Infos

    Draw the graph of individuals/variables from the output of Principal Component Analysis (PCA).

    The following functions, from factoextra package are use:

    • fviz_pca_ind(): Graph of individuals
    • fviz_pca_var(): Graph of variables
    • fviz_pca_biplot() (or fviz_pca()): Biplot of individuals and variables

    The package devtools is required for the installation as factoextra is hosted on github.

    # install.packages("devtools")library("devtools")install_github("kassambara/factoextra")

    Load factoextra :

    library("factoextra")
    # Graph of individualsfviz_pca_ind(X, axes = c(1, 2), geom = c("point", "text"), label = "all", invisible = "none", labelsize = 4, pointsize = 2, habillage = "none", addEllipses = FALSE, ellipse.level = 0.95, col.ind = "black", col.ind.sup = "blue", alpha.ind = 1, select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), jitter = list(what = "label", width = NULL, height = NULL), ...)# Graph of variablesfviz_pca_var(X, axes = c(1, 2), geom = c("arrow", "text"), label = "all", invisible = "none", labelsize = 4, col.var = "black", alpha.var = 1, col.quanti.sup = "blue", col.circle = "grey70", select.var = list(name =NULL, cos2 = NULL, contrib = NULL), jitter = list(what = "label", width = NULL, height = NULL))# Biplot of individuals and variablesfviz_pca_biplot(X, axes = c(1, 2), geom = c("point", "text"), label = "all", invisible = "none", labelsize = 4, pointsize = 2, habillage = "none", addEllipses = FALSE, ellipse.level = 0.95, col.ind = "black", col.ind.sup = "blue", alpha.ind = 1, col.var = "steelblue", alpha.var = 1, col.quanti.sup = "blue", col.circle = "grey70", select.var = list(name = NULL, cos2 = NULL, contrib= NULL), select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), jitter = list(what = "label", width = NULL, height = NULL), ...)# An alias of fviz_pca_biplot()fviz_pca(X, ...)
    ArgumentDescription
    Xan object of class PCA [FactoMineR]; prcomp and princomp [stats]; dudi and pca [ade4].
    axesa numeric vector of length 2 specifying the dimensions to be plotted.
    geoma text specifying the geometry to be used for the graph. Allowed values are the combination of c(“point”, “arrow”, “text”). Use “point” (to show only points); “text” to show only labels; c(“point”, “text”) or c(“arrow”, “text”) to show both types.
    labela text specifying the elements to be labelled. Default value is “all”. Allowed values are “none” or the combination of c(“ind”, “ind.sup”, “quali”, “var”, “quanti.sup”). “ind” can be used to label only active individuals. “ind.sup” is for supplementary individuals. “quali” is for supplementary qualitative variables. “var” is for active variables. “quanti.sup” is for quantitative supplementary variables.
    invisiblea text specifying the elements to be hidden on the plot. Default value is “none”. Allowed values are the combination of c(“ind”, “ind.sup”, “quali”, “var”, “quanti.sup”).
    labelsizefont size for the labels.
    pointsizethe size of points.
    habillagean optional factor variable for coloring the observations by groups. Default value is “none”. If X is a PCA object from FactoMineR package, habillage can also specify the supplementary qualitative variable (by its index or name) to be used for coloring individuals by groups (see ?PCA in FactoMineR).
    addEllipseslogical value. If TRUE, draws ellipses around the individuals when habillage != “none”.
    ellipse.levelthe size of the concentration ellipse in normal probability.
    col.ind,col.varcolors for individuals and variables, respectively. Possible values include also : “cos2”, “contrib”, “coord”, “x” or “y”. In this case, the colors for individuals/variables are automatically controlled by their qualities of representation (“cos2”), contributions (“contrib”), coordinates (x^2 + y^2, “coord”), x values (“x”) or y values (“y”). To use automatic coloring (by cos2, contrib, ….), make sure that habillage =“none”.
    col.ind.supcolor for supplementary individuals.
    alpha.ind,alpha.varcontrols the transparency of individual and variable colors, respectively. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Possible values include also : “cos2”, “contrib”, “coord”, “x” or “y”. In this case, the transparency for the individual/variable colors are automatically controlled by their qualities (“cos2”), contributions (“contrib”), coordinates (x^2 + y^2 , “coord”), x values(“x”) or y values(“y”). To use this, make sure that habillage =“none”.
    select.ind,select.var

    a selection of individuals/variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:

    • name: is a character vector containing individuals/variables to be drawn
    • cos2: if cos2 is in [0, 1], ex: 0.6, then individuals/variables with a cos2 > 0.6 are drawn. if cos2 > 1, ex: 5, then the top 5 individuals/variables with the highest cos2 are drawn.
    • contrib: if contrib > 1, ex: 5, then the top 5 individuals/variables with the highest cos2 are drawn
    jittera parameter used to jitter the points in order to reduce overplotting. It’s a list containing the objects what, width and height (Ex.; jitter = list(what, width, height)). what: the element to be jittered. Possible values are “point” or “p”; “label” or “l”; “both” or “b”. width: degree of jitter in x direction (ex: 0.2). height: degree of jitter in y direction (ex: 0.2).
    col.quanti.supa color for the quantitative supplementary variables.
    col.circlea color for the correlation circle.
    Arguments to be passed to the function fviz_pca_biplot().

    A ggplot2 plot

    Principal component analysis

    A principal component analysis (PCA) is performed using the built-in R function prcomp() and iris data:

    data(iris)head(iris)
     Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa
    # The variable Species (index = 5) is removed# before the PCA analysisres.pca <- prcomp(iris[, -5], scale = TRUE)

    fviz_pca_ind(): Graph of individuals

    # Default plotfviz_pca_ind(res.pca)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (1)

    # Change title and axis labelsfviz_pca_ind(res.pca) + labs(title ="PCA", x = "PC1", y = "PC2")

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (2)

    # Change axis limits by specifying the min and maxfviz_pca_ind(res.pca) + xlim(-4, 4) + ylim (-4, 4)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (3)

    # Use text onlyfviz_pca_ind(res.pca, geom="text")

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (4)

    # Use points onlyfviz_pca_ind(res.pca, geom="point")

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (5)

    # Change the size of pointsfviz_pca_ind(res.pca, geom="point", pointsize = 4)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (6)

    # Change point color and themefviz_pca_ind(res.pca, col.ind = "blue")+ theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (7)

    # Control automatically the color of individuals# using the cos2 or the contributions# cos2 = the quality of the individuals on the factor mapfviz_pca_ind(res.pca, col.ind="cos2")

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (8)

    # Gradient colorfviz_pca_ind(res.pca, col.ind="cos2") + scale_color_gradient2(low="white", mid="blue", high="red", midpoint=0.6)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (9)

    # Change the theme and use only pointsfviz_pca_ind(res.pca, col.ind="cos2", geom = "point") + scale_color_gradient2(low="white", mid="blue", high="red", midpoint=0.6)+ theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (10)

    # Color by the contributionsfviz_pca_ind(res.pca, col.ind="contrib") + scale_color_gradient2(low="white", mid="blue", high="red", midpoint=4)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (11)

    # Control the transparency of the color by the# contributionsfviz_pca_ind(res.pca, alpha.ind="contrib") + theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (12)

    # Color individuals by groupsfviz_pca_ind(res.pca, label="none", habillage=iris$Species)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (13)

    # Add ellipsesp <- fviz_pca_ind(res.pca, label="none", habillage=iris$Species, addEllipses=TRUE, ellipse.level=0.95)print(p)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (14)

    # Change group colors using RColorBrewer color palettesp + scale_color_brewer(palette="Dark2") + theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (15)

    p + scale_color_brewer(palette="Paired") + theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (16)

    p + scale_color_brewer(palette="Set1") + theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (17)

    # Change color manuallyp + scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (18)

    # Select and visualize individuals with cos2 > 0.96fviz_pca_ind(res.pca, select.ind = list(cos2 = 0.96))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (19)

    # Select the top 20 according the cos2fviz_pca_ind(res.pca, select.ind = list(cos2 = 20))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (20)

    # Select the top 20 contributing individualsfviz_pca_ind(res.pca, select.ind = list(contrib = 20))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (21)

    # Select by namesfviz_pca_ind(res.pca,select.ind = list(name = c("23", "42", "119")))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (22)

    fviz_pca_var(): Graph of variables

    # Default plotfviz_pca_var(res.pca)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (23)

    # Use points and textfviz_pca_var(res.pca, geom = c("point", "text"))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (24)

    # Change color and themefviz_pca_var(res.pca, col.var="steelblue")+ theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (25)

    # Control variable colors using their contributionsfviz_pca_var(res.pca, col.var="contrib")+ scale_color_gradient2(low="white", mid="blue", high="red", midpoint=96) + theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (26)

    # Control the transparency of variables using their contributionsfviz_pca_var(res.pca, alpha.var="contrib") + theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (27)

    # Select and visualize variables with cos2 >= 0.96fviz_pca_var(res.pca, select.var = list(cos2 = 0.96))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (28)

    # Select the top 3 contributing variablesfviz_pca_var(res.pca, select.var = list(contrib = 3))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (29)

    # Select by namesfviz_pca_var(res.pca, select.var= list(name = c("Sepal.Width", "Petal.Length")))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (30)

    fviz_pca_biplot(): Biplot of individuals of variables

    fviz_pca_biplot(res.pca)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (31)

    # Keep only the labels for variablesfviz_pca_biplot(res.pca, label ="var")

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (32)

    # Keep only labels for individualsfviz_pca_biplot(res.pca, label ="ind")

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (33)

    # Hide variablesfviz_pca_biplot(res.pca, invisible ="var")

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (34)

    # Hide individualsfviz_pca_biplot(res.pca, invisible ="ind")

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (35)

    # Control automatically the color of individuals using the cos2fviz_pca_biplot(res.pca, label ="var", col.ind="cos2") + theme_minimal()

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (36)

    # Change the color by groups, add ellipsesfviz_pca_biplot(res.pca, label="var", habillage=iris$Species, addEllipses=TRUE, ellipse.level=0.95)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (37)

    # Select the top 30 contributing individualsfviz_pca_biplot(res.pca, label="var", select.ind = list(contrib = 30))

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (38)

    This analysis has been performed using R software (ver. 3.2.1) and factoextra (ver. 1.0.3)


    Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

    Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

    Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

    Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!



    Recommended for You!


    Machine Learning Essentials: Practical Guide in R
    Practical Guide to Cluster Analysis in R
    Practical Guide to Principal Component Methods in R
    R Graphics Essentials for Great Data Visualization
    Network Analysis and Visualization in R
    More books on R and data science

    Recommended for you

    This section contains best data science and self-development resources to help you on your path.

    Coursera - Online Courses and Specialization

    Data science

    Popular Courses Launched in 2020

    Trending Courses

    Books - Data Science

    Our Books

    Others



    Want to Learn More on R Programming and Data Science?

    Follow us by Email

    On Social Networks:

    Get involved :
    Click to follow us on Facebook and Google+ :
    Comment this article by clicking on "Discussion" button (top-right position of this page)

    fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining - Easy Guides - Wiki (2024)
    Top Articles
    Latest Posts
    Article information

    Author: Allyn Kozey

    Last Updated:

    Views: 6596

    Rating: 4.2 / 5 (43 voted)

    Reviews: 82% of readers found this page helpful

    Author information

    Name: Allyn Kozey

    Birthday: 1993-12-21

    Address: Suite 454 40343 Larson Union, Port Melia, TX 16164

    Phone: +2456904400762

    Job: Investor Administrator

    Hobby: Sketching, Puzzles, Pet, Mountaineering, Skydiving, Dowsing, Sports

    Introduction: My name is Allyn Kozey, I am a outstanding, colorful, adventurous, encouraging, zealous, tender, helpful person who loves writing and wants to share my knowledge and understanding with you.