Correlação e Regressão
Prof. Dr. Ricardo Primi
Correlação e Regressão
• Mensura a relação entre variáveisCorrelação = co-relação = co-variância = r
• Geralmente examina variáveis bidimensionais • Mas diferenças de média entre grupos também podem ser expressas
por meio da co-relação• Em geral as medidas estão associadas por relações lineares• Mas existem técnicas para correlações e regressões não lineares
•Correlação ≠ Causalidade• r’s assumem valores entre -1.0 e +1.0• O sinal mostra a direção das relações• Os valores absolutos mostram a magnitude da relação• 0.0 = ausência de relação• -1.0 or +1.0 = relação perfeita
Correlation plot
T5_24Em1
T5_23Em1
T5_22Em1
T5_21Em1
T5_20Em1
T5_19Em1
T5_18Em1
T5_17Em1
T5_16Sc1
T5_15Sc1
T5_14Sc1
T5_13Sc1
T5_12Sc1
T5_11Sc1
T5_10Sc1
T5_09Sc1
T5_08Ac1
T5_07Ac1
T5_06Ac1
T5_05Ac1
T5_04Ac1
T5_03Ac1
T5_02Ac1
T5_01Ac1
T5_01Ac1
T5_02Ac1
T5_03Ac1
T5_04Ac1
T5_05Ac1
T5_06Ac1
T5_07Ac1
T5_08Ac1
T5_09Sc1
T5_10Sc1
T5_11Sc1
T5_12Sc1
T5_13Sc1
T5_14Sc1
T5_15Sc1
T5_16Sc1
T5_17Em1
T5_18Em1
T5_19Em1
T5_20Em1
T5_21Em1
T5_22Em1
T5_23Em1
T5_24Em1 −1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Correlation plot
SE
Grit
SlfSoc
SlfEmo
SlfAcd
Locus
ProSoc
PeerProb
HypAc
EmoSym
CndProb
Open2
Neuro2
Extra2
Consc2
Agree2
Open1
Neuro1
Extra1
Consc1
Agree1
Agree1
Consc1
Extra1
Neuro1
Open1
Agree2
Consc2
Extra2
Neuro2
Open2
CndProb
EmoSym
HypAc
PeerProb
ProSoc
Locus
SlfAcd
SlfEmo
SlfSoc Grit SE
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
GF_GC
800700600500400300200
RMG
10
9
8
7
6
5
4
3
2
1
0
EPN_EG
16014513011510085705540
RG
20
18
16
14
12
10
8
6
4
2
0
Correlations
1 ,575** ,581** ,474** ,367** ,410**,000 ,000 ,000 ,000 ,000
289 289 288 287 289 289,575** 1 ,473** ,475** ,269** ,323**,000 ,000 ,000 ,000 ,000289 289 288 287 289 289,581** ,473** 1 ,507** ,376** ,427**,000 ,000 ,000 ,000 ,000288 288 288 286 288 288,474** ,475** ,507** 1 ,094 ,120*,000 ,000 ,000 ,113 ,042287 287 286 287 287 287,367** ,269** ,376** ,094 1 ,936**,000 ,000 ,000 ,113 ,000289 289 288 287 289 289,410** ,323** ,427** ,120* ,936** 1,000 ,000 ,000 ,042 ,000289 289 288 287 289 289
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
RA_measure
RV_measure
RN_measure
RP_measure
Idade
Escolaridade
RA_measure RV_measure RN_measure RP_measure Idade Escolaridade
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation is significant at the 0.05 level (2-tailed).*.
Correlations
1 ,145 ,214** ,045,076 ,008 ,621
157 151 152 124,145 1 ,799** ,315**,076 ,000 ,000151 151 151 122,214** ,799** 1 ,325**,008 ,000 ,000152 151 152 123,045 ,315** ,325** 1,621 ,000 ,000124 122 123 124
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
c_br
epn_eg
epn_re
av_des
c_br epn_eg epn_re av_des
Correlation is significant at the 0.01 level (2-tailed).**.
4.5 Other estimates of association 99
Rx = Isdxx⌅Isd (4.24)
is the matrix of correlations between the columns of x.There are a number of alternative measures of association, some of which appear very
di�erent but are merely the PPMCC for special cases, while there are other measures forcases where the data are clearly neither continuous nor at the interval level of measurement.Even more coe⌅cients of association are used as estimates of e�ect sizes.
4.5.1 Pearson correlation equivalents
Using Spearman’s formula for the correlation (Equation 4.22) allows a simple categoriza-tion of a variety of correlation coe⌅cients that at first appear di�erent but are functionallyequivalent (Table 4.8).
Table 4.8 A number of correlations are Pearson r in di�erent forms, or with particular assumptions.
If r = �xiyi⌥�x2
i �y2i, then depending upon the type of data being analyzed, a variety of correlations are
found.
Coe⌅cient symbol X Y Assumptions
Pearson r continuous continuous
Spearman rho (⌥) ranks ranks
Point bi-serial rpb dichotomous continuous
Phi ⇥ dichotomous dichotomous
Bi serial rbis dichotomous continuous normality of latent X
Tetrachoric rtet dichotomous dichotomous bivariate normality of latent X, Y
polychoric rpc categorical categorical bivariate normality of latent X, Y
polyserial rps categorical continuous bivariate normality of latent X, Y
4.5.1.1 Spearman ⌥: a Pearson correlation of ranks
In the first of two major papers published in the American Journal of Psychology in 1904,Spearman (1904b) reviewed for psychologists the e�orts made to define the correlation co-e⌅cient by Galton (1888) and Pearson (1895). Not only did he consider the application ofthe Pearson correlation to ranked data, but he also developed corrections for attenuationand the partial correlation, two subjects that will be addressed later. The advantage of usingranked data rather than the raw data is that it is more robust to variations in the extremescores. For whether a person has an 8,000 or a 6,000 on an exam, that he or she is the highestscore makes no di�erence to the ranks. Consider Y as ten numbers sampled from 1 to 20and then find the Pearson correlation with Y 2 and eY . Do the same things for the ranksof these numbers. That is, find the Spearman correlations. As is clear from Figure 4.5, theSpearman correlation is not a�ected by the large non-linear transformation applied to thedata Spearman (1907).
It should be observed, that in many cases the non-linear form is more apparent than real.
Generally speaking, a mere tendency of two characteristics to vary concurrently must be taken,
120 4 Covariance, Regression, and Correlation
Table 4.22 Alternative Estimates of e�ect size. Using the correlation as a scale free estimate of e�ect
size allows for combining experimental and correlational data in a metric that is directly interpretable
as the e�ect of a standardized unit change in x leads to r change in standardized y.
Regression by.x =Cxy�2
xby.x by.x = r �x
�y
Pearson correlation rxy =Cxy
�x�yrxy
Cohen’s d d = X1�X2�x
r = d⌥d2+4
d = 2r⌥1�r2
Hedge’s g g = X1�X2sx
r = g⌥g2+4(d f/N)
g =
t - test t = 2d⌥
d f r =⇤
t2/(t2 +d f ) t =⌅
r2d f1�r2
F-test F = 4d2d f r =⇤
F/(F +d f ) F = r2d f1�r2
Chi Square r =⇤
�2/n �2 = r2nOdds ratio d = ln(OR)
1.81 r = ln(OR)1.81
⌥(ln(OR)/1.81)2+4
ln(OR) = 3.62r⌥1�r2
requivalent r with probability p r = requivalent
important than the squared correlation and it is more appropriate to consider the slope ofthe mean di�erences between groups (Lubinski and Humphreys, 1996). Finally, correlationscan be seriously attenuated by di�erences in skew between di�erent sets of variables.
4.8.1 Restriction of range
The correlation is a ratio of covariance to the square root of the product of two variances4.8. As such, if the variance of the predictor is artificially constrained, the correlation will bereduced, even though the slope of the regression remains the same. Consider an example of1,000 simulated students with GREV and GREQ scores with a population correlation of .6.If the sample is restricted in its variance (say only students with GREV > 600 are allowedto apply, the correlation drops by almost 1/2 from .61 to .34.(Table 4.23, Figure 4.9).
An even more serious problem occurs if the range is restricted based upon the sum of thetwo variables. This might be the case if an admissions committee based their decisions upontotal GRE scores and then examined the correlation between their predictors. Consider thecorrelation within those applicants who had total scores of more than 1400. In this case, thecorrelation for those 11 hypothetical subjects has become -.34 even though the underlyingcorrelation was .61! Similar problems will occur when choosing a high group based uponseveral measures of a related concept. Some researchers examine the relationship amongmeasures of negative a�ecting with a group chosen to be extreme on the trait. That is,what is the correlation between measures of neuroticism, anxiety, and depression within aselected set of patients rather than the general population. Consider the data set epi.bfiwhich includes measures of Neuroticism using the Eysenck Personality Inventory (Eysenckand Eysenck, 1964), of Depression using the Beck Depression Inventory Beck et al. (1961)and Trait Anxiety using the State Trait Anxiety Inventory (Spielberger et al., 1970) for 231undergraduates. For the total sample, these three measures have correlations of .53, .73 and.65, but if a broad trait of negative a�ectivity is defined as the sum of the three standardizedscales, and an “at risk” group is defined as more than 1 s.d. on this composite is chosen, thecorrelations become -.08, -.11, and .17.
Correlação e regressão
• Uma distinção simples: • Empregamos a análise correlacional quando queremos investigar a existência
de relações entre variáveis e a análise de regressão quando queremos prever uma variável a partir de outra ou de uma soma de outras
Exercício 1
• Instalar o JASP: https://jasp-stats.org• Explore o arquivo ex1_ie_bpr_16pf_avdes.sav• Use visualização• Entenda as estatísticas descritivas
Modeloshttp://r4ds.had.co.nz/program-intro.html
Visualização de padrões -> modelos
Modelagem • http://r4ds.had.co.nz/model-basics.html#introduction-15• “Patterns provide one of the most useful tools for data scientists
because they reveal covariation. If you think of variation as a phenomenon that creates uncertainty, covariation is a phenomenonthat reduces it. If two variables covary, you can use the values of onevariable to make better predictions about the values of the second. Ifthe covariation is due to a causal relationship (a special case), thenyou can use the value of one variable to control the value of thesecond... Models are a tool for extracting patterns out of data. ” (p. 106)
http://rpsychologist.com/d3/NHST/
http://rpsychologist.com/d3/CI/
http://rpsychologist.com/d3/cohend/
Correlação e Regressãohttp://rpsychologist.com/d3/correlation/
Fórmula da correlação
�
r =
xi − x ( )sx
⎛ ⎝ ⎜
⎞ ⎠ ⎟
yi − y ( )sy
⎛
⎝ ⎜ ⎞
⎠ ⎟ i=1
N
∑N −1( )
�
r =zxi zyi
i=1
N
∑N −1
https://rpsychologist.com/d3/correlation/
Produto-momento!
• A média do produto de dois momentos indicando co-relação• Produto : multiplicação de duas variáveis (X, Y)
• Momento: função aplicada a média de desvios• Momentos centrais: : 1o = Média, 2º = Variancia, 3º =
Assimetria, 4o = Kurtose• Os escores z são momentos
• Desvios da média em unidades de desvio padrão• Co-relação: ocorrência simultânea together
• z para X pareado copm z para Y• Então a correlação Produto-Momento de Pearson (r)
é a magnitude média em que pares de escores (X, Y) se correlacionam por desviarem simultaneamente de suas respectivas médias
( )X Xzs-
=
X Yz zr
N=å
Reta de regressão
•Melhor previsão de Y em relação aos valores de X• Equação de previsão:
Ŷ= b0+ b1XNa qual:X = valor do preditor (variável preditora ou VI)Ŷ= valor previsto de Y (variável resposta ou VD ou critério)
i.e., valor de Y na linha, dado Xb1 = inclinação (slope) da linha, Mudança em Ŷ para uma mudança de 1-unidade de mudança em Xb1 = rXY(SY/SX) b0 = constante (intercept)
Ŷ quando X = 0.0 b0= MY – b1MX
Soma de Quadrados da Regressão
X
876543210
Y
8
7
6
5
4
3
2
1
0
X
876543210
Y
8
7
6
5
4
3
2
1
0
X
876543210
Y
8
7
6
5
4
3
2
1
0
SSTotal = ∑(Y –MY)2
SSModel = ∑(Ŷ –MY)2 SSResidual = ∑(Y – Ŷ)2
http://setosa.io/ev/ordinary-least-squares-regression/
Conceitos até aqui ...
• Escore z• Variância/covariância• Correlação• Equação da reta: intercept e slope• VD e VI• Variância total, Variância residual, variância da regressão• Proporção de variância explicada• Graus de liberdade• F
https://gallery.shinyapps.io/anova_shiny_rstudio/
https://gallery.shinyapps.io/simple_regression/
https://gallery.shinyapps.io/multi_regression/
Manual do jasp: https://jasp-stats.org/jasp-materials/
https://rpsychologist.com/d3/CI/
https://rpsychologist.com/d3/NHST/
https://rpsychologist.com/d3/cohend/
https://www.jamovi.org
Exercício 1
• Instalar o JASP: https://jasp-stats.org• Explore o arquivo ex1_ie_bpr_16pf_avdes.sav• Use visualização• Procure interpretar as estatísticas descritivas
Exercício 2
• Abra o arquivo ex1_ie_bpr_16pf_avdes.sav • Escolha duas variáveis e faça a regressão simples
Exercício 3
• ex1_ie_bpr_16pf_avdes.sav
• ANOVA• RMANOVA • TWO-WAY INDEPENDENT ANOVA • MIXED FACTOR ANOVA
personality measurement. Instead of beingdeveloped to measure preconceived dimen-sions of interest to a particular author, theinstrument was developed from the uniqueperspective of a scientific quest to try to discover the basic structural elements of personality.
Raymond Cattell’s personality researchwas based on his strong background in thephysical sciences; born in 1905, he witnessedthe first-hand awe-inspiring results of sci-ence, from electricity and telephones to auto-mobiles, airplanes, and medicine. He wantedto apply these scientific methods to theuncharted domain of human personality withthe goal of discovering the basic elements ofpersonality (much as the basic elements of thephysical world were discovered and organ-ized into the periodic table). He believed thathuman characteristics such as creativity,authoritarianism, altruism, or leadership skillscould be predicted from these fundamentalpersonality traits (much as water was aweighted combination of the elements of
hydrogen and oxygen). For psychology toadvance as a science, he felt it also neededbasic measurement techniques for personality.Thus, through factor analysis – the powerfulnew tool for identifying underlying dimen-sions behind complex phenomena – Cattellbelieved the basic dimensions of personalitycould be discovered and then measured.
Over several decades, Cattell and his col-leagues carried out a program of comprehen-sive, international research seeking athorough, research-based map of normal per-sonality. They systematically measured thewidest possible range of personality dimen-sions, believing that ‘all aspects of human personality which are or have been of impor-tance, interest, or utility have already becomerecorded in the substance of language’(Cattell, R.B., 1943: 483). They studied thesetraits in diverse populations, using three differ-ent methodologies (Cattell, R.B., 1973):observation of natural, in-situ life behavior orL-data (e.g. academic grades, number of trafficaccidents, or social contacts); questionnaire
136 THE SAGE HANDBOOK OF PERSONALITY THEORY AND ASSESSMENT
Table 7.1 16PF Scale Names and DescriptorsDescriptors of Low Range Primary Scales Descriptors of High RangeReserved, Impersonal, Distant Warmth (A) Warm-hearted, Caring, Attentive To OthersConcrete, Lower Mental Capacity Reasoning (B) Abstract, Bright, Fast-LearnerReactive, Affected By Feelings Emotional Stability (C) Emotionally Stable, Adaptive, MatureDeferential, Cooperative, Avoids Conflict Dominance (E) Dominant, Forceful, AssertiveSerious, Restrained, Careful Liveliness (F) Enthusiastic, Animated, SpontaneousExpedient, Nonconforming Rule-Consciousness (G) Rule-Conscious, DutifulShy, Timid, Threat-Sensitive Social Boldness (H) Socially Bold, Venturesome, Thick-SkinnedTough, Objective, Unsentimental Sensitivity (I) Sensitive, Aesthetic, Tender-MindedTrusting, Unsuspecting, Accepting Vigilance (L) Vigilant, Suspicious, Skeptical, WaryPractical, Grounded, Down-To-Earth Abstractedness (M) Abstracted, Imaginative, Idea-OrientedForthright, Genuine, Artless Privateness (N) Private, Discreet, Non-DisclosingSelf-Assured, Unworried, Complacent Apprehension (O) Apprehensive, Self-Doubting, WorriedTraditional, Attached To Familiar Openness to Change (Q1) Open To Change, ExperimentingGroup-Orientated, Affiliative Self-Reliance (Q2) Self-Reliant, Solitary, IndividualisticTolerates Disorder, Unexacting, Flexible Perfectionism (Q3) Perfectionistic, Organized, Self-DisciplinedRelaxed, Placid, Patient Tension (Q4) Tense, High Energy, Driven
Global ScalesIntroverted, Socially Inhibited Extraversion Extraverted, Socially ParticipatingLow Anxiety, Unperturbable Anxiety Neuroticism High Anxiety, PerturbableReceptive, Open-Minded, Intuitive Tough-Mindedness Tough-Minded, Resolute, UnempathicAccommodating, Agreeable, Selfless Independence Independent, Persuasive, WillfulUnrestrained, Follows Urges Self-Control Self-Controlled, Inhibits UrgesAdapted with permission from S.R. Conn and M.L. Rieke (1994). 16PF Fifth Edition Technical Manual. Champaign, IL: Institutefor Personality and Ability Testing, Inc.
9781412946520-Ch07 5/7/08 7:03 PM Page 136
Table 7.2 16PF global factors and the primary trait` make-up
Global Factors
Primary Factors
Extraversion/Introversion High Anxiety/Low Anxiety
(A) Warm-Reserved(F) Lively-Serious(H) Bold-Shy(N) Private-Forthright(Q2) Self-Reliant–Group-oriented
(C) Emotionally Stable–Reactive
(L) Vigilant–Trusting(O) Apprehensive–Self-assured(Q4) Tense–Relaxed
(A) Warm–Reserved(I) Sensitive–Unsentimental(M) Abstracted–Practical(Q1) Open-to-Change/
Traditional
(E) Dominant–Deferential(H) Bold–Shy(L) Vigilant–Trusting(Q1) Open-to Change/
Traditional
(F) Lively–Serious(G) Rule-conscious/Expedient(M) Abstracted–Practical(Q3) Perfectionistic–Tolerates
disorder
Tough-Mindedness/Receptivity Independence/Accommodation Self-Control/Lack of Restraint
9781412946520-Ch07 5/7/08 7:03 PM Page 138
Comparison of the 16PF globalscales with other five-factor models
For over 50 years, the 16PF has included thebroad, second-order dimensions currentlycalled ‘the Big Five’ (Cattell, R.B., 1946; Krug and Johns, 1986). In fact, Cattell locatedthree of these five factors in his earliest stud-ies of temperament (1933) – which Digman(1996) called ‘the first glimpse of the BigFive’. Four of the five current traits werealready described in Cattell’s 1957 book. Allfive traits have been clearly identified andscorable from the questionnaire since therelease of the fourth edition around 1970.Although Cattell continued to believe thatthere were more than five factors, so havemany other prominent psychologists (Block,1995; Fiske, 1994; Hogan et al., 1996;Jackson et al., 2000; Lee et al., 2005;Ostendorf, 1990; Saucier 2001).
The 16PF scales and items also played animportant role in the development of the otherBig Five factor models (e.g. Costa andMcCrae, 1976, 1985; Norman, 1963;McKenzie et al., 1997; Tupes and Christal,1961). For example, the first NEO manual(Costa and McCrae, 1985: 26) describes thedevelopment of the questionnaire as beginningwith cluster analyses of 16PF scales, whichthese researchers had been using for over 20 years in their own research. However, thisorigin, or even acknowledgement of the exis-tence of the five 16PF global factors, does notappear in any current accounts of the develop-ment of the Big Five (Costa and McCrae,1992a; Digman, 1990; Goldberg, 1990).
Furthermore, when the 16PF correlationmatrix, which was used in the original devel-opment of the Big Five, is re-analyzed using more modern, rigorous factor-analytic
methods, Costa and McCrae’s results do notreplicate (McKenzie, 1998). Instead, appro-priate factoring (see R.B. Cattell, 1978;Gorsuch, 1983) of the original matrix pro-duces the five 16PF global factors, ratherthan the three orthogonal NEO factors thatCosta and McCrae chose to use.
A range of studies comparing the five 16PFglobal factors and the set of NEO Big Five fac-tors show a striking resemblance between thetwo (Carnivez and Allen, 2005; H.E.P. Cattell,1996; Conn and Rieke, 1994; Gerbing andTuley, 1991; Schneewind and Graf, 1998).These studies show strong correlational andfactor-analytic alignment between the twomodels: Between the two extraversion factors,between anxiety and neuroticism, betweenself-control and conscientiousness, betweentough-mindedness/receptivity and openness-to-experience, and between independence anddis-agreeableness. In fact, the average correla-tion between the 16PF global factors and theirrespective NEO five factors are just as high asthose between the NEO five factors and theBig Five markers which the NEO was devel-oped to measure (H.E.P. Cattell, 1996;Goldberg, 1992). The alignments among theBig Five models are summarized in Table 7.4.
However, there are important differencesbetween the two models. Although propo-nents of the other five-factor models have donemuch in the last decade to try to bring abouta consensus in psychology about the exis-tence of five global factors, their particularset of traits have been found to be problem-atic. In the development process, the NEO Big Five factors were forced to be statisti-cally uncorrelated or orthogonal for reasonsof theoretical and statistical simplicity.However, few have found this as a satisfactoryapproach for defining the basic dimensions
THE SIXTEEN PERSONALITY FACTOR QUESTIONNAIRE (16PF) 141
Table 7.4 Alignments among the three main five-factor models16PF (Cattell) NEO-PI-R (Costa and McCrae) Big Five (Goldberg)Extraversion/Introversion Extraversion SurgencyLow Anxiety/High Anxiety Neuroticism Emotional stabilityTough-Mindedness/Receptivity Openness Intellect or cultureIndependence/Accommodation Agreeableness AgreeablenessSelf-Control/Lack of Restraint Conscientiousness Conscientiousness or dependability
9781412946520-Ch07 5/7/08 7:03 PM Page 141
Correlation Preliminaries Alternative cases What is r Multiple R Path algebra R in R Moderation setCor SIgnificance Mediation References References
Cautions about correlations: Anscombe data set
5 10 15
46
810
12
x1
y1
5 10 15
46
810
12
x2
y2
5 10 15
46
810
12
x3
y3
5 10 15
46
810
12
x4
y4
Anscombe's 4 Regression data sets
40 / 119
Top Related