Social Media IR
-
Upload
flavio58it -
Category
Documents
-
view
227 -
download
0
Transcript of Social Media IR
-
8/9/2019 Social Media IR
1/39
-
8/9/2019 Social Media IR
2/39
Social Media?
-
8/9/2019 Social Media IR
3/39
-
8/9/2019 Social Media IR
4/39
Properties of Social Media :Scale
• Twitter (Chirp 2010) – More than 100M user accounts
– more than 600M search ueries a da!
– ""M tweets a da!
• #ace$oo% – More than &00M acti'e users
– More than 2" $illion pieces of content(we$ lin%s news stories $lo posts notesphoto al$ums etc*) shared each month*
-
8/9/2019 Social Media IR
5/39
Properties of Social Media:+mmediac!
• ,eed to share $rea%in news
• Search : Content 's* Peerrecommendation
-
8/9/2019 Social Media IR
6/39
Properties of Social Media:-uplication
• -uplication of content – .los: Cop!/Paste
– Twitter: e/tweet
– 3roups: Cross/postin
– 4mail: Sinature lines +nline eplies
-
8/9/2019 Social Media IR
7/39
Properties of Social Media: Semi/structuredness
• +nformal $ut structured – +nformal 5 low ualit! (e* 7i%ipedia)
• Structure – Metadata
– Connecti'it!
-
8/9/2019 Social Media IR
8/39
Suested eadin
Towards a People7e$ ahuama%rishnan 8 9ndrew Tom%ins +444Computer 9u 200
+mportant properties of users and o$;ects will mo'efrom $ein tied to indi'idual 7e$ sites to $einlo$all! a'aila$le* The con;unction of a lo$al o$;ect
model with porta$le user conte
-
8/9/2019 Social Media IR
9/39
Properties of Social Media
• Scale
• +mmediac!
•
>eteroeneit!• -uplication
• Semi/structuredness
-
8/9/2019 Social Media IR
10/39
Properties of Social Media3raphs
• Small/world propert! – Si
-
8/9/2019 Social Media IR
11/39
,etwor% 4'olution and PathSie
-
8/9/2019 Social Media IR
12/39
-
8/9/2019 Social Media IR
13/39
Pro$a$ilistic Modelin of,etwor%s
• 4rdos/en!i Model – Choose a pair of nodes uniforml! at
random and add an ede*
– 3(n p)
– ,ot Scale #ree (small a'* Path $ut lowclusterin coeDcient)
–Scale #ree networ%s donEt e'ol'e $!chance
-
8/9/2019 Social Media IR
14/39
Pro$a$ilistic Modelin of,etwor%s
• Preferential 9ttachment (.ara$asiand 9l$ert @@) – ich $ecome richer
– Stochastic process: Fsin Pol!aEs urn
-
8/9/2019 Social Media IR
15/39
7h! model?
• Stud! networ% e'olutiondeeneration – -e'elop alorithms
• -etect communities
• 7ho are the mo'ers and sha%ers?
• -etect diGusion of ideas acrossnetwor%s
• -etect anomalies
-
8/9/2019 Social Media IR
16/39
Crawlin Social ,etwor%s
• >TMB (Slashdot)
• SSH9tom feeds ($los)
•
9P+ dri'en (Twitter #ace$oo% I) – -ata li$eration
-
8/9/2019 Social Media IR
17/39
-
8/9/2019 Social Media IR
18/39
Storae and +nde
-
8/9/2019 Social Media IR
19/39
-ata is $ecomin more and moreconnected
(4ifrem NSCN, 200@)
-
8/9/2019 Social Media IR
20/39
Social Media 3raphs
(4ifrem NSCN, 200@)
-
8/9/2019 Social Media IR
21/39
Social Media 3raphs :epresentation
• ,odes
• elationship $etween nodes
•
Properties on .oth• Storin in #lat #iles 's* 3raph
-ata$ases
•
,eo& dis% $ased solution – wor%s well for sies up to a few $illion
(Sinle OM)
-
8/9/2019 Social Media IR
22/39
-
8/9/2019 Social Media IR
23/39
Parallelism 'ia Map/educe
• 9 paradim to 'iew input as (%e! 'alue)pairs and alorithms process these pairsin one of two staes
– Map: Perform operations on indi'idual pairs – educe: Com$ine all pairs with the same
%e!
– #unctional prorammin oriins
– 9$stracts awa! s!stem speci=c issues
– Manipulate lare uantities of data
-
8/9/2019 Social Media IR
24/39
-
8/9/2019 Social Media IR
25/39
-
8/9/2019 Social Media IR
26/39
+nput: Collection of documents Nutput: #or each word =nd all
documents with the word
def m a!!er($ienam e, content)%foreach w ord in content.s!it()%
output(w ord, $ienam e)
def reducer(-ey, aues)%output(-ey, uni/ue(aues))
4
-
8/9/2019 Social Media IR
27/39
Map/educin raph data
• ,ote: .! desin the mappers cannotcommunicate with each other*
• The raph representation should $e such
that that all information (e** neih$orhood)needed for processin a node should $elocall! a'aila$le*
• The ad;acenc! list representation is perfectl!
suited*• Qe!: 'erte< in the raph
• Oalue: neih$ors of the 'erte< and their associated'alues
-
8/9/2019 Social Media IR
28/39
Computin Paean% (Mapeduce)
• Paean% update with dampeninparameter R
where P is the transition pro$a$ilit!matri
-
8/9/2019 Social Media IR
29/39
Mapeduce: Paean%+teration
Map(Qe! % Oalue ')
rold %*ran%U
r 0U foreach node n in '*et,eih$ors()
r V p(n %)Wrold V dampeninfactor
X
'*ran% rU
4mit(% ')U
X
-
8/9/2019 Social Media IR
30/39
Processin Bare Scale3raph -ata
• Mapeduce is not the $est model forlare scale raph processin – Simple raph concepts (Paeran% .#S
I) are not eas! to proram – Mapeduce does not preser'e data
localit! in consecuti'e operations
-
8/9/2019 Social Media IR
31/39
9 ,ew Paradim to Process BareScale 3raph -ata
• .ul% S!nchronous Parallel
• -e'eloped in the Y0s $! BeslieOaliant
• +ntroduced $! 3oole for 3raphcomputation
Preel: a s!stem for lare/scale raphprocessin (Malewic et al PN-C200@)
-
8/9/2019 Social Media IR
32/39
.ul% S!nchronous Parallel
• Seuence of steps Z SuperSteps
• 4ach SuperStep S – 4
-
8/9/2019 Social Media IR
33/39
7h! SuperStep?
• +nternall! consists of three staes
-
8/9/2019 Social Media IR
34/39
Computin Paean% (.SP'ersion)
Compute()
rold rU
r 0U
for each incomin messae m
r V m*pW rold V dampeninfactorU
X if(r Z rold [ epsilon) done()
X
-
8/9/2019 Social Media IR
35/39
Suested eadin
• Trul! Madl! -eepl! Parallelo$ert Matthews ,ew Scientist #e$1@@6
• Preel: a s!stem for lare/scaleraph processin (Malewic et alPN-C 200@)
-
8/9/2019 Social Media IR
36/39
Social ,etwor% 9nal!sis
• etrie'in information from structure
• 4
-
8/9/2019 Social Media IR
37/39
.oo%
,etwor%s Crowds and Mar%ets:easonin 9$out a >ihl! Connected
7orld
.! -a'id 4asle! and on Qlein$erhttp://www.cs.cornell.ed/home/!lein"er/networ!s#"oo!/
-
8/9/2019 Social Media IR
38/39
ecap I
• Properties of socialmedia – Scale
– +mmediac!
–
>eteroeneit! – -uplication
– Semi/structuredness
• Properties of social
media raphs – Small/worldness
– Scale free propert!
– 4'olution models
• Crawlin – 9P+ dri'en
• +nde
-
8/9/2019 Social Media IR
39/39
Social Media elatedPro;ects
• Twitter