Social Media IR

download Social Media IR

of 12

Transcript of Social Media IR

  • 8/9/2019 Social Media IR

    1/39

  • 8/9/2019 Social Media IR

    2/39

    Social Media?

  • 8/9/2019 Social Media IR

    3/39

  • 8/9/2019 Social Media IR

    4/39

    Properties of Social Media :Scale

    •  Twitter (Chirp 2010) – More than 100M user accounts

     – more than 600M search ueries a da!

     – ""M tweets a da!

    • #ace$oo% – More than &00M acti'e users

     – More than 2" $illion pieces of content(we$ lin%s news stories $lo posts notesphoto al$ums etc*) shared each month*

  • 8/9/2019 Social Media IR

    5/39

    Properties of Social Media:+mmediac!

    • ,eed to share $rea%in news

    • Search : Content 's* Peerrecommendation

  • 8/9/2019 Social Media IR

    6/39

    Properties of Social Media:-uplication

    • -uplication of content – .los: Cop!/Paste

     – Twitter: e/tweet

     – 3roups: Cross/postin

     – 4mail: Sinature lines +nline eplies

  • 8/9/2019 Social Media IR

    7/39

    Properties of Social Media: Semi/structuredness

    • +nformal $ut structured – +nformal 5 low ualit! (e* 7i%ipedia)

    • Structure – Metadata

     – Connecti'it!

  • 8/9/2019 Social Media IR

    8/39

    Suested eadin

    Towards a People7e$ ahuama%rishnan 8 9ndrew Tom%ins +444Computer 9u 200

    +mportant properties of users and o$;ects will mo'efrom $ein tied to indi'idual 7e$ sites to $einlo$all! a'aila$le* The con;unction of a lo$al o$;ect

    model with porta$le user conte

  • 8/9/2019 Social Media IR

    9/39

    Properties of Social Media

    • Scale

    • +mmediac!

    >eteroeneit!• -uplication

    • Semi/structuredness

  • 8/9/2019 Social Media IR

    10/39

    Properties of Social Media3raphs

    • Small/world propert! – Si

  • 8/9/2019 Social Media IR

    11/39

    ,etwor% 4'olution and PathSie

  • 8/9/2019 Social Media IR

    12/39

  • 8/9/2019 Social Media IR

    13/39

    Pro$a$ilistic Modelin of,etwor%s

    • 4rdos/en!i Model – Choose a pair of nodes uniforml! at

    random and add an ede*

     – 3(n p)

     – ,ot Scale #ree (small a'* Path $ut lowclusterin coeDcient)

     –Scale #ree networ%s donEt e'ol'e $!chance

  • 8/9/2019 Social Media IR

    14/39

    Pro$a$ilistic Modelin of,etwor%s

    • Preferential 9ttachment (.ara$asiand 9l$ert @@) – ich $ecome richer

     – Stochastic process: Fsin Pol!aEs urn

  • 8/9/2019 Social Media IR

    15/39

    7h! model?

    • Stud! networ% e'olutiondeeneration – -e'elop alorithms

    • -etect communities

    • 7ho are the mo'ers and sha%ers?

    • -etect diGusion of ideas acrossnetwor%s

    • -etect anomalies

  • 8/9/2019 Social Media IR

    16/39

    Crawlin Social ,etwor%s

    • >TMB (Slashdot)

    • SSH9tom feeds ($los)

    9P+ dri'en (Twitter #ace$oo% I) – -ata li$eration

  • 8/9/2019 Social Media IR

    17/39

  • 8/9/2019 Social Media IR

    18/39

    Storae and +nde

  • 8/9/2019 Social Media IR

    19/39

    -ata is $ecomin more and moreconnected

    (4ifrem NSCN, 200@)

  • 8/9/2019 Social Media IR

    20/39

    Social Media 3raphs

    (4ifrem NSCN, 200@)

  • 8/9/2019 Social Media IR

    21/39

    Social Media 3raphs :epresentation

    • ,odes

    • elationship $etween nodes

    Properties on .oth• Storin in #lat #iles 's* 3raph

    -ata$ases

    ,eo& dis% $ased solution – wor%s well for sies up to a few $illion

    (Sinle OM)

  • 8/9/2019 Social Media IR

    22/39

  • 8/9/2019 Social Media IR

    23/39

    Parallelism 'ia Map/educe

    • 9 paradim to 'iew input as (%e! 'alue)pairs and alorithms process these pairsin one of two staes

     – Map: Perform operations on indi'idual pairs – educe: Com$ine all pairs with the same

    %e!

     – #unctional prorammin oriins

     – 9$stracts awa! s!stem speci=c issues

     – Manipulate lare uantities of data

  • 8/9/2019 Social Media IR

    24/39

  • 8/9/2019 Social Media IR

    25/39

  • 8/9/2019 Social Media IR

    26/39

    +nput: Collection of documents Nutput: #or each word =nd all

    documents with the word

    def m a!!er($ienam e, content)%foreach w ord in content.s!it()%

    output(w ord, $ienam e)

    def  reducer(-ey, aues)%output(-ey, uni/ue(aues))

    4

  • 8/9/2019 Social Media IR

    27/39

    Map/educin raph data

    • ,ote: .! desin the mappers cannotcommunicate with each other*

    •  The raph representation should $e such

    that that all information (e** neih$orhood)needed for processin a node should $elocall! a'aila$le*

    •  The ad;acenc! list representation is perfectl!

    suited*• Qe!: 'erte< in the raph

    • Oalue: neih$ors of the 'erte< and their associated'alues

  • 8/9/2019 Social Media IR

    28/39

    Computin Paean% (Mapeduce)

    • Paean% update with dampeninparameter R

    where P is the transition pro$a$ilit!matri

  • 8/9/2019 Social Media IR

    29/39

    Mapeduce: Paean%+teration

    Map(Qe! % Oalue ')

      rold %*ran%U

      r 0U  foreach node n in '*et,eih$ors()

      r V p(n %)Wrold V dampeninfactor

    X

      '*ran% rU

      4mit(% ')U

    X

  • 8/9/2019 Social Media IR

    30/39

    Processin Bare Scale3raph -ata

    • Mapeduce is not the $est model forlare scale raph processin – Simple raph concepts (Paeran% .#S

    I) are not eas! to proram – Mapeduce does not preser'e data

    localit! in consecuti'e operations

  • 8/9/2019 Social Media IR

    31/39

    9 ,ew Paradim to Process BareScale 3raph -ata

    • .ul% S!nchronous Parallel

    • -e'eloped in the Y0s $! BeslieOaliant

    • +ntroduced $! 3oole for 3raphcomputation

    Preel: a s!stem for lare/scale raphprocessin (Malewic et al PN-C200@)

  • 8/9/2019 Social Media IR

    32/39

    .ul% S!nchronous Parallel

    • Seuence of steps Z SuperSteps

    • 4ach SuperStep S – 4

  • 8/9/2019 Social Media IR

    33/39

    7h! SuperStep?

    • +nternall! consists of three staes

  • 8/9/2019 Social Media IR

    34/39

    Computin Paean% (.SP'ersion)

    Compute()

      rold rU

      r 0U

      for each incomin messae m

      r V m*pW rold V dampeninfactorU

      X  if(r Z rold [ epsilon) done()

    X

  • 8/9/2019 Social Media IR

    35/39

    Suested eadin

    • Trul! Madl! -eepl! Parallelo$ert Matthews ,ew Scientist #e$1@@6

    • Preel: a s!stem for lare/scaleraph processin (Malewic et alPN-C 200@)

  • 8/9/2019 Social Media IR

    36/39

    Social ,etwor% 9nal!sis

    • etrie'in information from structure

    • 4

  • 8/9/2019 Social Media IR

    37/39

    .oo%

    ,etwor%s Crowds and Mar%ets:easonin 9$out a >ihl! Connected

    7orld

    .! -a'id 4asle! and on Qlein$erhttp://www.cs.cornell.ed/home/!lein"er/networ!s#"oo!/

  • 8/9/2019 Social Media IR

    38/39

    ecap I

    • Properties of socialmedia – Scale

     – +mmediac!

     –

    >eteroeneit! – -uplication

     – Semi/structuredness

    • Properties of social

    media raphs – Small/worldness

     – Scale free propert!

     – 4'olution models

    • Crawlin – 9P+ dri'en

    • +nde

  • 8/9/2019 Social Media IR

    39/39

    Social Media elatedPro;ects

    •  Twitter