Post on 08-Apr-2018
8/7/2019 Foursquare -CEPSR Presentation
1/18
Foursquare
Adventures in Machine
Learning, Statistics, and BigData
8/7/2019 Foursquare -CEPSR Presentation
2/18
What is Foursquare?
Location based startup, application
that helps you to explore your city
Visit places, check-in, earn rewards,stay connected with your friends
Game elements: single-player, multi-player
8/7/2019 Foursquare -CEPSR Presentation
3/18
What is Foursquare? (cont.)
4M+ users, 10M+ venues, 200M+
check-ins
Large reach (most major countries,North Pole, Space)
Native app for almost everysmartphone, also available on SMS,web, mobile-web
8/7/2019 Foursquare -CEPSR Presentation
4/18
Data Model
Users Venues
/ -Tips To dos
-Check ins
Shouts
8/7/2019 Foursquare -CEPSR Presentation
5/18
Big Data
Some problems no longer solvable by
simple/nave algorithms, simplecrowd-sourcing
Interesting problems can now besolved using statistical methods:prediction, classification,optimization
8/7/2019 Foursquare -CEPSR Presentation
6/18
Example Problems
Prediction: Recommending places to
people, people to places, people topeople, places from places, tips,events, checking-in, interestingness
Optimization: Search, ranking, userexperience
Classification: Categorizing venues,removing junk, spam, duplicates
The list goes on
8/7/2019 Foursquare -CEPSR Presentation
7/18
Zooming in
Recommending Places toPeople
What do we have available? Check-ins (user history, venue
history)
Venue meta-data, User meta-data
Friend graph What do we want to do?
Social, fun (interesting >? optimal)
Be smart, provide serendipity, hit the
tail
8/7/2019 Foursquare -CEPSR Presentation
8/18
Initial Thoughts
Want a hybrid model, many features
Need to be scalable, fast (web-scale,offline computations are OK as longas they scale linearly)
Start with dumb, get smarter where
possible. Iterate. Something isbetter than nothing. Data is key.
8/7/2019 Foursquare -CEPSR Presentation
9/18
Start with Simple
Popularity: user independent, works for
cold-start, can be extended: Decay popularity: recently popular, new,
long-term
Break down by time of day, day of week
Unique users vs. hits per user (must seevs. hidden gem vs. generally popular)
Bubble up interesting things: specials,todos, similar tips
8/7/2019 Foursquare -CEPSR Presentation
10/18
:// . / / .http bitsybot com foursquare trends html
.Breakfast vs Brunch.Unstructured vs Commuting
8/7/2019 Foursquare -CEPSR Presentation
11/18
More Complex Add some social elements. Where do
your friends go? Can we rate yourfriends?
Good for users with small check-inhistory, large # of friends
Can we determine friend quality fromcheck-ins?
Even if weak mathematically, social
can triumph: Jane, John, and 17other friends went here
8/7/2019 Foursquare -CEPSR Presentation
12/18
Hard: Check-in History Can we accurately predict where you want to go based
on where you went? Seems good, but how to do it?How to scale it?
Lots of research in this area lately, mostly because ofNetflix and Amazon before them
Collaborative filtering (venue-to-venue, user-to-user)
Factorization, dimensionality reduction
Clustering
SVM
Linear models
Context Filtering/Search
The branching factor for choosing a method increasesdramatically at this stage. Although it provides the
most value, it is the most difficult to do right.
8/7/2019 Foursquare -CEPSR Presentation
13/18
Choosing a Venue SimilarityMetric
Correlation
Cosine similarity
How do adjust for scale? How much?
How to remove neighborhood effect?
A B
||A *|| ||B||
( , )cov A B
( )td A * ( )td b
8/7/2019 Foursquare -CEPSR Presentation
14/18
kNN Collaborative Filtering
val result =
for { historyVenue 0)
val modscore = vPair.r /
(1 + math.exp(vPair.distance / -1000.0 + 1.75))
} yield PairScore(venue, modscore, vPair.numCheckins)
val kNN = result.sortBy(_.modscore).reverse.take(K)
kNN.map(n => n.modScore * math.log(n.numCheckins + 1)).sum /
WTF?
8/7/2019 Foursquare -CEPSR Presentation
15/18
Similarity Data: Correlation?Venue
Ve
nu
e
,
(
,
Tooclose
dens
e lazy)
neighborhood
,
Sweetspot
sparse
Similarity Matrix( )symmetric?
No Data
Bestsource
ofdat
a
!Interesting
Lazy?
8/7/2019 Foursquare -CEPSR Presentation
16/18
( ), (. )Relatively close 300m high correlation 26 ( . ), (. )Relatively far 7 6km low correlation 04
Whats the Difference?
8/7/2019 Foursquare -CEPSR Presentation
17/18
Were Hiring!
Looking for developers in the field of
ML and/or Statistics, also hiringacross the board
We use cool tech: Scala, Lift,
MongoDB Small company (35 people), flexible
work environment, lots of big
projects to work on
8/7/2019 Foursquare -CEPSR Presentation
18/18
Seriously, Come Work for Us
Lots of ex-finance employees (almost
half our engineers!), lots of ex-*softemployes (more than half ourengineers!) note the ex-
Fast growing company, lots ofinnovation
Many very smart people with
common goals