IOT meetup presentation

20
Handling IOT Data with a Modern Data Architecture Cliff Gilmore - Data Practice Director @ 1904labs

Transcript of IOT meetup presentation

Page 1: IOT meetup presentation

Handling IOT Data with a Modern Data Architecture

Cliff Gilmore - Data Practice Director @ 1904labs

Page 2: IOT meetup presentation

Capture, Process and Serve (All the Things)

Page 3: IOT meetup presentation

Challenges of IOT Data

3

Scale

Frequency of events

Size of Data

Number of Devices

Number of Users

Latency Demands

Geo Distribution

Processing

Batch Analytics

Realtime Analytics

Aggregations

Machine Learning

Reporting

Applications

Realtime Access

Report Visualization

Production Analytics

ML Driven Decisions

Microservices

Page 4: IOT meetup presentation

IOT Data ArchitectureLeading to Lambda and Kappa

Page 5: IOT meetup presentation

Architectural Requirements

❏ Must scale out linearly on ingestion, processing, storage, and access.

❏ Need to be able to store huge amounts of data organized for different access patterns.

❏ Must have the ability to process data inflight for real time decision making, alerting and pattern matching

❏ Need to serve the data to the rest of the organization through a common API/Service

❏ The architecture must be agile to accommodate new changes to business logic and processing algorithms

Page 6: IOT meetup presentation

Stack Components

❏ Distributed Log / Queue❏ A pub/sub partitioned queue❏ Kafka is the defacto choice due to it’s wide use in production

❏ Stream Processing❏ Ability to process events as they arrive❏ Event at a Time

❏ Samza, Flink, Storm❏ Micro Batch

❏ Spark Streaming❏ Batch Processing

❏ Process event history in bulk❏ Spark, MapReduce on top of HDFS or Wide Column Stores

❏ Serving❏ Expose data to the rest of the organization and serve application requests❏ Wide Column Store

❏ Cassandra, HBase, BigTable❏ Can also be RDBMS for some data sets (Reports, BI Rollups, etc)

Page 7: IOT meetup presentation

Lambda Architecture

EventsDistributed

Log

Batch Layer

Speed Layer Serving LayerRaw Data, Pattern Matching and Aggregates

Patterns, Rollups, Recommendations

Page 8: IOT meetup presentation

Kappa Architecture

EventsDistributed

Log

Bat

ch

Streaming Serving

Stream Results

Stream V1Stream V2

Table V1Table V2Raw DataRaw

Page 9: IOT meetup presentation

Cassandra toServe IOT Data

The Art of Time Series

Page 10: IOT meetup presentation

Why Cassandra?

❏ Proven linear scale up to 1000s of nodes in a single cluster

❏ Geo redundancy to collect data where it is created and replicate across the globe

❏ High capacity to ingest parallel individual writes

❏ Low latency and high throughput reads

❏ Wide-column store data model allows for data to be structured around query patterns

❏ Continuous availability suited to and used for the most mission critical systems

❏ AP platform by definition of the CAP theorem, consistency is tunable to give availability

Page 11: IOT meetup presentation

Cassandra 101

DC1 DC2

Page 12: IOT meetup presentation

Physical Data Model

Partition Key

Clustering Key Val1

Col1:ValCol2:ValCol3:ValCol4:ValCol5:ValCol6:Val

….

Clustering Key Val2

Col1:ValCol6:Val

….

Clustering Key Val3

Col6:Val….

Clustering Key Val4

Col1:ValCol2:ValCol3:ValCol4:ValCol5:ValCol6:Val

….

Clustering Key Val5

Col1:ValCol2:Vall

….

...

….

Page 13: IOT meetup presentation

CQL - Cassandra Query Language

❏ Simple to use language that looks like SQL

❏ No joins, group by etc

❏ Example Queries

❏ SELECT * FROM readings WHERE event_time > ? AND event_tiime <= ? WHERE device_id= ?;

❏ INSERT INTO readings (device_id, event_time, temperature) VALUES (?,?,?);

I’ve got this!

Page 14: IOT meetup presentation

TimeSeries Table Example

CREATE TABLE readings (

sensor_id text,

event_time TimeUUID,

temperature decimal,

PRIMARY KEY (sensor_id,event_time)

);

Page 15: IOT meetup presentation

TimeSeries Table Example

CREATE TABLE readings (

sensor_id text,

event_time TimeUUID,

temperature decimal,

PRIMARY KEY (sensor_id,event_time)

);

Time Ordered Sortable UUID

Page 16: IOT meetup presentation

TimeSeries Table Example

CREATE TABLE readings (

sensor_id text,

event_time TimeUUID,

temperature decimal,

PRIMARY KEY (sensor_id,event_time)

);

Partition Key

Page 17: IOT meetup presentation

TimeSeries Table Example

CREATE TABLE readings (

sensor_id text,

event_time TimeUUID,

temperature decimal,

PRIMARY KEY (sensor_id,event_time)

);

Clustering Key

Page 18: IOT meetup presentation

Physical Data Model

Station #1

12:05.15

15.9 C

12:05.16

15.9 C

12:05.17

16.0 C

12:05.18

16.1 C

12:05.19

16.0 C

...

….

Station #2

12:05.15

22.0 C

12:05.20

22.1 C

12:05.25

27.9 C

12:05.30

27.7 C

12:05.35

30.2 C

...

….

Page 19: IOT meetup presentation

Advanced Data Model Topics

❏ Consider bucketing time in Partition Key if sample rate is high

❏ Primary Key ((device_id,year,week),event_time)

❏ If per event granularity not needed can batch or rollup events

❏ Primary Key (device_id, event_minute)

❏ If we batch events

❏ JSON blob of sensor readings within the minute

❏ Can’t update sensor readings without read-before-write