MCTI-DSD Sistemas Escalables en Entornos Distribuidos (v4d)
-
Upload
alejandro-calderon-mateos -
Category
Education
-
view
102 -
download
0
Transcript of MCTI-DSD Sistemas Escalables en Entornos Distribuidos (v4d)
Diseño de Sistemas DistribuidosMáster en Ciencia y Tecnología Informática
Curso 2016-2017
Alejandro Calderón Mateos & Óscar Pérez Alonso
[email protected] [email protected]
Sistemas escalablesen entornos distribuidos
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
2
Contenidos
– Evolución
http://image.slidesharecdn.com/baronbigdatadeveloper-120418090010-phpapp02/95/the-big-data-developer-pavlobaron-1-728.jpg?cb=1338966838
Big
Big
Big
Big
Big
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
3
Sistemas distribuidos: los inicios…
http://www.nethistory.info/History%20of%20the%20Internet/origins.html#apps
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
4
Sistemas distribuidos: los inicios…
http://www.nethistory.info/History%20of%20the%20Internet/origins.html#apps
• Sistema centralizado• Mejor
mantenimiento
• Compartición de recursos• Repartición de
costes
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
5
Sistemas distribuidos: los inicios…
http://www.nethistory.info/History%20of%20the%20Internet/origins.html#apps
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
6
Sistemas distribuidos: los inicios…
http://www.nethistory.info/History%20of%20the%20Internet/origins.html#apps
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
7
Sistemas distribuidos: los inicios…
http://microchip.wdfiles.com/local--files/tcpip:tcp-ip-five-layer-model/TCPIP_5_layer_overview.JPG
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
8
Sistemas distribuidos: los inicios…
http://www.nethistory.info/History%20of%20the%20Internet/origins.html#apps
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
9
Sistemas distribuidos: los inicios…
http://edc.tversu.ru/elib/inf/0091/tcpip/figs/tcp2_0303.gif
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
10
Sistemas distribuidos: los inicios…
https://en.wikipedia.org/wiki/IBM_Personal_Computer
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
11
Sistemas distribuidos: los inicios…
http://www.thefoa.org/tech/ref/appln/OLAN.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
12
Sistemas distribuidos: los inicios…
http://cdn.arstechnica.net/2011/09/23/hdd-capacity-scale-4e7ce6c-intro.png
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
13
Sistemas distribuidos: los inicios…
https://bitcointalk.org/index.php?topic=430357.0
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
14
Sistemas distribuidos: los inicios…
http://img.frbiz.com/news/145317_s/Mobile_communication_base_station_radio_equipment_greet_with_the_explosive_growth_mobile_communication_base_station_radio_equipment_3G_communication_industry.jpg
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
15
Sistemas distribuidos: los inicios…
https://www.dcsorg.com/images/image_centralized_management.jpg
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
16
Contenidos
– Evolución
– Learning
http://image.slidesharecdn.com/baronbigdatadeveloper-120418090010-phpapp02/95/the-big-data-developer-pavlobaron-1-728.jpg?cb=1338966838
Big
Big
Big
Big
Big
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
17
Las transiciones no son fáciles…
https://www.dcsorg.com/images/image_centralized_management.jpg
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
18
Las transiciones no son fáciles…Expectativas
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
19
Las transiciones no son fáciles…Realidades
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
20
Las transiciones no son fáciles…Retos…
http://cdn.comsol.com/wordpress/2014/02/Speeding-up-communications-distributed-memory-computing-copy.jpg
The network is homogeneous
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
21
Las transiciones no son fáciles…Retos…
http://thenewstack.io/helix-a-linkedin-framework-for-distributed-systems-development/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
22
Las transiciones no son fáciles…Áreas de trabajo…
http://cdn.comsol.com/wordpress/2014/02/Speeding-up-communications-distributed-memory-computing-copy.jpg
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
23
Las transiciones no son fáciles…Áreas de trabajo…
IS473 at http://www.xpowerpoint.com/ppt/system-model-distributed-systems.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
24
Las transiciones no son fáciles…Lecciones aprendidas…
http://www.pixempire.com/images/preview/orchestra-director-with-stick-icon.jpg
Software/hardware extra en S.D.:conseguir que la existencia de múltiples elementos sea transparente
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
25
Las transiciones no son fáciles…Lecciones aprendidas…
http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
26
Las transiciones no son fáciles…Lecciones aprendidas: elementos típicos…
http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
27
Las transiciones no son fáciles…Lecciones aprendidas: arquitectura típica…
http://books.cs.luc.edu/distributedsystems/issues.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
28
Contenidos
– Evolución
– Learning
– (re)Tendencia
http://image.slidesharecdn.com/baronbigdatadeveloper-120418090010-phpapp02/95/the-big-data-developer-pavlobaron-1-728.jpg?cb=1338966838
Big
Big
Big
Big
Big
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
29
Sistemas distribuidos: mayor tamaño…Todas las casas conectadas
https://bitcointalk.org/index.php?topic=430357.0
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
30
Sistemas distribuidos: mayor tamaño…Todas las personas conectadas
http://www.videcom.com/Portals/0/iphone1.png & http://images.techtimes.com/data/images/full/127370/events-for-gmail.jpg?w=600
a computer in your hands
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
31
Sistemas distribuidos: mayor tamaño…
https://www.dcsorg.com/images/image_centralized_management.jpg http://www.hsi.es/images/cc.png
Cloud
Cloud computing:Ubiquitous, on-demand access to shared pool of computing resources
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
32
Sistemas distribuidos de mayor tamaño…Todas las cosas conectadas
http://www.mercurynews.com/business/ci_24836116/internet-things-seen-bonanza-bay-area-businesses
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
33
Sistemas distribuidos de mayor tamaño…Internet of Things (IoT)
http://tarrysingh.com/2014/07/fog-computing-happens-when-big-data-analytics-marries-internet-of-things/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
34
Ejemplo de IoT:PI0 + HTML5 Server-Sent Events
https://www.raspberrypi.org/magpi/wp-content/uploads/2015/11/Pi_Zero-Pics-Spread.jpghttps://redbear.cc/content/blog/pi-zero-iot-hat/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
35
Ejemplo de IoT:PI0 + HTML5 Server-Sent Events
https://developer.mozilla.org/es/docs/Server-sent_events/utilizando_server_sent_events_sse
Client
• Web page gets updates from a server– without ask for updates, the updates came automatically
• Examples: Twitter updates, stocks prices, news feeds, etc.
Server
data: some text
data: another messagedata: with two lines
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
36
Ejemplo de IoT:PI0 + HTML5 Server-Sent Events
http://www.w3schools.com/html/html5_serversentevents.asp
Client(demo.html)
• Web page gets updates from a server– without ask for updates, the updates came automatically
• Examples: Twitter updates, stocks prices, news feeds, etc.
Server(demo.php)
<span id="result"></span>
<script>
var source = new EventSource("demo.php");source.onmessage = function (e) {
var ref = document.getElementById("result") ;ref.innerHTML += e.data + "<br>";
};
source.onerror = function(e) { alert(“Problems..."); };
</script>
<?phpheader('Content-Type: text/event-stream');header('Cache-Control: no-cache');
while (1) {echo 'data: {"time": ’ . date('r') . “}\n\n';flush(); sleep(1);
} ?>
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
37
Ejemplo de IoT:PI0 + HTML5 Server-Sent Events
https://github.com/acaldero/moon
Client(demo.html)
Server(demo.sh)
<!DOCTYPE html>
<html>
<head> <meta charset="utf-8" /> </head>
<body>
<script>
var s = new EventSource('http://<ip>:8080');
s.onmessage = function(e) {
document.body.innerHTML += e.data + '<br>';
};
</script>
</body>
</html>
echo "HTTP/1.1 200 OK"echo "Access-Control-Allow-Origin: *"echo "Content-Type: text/event-stream"echo "Cache-Control: no-cache"echo ""
while [ 1 ]; do
T=$(date +%H:%M:%S)echo "data: {'timestamp': $T}\n\n"sleep 1
done
(demo-nc.sh)
./demo.sh | nc -l -p 8080
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
38
Sistemas distribuidos de mayor tamaño…Internet of Things (IoT)
http://knowledgeblob.com/technology/a-brief-about-internet-of-things-iot/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
39
Contenidos
– Evolución
– Learning
– (re)Tendencia
– (re)Retos
http://image.slidesharecdn.com/baronbigdatadeveloper-120418090010-phpapp02/95/the-big-data-developer-pavlobaron-1-728.jpg?cb=1338966838
Big
Big
Big
Big
Big
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
40
Dependencia cada vez mayor…
https://www.dcsorg.com/images/image_centralized_management.jpg http://www.hsi.es/images/cc.png
Cloud
Cloud computing:Ubiquitous, on-demand access to shared pool of computing resources
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
41
Dependencia cada vez mayor…criticidad con difícil vuelta atrás
http://kburnett.net/business-case/technology/mobility-2/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
42
Nuevos retos…
http://tarrysingh.com/2014/07/fog-computing-happens-when-big-data-analytics-marries-internet-of-things/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
43
Big data is coming…
http://online.wsj.com/news/articles/SB10001424127887324178904578340071261396666
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
44
Big data is coming…
http://online.wsj.com/news/articles/SB10001424127887324178904578340071261396666
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
45
Big Data frequently used…
http://www.ndm.net/emcstore/storage/greenplumhttp://fcw.com/~/media/GIG/FCWNow/Topics/Big%20Data/Big_data.png
• Text mining
• Index building
• Graph creation and analysis
• Pattern recognition
• Prediction model
• …
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
46
Big Data & Big Processing…
https://www.linkedin.com/pulse/open-source-network-has-organized-workshop-big-data-unlock-sharmin
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
47
Big Distributed Systems are coming…
IS473 at http://www.xpowerpoint.com/ppt/system-model-distributed-systems.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
48
Contenidos
– Evolución
– Learning
– (re)Tendencia
– (re)Retos
http://image.slidesharecdn.com/baronbigdatadeveloper-120418090010-phpapp02/95/the-big-data-developer-pavlobaron-1-728.jpg?cb=1338966838 http://gruposda.es/wp-content/uploads/sertecni.jpg
Big
Big
Big
Big
Big
– Motivación
– Introducción
– Hand-on
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
49
Repaso de opciones… (1/3)
http://www.scsi4me.com/images/1296909213_60.jpg
File System
Disk Disk
App
Ordenador
DASDirect Attached Storage
• HW: – Ordenador con periféricos de
almacenamiento.
• SW: – Sistema de ficheros local o
gestor de base de datos local.
• V/I:– Rapidez, simplicidad
– No compartición, crecimiento limitado
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
50
Repaso de opciones… (2/3)
http://www.jollynas.com/nas/img/wJollyNAS.jpg
NAS
File System
Disk Disk
App
Cliente
NASNetwork Attached Storage
NFS, CIFS
• HW: – Ordenador con periféricos de
comunicación.
– NAS: ordenador con periféricos de comunicación y periféricos de almacenamiento.
• SW: – Sistema de ficheros remoto o
gestor de base de datos remoto.
• V/I:– Compartición (RO), crecimiento
– Red limita velocidad y crecimiento
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
51
Repaso de opciones… (3/3)
http://andysworld.org.uk/blog/wp-content/uploads/2010/05/ra4100.jpg
Disk Disk
Shared F.S.
App
RAID/JBOD
App. serveriSCSI
SANStorage Area Network
• HW: – Ordenador con periféricos de
comunicación.
– SAN: periféricos de almacenamiento con dispositivo de comunicación.
• SW: – Gestor de base de datos o
sistema de ficheros, compartido.
• V/I:– Mejor crecimiento-velocidad
– Complejidad, coste
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
52
Combinación de opciones…
File System
Disk Disk
App
Ordenador
DASDirect Attached Storage
NAS
File System
Disk Disk
App
Cliente
NASNetwork Attached Storage
NFS
, CIF
S
Disk Disk
Shared F.S.
App
RAID/JBOD
App. server
iSC
SI
SANStorage Area Network+ +
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
53
Combinación de opciones…
SSD AFA
NAS + SAN
Shared F.S.
App
GP
FS, L
ust
re,…
iSC
SI
Shared F.S.
Disks Disk
SSD AFADisks Disk
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
54
Combinación de opciones…
Shared F.S.
App
Shared F.S.
Blocks
FilesObjs.
GP
FS, L
ust
re,…
iSC
SI
SSD AFADisks Disk
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
55
Combinación de opciones…
Shared F.S.
App
Shared F.S.
Blocks
FilesObjs.
GP
FS, L
ust
re,…
iSC
SI
S3, IL7, … NFS, CEPH, …
iSCSI, FCoE, …
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
56
Combinación de opciones…
App
Blocks
FilesObjs.
S3, IL7, … NFS, CEPH, …
iSCSI, FCoE, …
Gran almacenamiento con capacidad de
“crecimiento”
SSD AFADisks Disk
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
57
Problemas…
Shared F.S.
App
Shared F.S.
GP
FS, L
ust
re,…
iSC
SI
• Gran cantidad de tráfico por movimiento de datos.
SSD AFADisks Disk
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
58
Opciones para almacenamiento…
Shared F.S.
App
Shared F.S.
GP
FS, L
ust
re,…
iSC
SI• Gran cantidad de tráfico
por movimiento de datos.• Posible disminuir si se
acerca parte del cómputoal almacenamiento.
A0 A1 A2 A3
A4 A5
SSD AFADisks Disk
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
59
Opciones para almacenamiento…
Shared F.S.
App
Shared F.S.
GP
FS, L
ust
re,…
iSC
SI• Gran cantidad de tráfico
por movimiento de datos.• Posible disminuir si se
acerca parte del cómputoal almacenamiento.
A0 A1 A2 A3
A4 A5
SSD AFADisks Disk
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
60
Opciones para almacenamiento…
Shared F.S.
App
Shared F.S.
GP
FS, L
ust
re,…
iSC
SI• Gran cantidad de tráfico
por movimiento de datos.• Posible disminuir si se
acerca parte del cómputoal almacenamiento.
A0 A1 A2 A3
A4 A5
http://rsoltd.com/wp-content/uploads/2014/04/expensive.jpg
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
61
Buscar/crear la herramienta adecuada…
http://storageio.com/images/SIO_ToolBox.png
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
62
The Google File System
• Google presenta MapReduce y Google File System (GFS)
– MapReduce es una propuesta para aplicar la misma función a particiones de datos (map) y luego se tiene el resultado procesando los resultados parciales (reduce)
– GFS es una propuesta para almacenar petabytes de datos en muchas máquinas comunes, tratando con fallos, distribución, etc.
http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf
SSD AFADisks Disk
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
63
De esta opción…
Shared F.S.
App
Shared F.S.
GP
FS, L
ust
re,…
iSC
SI
A0 A1 A2 A3
A4 A5
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
64
…a esta opción
GFS
Disk
App
A0
GFS
Disk
A1
GFS
Disk
A3
GFS
Disk
A4
GFS
Disk
A5
GFS
Disk
A6
GFS
Disk
A7
GFS
Disk
A8
Network
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
65
…a esta opción
GFS
Disk
App
A0
GFS
Disk
A1
GFS
Disk
A3
GFS
Disk
A4
GFS
Disk
A5
GFS
Disk
A6
GFS
Disk
A7
GFS
Disk
A8
Network
https://upload.wikimedia.org/wikipedia/commons/thumb/7/7c/Yin_and_Yang.svg/1024px-Yin_and_Yang.svg.png
Datos
Cómputo
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
66
Hadoop Breve historia
• Doug Cutting trabajando en Yahoo! e inspirado por estas tecnologías, inicia el desarrollo de Hadoop.
– El proyecto usa Java, implanta las ideas detrás de GFS y MapReduce.
– El logotipo se basa en el elefante amarillo que era el juguete favorito de su hijo.
– Doug Cutting pasó a trabajar a Cloudera.
http://www.enriquedans.com/2011/11/hadoop-el-elefante-omnipresente.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
67
Hadoop Breve historia
• Actualmente es un proyecto de código abierto bajo licencia Apache.
– Su licencia ha facilitado que sea adoptado por un importante número de empresas.
• El apoyo de IBM, Oracle, EMC, etc. ha acelerado su implantación y su mejora en prestaciones.
– Gran uso en proyectos tipo big data.
– JPMorgan Chase: “We’re hiring, and we’re paying10% more than the other guys.”
http://www.informationweek.com/software/information-management/its-next-hot-job-hadoop-guru/d/d-id/1101209?
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
68
http://hadoop.apache.org
• Apache Hadoop es un proyecto software open source para computación distribuida escalable y de confianza (reliable).
• Ofrece un framework que permite la computacióndistribuida de grandes conjuntos de datos mediante clusters de ordenadores usando modelos de programación simples.– Diseñado para poder pasar de un solo servidor a miles de máquinas
(scale-up), donde cada una ofrece tanto computación comoalmacenamiento.
– El framework está diseñado para detectar y tratar fallos a nivel de aplicación. Esto permite ofrecer servicios con alta disponibilidadsobre un cluster de ordenadores (aunque estos tengan fallos).
http://hadoop.apache.org/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
69
B.D.A. Workflow example…
http://www.infoivy.com/2013/12/5-steps-for-big-data-application.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
70
Big Opportunities…
http://online.wsj.com/news/articles/SB10001424127887324178904578340071261396666
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
71
Big Opportunities…
http://www.indeed.com/jobtrends?q=Big-data&relative=1
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
72
Contenidos
– (re)Evolución
– (re)Learning
http://www.siliconweek.es/wp-content/uploads/2013/08/BigData-datos-guardar-almacenamiento-fichero-archivo.jpghttp://datameer2.datameer.com/blog/wp-content/uploads/2012/06/Hadoop-Ecosystem-Infographic-21.png
Big
Big
Big
Big
Big
– Motivación
– Introducción
– Hand-on
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
73
Arquitectura
http://www.monitis.com/blog/2013/12/19/big-data-and-hadoop-whats-it-all-about/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
74
Arquitectura
http://www.sachinpbuzz.com/2014/01/big-data-overview-of-apache-hadoop.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
75
Despliegue
http://blog.csdn.net/suifeng3051/article/details/17288047
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
76
Despliegue
http://blog.csdn.net/suifeng3051/article/details/17288047
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
77
Despliegue
http://blog.csdn.net/suifeng3051/article/details/17288047
hdfs-site.xml:dfs.replication
:9000
:50010
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
78
Despliegue
http://blog.csdn.net/suifeng3051/article/details/17288047
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
79
Despliegue
http://blog.csdn.net/suifeng3051/article/details/17288047
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
80
Despliegue
http://blog.csdn.net/suifeng3051/article/details/17288047
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
81
Despliegue
http://blog.csdn.net/suifeng3051/article/details/17288047
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
82
Contenidos
– (re)Evolución
– (re)Learning
http://www.siliconweek.es/wp-content/uploads/2013/08/BigData-datos-guardar-almacenamiento-fichero-archivo.jpghttp://datameer2.datameer.com/blog/wp-content/uploads/2012/06/Hadoop-Ecosystem-Infographic-21.png
Big
Big
Big
Big
Big
– Motivación
– Introducción
– Hand-on
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
83
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
inactivo
start-all.sh
activo
stop-all.sh
hdfs
namenode
-format
inicial
<hdfs>
<mapReduce>
<monitorizar>
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
84
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
inactivo
start-all.sh
activo
stop-all.sh
hdfs
namenode
-format
inicial
<hdfs>
<mapReduce>
<monitorizar>
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
85
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
hduser@h1:~$ hdfs namenode -format
14/09/25 23:02:59 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = h1/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.5.2
…
14/09/27 23:07:07 INFO blockmanagement.BlockManager: encryptDataTransfer = false
14/09/27 23:07:07 INFO namenode.FSNamesystem: fsOwner = hduser (auth:SIMPLE)
…
14/09/25 23:03:04 INFO util.ExitUtil: Exiting with status 0
14/09/25 23:03:04 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at h1/127.0.1.1
************************************************************/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
86
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
inactivo
start-all.sh
activo
stop-all.sh
hdfs
namenode
-format
inicial
<hdfs>
<mapReduce>
<monitorizar>
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
87
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
hduser@h1:~$ start-all.shThis script is Deprecated. Instead use start-dfs.sh and start-yarn.sh14/09/28 13:31:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-h1.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-h1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-h1.out14/09/28 13:32:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-h1.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-h1.out
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
88
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
hduser@h1:~$ stop-all.shThis script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh14/09/28 13:33:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode14/09/28 13:33:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
89
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
inactivo
start-all.sh
activo
stop-all.sh
hdfs
namenode
-format
inicial
<hdfs>
<mapReduce>
<monitorizar>
• NameNode: http://localhost:50070/
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
90
• SecondaryNameNode: http://localhost:50090/
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
91
• DataNode: http://localhost:50075/
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
92
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
93
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
inactivo
start-all.sh
activo
stop-all.sh
hdfs
namenode
-format
inicial
<hdfs>
<mapReduce>
<monitorizar>
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
94
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
: crear un directorio
hduser@h1:~$ hadoop fs -mkdir -p /user/hduser
: copiar un fichero de local a hadoop
hduser@h1:~$ echo "hdfs test" > hdfsTest.txt
hduser@h1:~$ hadoop fs -copyFromLocal hdfsTest.txt hdfsTest.txt
: ver contenido de un directorio
hduser@h1:~$ hadoop fs -ls
: ver contenido de un archivo
hduser@h1:~$ hadoop fs -cat /user/hduser/hdfsTest.txt
: copiar un fichero de hadoop a local
hduser@h1:~$ hadoop fs -copyToLocal /user/hduser/hdfsTest.txt hdfsTest2.txt
: borrar un fichero
hduser@h1:~$ hadoop fs -rm hdfsTest.txt
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
95
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
hduser@h1:~$ wget http://www.gutenberg.org/files/2000/old/2donq10.txt…
2014-10-04 12:53:30 (1,10 MB/s) - ‘2donq10.txt’ saved [2143292/2143292]
hduser@h1:~$ dos2unix -n 2donq10.txt dq.txt dos2unix: converting file 2donq10.txt to file dq.txt in Unix format ...
hduser@h1:~$ hadoop fs –copyFromLocal -f dq.txt /user/hduser/dq.txt
hduser@h1:~$ hadoop fs -ls /user/hduser…
Found 1 items
-rw-r--r-- 3 hduser supergroup 2143292 2014-10-04 13:09 /user/hduser/dq.txt
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
96
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
inactivo
start-all.sh
activo
stop-all.sh
hdfs
namenode
-format
inicial
<hdfs>
<mapReduce>
<monitorizar>
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
97
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
Nativo Encapsulado
Java Perl, Python, …
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
98
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
hduser@h1:~$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar pi 2 5
Number of Maps = 2
Samples per Map = 5
…
Job Finished in 11.536 seconds
Estimated value of Pi is 3.60000000000000000000
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Running_MapReduce_Job.php
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
99
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
http://wiki.apache.org/hadoop/WordCount
1
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
100
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
http://wiki.apache.org/hadoop/WordCount
2
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
101
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce (Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
http://wiki.apache.org/hadoop/WordCount
3
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
102
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
public static void main (String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
} // class WordCount
http://wiki.apache.org/hadoop/WordCount
4
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
103
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
hduser@h1:/usr/local/hadoop$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /user/hduser/dq.txt /user/hduser/counterj
14/10/04 16:33:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/10/04 16:33:37 INFO input.FileInputFormat: Total input paths to process : 1
14/10/04 16:33:37 INFO mapreduce.JobSubmitter: number of splits:1
14/10/04 16:33:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local835374884_0001
…
File Input Format Counters
Bytes Read=2106143
File Output Format Counters
Bytes Written=454722
hduser@h1:/usr/local/hadoop$ hadoop fs -cat /user/hduser/counterj/* | sort -n -k 2 -r|head -5
…
que 19429
de 17986
y 15887
la 10199
a 9502
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
104
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
Nativo Encapsulado
Java Perl, Python, …
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
105
Hadoop Streaming API
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
mapper.sh
en 1un 1lugar 1…
En un lugar…STDOUTSTDIN
awk '{i=1; while (i<=NF) {gsub(/[\.,;]/,"",$i); print tolower($i)" "1; i++;}}'
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
106
Hadoop Streaming API
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
reducer.sh
en 10un 20lugar 3…
sed 's/ 1$//g' |uniq -c| awk '{print $2" "$1}'|sed 's/^$//g'
en 1un 1lugar 1…
STDOUTSTDIN
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
107
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
hduser@h1:~$ echo “uno uno dos dos tres” | ./mapper.sh | more…
hduser@h1:~$ echo “uno uno dos dos tres” | ./mapper.sh|sort | more…
hduser@h1:~$ echo “uno uno dos dos tres” | ./mapper.sh|sort|./reducer.sh |more…
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
108
Hadoop: solo un nodoPrerequisitos Instalación Uso básico
hduser@h1:/usr/local/hadoop$ hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar -file ./mapper.sh -mapper ./mapper.sh-file ./reducer.sh -reducer ./reducer.sh-input /user/hduser/ -output /user/hduser/counter
packageJobJar: [./mapper.sh, ./reducer.sh] [] /tmp/streamjob724842872862965882.jar tmpDir=null
14/10/04 15:48:02 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
…
File Input Format Counters
Bytes Read=2106143
File Output Format Counters
Bytes Written=320124
14/10/04 15:48:46 INFO streaming.StreamJob: Output directory: /user/hduser/counter
hduser@h1:/usr/local/hadoop$ hadoop fs -cat /user/hduser/counter/part-00000|sort -n -k 2 -r|head -5
…
que 20545
de 18154
y 18053
la 10338
a 9779
http://hadoop.apache.org/docs/r1.1.2/streaming.html#Hadoop+Streaming
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
109
Contenidos
– (re)Evolución
– (re)Learning
http://www.siliconweek.es/wp-content/uploads/2013/08/BigData-datos-guardar-almacenamiento-fichero-archivo.jpghttp://datameer2.datameer.com/blog/wp-content/uploads/2012/06/Hadoop-Ecosystem-Infographic-21.png
Big
Big
Big
Big
Big
– Motivación
– Introducción
– Hand-on
– Alternativas
– Ecosistema
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
110
Alternativas
https://www.simple-talk.com/cloud/data-science/analyze-big-data-with-apache-hadoop-on-windows-azure-preview-service-update-3/
Google Dataflow
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
111
• Data pipelines for helping toingest, transform and, analyze data.
http://techcrunch.com/2014/06/25/google-launches-cloud-dataflow-a-managed-data-processing-service/
Google Dataflow
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
112
https://4.bp.blogspot.com/-RlLeDymI_mU/Vp-1cb3AxNI/AAAAAAAACSQ/5TphliHJA4w/s1600/dataflow%2BASF.png
Google Dataflow
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
113
http://www.slideshare.net/GoogleCloudPlatformJP/google-cloud-dataflow-bqsushi
Google Dataflow
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
114
https://cloud.google.com/solutions/processing-logs-at-scale-using-dataflow
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
115
• Velocidad– Se evita en lo posible escribir en disco
los resultados intermedios
• Facilidad de uso– Transformaciones y operaciones
• Polivalente– En Scala, Python, Java
– Combina SQL, streaming, analítico…
– Trabaja con HDFS, HBase, S3, …
https://spark.apache.org/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
116
http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/
Hadoop Spark
Datos Resultados intermedios en disco Mantener en memoria todo lo posible
Tolerancia a fallos en datos
HDF (2+1) RDD
Procesamiento Trabajos map/reduce DAG
Programación Java + Otros con tuberías Scala, Python, Java, etc.
Escenario Trabajos lentos en batch Batch + Real-time + Iterativo + Interactivo
(Más) E/S de disco + red que Spark (Más) memoria RAM que Hadoop
Marco de trabajo de propósito general
Alternativa más que remplazamiento de Hadoop
vs.
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
117
https://geekytheory.com/apache-spark-que-es-y-como-funciona/http://hadoopgeek.com/wp-content/uploads/2013/12/hdfswrite.png
http://www.slideshare.net/cfregly/spark-streaming-40659876
vs.
RDD (Resilient Distributed Dataset)
• Datos leídos como RDD– Inmutable, regenerable, en memoria
• DataFrame = RDD[fila] + Schema
• Datos leídos en bloques– 64MB por defecto
• Usados como en ficheros
HDFS(HDFS + 3copies)
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
118
https://geekytheory.com/apache-spark-que-es-y-como-funciona/
vs.
DAG (Directed Acyclic Graph)
• DAG con N estados
• Datos intermedios en memoria
• DAG con 2 estados – Map + Reduce
• Datos intermedios a disco
M-R (Map-Reduce)
1 20
S
43
~MPI_Scatter
~MPI_all-to-all
~MPI_Gather
2
1
43
56
7
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
119
https://geekytheory.com/apache-spark-que-es-y-como-funciona/
vs.
RDD + DAG
• Dos tipos operaciones:– Transformaciones (RDD[] -> RDD’[])
– Acciones (RDD[] -> valor)
• Operaciones sobre bloques:– En M se leerán
– En R se escribirán
HDFS + M-R
1 20
S
43
~MPI_Scatter
~MPI_all-to-all
~MPI_Gather
B1 B2
C1 C2
T
T TT
A
T T
T
A
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
120
http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
121
https://www.youtube.com/watch?v=x8xXXqvhZq8
1. val conf = new SparkConf().setMaster("local[2]")
2. val sc = new SparkContext(conf)
3. val lines = sc.textFile(path, 2)
4. val words = lines.flatMap(_.split(" "))
5. val pairs = words.map(word => (word, 1))
6. val wordCounts = pairs.reduceByKey(_ + _)
7. val localValues = wordCounts.take(100)
8. localValues.foreach(r => println(r))
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
122
https://www.youtube.com/watch?v=x8xXXqvhZq8
1. val conf = new SparkConf().setMaster("local[2]")
2. val sc = new SparkContext(conf)
3. val lines = sc.textFile(path, 2)
4. val words = lines.flatMap(_.split(" "))
5. val pairs = words.map(word => (word, 1))
6. val wordCounts = pairs.reduceByKey(_ + _)
7. val localValues = wordCounts.take(100)
8. localValues.foreach(r => println(r))
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
123
https://www.youtube.com/watch?v=x8xXXqvhZq8
1. val conf = new SparkConf().setMaster("local[2]")
2. val sc = new SparkContext(conf)
3. val lines = sc.textFile(path, 2)
4. val words = lines.flatMap(_.split(" "))
5. val pairs = words.map(word => (word, 1))
6. val wordCounts = pairs.reduceByKey(_ + _)
7. val localValues = wordCounts.take(100)
8. localValues.foreach(r => println(r))
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
124
Contenidos
– (re)Evolución
– (re)Learning
http://www.siliconweek.es/wp-content/uploads/2013/08/BigData-datos-guardar-almacenamiento-fichero-archivo.jpghttp://datameer2.datameer.com/blog/wp-content/uploads/2012/06/Hadoop-Ecosystem-Infographic-21.png
Big
Big
Big
Big
Big
– Motivación
– Introducción
– Hand-on
– Ecosistema
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
125
Ecosistema
http://ambuj4bigdata.blogspot.com.es/2014_05_01_archive.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
126
Ecosistema
http://ambuj4bigdata.blogspot.com.es/2014_05_01_archive.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
127
Almacenamiento
http://www.nextree.co.kr/p2865/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
128
Procesamiento
http://www.nextree.co.kr/p2865/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
129
Ecosistema
http://ambuj4bigdata.blogspot.com.es/2014_05_01_archive.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
130
CDH (Cloudera Hadoop Distribution)
http://www.theregister.co.uk/2012/06/05/cloudera_cdh4_hadoop_stack/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
131
AEMR (Amazon Elastic MapReduce)
http://aws.amazon.com/es/elasticmapreduce/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
132
Ecosistema
http://ambuj4bigdata.blogspot.com.es/2014_05_01_archive.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
133
Ecosistema
https://practicalanalytics.files.wordpress.com/2011/11/hadoopevolution.png
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
134
Ecosistema(funcionalidad)
http://mattturck.com/wp-content/uploads/2016/03/Big-Data-Landscape-2016-v18-FINAL.png + http://dfkoz.com/big-data-landscape/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
135
Ecosistema(funcionalidad)
https://s-media-cache-ak0.pinimg.com/originals/0d/4f/0d/0d4f0d8aad9d144c52c696c603b8a27c.jpg
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
136
Ecosistema (estructura)
http://www.nextree.co.kr/p2865/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
137
Almacenamiento
http://www.nextree.co.kr/p2865/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
138
Procesamiento
http://www.nextree.co.kr/p2865/
purchases = LOAD "/data/purchases" AS (id, client_id, product_id);bigpurchases = FILTER purchases BY price > 1000;…
SELECT * FROM purchases WHERE price > 1000 ORDER BY client_id
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
139
Almacenamiento: Integración
http://www.nextree.co.kr/p2865/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
140
Procesamiento: orquestación
http://www.nextree.co.kr/p2865/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
141
Ej.: Flume-Hive-oozie
http://www.datadansandler.com/2013/03/big-data-and-hadoop-assets-available-on.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
142
Ecosistema
http://ambuj4bigdata.blogspot.com.es/2014_05_01_archive.html
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
143
Contenidos
– (re)Evolución
– (re)Learning
http://www.siliconweek.es/wp-content/uploads/2013/08/BigData-datos-guardar-almacenamiento-fichero-archivo.jpghttp://datameer2.datameer.com/blog/wp-content/uploads/2012/06/Hadoop-Ecosystem-Infographic-21.png
Big
Big
Big
Big
Big
– Motivación
– Introducción
– Hand-on
– Ecosistema
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
144
Bibliografía: tutoriales
• Página Web oficial:– http://hadoop.apache.org/
• Introducción a cómo funciona Hadoop:– http://blog.csdn.net/suifeng3051/article/details/17288047
• Tutorial de cómo instalar y usar Hadoop:– http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_
ubuntu_single_node_cluster.php
– http://www.bogotobogo.com/Hadoop/BigData_hadoop_Running_MapReduce_Job.php
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
145
Bibliografía: libro
• Hadoop: The Definitive Guide, 3rd Edition:– http://shop.oreilly.com/product/0636920021773.do
– https://github.com/tomwhite/hadoop-book/
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
146
Bibliografía: TFG
• Extracción de información social desde Twitter y análisis mediante Hadoop.– Autor: Cristian Caballero Montiel
– Tutores: Daniel Higuero Alonso-Mardones y Juan Manuel Tirado Martín
– http://e-archivo.uc3m.es/handle/10016/16784
• Adaptation, Deployment and Evaluation of a Railway Simulator in Cloud Environments – Autora: Silvina Caíno Lores
– Tutor: Alberto García Fernández
Dis
eño
de
Sist
emas
Dis
trib
uid
os
Ale
jan
dro
Cal
der
ón
Mat
eos
147
Agradecimientos
• Por último pero no por ello menos importante,agradecer al personal del Laboratorio del Departamento de Informáticatodos los comentarios y sugerencias para esta presentación.
Diseño de Sistemas DistribuidosMáster en Ciencia y Tecnología Informática
Curso 2016-2017
Alejandro Calderón Mateos & Óscar Pérez Alonso
[email protected] [email protected]
Sistemas escalablesen entornos distribuidos