LAS LISTAS 2 Y LAS CORPORAS DAY 12 - 2/6/15 SPAN 4350 Cultura computacional en español Harry Howard...
-
Upload
ruben-benitez-olivares -
Category
Documents
-
view
214 -
download
0
Transcript of LAS LISTAS 2 Y LAS CORPORAS DAY 12 - 2/6/15 SPAN 4350 Cultura computacional en español Harry Howard...
Las Listas 2 y las corporasDay 12 - 2/6/15SPAN 4350
Cultura computacional en español
Harry Howard
Tulane University
Organización del curso
6-feb-2015CultCompES, Prof. Howard, Tulane University
2
http://www.tulane.edu/~howard/Span4350/
http://www.tulane.edu/~howard/CompCultES/1. cultcomp2. python 3. cadenas4. unicode 5. exreg 6. listas 7. nltk_archives
Una lista es una secuencia de objetos entre corchetes.
Repaso
6-feb-2015
3
CultCompES, Prof. Howard, Tulane University
La mayoría de los métodos de las cadenas funcionan con las listas
6-feb-2015CultCompES, Prof. Howard, Tulane University
4
split() vs. join()
6-feb-2015CultCompES, Prof. Howard, Tulane University
5
§6. Las listas
6-feb-2015
6
CultCompES, Prof. Howard, Tulane University
6.2.3. ¿Qué métodos se permiten con una lista pero no con una cadena?1. >>> L1 = ['Miguel', 'Cervantes']2. >>> L1.append('de Saavedra') 3. >>> del L1[2] 4. >>> L1.insert(1, 'de Saavedra') 5. >>> L1.remove('de Saavedra') 6. >>> L1[0] = 'Miguelito' 7. >>> L1.append('de Saavedra') 8. >>> L1.pop(2) 9. >>> L1.reverse()
6-feb-2015CultCompES, Prof. Howard, Tulane University
7
http://www.tulane.edu/~howard/CompCultES/nltk_archives.html
7. NLTK and Internet corpora
6-feb-2015
8
CultCompES, Prof. Howard, Tulane University
Configurar el directorio de trabajo global
Crea una carpeta "pyScripts" en tu carpeta de documentos.
En Spyder > Preferences > Global Working directory: "At start-up, the global working directory is …
the following directory (navega a "pyScripts" y pínchala)
"Files are opened from: … the global working directory.
"Files are created in: … the global working directory.
6-feb-2015CultCompES, Prof. Howard, Tulane University
9
7.1.1. How to navigate folders with os1. >>> import os
2. >>> os.getcwd()
3. '/Users/harryhow/Documents/pyScripts'
4. # if the path is not to your pyScripts folder, then change it:
5. >>> os.chdir('/Users/{your_user_name}/Documents/pyScripts/')
6. >>> os.getcwd()
7. '/Users/{your_user_name}/Documents/pyScripts/'
13-Oct-2014NLP, Prof. Howard, Tulane University
10
7.1.2. Project Gutenberghttp://www.gutenberg.org/ebooks/28554
13-Oct-2014NLP, Prof. Howard, Tulane University
11
7.1.3. How to download a file with urllib and convert it to a string with read()1. >>> from urllib import urlopen
2. >>> url = 'http://www.gutenberg.org/cache/epub/28554/pg28554.txt'
3. >>> download = urlopen(url)
4. >>> downloadString = download.read()
5. >>> type(downloadString)
6. >>> len(downloadString) # 35739?
7. >>> downloadString[:50]
13-Oct-2014NLP, Prof. Howard, Tulane University
12
7.1.4. How to save a file to your drive with open(), write(), and close() # it is assumed that Python is looking at your
pyScripts folder >>> tempFile = open('Cervantes.txt','w') >>> tempFile.write(downloadString.encode('utf8')) >>> tempFile.close() # import os if you haven't already done so >>> os.listdir('.')
13-Oct-2014NLP, Prof. Howard, Tulane University
13
7.1.5. How to look at a file with open() and read()
1. >>> tempFile = open('Cervantes.txt','r')
2. >>> text = tempFile.read()
3. >>> type(text)
4. >>> len(text)
5. >>> text[:50]
13-Oct-2014NLP, Prof. Howard, Tulane University
14
7.1.6. How to slice away what you don’t need
1. >>> text.index('*** START OF THIS PROJECT GUTENBERG EBOOK')
2. 499
3. >>> lineIndex = text.index('*** START OF THIS PROJECT GUTENBERG EBOOK')
4. >>> startIndex = text.index('\n',lineIndex)
5. >>> text[:startIndex]
6. >>> text.index('*** END OF THIS PROJECT GUTENBERG EBOOK')
7. >>> endIndex = text.index('*** END OF THIS PROJECT GUTENBERG EBOOK')
8. >>> story = text[startIndex:endIndex]
13-Oct-2014NLP, Prof. Howard, Tulane University
15
Now save it as “Wub.txt”
1. # it is assumed that Python is looking at your pyScripts folder
2. >>> tempFile = open('Cervantes.txt','w')
3. >>> tempFile.write(story.encode('utf8'))
4. >>> tempFile.close()
13-Oct-2014NLP, Prof. Howard, Tulane University
16
P3 sobre unicode y listas§7. Corpora
El próximo día
6-feb-2015CultCompES, Prof. Howard, Tulane University
17