python script to extract all email addresses from bulk text

Revision: 20902

at December 1, 2009 11:16 by backlashblues

Updated Code

# this script will open a file with email addresses in it, then extract 
# those address and write them to a new file

import os
import re

# vars for filenames
filename = 'emaillist.txt'
newfilename = 'emaillist-rev.txt'

# read file
if os.path.exists(filename):
	data = open(filename,'r')
	bulkemails = data.read()
else:
	print "File not found."
	raise SystemExit

# regex = [email protected]
r = re.compile(r'(\b[\w.]+@+[\w.]+.+[\w.]\b)')
results = r.findall(bulkemails)    

emails = ""   
for x in results:
	emails += str(x)+"\n"	

# function to write file
def writefile():
	f = open(newfilename, 'w')
	f.write(emails)
	f.close()
	print "File written."

# function to handle overwrite question
def overwrite_ok():
	response = raw_input("Are you sure you want to overwrite "+str(newfilename)+"? Yes or No\n")
	if response == "Yes":
		writefile()
	elif response == "No":
		print "Aborted."
	else:
		print "Please enter Yes or No."
		overwrite_ok()

# write/overwrite
if os.path.exists(newfilename):
	overwrite_ok()		
else: 
	writefile()

Revision: 20901

at November 30, 2009 17:10 by backlashblues

Initial Code

>>> re.compile("<([A-Za-z0-9-]+@+[A-Za-z0-9-]+.+[A-Za-z0-9-])>")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x3e85f3ec409b8988>
>>> regex.match(string)
None

# List the groups found
>>> r.groups()
(u'[email protected]',)

# List the named dictionary objects found
>>> r.groupdict()
{}

# Run findall
>>> regex.findall(string)

Initial URL

Initial Description

testing tool: http://www.pythonregex.com/

Initial Title

python script to extract all email addresses from bulk text

Initial Tags

regex, email, python, script

Initial Language

Python

Choose a language for easy browsing: