One of the most important tools you'll need when writing tests with twill is a parser to scrape the contents of the pages that you'll be introspecting with your tests. To verify functionality from the end-user's perspective, you'll likely need to assert that specific pieces of content are being displayed. One way to do this is to scan the page content for whatever the acceptance criteria are using regular expressions. A better way is to break the page components apart and look at the contents of a single cell or div. This is where BeautifulSoup fits the bill.
BeautifulSoup is an HTML/XML parser that turns your page content into an objectified hierarchy. Trust us, you don't want to end up maintaining a nasty collection of regular expressions to get to the content you need. Instead, access content by its location and identity. If you're not maintaining the templates in the application, this is a great time to collaborate with those who are so that your job is even easier.
To start using BeautifulSoup, you can use easy_install to grab it from PyPi. (Mac users: don't forget to use sudo)
kevins-macbook:~ kevin$ sudo easy_install BeautifulSoup Searching for BeautifulSoup Reading http://pypi.python.org/simple/BeautifulSoup/ Reading http://www.crummy.com/software/BeautifulSoup/ Reading http://www.crummy.com/software/BeautifulSoup/download/ Best match: BeautifulSoup 3.1.0.1 Downloading http://www.crummy.com/software/BeautifulSoup/download/BeautifulSoup-3.1.0.1.tar.gz Processing BeautifulSoup-3.1.0.1.tar.gz Running BeautifulSoup-3.1.0.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-JYLoUB/BeautifulSoup-3.1.0.1/egg-dist-tmp-mbTUsh Adding BeautifulSoup 3.1.0.1 to easy-install.pth file Installing testall.sh script to /usr/local/bin Installing to3.sh script to /usr/local/bin Installed /Library/Python/2.5/site-packages/BeautifulSoup-3.1.0.1-py2.5.egg Processing dependencies for BeautifulSoup Finished processing dependencies for BeautifulSoup kevins-macbook:~ kevin$
This will install it in your Python distribution's site-packages directory.
Consider the following HTML:
My Awesome Page This is paragraph one .
This is paragraph two .
From the BeautifulSoup docs: "A Beautiful Soup constructor takes an HTML (or XML) document in the form of a string (or an open file-like object). It parses the document and creates a corresponding data structure in memory." Let's go ahead and give that a shot. Assuming that we've created a string object named html with the contents of our page:
from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) # The 'contents' attr is a list of elements within the page print soup.contents[0].name Out[1]: html # Since we're at the document level, we only see the html element. Everything else is contained within: doc = soup.contents[0] head = doc.contents[0] body = doc.contents[1]
Now that we have an idea of how this object is structured, let's walk the parse tree!
print body.contents Out[2]: [] print body.findChildren() # This will find all child elements, even nested ones! Out[3]: [This is paragraph one.
This is paragraph two.
,This is paragraph one.
This is paragraph two.
,This is paragraph one.
This is paragraph one.
, one,,This is paragraph two.
This is paragraph two.
, two] print soup.body.div.div.p.b # Walk the nested tree by element name Out[4]: one print soup.body.div.div.p.b.string # The string attr on an element will give you the unicode representation Out[5]: u'one'
Good stuff, but should we be expected to modify our tests every time a trigger-happy production person throws in an extra div? Of course not! That's why the search mechanism is so important. Witness the sheer power of the findAll method:
soup.findAll('p') # Pass in an element name in a string Out[6]: [This is paragraph one.
,This is paragraph two.
] soup.findAll(['p', 'b']) # Or, pass in a list of element names to find Out[7]: [This is paragraph one.
, one,This is paragraph two.
, two] soup.findAll(align="meh") # Want to search by attribute? Out[8]: [This is paragraph two.
] soup.find("div", { "class" : "b" }) # For attribute searches that are reserved words Out[9]:soup.find("p", { "id" : "two" }) # Or just because... Out[10]:This is paragraph two.
This is paragraph two.
Now you can see how important it is to communicate with the production team, so that you can create proprietary attributes on elements to make them more accessible. Also, combining a parser like this with a base Page Class makes it that much more powerful!
Fluid 960 Grid System, created by Stephen Bau, based on the 960 Grid System by Nathan Smith. Released under the GPL/ MIT Licenses.