We have cut up a picture into narrow vertical strips. These image strips are located somewhere on the Internet, each with their own URL (uniform resource locator1). Your mission (should you decide to accept it) is to find the URLs and download all image stripes to re-create the original image.

The slice URLs are hidden inside apache log files (the open source apache web server is the most widely used server on the internet). Here is what a single line from the log file looks like (this really is what apache log files look like):

212.77.55.128 - - [06/Feb/2010:00:24:35 -0700] "GET spoj/problems/puzzle/eukuk-ruwpbl.jpg HTTP/1.1" 404 493104 "-" "googlebot-mscrawl-moma (enterprise; bar-XYZ; foo123@google.com,foo123@google.com,foo123@google.com,foo123@google.com)"
128.34.153.184 - - [06/Feb/2010:00:26:12 -0700] "GET uypys/bclkvy/tgtquo-rds-zmcps HTTP/1.0" 301 858367 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6; Google-TR-5.1.706.29690-en) Gecko/20070725 Firefox/2.0.0.6"
163.196.162.170 - - [06/Feb/2010:00:33:25 -0700] "GET spoj/problems/puzzle/jwbsb-zcckx.jpg HTTP/1.0" 200 453671 "-" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1"
174.76.115.140 - - [06/Feb/2010:00:36:17 -0700] "GET spoj/problems/puzzle/omkzx-idk.jpg HTTP/1.1" 200 855063 "-" "googlebot-mscrawl-moma (enterprise; bar-XYZ; foo123@google.com,foo123@google.com,foo123@google.com,foo123@google.com)"
5.254.189.77 - - [06/Feb/2010:00:46:03 -0700] "GET bgcic/cdx/nqegmo/pzdvszjanh-jhqbj-fxqbfw-wzb HTTP/1.0" 200 320132 "-" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.1; Google-TR-3) Gecko/20060111 Firefox/1.5.0.1"

The first few numbers are the IP-address2 of the requesting browser. The most interesting part is the "GET path HTTP" showing the path of a web request received by the server. The path itself never contains spaces, and is separated from the GET and HTTP by spaces.

In the log file, the image strips have a path name that includes the term puzzle. URLs that occur multiple times, are to be deduplicated. The file name of each image strip contains one or more hyphens (-). The image strips must be arranged alphabetically according to the part after the last hyphen. No distinction should be made between uppercase and lowercase letters. The sort key of the URL on the first line in the example apache file for example is ruwbpl.jpg.

Assignment

Write a function puzzlepieces to which the location of a log file should be passed as an argument. In that log the strips of the image are hidden. The function should return a list with the URLs of the image strips, deduplicated and sorted according to the key that was discussed above.

Note: By way of verification, you also get to see a graphical representation of the image strips that you have achieved from the log upon feedback.

Example

In the following interactive session we assume that the file angkorwat.log3  is in the current directory.

>>> puzzlepieces('angkorwat.log')
['spoj/problems/puzzle/iif-fpuo.jpg', 'spoj/problems/puzzle/xgfl-ftdzjc.jpg', 'spoj/problems/puzzle/ewktni-hhrir.jpg', 'spoj/problems/puzzle/omkzx-idk.jpg', 'spoj/problems/puzzle/lwpmuz-lmwp.jpg', 'spoj/problems/puzzle/srktu-nygk.jpg', 'spoj/problems/puzzle/xpyxjs-oocpc.jpg', 'spoj/problems/puzzle/eukuk-ruwpbl.jpg', 'spoj/problems/puzzle/mwyz-zalyj.jpg', 'spoj/problems/puzzle/jwbsb-zcckx.jpg']

If we collect these images from the web server http://users.ugent.be/~pdawyndt and put them next to each other, then the strips form the image below: