Thursday, March 22, 2012

Crawling the Web - Made Easy with Python

Earlier today, I began development on Code Crawler. I was stuck between Java and Python. Both have great networking libraries that are extremely easy to use. Java, being my first language, is easier to me, but that's actually what attracted me to Python as the choice for this project - the learning potential.

Anyways, after only maybe an hour of development, I have to say that I am extremely impressed with Python. I had a crawler, reading a list of seed urls from a file, pulling content off the urls, and checking each resources content type to ensure it was allowed (to avoid images) in about 30 lines of code. Not too bad at all.

You can check up on the status of this project here.

No comments:

Post a Comment