Parsed Participle

The personal weblog of Faiz Kazi: Mostly oddities in programming, life in Japan, occasionally music.

[ Home | RSS 2.0 | ATOM 1.0 ]

Mon, 19 Nov 2007

Screen Scraping in this Day And Age

... of RSS, Web2.0 and whatnot. As much as I loved doing it back in old days, screen-scraping (parsing HTML off of web pages with a lot of guesswork) was (and is) yucky. I suppose I enjoyed it back then because I ended up learning a bit of Awk, and later Perl.

But since I've found no good way to avoid missing concerts, other than hope that there's a RSS feed with ticket/date information for bands/artists that I don't want to miss, I have to resort to such nonsense now and then:

use LWP::Simple 'get';
use HTML::TableExtract;
use Data::Dumper;

sub STATUS { 5 }  # The 5th column of table happens to be 'ticket status'

my $te = new HTML::TableExtract;
# slurp!
$te->parse(get 'https://tickets.thepolice.com/');

my ($table) = $te->tables;             # The first and only table in the 
                                       # page is a list of all gigs by city,
                                       # date, and ticket availability
my @tokyo_gigs = grep {                
                    grep /Tokyo/, @$_  # Rows with dates in Tokyo
                 } $table->rows;

# Look out for any changes; at this time, there are only 2 shows in Tokyo
die "Whoa! no gigs in Tokyo??"    unless @tokyo_gigs;
die "Whoa! *more* gigs in Tokyo??"    if @tokyo_gigs > 2;
die "Whoa! only *one* gig in Tokyo??" if @tokyo_gigs == 1;

# ... and if their status is anything other than 'Coming Soon',
# then either tickets sales have begun, or... are already sold out!
print "Whoa! somethings up!\n", Dumper @tokyo_gigs 
    if grep { ! /coming soon/i  }
       map { $_->[STATUS] } @tokyo_gigs;

This is just so I don't miss The Police live at the Tokyo Dome, scheduled in February, 2008.

So having to screen-scrape may suck, but at least there's Perl.
UPDATE:Nov 21, 21:00 JST: Looks like it worked! Well, sort of. I put the script in my crontab and this morning it sent me a mail with "Whoa! no gigs in Tokyo??" in the body; and sure enough, it seems that the presale Tokyo tickets status had changed - a link that said "Buy Tickets" is in it's place.
(Of course, it's a totally different issue that the site in question does not seem to let one purchase tickets for the Tokyo venues - how lame! Well, at least I am early enough to buy the 'general public' tickets on time.)
posted: 07:58 | path: /programming | permanent link to this entry
Tags:

Name:


E-mail:


URL:


Comment:




Sections

< November 2007 >
SuMoTuWeThFrSa
     1 2 3
4 5 6 7 8 910
11121314151617
18192021222324
252627282930 

[ Home | RSS 2.0 | ATOM 1.0 ]