Portuguese data scraping experiment: parliament activities and YQL

It looks simple and effective, and something like this should have been made before by a newspaper or other news outlet. But, once again, it’s the geek community that steps forward and does a useful tool using public data (although I heard recently that a political editor working in an important media institution said that the deputies attendance was a “state secret”. Yeah, right…).  And why ? Just for the fun of it!

List of deputies by electoral circles

The mind behind this is experiment is Luis Confraria, that works as a front end developer at Outbox Ativism, a company that specializes in digital and web products for companies and institutions. So he’s not a journalist, but he decided to experiment with some tools to create what can be considered as a journalistic product. I asked him why: “Well my main motivation was.. fun! Also ,I liked the idea of building upon some  government website that indeed contained all the data but not in the most useful way. Besides, I really wanted to learn and test some stuff i used in it.

The project is based on the Portuguese Parliament website, and shows all the current deputies by party or electoral circle, and the profile and participation of each one of them. The original information is not organized in a easy to use way, so Luis resorted to some data scraping techniques. “At first I started scraping the data with a little python script but then I went with yql. Made a few open tables, and pushed them to github. On the client side i just used simple html / css / javascript with jquery, sammy.js.

There aren’t many open-data or data scraping projects in Portugal, and most of the ones I know are created by non-journalists, and he shared the example of  transparencia-pt.org. Is there a lack of programmers in Portuguese journalism? I think so, but Luis goes even further: “There is probably a lack of programmers in society at large :) not just in journalism.” Maybe journalists are  just not having fun with their work.

I’m optimistic about this. We will see more and more of these projects because there are better and faster tools each day and because people do really care.” To which he adds: “The most important is to have public data in the simplest format possible. The rest will come naturally.

I’m not as optimistic as he is, but I agree: there is a need, and things will find its own course. But it was hard to find space for videographers in the newsrooms, so I’m not confident about the future of programmers in the new newsrooms. At least in most of them. Now that he did this, what comes next?

Probably i will tweak and fix it some more. (It still looks like crap on ie). I hope someone builds something else with the yql tables. I have some ideas popping now and then but nothing mature enough.”

We’ll be looking out for them.

Luis shared the YQL tables, so have fun:

and here is the code for the tables:
Detailed information for each deputy: attendance, participation and profile

Leave a Reply