Tag Archives: data

Data+Design – Free Ebook

Download this free ebook on how to use Data+Design to tell your stories:

Whether you’re writing an article for your newspaper, showing the results of a campaign, introducing your academic research, illustrating your team’s performance metrics, or shedding light on civic issues, you need to know how to present your data so that other people can understand it.

Regardless of what tools you use to collect data and build visualizations, as an author you need to make decisions around your subjects and datasets in order to tell a good story. And for that, you need to understand key topics in collecting, cleaning, and visualizing data.

This free, Creative Commons-licensed e-book explains important data concepts in simple language. Think of it as an in-depth data FAQ for graphic designers, content producers, and less-technical folks who want some extra help knowing where to begin, and what to watch out for when visualizing information.

Download the ebook

Livro “Scraping for Journalists”: como recolher e analisar dados

Se me tivessem dito há uns anos atrás que para ser um jornalista moderno teria que saber trabalhar com o Excel e bases de dados provavelmente teria escolhido outra área.

Saber trabalhar com dados é uma das características mais importantes dos jornalistas digitais, pensem só na quantidade de informação estatística que está aí ao nosso dispor, em quantidades tão avassaladoras que é difícil perceber onde está a história e que outras histórias se podem encontrar.

Para nos ajudar a encontrar estas agulhas no palheiro da informação digital Paul Bradshaw escreveu um livro muito prático sobre como inquirir fontes (normalmente, bases de dados) e analisar os resultados de forma eficaz.Apesar de estar orientado para uma realidade anglo-saxónica, onde a disponibilização e organização de dados estar muito à frente da portuguesa, é uma excelente forma de entrar nesta área.

Tem alguma componente de programação, por isso não será para todos, é preciso saber mais do que escrever e editar no mundo digital onde vivemos.Podem ver aqui um excerto em PDF.

“Scraping for Journalists introduces you to a range of scraping techniques – from very simple scraping techniques which are no more complicated than a spreadsheet formula, to more complex challenges such as scraping databases or hundreds of documents. At every stage you’ll see results – but you’ll also be building towards more ambitious and powerful tools.

You’ll be scraping within 5 minutes of reading the first chapter – but more importantly you’ll be learning key principles and techniques for dealing with scraping problems.”

Scraping for Journalists

Paul Bradshaw foi o meu professor no Mestrado de Jornalismo Online em Birmingham.



Breadth Portfolio: Part 3 – Data Visualization

The last part of the series with parts of the report I made for the Multimedia Journalism module, this time about data visualization.

Experiments in Data Visualization

One of the fields I’ve been interested the most is in data journalism, and the visual representation of information. It takes two seemingly opposite mindsets to work with data: one of a statistician and the other of a designer, the analytical and the creative side by side. But I found that data, to be interesting to the audience must have one or both of these features, besides being accurate and relevant: it has to be visually compelling and/or interactive. Once again, technology comes to the rescue, and at the same time can lead us to disaster. The huge amount of tools available to organize and present data relies in different coding languages, mostly Javascript and Flash, and if we are to use live data we must know how to use APIs to direct content into our application. And even if we find a software that does it all for us, we need to know which story we’re telling.

I did some research about newspaper brand values and online traffic of Portuguese news websites, and I came across with the monthly traffic report for all of them. My goal was to understand the relative and proportional position of each one, regarding visits, page views, and how those two values relate to each other. The data I got also has portals, specialized websites, and entertainment magazines so it has a broad range of themes (all charts are available live here – http://is.gd/aZLXs)

First of all I wanted to have a general overview of the size of each one when it comes to space within the Portuguese online universe, so I went for a tree map. The view wasn’t clear enough, so I tried a different approach, using a bubble chart, highlighting just the focus of my research, which were specifically newspapers. It was a better option, and I could have done things even more interesting if I added the paper circulation data. One of the conclusions that I would have found was that best selling newspapers don’t necessarily do equally well online.

But the real risk when connecting data is to draw wrong conclusions from fact. So these are the most successful websites, looking at these data sets right? Well, maybe not. What if we try to understand who are better at engaging the audience and get more page views per visit?

The idea that I got is that we can use data visualization to find a perspective on the subject and that will lead us to a better understanding of what were just numbers on a paper.

Visual representations of online traffic for Portuguese news websites

I’ve been working on my assignment for the Online Journalism module, where we are supposed to experiment with different ways of presenting information. Since i was already looking into this data, i decided to try ManyEyes to produce three different charts for online traffic of Portuguese news websites.

They are not visually outstanding, but i think they convey a pretty good notion how the online universe is organized in Portugal. The three values i used were Visits, Pageviews and the relationship between those two. This first tree map shows how much space each website takes, and you can select different sets of values in the drop down menu below.


But my goal was to highlight just the most important newspapers. So i highlighted them in a bubble chart:


But what i really wanted  was how to measure success. Do more visits and pageviews mean the product is more compelling? I tried something else:


We can see that the bulk of the audience engages more with specialized, leisure publications, and that the outliers in terms of visits and pegeviews are not “page turners”, so to speak.

What conclusions do you draw from these representations (besides that i’m not that good at it)?

PorData: Portugal’s Database | A Base de dados de Portugal

Have Data Will Mashup

Pordata.pt is a new website supported by the Francisco Manuel dos Santos Foundation and “aims to make statistical data available in three main phases: for Portugal (1st phase), for Portugal and the countries of the EU 27 (2nd phase) and the Portuguese regions and municipalities (3rd phase). The vector common to all the information presented is time. Published in chronological series, the information is related to a long period, which begins, wherever possible, in 1960 and continues to the present day.”

All these statistics available gave me the mashup frenzy. My questions is: now that they’re available, will someone do anything good with them?

Pordata é o novo site apoiado pela Fundação Francisco Manuel dos Santos e “prevê disponibilizar os dados estatísticos em três fases principais: para Portugal (1.ª fase), para Portugal e países da UE 27 (2.ª fase) e para as regiões e municípios portugueses (3.ª fase). O vector comum a toda a informação apresentada é o tempo. Publicada sob a forma de séries cronológicas, a informação incide sobre um longo período, que se inicia, sempre que possível, em 1960 e se prolonga até à actualidade.”

Todas estas estatísticas deram-me desejos de fazer coisas com elas. A minha pergunta é: agora que estão disponíveis, irá alguém fazer alguma coisa de jeito?