/* Google Analytics ----------------------------------------------- */

Monday, January 4, 2010

Data, links and hyperlink ..

We are in 2010, Great! Happy New year to all the 30 people reading this blog!
I'm not famous, but I have a personal grand challenge I would like somebody to solve.

When will it be possible from a PDF/HTML document, with no hyperlink, to deduce:
  • all articles and authors linked to it?
  • the buzz around the thesis developed in the document?
  • how this document and the thesis developed in it is associated to others theories' (list or chronology order)
  • how far this document is from others, is there any copy?
  • is there some patents related to it?
Of course this grand challenge should cover different languages and different representation (DNA can be described in several ways). Of course, we can say, we need more semantics to help and the technology still not widely developed and deployed. But it's not a good excuse anymore.

If we could benefit from all the computing power we have to try to deduce some possible connections between subjects, theories, information, data, it could be very fruitful. For example, why was it impossible to see the systemic crisis coming from the USA in 2009? I can not imagine that people are so focused on their own small world that they can not do better. May be because of the speed, but even there, .it took some time for the crisis to appear!

The more power we offer to people, the more data they can leverage and the less they digg ... Seems strange to me. Google is very good to crawl millions of web sites and to offer good technologies, but who is offering to companies all the needed power, data and tool to explore the data ocean in search for new continent? That's where we need to focus on. Instantaneous answers to a simple question is great, but what about with millions of answers finding the right question?