BBC: Using Semantic Technology as a base for our World Cup site is a Revolution; Doors open for Linked Data

Top Quote Ontotext announced, the company's own semantic repository BigOWLIM was successfully integrated into the high performance Semantic Web publishing stack powering the BBC's 2010 World Cup website End Quote
  • (1888PressRelease) August 18, 2010 - As Ontotext announced earlier, the company's own semantic repository BigOWLIM was successfully integrated into the high performance Semantic Web publishing stack powering the BBC's 2010 World Cup website. BigOWLIM is used there as a triple-store performing OWL reasoning on continuously changing data and handling millions of page requests per day.

    A couple of recent blog posts from the technical team at BBC provide an insight into the business case for deployment of semantic technologies in their World Cup website, the technical architecture of the publishing stack, the strategic importance of the project's success and the plans for usage of semantic technology and linked data within the BBC.

    In "The World Cup and a call to action around Linked Data", John O'Donovan, Chief Technical Architect, Journalism and Knowledge, BBC Future Media & Technology, discusses the business benefits of the implemented semantic solution:

    "The World Cup site is our first major statement on how we think this (the Semantic Web) can work for mass market media and a showcase for the benefits it brings. … Though we have been using RDF and linked data on some other sites (…) we believe this is the first large scale, mass media site to be using concept extraction, RDF and a Triple store to deliver content."

    "…we are not publishing pages, but publishing content as assets which are then organised by the metadata dynamically into pages, but could be re-organised into any format we want much more easily than we could before. …There is also a change in editorial workflow for creating content and managing the site. This changes from publishing stories and index pages, to one where you publish content and check the suggested tags are correct. The index pages are published automatically. This process is what assures us of the highest quality output, but still saves large amounts of time in managing the site and makes it possible for us to efficiently run so many pages for the World Cup."

    "As more content has Linked Data principles applied to it … the vision of a Semantic Web moves closer. Importantly, what we have been able to show with the World Cup, is that the technology behind this is ready to deliver large scale products."

    "This is more than just a technical exercise - we have delivered real benefits back to the business as well as establishing a future model for more dynamic publishing which we think will allow us to make best use of our content and also use Linked Data to more accurately share this content and link out to other sites and content, a key goal for the BBC. We look forward to seeing the use of Linked Data grow as we move towards a more Semantic Web."

    In a following post "BBC World Cup 2010 dynamic semantic publishing", Jem Rayfield, Senior Technical Architect, BBC News and Knowledge, provides more information on the technical architecture of the high-performance publishing stack and the related data flows and data modelling:

    "The World Cup 2010 website is a significant step change in the way that content is published. … As you navigate through the site it becomes apparent that this is a far deeper and richer use of content than can be achieved through traditional CMS-driven publishing solutions."

    "The site features 700-plus team, group and player pages, which are powered by a high-performance dynamic semantic publishing framework. This framework facilitates the publication of automated metadata-driven web pages that are light-touch, requiring minimal journalistic management, as they automatically aggregate and render links to relevant stories."

    "The foundation of these dynamic aggregations is a rich ontological domain model. The ontology describes entity existence, groups and relationships between the things/concepts that describe the World Cup. For example, "Frank Lampard" is part of the "England Squad" and the "England Squad" competes in "Group C" of the "FIFA World Cup 2010". The ontology also describes journalist-authored assets (stories, blogs, profiles, images, video and statistics) and enables them to be associated to concepts within the domain model…."

    "A RDF triplestore (ref. BigOWLIM) and SPARQL approach was chosen over and above traditional relational database technologies due to the requirements for interpretation of metadata with respect to an ontological domain model. The high level goal is that the domain ontology allows for intelligent mapping of journalist assets to concepts and queries. The chosen triple store provides reasoning following the forward-chaining model and thus implied inferred statements are automatically derived from the explicitly applied journalist metadata concepts."

    "This inference capability makes both the journalist tagging and the triple store powered SPARQL queries simpler and indeed quicker than a traditional SQL approach. Dynamic aggregations based on inferred statements increase the quality and breadth of content across the site. The RDF triple approach also facilitates agile modeling, whereas traditional relational schema modeling is less flexible and also increases query complexity."

    "Our triple store is deployed multi-data center in a resilient, clustered, performant and horizontally scalable fashion, allowing future expansion for additional ontologies and indeed linked open data (LOD) sets. … The triple store is abstracted via a JAVA/Spring/CXF JSR 311 compliant REST service. ... The API is designed as a generic façade onto the triple store allowing RDF data to be re-purposed and re-used pan BBC. This service orchestrates SPARQL queries and ensures that results are dynamically cached with a low 'time-to-live' (TTL) (1 minute) expiry cross data center using memcached."

    "This dynamic semantic publishing architecture has been serving millions of page requests a day throughout the World Cup with continually changing OWL reasoned semantic RDF data. The platform currently serves an average of a million SPARQL queries a day with a peak RDF transaction rate of 100s of player statistics per minute. …."

    "The development of this new high-performance dynamic semantic publishing stack is a great innovation for the BBC as we are the first to use this technology on such a high-profile site. It also puts us at the cutting edge of development for the next phase of the Internet, Web 3.0."

    The blog posts by BBC set off a wave of enthusiasm in the community. Some of the reflections are cited below.

    BBC World Cup Website Showcases Semantic Technologies, post by Richard MacManus, the founder of ReadWrite Web. "…if there was a World Cup for the Semantic Web, then the BBC may have lifted the trophy for its country"

    First BBC microsite powered by a triple-store, post by Yves Raimond at DBTune. "All this is very exciting, the World Cup Website proved that triple store technologies can be used to drive a production website with significant traffic. I am expecting lots more parts of the BBC web infrastructure to evolve in the same way :-)"

    These posts were accompanied by an avalanche of twits and blog comments, dominated by the words "impressive". A few of the comments of the ReadWriteWeb post were particularly pathetic: "excellent technology on both software and hardware", "It Begins ...", "It's really fantastic to see organizations like the BBC building really exciting semantic applications which demonstrate quantifiable business value. Great stuff!..."

    ###
space
space
  • FB Icon Twitter Icon In-Icon
Contact Information