Jump to content

How to use the Sesame Java API to power a Web or Client-Server Application

+ 1
  kiwitobes's Photo
Posted Oct 27 2009 09:27 PM

A number of good graph-store solutions have emerged since developers first started building semantic web technologies. Sesame is one of the leading graph stores and is considered to have excellent performance.

Sesame is an open source Java framework for querying and storing RDF data; it was originally developed by the Dutch company Aduna as a research prototype for the European Union research project On-To-Knowledge. It's currently developed as community project and hosted at http://openrdf.org. Sesame has an excellent administration interface included in the distribution and it's easy to install.

You'll need a Java Runtime Environment (JRE) that works with at least Java 5 to use Sesame. Mac OS X and many Linux distributions come with Java pre-installed. If you don't already have it, you can download the latest version from http://java.sun.com/javase/downloads/.

Like RDFLib, Sesame can be embedded in applications, but unlike RDFLib, it can also be used in a standalone server mode, much like a traditional database with multiple applications connecting to it. We will briefly look at how to use Sesame through its native API in order to learn about some of its features.

Sesame uses a modular architecture that allows capabilities to be snapped together as needed. The native Java APIs gives you great control over using as many or as few of these capabilities as you desire. To get started, we will create a class that wraps a Sesame repository and encapsulates the most frequently used graph operations, allowing us to use Sesame in much the same way we have been using RDFLib.

You can build this class as we walk through the example, or you can download the completed class from http://semprog.com/p...impleGraph.java. Although the SimpleGraph class is useful on its own, it's really just a starting point from which to explore various capabilities of Sesame for your own needs.

We will start by creating the class and adding two constructors. The first constructor instantiates a bare-bones in-memory graph repository; the second will take a boolean, which, when true, creates an in-memory store that supports a form of forward-chaining inference directly in the data store. Like our previous encounters with forward-chaining inference, this one incorporates a set of rules that automatically expands certain RDFS information in your ontologies into additional assertions. So, for example, classes that are declared as rdfs:subClassOf of a parent class will have rdf:type information corresponding to both their class and their parent class asserted in the graph:

import org.openrdf.query.*;
import org.openrdf.model.vocabulary.*;
import org.openrdf.repository.*;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.inferencer.fc.ForwardChainingRDFSInferencer;
import org.openrdf.sail.memory.MemoryStore;
import org.openrdf.rio.*;
import org.openrdf.model.*;

import java.net.URL;
import java.net.URLConnection;
import java.util.*;
import java.io.*;

public class SimpleGraph {

    Repository therepository = null; 
    
    // useful -local- constants
    static RDFFormat NTRIPLES = RDFFormat.NTRIPLES;
    static RDFFormat N3 = RDFFormat.N3;
    static RDFFormat RDFXML = RDFFormat.RDFXML;
    static String RDFTYPE =  RDF.TYPE.toString();
    
    /**
     *  In memory Sesame repository without inferencing
     */
    public SimpleGraph(){
        this(false);
    } /**
     * In memory Sesame repository with optional inferencing
     * @param inferencing
     */
    public SimpleGraph(boolean inferencing){
        try {
            if (inferencing){
            therepository = 
                new SailRepository(new ForwardChainingRDFSInferencer(new 
                    MemoryStore()));
                
            } else {
                therepository = new SailRepository(new MemoryStore());
            }
            therepository.initialize();
        } catch (RepositoryException e) {
            e.printStackTrace();
        }
    }
}


The SimpleGraph class provides access to a select set of Sesame’s high-level (or repository-layer) APIs. The repository layer is itself a high-level abstraction, shielding us from the details of Sesame’s lower-level Storage And Inference Layer (Sail). The low-level Sail components manage the actual persistence and manipulation of data and are configurable, allowing us to mix and match various storage options through a set of "stackable" components. When we pass true to the SimpleGraph constructor, it stacks the memory-based graph store under the RDFS inferencing component and returns this configuration wrapped in an easy-to-use Sail repository interface.

We could further configure the MemoryStore to save its state to disk on a periodic basis, allowing us to recover the state of the graph on the next instantiation of the class. To do this, we need only pass a file path (string) to the MemoryStore constructor. Alternatively, we could replace the MemoryStore altogether with an interface-compatible NativeStore, which would allow us to efficiently expand the size of the graph beyond the limitations of main memory. To make this change, replace the call to the MemoryStore constructor with a call to the NativeStore constructor (org.openrdf.sail.nativerdf.NativeStore), along with a java.io.File object that will be used to store the disk image of the graph, e.g., new NativeStore(new File("/file/path")).

Next, we will add some methods for creating objects like URIrefs, BNodes, and literals. As you see from the rest of the methods in our class, the repository’s getConnection() method provides access to the repository for high-level data manipulation. Once a repository connection has been obtained, it should be released whether the operation was successful or not; hence, we wrap all of the connections in try/finally blocks. Obviously, a good implementation of SimpleGraph should do something more useful on exceptions than simply printing a stack trace:

    /**
     *  Literal factory
     * 
     * @param s the literal value
     * @param typeuri uri representing the type (generally xsd)
     * @return
     */
    public org.openrdf.model.Literal Literal(String s, URI typeuri) {
        try {
            RepositoryConnection con = therepository.getConnection();
            try {
                ValueFactory vf = con.getValueFactory();
                if (typeuri == null) {
                    return vf.createLiteral(s);
                } else {
                    return vf.createLiteral(s, typeuri);
                }
            } finally {
                con.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }
    
    /**
     * Untyped Literal factory
     * 
     * @param s the literal
     * @return
     */
    public org.openrdf.model.Literal Literal(String s) {
        return Literal(s, null);
    }
    
    /**
     *  URIref factory
     * 
     * @param uri
     * @return
     */
    public URI URIref(String uri) {
        try {
            RepositoryConnection con = therepository.getConnection();
            try {
                ValueFactory vf = con.getValueFactory();
                return vf.createURI(uri);
            } finally {
                con.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }
    
    /**
     *  BNode factory
     * 
     * @return
     */
    public BNode bnode() {
        try{
            RepositoryConnection con = therepository.getConnection();
            try {
                ValueFactory vf = con.getValueFactory();
                return vf.createBNode();
            } finally {
                con.close();
            }
        }catch(Exception e){
            e.printStackTrace();
            return null;
        }
    }


With the factories in place to create the raw pieces of an RDF statement, let's add methods to populate the graph. The first method allows us to assert raw triples into the graph composed of the objects obtained from the factory methods we just added.

The addString, addFile, and addURI methods take serialized RDF and add it to the repository. The format parameter is used to select the proper RDF parser for the serialization being loaded (and in the case of addURI, to set the proper HTTP ACCEPT header for content negotiation). These parsers are a part of Sesame’s modular architecture and are managed through the RIO (RDF I/O) package. RIO components can be used outside of Sesame for handling RDF serialization, and they can be augmented as new standards emerge without affecting the core Sesame systems:

    /**
     *  Insert Triple/Statement into graph 
     * 
     * @param s subject uriref
     * @param p predicate uriref
     * @param o value object (URIref or Literal)
     */
    public void add(URI s, URI p, Value o) {
        try {
               RepositoryConnection con = therepository.getConnection();
               try {
                    ValueFactory myFactory = con.getValueFactory();
                    Statement st = myFactory.createStatement((Resource) 
                        s, p, (Value) o);
                    con.add(st);
               } finally {
                  con.close();
               }
            }
            catch (Exception e) {
               // handle exception
            }
    }

    /**
     *  Import RDF data from a string
     * 
     * @param rdfstring string with RDF data
     * @param format RDF format of the string (used to select parser)
     */
    public void addString(String rdfstring,  RDFFormat format) {
        try {
            RepositoryConnection con = therepository.getConnection();
            try {
                StringReader sr = new StringReader(rdfstring);
                con.add(sr, "", format);
            } finally {
                con.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    /**
     *  Import RDF data from a file
     * 
     * @param location of file (/path/file) with RDF data
     * @param format RDF format of the string (used to select parser)
     */
    public void addFile(String filepath,  RDFFormat format) {
        try {
            RepositoryConnection con = therepository.getConnection();
            try {
                con.add(new File(filepath), "", format);
            } finally {
                con.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }    

    /**
     *  Import data from URI source
     *  Request is made with proper HTTP ACCEPT header
     *  and will follow redirects for proper LOD source negotiation
     * 
     * @param urlstring absolute URI of the data source
     * @param format RDF format to request/parse from data source
     */
    public void addURI(String urlstring, RDFFormat format) {
        try {
            RepositoryConnection con = therepository.getConnection();
            try {
                URL url = new URL(urlstring);
                URLConnection uricon = (URLConnection) url.openConnection();
                uricon.addRequestProperty("accept", format.getDefaultMIMEType());
                InputStream instream = uricon.getInputStream();
                con.add(instream, urlstring, format);
            } finally {
                con.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }


Now that we can get things into the graph, let’s add methods for getting information out of the repository. The first method, dumpRDF, is simply an RDF serialization of everything that’s in the repository. The tuplePattern method, like RDFLib's Graph.triples() method, allows us to search for specific patterns of triples in the graph, using null to specify wildcards. Finally, we'll add two methods for running SPARQL queries on the graph. The first runSPARQL method can be used for SPARQL queries of the CONSTRUCT or DESCRIBE form, which return a new graph construction where the format parameter tells the system how the new graph should be returned. The other runSPARQL method can be used for queries of the SELECT form, which return a Java List of solutions and bindings:

    /**
     *  dump RDF graph
     * 
     * @param out output stream for the serialization
     * @param outform the RDF serialization format for the dump
     * @return
     */
    public void dumpRDF(OutputStream out, RDFFormat outform) {
        try {
            RepositoryConnection con = therepository.getConnection();
            try {
                RDFWriter w = Rio.createWriter(outform, out);
                con.export(w);
            } finally {
                con.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    
    /**
     *  Convenience URI import for RDF/XML sources
     * 
     * @param urlstring absolute URI of the data source
     */
    public void addURI(String urlstring) {
        addURI(urlstring, RDFFormat.RDFXML);
    }
    

    /**
     *  Tuple pattern query - find all statements with the pattern, where null 
     *  is a wildcard 
     * 
     * @param s subject (null for wildcard)
     * @param p predicate (null for wildcard)
     * @param o object (null for wildcard)
     * @return serialized graph of results
     */
    public List tuplePattern(URI s, URI p, Value o) {
        try{
            RepositoryConnection con = therepository.getConnection();
            try {
                RepositoryResult repres = con.getStatements(s, p, o, true);
                ArrayList reslist = new ArrayList();
                while (repres.hasNext()) {
                    reslist.add(repres.next());
                }
                return reslist;
            } finally {
                con.close();
            }
        }catch(Exception e){
            e.printStackTrace();
        }
        return null;
    }
    
    /**
     *  Execute a CONSTRUCT/DESCRIBE SPARQL query against the graph 
     * 
     * @param qs CONSTRUCT or DESCRIBE SPARQL query
     * @param format the serialization format for the returned graph
     * @return serialized graph of results
     */
    public String runSPARQL(String qs, RDFFormat format) {
        try{
            RepositoryConnection con = therepository.getConnection();
            try {
                GraphQuery query = 
                    con.prepareGraphQuery(
                    org.openrdf.query.QueryLanguage.SPARQL, qs);
                StringWriter stringout = new StringWriter();
                RDFWriter w = Rio.createWriter(format, stringout);
                query.evaluate(w);
                return stringout.toString();
            } finally {
                con.close();
            }
        }catch(Exception e){
            e.printStackTrace();
        }
        return null;
    }
    
    /**
     *  Execute a SELECT SPARQL query against the graph 
     * 
     * @param qs SELECT SPARQL query
     * @return list of solutions, each containing a hashmap of bindings
     */
    public List runSPARQL(String qs) {
        try{
            RepositoryConnection con = therepository.getConnection();
            try {
                TupleQuery query = 
                    con.prepareTupleQuery(
                    org.openrdf.query.QueryLanguage.SPARQL, qs);
                TupleQueryResult qres = query.evaluate();
                ArrayList reslist = new ArrayList();
                while (qres.hasNext()) {
                    BindingSet b = qres.next();
                    Set names = b.getBindingNames();
                    HashMap hm = new HashMap();
                    for (Object n : names) {
                        hm.put((String) n, b.getValue((String) n));
                    }
                    reslist.add(hm);
                }
                return reslist;
            } finally {
                con.close();
            }
        }catch(Exception e){
            e.printStackTrace();
        }
        return null;
    }
}


Now we will build a simple test class to exercise the various methods in our SimpleGraph wrapper class. We will load RDF into the repository through a URI, as raw triples and from a string, and then we’ll ask for the data using a tuple pattern, two SPARQL queries, and finally a dump of the whole repository.

Create the following class and save it as SimpleTest.java in the same directory as SimpleGraph.java (or download it from http://semprog.com/p...SimpleTest.java):

import java.util.List;

import org.openrdf.model.URI;
import org.openrdf.model.Value;

public class SimpleTest {

    public static void main(String[] args) {
        // a test of graph operations
        SimpleGraph g = new SimpleGraph();
        
        // get LOD from a URI -  Jamie's FOAF profile from Hi5
        g.addURI("http://api.hi5.com/rest/profile/foaf/241087912");
        
        // manually add a triple/statement with a URIref object
        URI s1 = g.URIref("http://semprog.com/people/toby");
        URI p1 = g.URIref(SimpleGraph.RDFTYPE);
        URI o1 = g.URIref("http://xmlns.com/foaf/0.1/person");
        g.add(s1, p1, o1);
        
        // manually add with an object literal
        URI s2 = g.URIref("http://semprog.com/people/toby");
        URI p2 = g.URIref("http://xmlns.com/foaf/0.1/nick");
        Value o2 = g.Literal("kiwitobes");
        g.add(s2, p2, o2);
        
        // parse a string of RDF and add to the graph
        String rdfstring = "<http://semprog.com/people/jamie>

            <http://xmlns.com/foaf/0.1/nick> \"jt\" .";
        g.addString(rdfstring, SimpleGraph.NTRIPLES);
        
        System.out.println("\n==TUPLE QUERY==\n");
        List rlist = g.tuplePattern(null, 
            g.URIref("http://xmlns.com/foaf/0.1/nick"), null);
        System.out.print(rlist.toString());
        
        // run a SPARQL query - get back solution bindings
        System.out.println("\n==SPARQL SELECT==\n");
        List solutions = g.runSPARQL("SELECT ?who ?nick " +
                "WHERE { " +
                    "?x <http://xmlns.com/foaf/0.1/knows> ?y . " +
                    "?x <http://xmlns.com/foaf/0.1/nick> ?who ." +
                    "?y <http://xmlns.com/foaf/0.1/nick> ?nick ."   +
                "}");
        System.out.println("SPARQL solutions: " + solutions.toString());
        
        // run a CONSTRUCT SPARQL query 
        System.out.println("\n==SPARQL CONSTRUCT==\n");
        String newgraphxml = g.runSPARQL("CONSTRUCT { ?x 
            <http://semprog.com/simple#friend> ?nick . } " +
                "WHERE { " +
                    "?x <http://xmlns.com/foaf/0.1/knows> ?y . " +
                    "?x <http://xmlns.com/foaf/0.1/nick> ?who ." +
                    "?y <http://xmlns.com/foaf/0.1/nick> ?nick ."   +
                "}", SimpleGraph.RDFXML);
        System.out.println("SPARQL solutions: \n" + newgraphxml);

        // dump the graph in the specified format
        System.out.println("\n==GRAPH DUMP==\n");
        g.dumpRDF(System.out, SimpleGraph.NTRIPLES);        
    }
}


Sesame is conveniently packaged in a number of different forms for different types of deployments. For our SimpleGraph class, we will use the One-JAR library distribution (openrdf-sesame-2.2.4-onejar.jar at the time of this writing) available from http://www.openrdf.org/download.jsp. Download sesame-onejar.jar and place it in the same directory as the SimpleGraph and SimpleTest classes.

To compile these classes from the command line, type:

$ javac -cp openrdf-sesame-2.2.4-onejar.jar SimpleGraph.java SimpleTest.java


To use the Sesame libraries, you will also need to configure a Java logger. Sesame uses the Simple Logging Facade for Java (SLF4J), which allows you to connect Sesame to your favorite Java logger. The easiest way to get started is to download the latest SLF4J distribution from http://www.slf4j.org/download.html and unpack it. In the distribution you will find the two files slf4j-simple.jar and slf4j-api.jar. Copy these JARs to the same directory as the sesame-onejar.jar file and the SimpleGraph and SimpleTest classes.

To run the test class from the command line, type:

$ java -cp 
    openrdf-sesame-2.2.4-onejar.jar:slf4j-api-1.5.6.jar:slf4j-simple-1.5.6.jar:. 
    SimpleTest


Note that you may need to adjust the classpath (-cp) file separators as appropriate for your system. On Windows, this would be a semicolon (;) instead of a colon (:).

You can infer the type of a resource based on the domain and range of properties that reference it. While your application could take on this and other types of inferencing responsibilities, you can delegate many of these responsibilities by using a semantic platform.

Sesame can conduct a wide range of inferencing about RDFS type relations. In this example we will see how Sesame can infer the type of an object based on the rdfs:subClassOf relation. This will allow us to avoid writing code where we would have to walk the type hierarchy to see if one class was the parent of another.

Let's start by creating another simple test class called TypeTest. This class will make use of a film ontology built with Protégé. The ontology is available at http://semprog.com/p...m-ontology.owl. The main method will load the ontology, which declares that both the Actor and Director classes are rdfs:subClassOf ofPerson.

The ontology file also provides some sample instance data that declares that Harrison_Ford is of rdf:type Actor, whereas Ridley_Scott is of rdf:type Director, but in neither case does it say explicitly that either one is of rdf:type Person. The RDFS inferencer operates on the instance data as it is being loaded into the repository and generates new assertions of rdf:type for each instance's parent type.

Save this class as TypeTest.java (or download it from http://semprog.com/p...8/TypeTest.java):

import java.util.List;

import org.openrdf.model.URI;
import org.openrdf.model.Value;

public class TypeTest {
    public static void main(String[] args) {
        
        // create a graph with type inferencing
        SimpleGraph g = new SimpleGraph(true); 
        
        // load the film schema and the example data
        g.addFile("film-ontology.owl", SimpleGraph.RDFXML);
        
        List solutions = g.runSPARQL("SELECT ?who WHERE  { " +
          "?who &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; 
              &lt;http://semprog.com/film#Person&gt; ." +
                "}");
        System.out.println("SPARQL solutions: " + solutions.toString());
    }
}


Compile the TypeTest class and run it:

$ javac -cp openrdf-sesame-2.2.4-onejar.jar SimpleGraph.java TypeTest.java
$ java -cp 
    openrdf-sesame-2.2.4-onejar.jar:slf4j-api-1.5.6.jar:slf4j-simple-1.5.6.jar:. 
    TypeTest


If everything goes as planned, we should learn that both Harrison Ford and Ridley Scott are people.

behavior-Oriented Programming with Elmo

We have said that semantic programming is really about producing actions consistent with a model given some data. Elmo is an extension of Sesame that allows you to focus on this goal without being distracted by the tasks of managing RDF data. By encapsulating the behavior of ontologies, Elmo allows you to write programs at the modeling level, rather than at the RDF triple level.

Elmo uses Java annotations to facilitate the use of patterns such as composition, separation of concerns, and aspect-oriented approaches for mapping behaviors onto models. The Elmo distribution also contains a code generator for turning RDFS and OWL ontology files into Java classes that can be used to drive the Elmo applications. (In the next chapter we'll develop a similar but less complete system in Python that does not use code generation.)

In this example we make use of the ontology class files that come in the Elmo distribution for well-known ontologies such as FOAF. The example creates an Elmo manager that handles connections to the RDF, fetches Tim Berners-Lee's FOAF information, and then iterates through all resources in the file that are of rdf:type foaf:Person. It then uses the Java abstraction of the FOAF ontology to access and print the foaf:names of the people in his file:

import java.net.URL;
import org.openrdf.concepts.foaf.Person;
import org.openrdf.elmo.*;
import org.openrdf.elmo.sesame.*;
import org.openrdf.rio.RDFFormat;

public class ElmoDemo {

 public static void main(String[] args) {
  ElmoModule module = new ElmoModule(); 
  SesameManagerFactory factory = new SesameManagerFactory(module); 
  SesameManager manager = factory.createElmoManager();

  try {
   URL url = new URL("http://www.w3.org/People/Berners-Lee/card.rdf"); 
   manager.getConnection().add(url, null, RDFFormat.RDFXML);
  } catch (Exception e) {
   e.printStackTrace();
  }
  for (Person person : manager.findAll(Person.class)) { 
   System.out.print("Name: "); 
   System.out.println(person.getFoafNames()); 
  }
 }
}


To compile this example, you will need the base elmo JAR, the elmo-codegen and elmo-foaf JARs from the Elmo distribution, the javaassist and persistence-api JARs from the lib directory that comes with the Elmo distribution, and the sesame-onejar that we used previously. (The command lines have been broken up across multiple lines for readability.)

javac -cp elmo-1.4.jar:elmo-codegen-1.4.jar:elmo-foaf-1.4.jar:javaassist-3.7.ga.jar:
    persistence-api-1.0.jar:openrdf-sesame-2.2.4-onejar.jar:. ElmoDemo.java


Running this will list the foaf:name(s) of all the people of rdf:type foaf:Person (based on the Person.class):

java -cp elmo-1.4.jar:elmo-codegen-1.4.jar:elmo-foaf-1.4.jar:javaassist-3.7.ga.jar:
    persistence-api-1.0.jar:openrdf-sesame-2.2.4-onejar.jar:. ElmoDemo


A Servlet Container for the Sesame Server

Now that we have seen how Sesame can be embedded in an application, let's set up Sesame as a standalone server. This will give us additional flexibility as we embark on more sophisticated projects, and will allow us to access Sesame using other programming languages.

You can skip this section if you already have a Java Servlet container like Tomcat or Jetty that supports the Java Servlet API 2.4 and JSP 2.0 or newer. If you don't have one yet, we recommend using Jetty, as it has a very simple installation and will get you up and running quickly.

The official site for Jetty is http://jetty.mortbay.org/. You can download the latest distribution at http://dist.codehaus.org/jetty/. At the time of writing, the latest version is 6.1.14, but you're probably safe just downloading the latest release (not prerelease) version. The download will be a ZIP file, which you should unzip wherever you want to run Jetty—there is no installation procedure.

Now, go to the directory where you unzipped Jetty and type java -jar start.jar. You should see something like this:

$ java -jar start.jar 
2009-01-05 21:21:55.346::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2009-01-05 21:21:55.488::INFO:  jetty-6.1.14
...
2009-01-05 21:21:59.636::INFO:  Opened 
    /Users/bbts/writing/jetty-install/jetty-6.1.14/logs/2009_01_06.request.log
2009-01-05 21:21:59.651::INFO:  Started SelectChannelConnector@0.0.0.0:8080


Visit http://localhost:8080 in your web browser and you should see a welcome page confirming that you successfully installed Jetty. Go back to your prompt and hit Ctrl-C to stop it.

Installing the Sesame Web Application

The Sesame servlets are also very simple to install. The download page for Sesame is http://www.openrdf.org/download.jsp. Download the archive of the latest version (we're using version 2.2.3, openrdf-sesame-2.2.3-sdk.zip) and extract it.

There should be a directory within the archive called war. This directory contains two files, openrdf-sesame.war and openrdf-workbench.war, which are web archives for your Java server. Simply copy these files into the webapps directory of your Jetty (or Tomcat) installation. When you restart Jetty, you should see a message telling you that these two WAR files were detected and installed as new applications.

The Workbench

As mentioned earlier, one of the coolest things about Sesame is that it comes with a very functional administration interface. You can access this interface by visiting http://localhost:808...enrdf-workbench in your web browser, which should look something like the image below, "The OpenRDF Sesame workbench".

The OpenRDF Sesame workbench

Attached Image

The menu items on the left side allow you to access administration pages to create new data repositories (graphs), add RDF data, then explore and query the data. This section will walk you through setting up a repository and filling it with movie data. To start, click "New Repository" and you'll see a form like the image below, "Creating a movie repository"

Creating a movie repository

Attached Image

There are three fields to fill in here: the ID, which is just a short name for the repository and the one you'll refer to later when accessing the repository through the API; the title, which is a longer description; and the type, which is the storage mechanism for this repository. There are nine options for the type:

  • In-Memory Store
  • In-Memory Store RDF Schema
  • In-Memory Store RDF Schema and Direct Type Hierarchy
  • Native Java Store
  • Native Java Store RDF Schema
  • Native Java Store RDF Schema and Direct Type Hierarchy
  • MySQL RDF Store
  • PostgreSQL RDF Store
  • Remote RDF Store


Options 1–3 are different kinds of In-Memory Store. This is the fastest type, since the entire graph is kept in memory. Persistence (keeping the graph between restarts of the server) is optional but works well—the graph is saved to disk frequently and loaded completely into memory when the Sesame server is started. The main drawback of this is
that you need to guarantee that the graph will never grow larger than the memory available to the server. This may seem like a severe restriction, but remember that graphs of hundreds of thousands of triples can easily fit in memory in modern machines, so this is feasible for many applications.

The difference between options 1, 2, and 3 is whether they support the various types of ontology inferencing. Since more sophisticated inferencing uses more resources, you're given the choice of how much you need. Option 2 provides for RDF Schema inferencing, which can infer object types that aren't directly specified using properties, and option 3 adds type hierarchy inferencing on top of that, which can infer even more about types by using a specified class
hierarchy.

Options 4–6 are similar to 1–3, but they store all the data on disk in a Sesame-specific format. This obviously reduces performance, but it removes the restriction that the entire graph must fit in memory. In benchmarks, this type usually performs better than the MySQL and PostgreSQL options (7 and 8). Again, you have the choice to include inferencing at the level you need it.

Using options 7 and 8, you can store your graph in a currently running MySQL or PostgreSQL server. Although these are usually slightly slower than the native Java store, they have the advantage of using your existing relational databases, which is great if you work at an organization that already has people who back up and maintain these databases. Since the native Sesame store isn't something that most people are familiar with, it's probably lower-risk to go with a relational database if you work with people who already deal with them.

Finally, option 9 lets you point to a repository that is kept on a different Sesame server, and expose it through this server. This is useful if you have one or more instances of your repository on machines that you don't want people connecting to directly, either for security purposes or because you want to be able to change the location of the backend repository without reconfiguring the client applications.

For now, create a native Java store repository with the ID "Movies", as shown earlier in the image above, "Creating a movie repository".

Adding Data

Once you've created the repository, the first thing you'll need to do is get some data into it. Sesame lets you do this by uploading a file, pointing to a URL, or directly typing/pasting RDF statements. Clicking on "Add" in the menu on the left side of the admin interface will reveal an interface similar to the one shown in the image below, "Adding data to the movie repository".

Adding data to the movie repository

Attached Image

We're going to add the movie data. To get the data, you can download http://semprog.com/p.../iva_movies.xml, or you can generate the data file from iva_movies.py like this:

$ python iva_movies.py &gt; iva_movies.xml


Make sure the data format is set to "RDF/XML" and click "Browse..." to locate iva_movies.xml. After you've selected it, you'll notice that the Base URI and Context fields automatically get filled in. Base URI is the namespace that will be used for nodes in your file that don't have a namespace specified (generally, and in this case, there aren't any nodes without specified namespaces).

Context is something we haven't come across before—it is stored along with all the triples in the file so that you know where they come from. If the same triple is added with a different context (perhaps you upload a different file), there will be two copies with separate contexts. This allows you to remove all the triples from one file if, for example, you determine that the data in it is bad, without removing identical triples that came from sources that are still considered good. This is another clear win over traditional data modeling, which usually doesn't record the origin of links between tables, much less allow multiple links with different origins.

Click on "Upload" to add your data to the repository.

SPARQL Queries

The Sesame workbench also allows you to query the repository using a web form. This is excellent for checking the data, answering ad hoc questions, and testing queries for use in your applications. Clicking on "Query" in the left menu bar will bring up a page like the one shown in the image below, "Querying the movie store".

Querying the movie store

Attached Image

When you uploaded the RDF file, it included some namespace prefixes. These have now become some of the repository's default prefixes. When the query form appears, the prefixes are automatically included in the main text box, which means you can type your SPARQL query without having to define all the namespaces every time. In the image above, "Querying the movie store", there's a query for all of the films that have performances by an actor named "John Malkovich". If you enter this query and click "Execute," you'll be taken to a new page with the results, shown in the image below "Results of the query". SPARQL SELECT query results are formatted into very clean tables with all the requested fields as headers. The fields themselves are all links that you can click on to see everything in the repository around a specific node.

[i]Results of the query[/i]

Attached Image

If you like, you can try the following more complex query, which shows all the people who have costarred in a film with John Malkovich:

PREFIX fb:&lt;http://rdf.freebase.com/ns/&gt;

PREFIX dc:&lt;http://purl.org/dc/elements/1.1/&gt;
PREFIX rdf:&lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
SELECT ?costar ?fn WHERE {?film fb:film.film.performances ?p1 .
                   ?film dc:title ?fn .
                   ?p1 fb:film.performance.actor ?a1 . 
                   ?a1 dc:title "John Malkovich".
                   ?film fb:film.film.performances ?p2 .
                   ?p2 fb:film.performance.actor ?a2 . 
                   ?a2 dc:title ?costar .}


We recommend that you become familiar with the other features of the workbench. You can explore your data by node type, namespace, context, or through queries. You can add data by typing N3 notation directly, and you can remove triples by specifying any combination of subject, predicate, object, or context. For example, you could decide that IVA messed up its John Malkovich data and remove anything that came from the IVA context about John Malkovich.

REST API

To actually use Sesame in an application, you'll want to be able to access it from other applications. This is usually done through the Sesame REST API, which is documented at http://www.openrdf.o...ystem/ch08.html. Like many other REST APIs, calls are made by passing parameters in a URL through a regular HTTP request, and the server returns the result in a machine-readable format. Sesame returns its results in JSON, which is great because that's very easy to parse in Python.

For example, you could turn the previous query into a REST request:

http://&lt;server&gt;/openrdf-sesame/repositories/celebs?
    query=select+?fn+where+%7B?film+fb:film.film.performances+?p1+.+?film ...etc...


and get a set of JSON with variable bindings:

{"headers": ["costar", "fn"],
 "data": [{"costar": {"type": "literal", "value": "Angelina Jolie"}, 
           "fn": {"type": "literal", "value": "CHANGELING"}, 
          {"costar": {"type": "literal", "value": "Gillian Jacobs"}, 
           "fn": {"type": "literal", "value": "GARDENS OF THE NIGHT"},
...


To make this easy to use from Python, we've created a simple module called pysesame.py that wraps the REST API. You can download this module from http://semprog.com/p...er8/pysesame.py. The code should be easy enough to translate into other languages if you're not planning your project in Python. Here's what it looks like:

from urllib import urlopen,quote_plus
from simplejson import loads

class connection:
    def __init__(self,url):
        self.baseurl = url
        self.sparql_prefix = ""
    
    def addnamespace(self, id, ns):
        self.sparql_prefix += 'PREFIX %s:&lt;%s&gt;\n' % (id,ns) 
    
    def __getsparql__(self, method):
        data = urlopen(self.baseurl + method).read()
        try:
            result = loads(data)['results']['bindings']
            return result
        except:
            return [{'error':data}];
    
    def repositories(self):
        return self.__getsparql__('repositories')
        
    def use_repository(self, r):
        self.repository = r
    
    def query(self, q):
        q = 'repositories/' + self.repository + '?query=' + 
            quote_plus(self.sparql_prefix + q)
        return self.__getsparql__(q)


This module defines a class called connection, which represents a connection to a Sesame store. In reality, however, there is no persistent connection—it's just a wrapper to store settings for making the REST requests. The connection class is initialized with the name of the server. After initialization, you can call repositories to see a list of repositories on that server, use_repository to choose one, addnamespace to define namespaces, and finally query to query the database using SPARQL.

The module also includes an example in its main method so you can see it in action:

if __name__ == '__main__':
    c = connection('http://localhost:8080/openrdf-sesame/')
    c.use_repository('Movies')
    c.addnamespace('fb','http://rdf.freebase.com/ns/')
    c.addnamespace('dc','http://purl.org/dc/elements/1.1/')
    res = c.query("""SELECT ?costar ?fn WHERE {?film fb:film.film.performances ?p1 .
                     ?film dc:title ?fn .
                     ?p1 fb:film.performance.actor ?a1 . 
                     ?a1 dc:title "John Malkovich".
                     ?film fb:film.film.performances ?p2 .
                     ?p2 fb:film.performance.actor ?a2 . 
                     ?a2 dc:title ?costar .}""")
    for r in res: print r


Running pysesame.py from the command line will show you the results of this sample query:

$ python pysesame.py
{u'costar': {u'type': u'literal', u'value': u'Gillian Jacobs'}, 
 u'fn': {u'type': u'literal', u'value': u'GARDENS OF THE NIGHT'}}
{u'costar': {u'type': u'literal', u'value': u'Ryan Simpkins'}, 
 u'fn': {u'type': u'literal', u'value': u'GARDENS OF THE NIGHT'}}
{u'costar': {u'type': u'literal', u'value': u'John Malkovich'}, 
 u'fn': {u'type': u'literal', u'value': u'GARDENS OF THE NIGHT'}}
 


If you want to try some other queries, you can either modify the main method in pysesame or, more practically, create a new Python file and just add the imports at the beginning, like this:

from pysesame import connection, use_repository, addnamespace, query


We'll be using pysesame a lot more in Chapter 10, Tying It All Together, of Programming the Semantic Web, when we build a real application. For now, just make sure you're familiar with the workbench, and try importing some other RDF files and querying them through the REST API.

Programming the Semantic Web

Learn more about this topic from Programming the Semantic Web.

With this book, the promise of the semantic web -- in which machines can find, share, and combine data on the Web -- is not just a technical possibility, but a practical reality. Programming the Semantic Web demonstrates several ways to implement semantic web applications, using existing standards and patterns as well as technologies recently introduced. Each chapter walks you through a single piece of semantic technology, and explains how to use it to solve real problems.

See what you'll learn


Tags:
0 Subscribe


0 Replies