Skip to main content
Open In ColabOpen on GitHub

SAP HANA Cloud Knowledge Graph Engine

SAP HANA Cloud Knowledge Graph is a fully integrated knowledge graph solution within the SAP HANA Cloud database.

This example demonstrates how to build a QA (Question-Answering) chain that queries Resource Description Framework (RDF) data stored in an SAP HANA Cloud instance using the SPARQL query language, and returns a human-readable response.

SPARQL is the standard query language for querying RDF graphs.

Setup & Installation​

Prerequisite:
You must have an SAP HANA Cloud instance with the triple store feature enabled.
For detailed instructions, refer to: Enable Triple Store
Load the kgdocu_movies example data. See Knowledge Graph Example.

To use SAP HANA Knowledge Graph Engine and/or Vector Store Engine with LangChain, install the langchain-hana package:

import langchain_hana

First, create a connection to your SAP HANA Cloud instance.

import os

from dotenv import load_dotenv
from hdbcli import dbapi

# Load environment variables if needed
load_dotenv()

# Establish connection to SAP HANA Cloud
connection = dbapi.connect(
address=os.environ.get("HANA_DB_ADDRESS"),
port=os.environ.get("HANA_DB_PORT"),
user=os.environ.get("HANA_DB_USER"),
password=os.environ.get("HANA_DB_PASSWORD"),
autocommit=True,
sslValidateCertificate=False,
)

Example: Question Answering over a “Movies” Knowledge Graph​

Below we’ll:

  1. Instantiate the HanaRdfGraph pointing at our “movies” data graph
  2. Wrap it in a HanaSparqlQAChain powered by an LLM
  3. Ask natural-language questions and print out the chain’s responses

This demonstrates how the LLM generates SPARQL under the hood, executes it against SAP HANA, and returns a human-readable answer.

We'll use the sap-ai-sdk-gen package. Currently still installed via:

pip install "generative-ai-hub-sdk[all]"

Please check sap-ai-sdk-gen for future releases.

from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from langchain_hana import HanaRdfGraph, HanaSparqlQAChain

# from langchain_openai import ChatOpenAI # or your chosen LLM
API Reference:ChatOpenAI
# Set up the Knowledge Graph
graph_uri = "kgdocu_movies"

graph = HanaRdfGraph(
connection=connection, graph_uri=graph_uri, auto_extract_ontology=True
)
# a basic graph schema is extracted from the data graph. This schema will guide the LLM to generate a proper SPARQL query.
print(graph.get_schema)
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://kg.demo.sap.com/acted_in> a owl:ObjectProperty ;
rdfs:label "acted_in" ;
rdfs:domain <http://kg.demo.sap.com/Actor> ;
rdfs:range <http://kg.demo.sap.com/Film> .

<http://kg.demo.sap.com/dateOfBirth> a owl:DatatypeProperty ;
rdfs:label "dateOfBirth" ;
rdfs:domain <http://kg.demo.sap.com/Actor> ;
rdfs:range xsd:dateTime .

<http://kg.demo.sap.com/directed> a owl:ObjectProperty ;
rdfs:label "directed" ;
rdfs:domain <http://kg.demo.sap.com/Director> ;
rdfs:range <http://kg.demo.sap.com/Film> .

<http://kg.demo.sap.com/genre> a owl:ObjectProperty ;
rdfs:label "genre" ;
rdfs:domain <http://kg.demo.sap.com/Film> ;
rdfs:range <http://kg.demo.sap.com/Genre> .

<http://kg.demo.sap.com/placeOfBirth> a owl:ObjectProperty ;
rdfs:label "placeOfBirth" ;
rdfs:domain <http://kg.demo.sap.com/Actor> ;
rdfs:range <http://kg.demo.sap.com/Place> .

<http://kg.demo.sap.com/title> a owl:DatatypeProperty ;
rdfs:label "title" ;
rdfs:domain <http://kg.demo.sap.com/Film> ;
rdfs:range xsd:string .

rdfs:label a owl:DatatypeProperty ;
rdfs:label "label" ;
rdfs:domain <http://kg.demo.sap.com/Actor>,
<http://kg.demo.sap.com/Director>,
<http://kg.demo.sap.com/Genre>,
<http://kg.demo.sap.com/Place> ;
rdfs:range xsd:string .

<http://kg.demo.sap.com/Director> a owl:Class ;
rdfs:label "Director" .

<http://kg.demo.sap.com/Genre> a owl:Class ;
rdfs:label "Genre" .

<http://kg.demo.sap.com/Place> a owl:Class ;
rdfs:label "Place" .

<http://kg.demo.sap.com/Actor> a owl:Class ;
rdfs:label "Actor" .

<http://kg.demo.sap.com/Film> a owl:Class ;
rdfs:label "Film" .
# Initialize the LLM
llm = ChatOpenAI(proxy_model_name="gpt-4o", temperature=0)
# Create a SPARQL QA Chain
chain = HanaSparqlQAChain.from_llm(
llm=llm,
verbose=True,
allow_dangerous_requests=True,
graph=graph,
)
# output = chain.invoke("Which movies are in the data?")
# output = chain.invoke("In which movies did Keanu Reeves and Carrie-Anne Moss play in together")
# output = chain.invoke("which movie genres are in the data?")
# output = chain.invoke("which are the two most assigned movie genres?")
# output = chain.invoke("where were the actors of "Blade Runner" born?")
# output = chain.invoke("which actors acted together in a movie and were born in the same city?")
output = chain.invoke("which actors acted in Blade Runner?")

print(output["result"])


> Entering new HanaSparqlQAChain chain...
Generated SPARQL:
\`\`\`
PREFIX kg: <http://kg.demo.sap.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?actor ?actorLabel
WHERE {
?movie rdf:type kg:Film .
?movie kg:title ?movieTitle .
?actor kg:acted_in ?movie .
?actor rdfs:label ?actorLabel .
FILTER(?movieTitle = "Blade Runner")
}
\`\`\`
Final SPARQL:

PREFIX kg: <http://kg.demo.sap.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?actor ?actorLabel

FROM <kgdocu_movies>
WHERE {
?movie rdf:type kg:Film .
?movie kg:title ?movieTitle .
?actor kg:acted_in ?movie .
?actor rdfs:label ?actorLabel .
FILTER(?movieTitle = "Blade Runner")
}

Full Context:
actor,actorLabel
http://www.wikidata.org/entity/Q81328,Q81328
http://www.wikidata.org/entity/Q723780,Brion James
http://www.wikidata.org/entity/Q1372770,William Sanderson
http://www.wikidata.org/entity/Q498420,M. Emmet Walsh
http://www.wikidata.org/entity/Q358990,James Hong
http://www.wikidata.org/entity/Q211415,Edward James Olmos
http://www.wikidata.org/entity/Q1691628,Joe Turkel
http://www.wikidata.org/entity/Q3143555,Hy Pyke
http://www.wikidata.org/entity/Q207596,Daryl Hannah
http://www.wikidata.org/entity/Q213574,Rutger Hauer
http://www.wikidata.org/entity/Q236702,Joanna Cassidy
http://www.wikidata.org/entity/Q1353691,Morgan Paull
http://www.wikidata.org/entity/Q230736,Sean Young


> Finished chain.
The actors who acted in Blade Runner are Brion James, William Sanderson, M. Emmet Walsh, James Hong, Edward James Olmos, Joe Turkel, Hy Pyke, Daryl Hannah, Rutger Hauer, Joanna Cassidy, Morgan Paull, and Sean Young.

What’s happening under the hood?​

  1. SPARQL Generation
    The chain invokes the LLM with your Turtle-formatted ontology (graph.get_schema) and the user’s question using the SPARQL_GENERATION_SELECT_PROMPT. The LLM then emits a valid SELECT query tailored to your schema.

  2. Pre-processing & Execution

    • Extract & clean: Pull the raw SPARQL text out of the LLM’s response.
    • Inject graph context: Add FROM <graph_uri> if it’s missing and ensure common prefixes (rdf:, rdfs:, owl:, xsd:) are declared.
    • Run on HANA: Execute the finalized query via HanaRdfGraph.query() over your named graph.
  3. Answer Formulation
    The returned CSV (or Turtle) results feed into the LLM again—this time with the SPARQL_QA_PROMPT. The LLM produces a concise, human-readable answer strictly based on the retrieved data, without hallucination.

Initialize the HanaRdfGraph​

To power the QA chain, you first need a HanaRdfGraph instance that:

  1. Loads your ontology schema (in Turtle)
  2. Executes SPARQL queries against your SAP HANA Cloud data graph

The constructor requires:

  • connection: an active hdbcli.dbapi.connect(...) instance
  • graph_uri: the named graph (or "DEFAULT") where your RDF data lives
  • One of:
    1. ontology_query**: a SPARQL CONSTRUCT to extract schema triples
    2. ontology_uri**: a hosted ontology graph URI
    3. ontology_local_file** + ontology_local_file_format: a local Turtle/RDF file
    4. auto_extract_ontology=True** (not recommended for production—see note)

graph_uri vs. Ontology

  • graph_uri:
    The named graph in your SAP HANA Cloud instance that contains your instance data (sometimes 100k+ triples). If None or "DEFAULT" is provided, the default graph is used.
    âž” More details: Default Graph and Named Graphs
  • Ontology: a lean schema (typically ~50-100 triples) describing classes, properties, domains, ranges, labels, comments, and subclass relationships. The ontology guides SPARQL generation and result interpretation.
  1. Custom SPARQL CONSTRUCT query (ontology_query): Use a custom CONSTRUCT query to selectively extract schema triples.
ontology_query = """
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
CONSTRUCT {?cls rdf:type owl:Class . ?cls rdfs:label ?clsLabel . ?rel rdf:type ?propertyType . ?rel rdfs:label ?relLabel . ?rel rdfs:domain ?domain . ?rel rdfs:range ?range .}
FROM <kgdocu_movies>
WHERE { # get properties
{SELECT DISTINCT ?domain ?rel ?relLabel ?propertyType ?range
WHERE {
?subj ?rel ?obj .
?subj a ?domain .
OPTIONAL{?obj a ?rangeClass .}
FILTER(?rel != rdf:type)
BIND(IF(isIRI(?obj) = true, owl:ObjectProperty, owl:DatatypeProperty) AS ?propertyType)
BIND(COALESCE(?rangeClass, DATATYPE(?obj)) AS ?range)
BIND(STR(?rel) AS ?uriStr) # Convert URI to string
BIND(REPLACE(?uriStr, "^.*[/#]", "") AS ?relLabel)
}}
UNION { # get classes
SELECT DISTINCT ?cls ?clsLabel
WHERE {
?instance a/rdfs:subClassOf* ?cls .
FILTER (isIRI(?cls)) .
BIND(STR(?cls) AS ?uriStr) # Convert URI to string
BIND(REPLACE(?uriStr, "^.*[/#]", "") AS ?clsLabel)
}
}
}
"""
graph = HanaRdfGraph(
connection=connection, graph_uri=graph_uri, ontology_query=ontology_query
)
  1. Remote ontology URI (ontology_uri): Load the schema directly from a hosted graph URI.
graph = HanaRdfGraph(
connection=connection,
graph_uri=graph_uri,
ontology_uri="<your-rdf-ontology-uri>",
)
  1. Local RDF file (ontology_local_file + ontology_local_file_format): Load the schema from a local RDF ontology file. Supported RDF formats are Turtle, RDF/XML, JSON-LD, N-Triples, Notation-3, Trig, Trix, N-Quads.
graph = HanaRdfGraph(
connection=connection,
graph_uri=graph_uri,
ontology_local_file="ontology_file.ttl",
ontology_local_file_format="turtle",
)
  1. Auto-extract ontology (auto_extract_ontology=True): Infer schema information directly from your instance data.
graph_with_auto_extracted_ontology = HanaRdfGraph(
connection=connection, graph_uri=graph_uri, auto_extract_ontology=True
)

Note: Auto-extraction is not recommended for production—it omits important triples like rdfs:label, rdfs:comment, and rdfs:subClassOf in general.

Executing SPARQL Queries​

You can use the query() method to execute arbitrary SPARQL queries (SELECT, ASK, CONSTRUCT, etc.) on the data graph.

The following query retrieves the top 10 movies with the highest number of contributors:

query = """
PREFIX kg: <http://kg.demo.sap.com/>
SELECT ?movieTitle (COUNT(?actor) AS ?actorCount)

FROM <kgdocu_movies>
WHERE {
?actor kg:acted_in ?movie .
?movie kg:title ?movieTitle .
}
GROUP BY ?movieTitle
ORDER BY DESC(?actorCount)
LIMIT 10
"""
top10 = graph.query(query)
print(top10)
movieTitle,actorCount
The Matrix Reloaded,39
The Matrix,19
Blade Runner,13

Question Answering with HanaSparqlQAChain​

HanaSparqlQAChain ties together:

  1. Schema-aware SPARQL generation
  2. Query execution against SAP HANA
  3. Natural-language answer formatting

Initialization​

You need:

  • An LLM to generate and interpret queries
  • A HanaRdfGraph (with connection, graph_uri, and ontology)
qa_chain = HanaSparqlQAChain.from_llm(
llm=llm, graph=graph, allow_dangerous_requests=True, verbose=True
)

Pipeline Overview​

  1. SPARQL Generation
    • Uses SPARQL_GENERATION_SELECT_PROMPT
    • Inputs:
      • schema (Turtle from graph.get_schema)
      • prompt (user’s question)
  2. Query Post-processing
    • Extracts the SPARQL code from the llm output.
    • Inject FROM <graph_uri> if missing
    • Ensure required common prefixes are declared (rdf:, rdfs:, owl:, xsd:)
  3. Execution
    • Calls graph.query(generated_sparql)
  4. Answer Formulation
    • Uses SPARQL_QA_PROMPT
    • Inputs:
      • context (raw query results)
      • prompt (original question)

Prompt Templates​

"SPARQL Generation" prompt​

The sparql_generation_prompt is used to guide the LLM in generating a SPARQL query from the user question and the provided schema.

Answering prompt​

The qa_prompt instructs the LLM to create a natural language answer based solely on the database results.

The default prompts can be found here: prompts.py

Customizing Prompts​

You can override the defaults at initialization:

qa_chain = HanaSparqlQAChain.from_llm(
llm=llm,
graph=graph,
allow_dangerous_requests=True,
verbose=True,
sparql_generation_prompt=YOUR_SPARQL_PROMPT,
qa_prompt=YOUR_QA_PROMPT
)

Or swap them afterward:

qa_chain.sparql_generation_chain.prompt = YOUR_SPARQL_PROMPT
qa_chain.qa_chain.prompt = YOUR_QA_PROMPT
  • sparql_generation_prompt must have the input variables: ["schema", "prompt"]
  • qa_prompt must have the input variables: ["context", "prompt"]

Was this page helpful?