SAP HANA Cloud Knowledge Graph Engine

SAP HANA Cloud Knowledge Graph is a fully integrated knowledge graph solution within the SAP HANA Cloud database.

This example demonstrates how to build a QA (Question-Answering) chain that queries Resource Description Framework (RDF) data stored in an SAP HANA Cloud instance using the SPARQL query language, and returns a human-readable response.

SPARQL is the standard query language for querying RDF graphs.

Setup & Installation

Prerequisite:
You must have an SAP HANA Cloud instance with the triple store feature enabled.
For detailed instructions, refer to: Enable Triple Store
Load the kgdocu_movies example data. See Knowledge Graph Example.

To use SAP HANA Knowledge Graph Engine and/or Vector Store Engine with LangChain, install the langchain-hana package:

import langchain_hana

First, create a connection to your SAP HANA Cloud instance.

import os

from dotenv import load_dotenv
from hdbcli import dbapi

# Load environment variables if needed
load_dotenv()

# Establish connection to SAP HANA Cloud
connection = dbapi.connect(
    address=os.environ.get("HANA_DB_ADDRESS"),
    port=os.environ.get("HANA_DB_PORT"),
    user=os.environ.get("HANA_DB_USER"),
    password=os.environ.get("HANA_DB_PASSWORD"),
    autocommit=True,
    sslValidateCertificate=False,
)

Example: Question Answering over a “Movies” Knowledge Graph

Below we’ll:

Instantiate the HanaRdfGraph pointing at our “movies” data graph
Wrap it in a HanaSparqlQAChain powered by an LLM
Ask natural-language questions and print out the chain’s responses

This demonstrates how the LLM generates SPARQL under the hood, executes it against SAP HANA, and returns a human-readable answer.

We'll use the sap-ai-sdk-gen package. Currently still installed via:

pip install "generative-ai-hub-sdk[all]"

Please check sap-ai-sdk-gen for future releases.

from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from langchain_hana import HanaRdfGraph, HanaSparqlQAChain

# from langchain_openai import ChatOpenAI  # or your chosen LLM

API Reference:ChatOpenAI

# Set up the Knowledge Graph
graph_uri = "kgdocu_movies"

graph = HanaRdfGraph(
    connection=connection, graph_uri=graph_uri, auto_extract_ontology=True
)

# a basic graph schema is extracted from the data graph. This schema will guide the LLM to generate a proper SPARQL query.
print(graph.get_schema)

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://kg.demo.sap.com/acted_in> a owl:ObjectProperty ;
    rdfs:label "acted_in" ;
    rdfs:domain <http://kg.demo.sap.com/Actor> ;
    rdfs:range <http://kg.demo.sap.com/Film> .

<http://kg.demo.sap.com/dateOfBirth> a owl:DatatypeProperty ;
    rdfs:label "dateOfBirth" ;
    rdfs:domain <http://kg.demo.sap.com/Actor> ;
    rdfs:range xsd:dateTime .

<http://kg.demo.sap.com/directed> a owl:ObjectProperty ;
    rdfs:label "directed" ;
    rdfs:domain <http://kg.demo.sap.com/Director> ;
    rdfs:range <http://kg.demo.sap.com/Film> .

<http://kg.demo.sap.com/genre> a owl:ObjectProperty ;
    rdfs:label "genre" ;
    rdfs:domain <http://kg.demo.sap.com/Film> ;
    rdfs:range <http://kg.demo.sap.com/Genre> .

<http://kg.demo.sap.com/placeOfBirth> a owl:ObjectProperty ;
    rdfs:label "placeOfBirth" ;
    rdfs:domain <http://kg.demo.sap.com/Actor> ;
    rdfs:range <http://kg.demo.sap.com/Place> .

<http://kg.demo.sap.com/title> a owl:DatatypeProperty ;
    rdfs:label "title" ;
    rdfs:domain <http://kg.demo.sap.com/Film> ;
    rdfs:range xsd:string .

rdfs:label a owl:DatatypeProperty ;
    rdfs:label "label" ;
    rdfs:domain <http://kg.demo.sap.com/Actor>,
        <http://kg.demo.sap.com/Director>,
        <http://kg.demo.sap.com/Genre>,
        <http://kg.demo.sap.com/Place> ;
    rdfs:range xsd:string .

<http://kg.demo.sap.com/Director> a owl:Class ;
    rdfs:label "Director" .

<http://kg.demo.sap.com/Genre> a owl:Class ;
    rdfs:label "Genre" .

<http://kg.demo.sap.com/Place> a owl:Class ;
    rdfs:label "Place" .

<http://kg.demo.sap.com/Actor> a owl:Class ;
    rdfs:label "Actor" .

<http://kg.demo.sap.com/Film> a owl:Class ;
    rdfs:label "Film" .

# Initialize the LLM
llm = ChatOpenAI(proxy_model_name="gpt-4o", temperature=0)

# Create a SPARQL QA Chain
chain = HanaSparqlQAChain.from_llm(
    llm=llm,
    verbose=True,
    allow_dangerous_requests=True,
    graph=graph,
)

# output = chain.invoke("Which movies are in the data?")
# output = chain.invoke("In which movies did Keanu Reeves and Carrie-Anne Moss play in together")
# output = chain.invoke("which movie genres are in the data?")
# output = chain.invoke("which are the two most assigned movie genres?")
# output = chain.invoke("where were the actors of "Blade Runner" born?")
# output = chain.invoke("which actors acted together in a movie and were born in the same city?")
output = chain.invoke("which actors acted in Blade Runner?")

print(output["result"])

[1m> Entering new HanaSparqlQAChain chain...[0m
Generated SPARQL:
[32;1m[1;3m\`\`\`
PREFIX kg: <http://kg.demo.sap.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?actor ?actorLabel
WHERE {
    ?movie rdf:type kg:Film .
    ?movie kg:title ?movieTitle .
    ?actor kg:acted_in ?movie .
    ?actor rdfs:label ?actorLabel .
    FILTER(?movieTitle = "Blade Runner")
}
\`\`\`[0m
Final SPARQL:
[33;1m[1;3m
PREFIX kg: <http://kg.demo.sap.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?actor ?actorLabel

FROM <kgdocu_movies>
WHERE {
    ?movie rdf:type kg:Film .
    ?movie kg:title ?movieTitle .
    ?actor kg:acted_in ?movie .
    ?actor rdfs:label ?actorLabel .
    FILTER(?movieTitle = "Blade Runner")
}
[0m
Full Context:
[32;1m[1;3mactor,actorLabel
http://www.wikidata.org/entity/Q81328,Q81328
http://www.wikidata.org/entity/Q723780,Brion James
http://www.wikidata.org/entity/Q1372770,William Sanderson
http://www.wikidata.org/entity/Q498420,M. Emmet Walsh
http://www.wikidata.org/entity/Q358990,James Hong
http://www.wikidata.org/entity/Q211415,Edward James Olmos
http://www.wikidata.org/entity/Q1691628,Joe Turkel
http://www.wikidata.org/entity/Q3143555,Hy Pyke
http://www.wikidata.org/entity/Q207596,Daryl Hannah
http://www.wikidata.org/entity/Q213574,Rutger Hauer
http://www.wikidata.org/entity/Q236702,Joanna Cassidy
http://www.wikidata.org/entity/Q1353691,Morgan Paull
http://www.wikidata.org/entity/Q230736,Sean Young
[0m

[1m> Finished chain.[0m
The actors who acted in Blade Runner are Brion James, William Sanderson, M. Emmet Walsh, James Hong, Edward James Olmos, Joe Turkel, Hy Pyke, Daryl Hannah, Rutger Hauer, Joanna Cassidy, Morgan Paull, and Sean Young.

What’s happening under the hood?

SPARQL Generation
The chain invokes the LLM with your Turtle-formatted ontology (graph.get_schema) and the user’s question using the SPARQL_GENERATION_SELECT_PROMPT. The LLM then emits a valid SELECT query tailored to your schema.
Pre-processing & Execution
- Extract & clean: Pull the raw SPARQL text out of the LLM’s response.
- Inject graph context: Add FROM <graph_uri> if it’s missing and ensure common prefixes (rdf:, rdfs:, owl:, xsd:) are declared.
- Run on HANA: Execute the finalized query via HanaRdfGraph.query() over your named graph.
Answer Formulation
The returned CSV (or Turtle) results feed into the LLM again—this time with the SPARQL_QA_PROMPT. The LLM produces a concise, human-readable answer strictly based on the retrieved data, without hallucination.

Initialize the `HanaRdfGraph`

To power the QA chain, you first need a HanaRdfGraph instance that:

Loads your ontology schema (in Turtle)
Executes SPARQL queries against your SAP HANA Cloud data graph

The constructor requires:

connection: an active hdbcli.dbapi.connect(...) instance
graph_uri: the named graph (or "DEFAULT") where your RDF data lives
One of:
1. ontology_query**: a SPARQL CONSTRUCT to extract schema triples
2. ontology_uri**: a hosted ontology graph URI
3. ontology_local_file** + ontology_local_file_format: a local Turtle/RDF file
4. auto_extract_ontology=True** (not recommended for production—see note)

graph_uri vs. Ontology

graph_uri:
The named graph in your SAP HANA Cloud instance that contains your instance data (sometimes 100k+ triples). If None or "DEFAULT" is provided, the default graph is used.
➔ More details: Default Graph and Named Graphs
Ontology: a lean schema (typically ~50-100 triples) describing classes, properties, domains, ranges, labels, comments, and subclass relationships. The ontology guides SPARQL generation and result interpretation.

Custom SPARQL CONSTRUCT query (ontology_query): Use a custom CONSTRUCT query to selectively extract schema triples.

ontology_query = """
		PREFIX owl: <http://www.w3.org/2002/07/owl#>
		PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
		PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
		PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
		CONSTRUCT {?cls rdf:type owl:Class . ?cls rdfs:label ?clsLabel . ?rel rdf:type ?propertyType . ?rel rdfs:label ?relLabel . ?rel rdfs:domain ?domain . ?rel rdfs:range ?range .}
		FROM <kgdocu_movies>
		WHERE { # get properties
			{SELECT DISTINCT ?domain ?rel ?relLabel ?propertyType ?range
			WHERE {
				?subj ?rel ?obj .
				?subj a ?domain .
				OPTIONAL{?obj a ?rangeClass .}
				FILTER(?rel != rdf:type) 
				BIND(IF(isIRI(?obj) = true, owl:ObjectProperty, owl:DatatypeProperty) AS ?propertyType)
				BIND(COALESCE(?rangeClass, DATATYPE(?obj)) AS ?range)
				BIND(STR(?rel) AS ?uriStr)       # Convert URI to string
  				BIND(REPLACE(?uriStr, "^.*[/#]", "") AS ?relLabel)
			}}
			UNION { # get classes
				SELECT DISTINCT ?cls ?clsLabel
				WHERE {
					?instance a/rdfs:subClassOf* ?cls .
					FILTER (isIRI(?cls)) .
					BIND(STR(?cls) AS ?uriStr)       # Convert URI to string
  					BIND(REPLACE(?uriStr, "^.*[/#]", "") AS ?clsLabel)
				}
			}
		}
"""
graph = HanaRdfGraph(
    connection=connection, graph_uri=graph_uri, ontology_query=ontology_query
)

Remote ontology URI (ontology_uri): Load the schema directly from a hosted graph URI.

graph = HanaRdfGraph(
    connection=connection,
    graph_uri=graph_uri,
    ontology_uri="<your-rdf-ontology-uri>",
)

Local RDF file (ontology_local_file + ontology_local_file_format): Load the schema from a local RDF ontology file. Supported RDF formats are Turtle, RDF/XML, JSON-LD, N-Triples, Notation-3, Trig, Trix, N-Quads.

graph = HanaRdfGraph(
    connection=connection,
    graph_uri=graph_uri,
    ontology_local_file="ontology_file.ttl",
    ontology_local_file_format="turtle",
)

Auto-extract ontology (auto_extract_ontology=True): Infer schema information directly from your instance data.

graph_with_auto_extracted_ontology = HanaRdfGraph(
    connection=connection, graph_uri=graph_uri, auto_extract_ontology=True
)

Note: Auto-extraction is not recommended for production—it omits important triples like rdfs:label, rdfs:comment, and rdfs:subClassOf in general.

Executing SPARQL Queries

You can use the query() method to execute arbitrary SPARQL queries (SELECT, ASK, CONSTRUCT, etc.) on the data graph.

The following query retrieves the top 10 movies with the highest number of contributors:

query = """
PREFIX kg: <http://kg.demo.sap.com/>
SELECT ?movieTitle (COUNT(?actor) AS ?actorCount)

FROM <kgdocu_movies>
WHERE {
    ?actor kg:acted_in ?movie .
    ?movie kg:title ?movieTitle .
}
GROUP BY ?movieTitle
ORDER BY DESC(?actorCount)
LIMIT 10
"""
top10 = graph.query(query)
print(top10)

movieTitle,actorCount
The Matrix Reloaded,39
The Matrix,19
Blade Runner,13

Question Answering with `HanaSparqlQAChain`

HanaSparqlQAChain ties together:

Schema-aware SPARQL generation
Query execution against SAP HANA
Natural-language answer formatting

Initialization

You need:

An LLM to generate and interpret queries
A HanaRdfGraph (with connection, graph_uri, and ontology)

qa_chain = HanaSparqlQAChain.from_llm(
    llm=llm, graph=graph, allow_dangerous_requests=True, verbose=True
)

Pipeline Overview

SPARQL Generation
- Uses SPARQL_GENERATION_SELECT_PROMPT
- Inputs:
  - schema (Turtle from graph.get_schema)
  - prompt (user’s question)
Query Post-processing
- Extracts the SPARQL code from the llm output.
- Inject FROM <graph_uri> if missing
- Ensure required common prefixes are declared (rdf:, rdfs:, owl:, xsd:)
Execution
- Calls graph.query(generated_sparql)
Answer Formulation
- Uses SPARQL_QA_PROMPT
- Inputs:
  - context (raw query results)
  - prompt (original question)

Prompt Templates

"SPARQL Generation" prompt

The sparql_generation_prompt is used to guide the LLM in generating a SPARQL query from the user question and the provided schema.

Answering prompt

The qa_prompt instructs the LLM to create a natural language answer based solely on the database results.

The default prompts can be found here: prompts.py

Customizing Prompts

You can override the defaults at initialization:

qa_chain = HanaSparqlQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    verbose=True,
    sparql_generation_prompt=YOUR_SPARQL_PROMPT,
    qa_prompt=YOUR_QA_PROMPT
)

Or swap them afterward:

qa_chain.sparql_generation_chain.prompt = YOUR_SPARQL_PROMPT
qa_chain.qa_chain.prompt              = YOUR_QA_PROMPT

sparql_generation_prompt must have the input variables: ["schema", "prompt"]

qa_prompt must have the input variables: ["context", "prompt"]

Setup & Installation​

Example: Question Answering over a “Movies” Knowledge Graph​

What’s happening under the hood?​

Initialize the HanaRdfGraph​

Executing SPARQL Queries​

Question Answering with HanaSparqlQAChain​

Initialization​

Pipeline Overview​

Prompt Templates​

"SPARQL Generation" prompt​

Answering prompt​

Customizing Prompts​

Was this page helpful?