Traversal steps

The most important basic traversal steps will be the ones generated for your domain as highlighted in glimpse-of-a-simple-use-case.

In addition to the generated domain-specific steps based on your schema, there’s some basic traversal steps that you can use generically across domains, e.g. to traverse from a node (or an Iterator[Node]) to their neighbors, lookup their properties etc. There are also more advanced steps like repeat and advanced features like path tracking which will be described further below.

Tip

flatgraph traversals are based on Scala’s Iterator, so you can also use all regular collection methods. If you want to begin a traversal from a given node, the .start method will wrap that node in a traversal making the traversal steps available.

Step Types

The various traversal queries can be divided into a number of types: Filter, Map, Side Effect, and Terminal.

Filter Steps

Filter Steps are atomic traversals that filter nodes according to given criteria. The most common filter step is aptly-named filter, which continues the traversal in the step it suffixes for all nodes which pass its criterion. Its criterion is represented by a lambda function which has access to the node of the previous step and returns a boolean. Continuing with the previous example, let us execute a query which returns all METHOD nodes of the Code Property Graph for X42, but only if their IS_EXTERNAL property is set to false:

joern> cpg.method.filter(_.isExternal == false).name.toList 
res11: List[String] = List("main")

Tip

A note on Scala lambda functions: In the example above, we used the lambda function _.isExternal == false as the predicate for the filter. The _ is simply syntactic sugar referring to the parameter of the function, so this could be rewritten as method => method.isExternal == false.

Dissecting this query, we have cpg as the root object, a node-type step method which returns all nodes of type METHOD, a filter step where(_.isExternal == false) which continues the traversal only for nodes which have their IS_EXTERNAL property set to false (with _ referencing the individual nodes, and isExternal a property directive which accesses their IS_EXTERNAL property), followed by a property directive name which returns the values of the NAME property of the nodes that passed the Filter Step, and finally an Execution Directive toList which executes the traversal and returns the results in a list.

A shorter version of a query which returns the same results as the one above can be written using a Property-Filter Step. Property-filter steps are Filter Steps which continue the traversal only for nodes which have a specific value in the property the Property Filter Step refers to:

joern> cpg.method.isExternal(false).name.toList 
res11: List[String] = List("main")

Dissecting the query again, cpg is the root object, method is a node-type step, isExternal(false) is a property-filter step that filters for nodes which have false as the value of their IS_EXTERNAL property, name is a property directive, and toList is the execution directive you are already familiar with.

Tip

Be careful not to mix up property directives with property-filter steps, they look awfully similar. Consider that:

a) cpg.method.isExternal(true).name.toList returns all METHOD nodes which have the IS_EXTERNAL property set to true (in this case, 10 results)

b) cpg.method.isExternal.toList returns the value of the IS_EXTERNAL property for all METHOD nodes in the graph (12 results)

c) cpg.method.isExternal.name.toList is an invalid query which will not execute

A final Filter Step we will look at is named where. Unlike filter, this doesn’t take a simple predicate A => Boolean, but instead takes a Traversal[A] => Traversal[_]. I.e. you supply a traversal which will be executed at the current position. The resulting Traversal will preserves elements if the provided traversal has at least one result. The previous query that used a Property Filter Step can be re-written using where like so:

joern> cpg.method.where(_.isExternal(false)).name.toList 
res24: List[String] = List("main")

Maybe not particularly useful-seeming given this specific example, but keep it in the back of your head, because filter is a handy tool to have in the toolbox. Next up, Map Steps.

Map Steps

Map Steps are traversals that map a set of nodes into a different form given a function. Map Steps are a powerful mechanism when you need to transform results to fit your specifics. For example, say you’d like to return both the IS_EXTERNAL and the NAME properties of all METHOD nodes in X42’s Code Property Graph. You can achieve that with the following query:

joern> cpg.method.map(node => (node.isExternal, node.name)).toList
res6: List[(Boolean, String)] = List(
  (false, "main"),
  (true, "fprintf"),
  (true, "exit"),
  (true, "<operator>.logicalAnd"),
  (true, "<operator>.equals"),
  (true, "<operator>.greaterThan"),
  (true, "strcmp"),
  (true, "<operator>.indirectIndexAccess"),
  (true, "printf")
)

Don’t be intimidated by the syntax used in the map Step above. If you examine map(node => (node.isExternal, node.name)) for a bit, you might be able to infer that the first node simply defines the variable that represents the node which preceeds the map Step, that the ASCII arrow => is just syntax that preceeds the body of a lambda function, and that (node.isExternal, node.name) means that the return value of the lambda is a list which contains the value of the isExternal and name Property Directives for each of the nodes matched in the previous step and also passed into the lambda. In most cases in which you need map, you can simply follow the pattern above. But should you ever feel constrained by the common pattern shown, remember that the function for the map step is written in the Scala programming language, a fact which opens up a wide range of possibilities if you invest a little time learning the language.

Side Effect Steps

Side Effect Steps are traversal steps that perform an action or modify the state of the traversal without altering the path of the traversal itself. They do not directly contribute to the results that are returned, but they might be used to store information, log data, or manipulate variables during traversal. These steps can be thought of as adding “side effects” to the traversal that can be useful for various purposes like counting, aggregating, or modifying data.

Terminal Steps

Terminal Steps are steps that end the traversal and return the final result. Once a terminal step is reached, the traversal is considered complete, and it provides the output in some form (e.g., a list, a set, or a single element). Unlike intermediate steps that continue building the traversal, terminal steps execute the traversal and stop further processing. After a terminal step, the traversal cannot be continued or extended; it’s finished.

Traversal Steps

The steps described below are available when called on an Iterator. For these to be available, the following packaged must be imported, i.e., import flatgraph.traversal.language.*.

Basic steps

Assuming you have an Iterator[X], where X is typically a domain specific type, but could also be flatgraph’s root type for nodes GNode, here’s a (non-exhaustive) list of basic traversal steps.

Name	Type	Notes
and	Filter	Only preserves elements for which all of the given traversals have at least one result.
cast[B]	Map	Casts all elements to given type `B`.
choose	Filter	Allows to implement conditional semantics: if, if/else, if/elseif, if/elseif/else, …
coalesce	Filter	Evaluates the provided traversals in order and returns the first traversal that emits at least one element.
collectAll[B]	Filter	Collects all elements of the provided class `B` (beware of type-erasure).
dedup	Filter	Deduplicate elements of this traversal - a.k.a. distinct, unique.
dedupBy	Filter	Deduplicate elements of this traversal by a given function.
discardPathTracking	Side Effect	Disables path tracking, and any tracked paths so far.
enablePathTracking	Side Effect	Enable path tracking - prerequisite for path/simplePath steps.
filter	Filter	Filters in everything that evaluates to true by the given transformation function.
filterNot	Filter	Filters in everything that evaluates to false by the given transformation function.
groupCount	Map	Group elements and count how often they appear.
groupCount[B]	Map	Group elements by a given transformation function and count how often the results appear.
head	Terminal	The first element of the traversal.
is	Filter	Filters in everything that is the given value.
or	Filter	Only preserves elements for which at least one of the given traversals has at least one result.
l/toSet/toSeq	Terminal	Execute the traversal and returns the result as a list, set, or indexed sequence respectively.
last	Terminal	The last element of the traversal.
not	Filter	Filters out everything that is not the given value. Alias for `whereNot`.
path	Terminal	Retrieve entire paths that have been traversed thus far.
repeat	Map	Repeat the given traversal.
sideEffect	Side Effect	Perform side effect without changing the contents of the traversal.
simplePath	Filter	Ensure the traversal does not include any paths that visit the same node more than once.
size	Terminal	Total size of elements in the traversal.
sorted	Map	Sort elements by their natural order.
sortBy	Map	Sort elements by the value of the given transformation function.
union	Filter	Union/sum/aggregate/join given traversals from the current point.
within	Filter	Filters out all elements that are not in the provided set.
without	Filter	Filters out all elements that are in the provided set.
where	Filter	Only preserves elements if the provided traversal has at least one result.
whereNot	Filter	Only preserves elements if the provided traversal does not have any results.

Node Steps

When starting the traversal from an Iterator of nodes GNode.

Name	Type	Notes
both	Map/Filter	Follow both in and out-neighbours for a given node. Can be restricted by edge type.
bothE	Map/Filter	Follow both in and out-edges for a given node. Can be restricted by edge type.
hasLabel	Filter	Filters in nodes that match the given labels. Alias for `label`
id	Map/Filter	Return a unique identifier(s) for the node(s) in the traversal. Can filter by given IDs.
in	Map/Filter	In-neighbours for a given node. Can be restricted by edge type.
inE	Map/Filter	In-edges for a given node. Can be restricted by edge type.
out	Map/Filter	Out-neighbours for a given node. Can be restricted by edge type.
outE	Map/Filter	Out-edges for a given node. Can be restricted by edge type.
property	Map	Retrieve the value for a single property for the defined property name.
propertiesMap	Map	Retrieves all entity properties as a map.
label	Map/Filter	Node label. Can filter by given labels.
labelNot	Filter	Inverse of `label`.

Edge Steps

When starting the traversal from an Iterator of nodes Edge.

Name	Type	Notes
src	Map	Traverse to the source node (out-going node).
dst	Map	Traverse to the destination node (incoming node).

Property Directives

The steps described below are available when called on the entity/object directly. These are available as methods or properties on the objects so no import is necessary.

Graph Steps

Steps available from an instance of Graph.

Name	Type	Notes
allNodes	Map	The nodes of the graph.
allEdges	Map	The edges of the graph.
edgeCount	Terminal	The total edges in the graph. Can be restricted by a given label.
nodes	Filter	Create a traversal from the nodes of the graph that match the given IDs or labels.
nodeCount	Terminal	Total nodes in the graph. Can be restricted by a given label.

Edge Steps

Steps available from an instance of Edge.

Name	Type	Notes
label	Map	The edge label.
propertyName	Map	The property value of the edge, if one exists.

Node Steps

Steps available from an instance of GNode.

Name	Type	Notes
graph	Map	The graph this node belongs to.
id	Map	The node identifier.
label	Map	The node label.