Traversal steps

The most important basic traversal steps will be the ones generated for your domain as highlighted in glimpse-of-a-simple-use-case.

In addition to the generated domain-specific steps based on your schema, there’s some basic traversal steps that you can use generically across domains, e.g. to traverse from a node (or an Iterator[Node]) to their neighbors, lookup their properties etc. There are also more advanced steps like repeat and advanced features like path tracking which will be described further below.

Tip

flatgraph traversals are based on Scala’s Iterator, so you can also use all regular collection methods. If you want to begin a traversal from a given node, the .start method will wrap that node in a traversal making the traversal steps available.

Step Types

The various traversal queries can be divided into a number of types: Filter, Map, Side Effect, and Terminal.

Filter Steps

Filter Steps are atomic traversals that filter nodes according to given criteria. The most common filter step is aptly-named filter, which continues the traversal in the step it suffixes for all nodes which pass its criterion. Its criterion is represented by a lambda function which has access to the node of the previous step and returns a boolean. Continuing with the previous example, let us execute a query which returns all METHOD nodes of the Code Property Graph for X42, but only if their IS_EXTERNAL property is set to false:

joern> cpg.method.filter(_.isExternal == false).name.toList 
res11: List[String] = List("main")
Tip

A note on Scala lambda functions: In the example above, we used the lambda function _.isExternal == false as the predicate for the filter. The _ is simply syntactic sugar referring to the parameter of the function, so this could be rewritten as method => method.isExternal == false.

Dissecting this query, we have cpg as the root object, a node-type step method which returns all nodes of type METHOD, a filter step where(_.isExternal == false) which continues the traversal only for nodes which have their IS_EXTERNAL property set to false (with _ referencing the individual nodes, and isExternal a property directive which accesses their IS_EXTERNAL property), followed by a property directive name which returns the values of the NAME property of the nodes that passed the Filter Step, and finally an Execution Directive toList which executes the traversal and returns the results in a list.

A shorter version of a query which returns the same results as the one above can be written using a Property-Filter Step. Property-filter steps are Filter Steps which continue the traversal only for nodes which have a specific value in the property the Property Filter Step refers to:

joern> cpg.method.isExternal(false).name.toList 
res11: List[String] = List("main")

Dissecting the query again, cpg is the root object, method is a node-type step, isExternal(false) is a property-filter step that filters for nodes which have false as the value of their IS_EXTERNAL property, name is a property directive, and toList is the execution directive you are already familiar with.

Tip

Be careful not to mix up property directives with property-filter steps, they look awfully similar. Consider that:

a) cpg.method.isExternal(true).name.toList returns all METHOD nodes which have the IS_EXTERNAL property set to true (in this case, 10 results)

b) cpg.method.isExternal.toList returns the value of the IS_EXTERNAL property for all METHOD nodes in the graph (12 results)

c) cpg.method.isExternal.name.toList is an invalid query which will not execute

A final Filter Step we will look at is named where. Unlike filter, this doesn’t take a simple predicate A => Boolean, but instead takes a Traversal[A] => Traversal[_]. I.e. you supply a traversal which will be executed at the current position. The resulting Traversal will preserves elements if the provided traversal has at least one result. The previous query that used a Property Filter Step can be re-written using where like so:

joern> cpg.method.where(_.isExternal(false)).name.toList 
res24: List[String] = List("main")

Maybe not particularly useful-seeming given this specific example, but keep it in the back of your head, because filter is a handy tool to have in the toolbox. Next up, Map Steps.

Map Steps

Map Steps are traversals that map a set of nodes into a different form given a function. Map Steps are a powerful mechanism when you need to transform results to fit your specifics. For example, say you’d like to return both the IS_EXTERNAL and the NAME properties of all METHOD nodes in X42’s Code Property Graph. You can achieve that with the following query:

joern> cpg.method.map(node => (node.isExternal, node.name)).toList
res6: List[(Boolean, String)] = List(
  (false, "main"),
  (true, "fprintf"),
  (true, "exit"),
  (true, "<operator>.logicalAnd"),
  (true, "<operator>.equals"),
  (true, "<operator>.greaterThan"),
  (true, "strcmp"),
  (true, "<operator>.indirectIndexAccess"),
  (true, "printf")
)

Don’t be intimidated by the syntax used in the map Step above. If you examine map(node => (node.isExternal, node.name)) for a bit, you might be able to infer that the first node simply defines the variable that represents the node which preceeds the map Step, that the ASCII arrow => is just syntax that preceeds the body of a lambda function, and that (node.isExternal, node.name) means that the return value of the lambda is a list which contains the value of the isExternal and name Property Directives for each of the nodes matched in the previous step and also passed into the lambda. In most cases in which you need map, you can simply follow the pattern above. But should you ever feel constrained by the common pattern shown, remember that the function for the map step is written in the Scala programming language, a fact which opens up a wide range of possibilities if you invest a little time learning the language.

Side Effect Steps

Side Effect Steps are traversal steps that perform an action or modify the state of the traversal without altering the path of the traversal itself. They do not directly contribute to the results that are returned, but they might be used to store information, log data, or manipulate variables during traversal. These steps can be thought of as adding “side effects” to the traversal that can be useful for various purposes like counting, aggregating, or modifying data.

Terminal Steps

Terminal Steps are steps that end the traversal and return the final result. Once a terminal step is reached, the traversal is considered complete, and it provides the output in some form (e.g., a list, a set, or a single element). Unlike intermediate steps that continue building the traversal, terminal steps execute the traversal and stop further processing. After a terminal step, the traversal cannot be continued or extended; it’s finished.

Traversal Steps

The steps described below are available when called on an Iterator. For these to be available, the following packaged must be imported, i.e., import flatgraph.traversal.language.*.

Basic steps

Assuming you have an Iterator[X], where X is typically a domain specific type, but could also be flatgraph’s root type for nodes GNode, here’s a (non-exhaustive) list of basic traversal steps.

NameTypeNotes
andFilterOnly preserves elements for which all of the given traversals have at least one result.
cast[B]MapCasts all elements to given type B.
chooseFilterAllows to implement conditional semantics: if, if/else, if/elseif, if/elseif/else, …
coalesceFilterEvaluates the provided traversals in order and returns the first traversal that emits at least one element.
collectAll[B]FilterCollects all elements of the provided class B (beware of type-erasure).
dedupFilterDeduplicate elements of this traversal - a.k.a. distinct, unique.
dedupByFilterDeduplicate elements of this traversal by a given function.
discardPathTrackingSide EffectDisables path tracking, and any tracked paths so far.
enablePathTrackingSide EffectEnable path tracking - prerequisite for path/simplePath steps.
filterFilterFilters in everything that evaluates to true by the given transformation function.
filterNotFilterFilters in everything that evaluates to false by the given transformation function.
groupCountMapGroup elements and count how often they appear.
groupCount[B]MapGroup elements by a given transformation function and count how often the results appear.
headTerminalThe first element of the traversal.
isFilterFilters in everything that is the given value.
orFilterOnly preserves elements for which at least one of the given traversals has at least one result.
l/toSet/toSeqTerminalExecute the traversal and returns the result as a list, set, or indexed sequence respectively.
lastTerminalThe last element of the traversal.
notFilterFilters out everything that is not the given value. Alias for whereNot.
pathTerminalRetrieve entire paths that have been traversed thus far.
repeatMapRepeat the given traversal.
sideEffectSide EffectPerform side effect without changing the contents of the traversal.
simplePathFilterEnsure the traversal does not include any paths that visit the same node more than once.
sizeTerminalTotal size of elements in the traversal.
sortedMapSort elements by their natural order.
sortByMapSort elements by the value of the given transformation function.
unionFilterUnion/sum/aggregate/join given traversals from the current point.
withinFilterFilters out all elements that are not in the provided set.
withoutFilterFilters out all elements that are in the provided set.
whereFilterOnly preserves elements if the provided traversal has at least one result.
whereNotFilterOnly preserves elements if the provided traversal does not have any results.

Node Steps

When starting the traversal from an Iterator of nodes GNode.

NameTypeNotes
bothMap/FilterFollow both in and out-neighbours for a given node. Can be restricted by edge type.
bothEMap/FilterFollow both in and out-edges for a given node. Can be restricted by edge type.
hasLabelFilterFilters in nodes that match the given labels. Alias for label
idMap/FilterReturn a unique identifier(s) for the node(s) in the traversal. Can filter by given IDs.
inMap/FilterIn-neighbours for a given node. Can be restricted by edge type.
inEMap/FilterIn-edges for a given node. Can be restricted by edge type.
outMap/FilterOut-neighbours for a given node. Can be restricted by edge type.
outEMap/FilterOut-edges for a given node. Can be restricted by edge type.
propertyMapRetrieve the value for a single property for the defined property name.
propertiesMapMapRetrieves all entity properties as a map.
labelMap/FilterNode label. Can filter by given labels.
labelNotFilterInverse of label.

Edge Steps

When starting the traversal from an Iterator of nodes Edge.

NameTypeNotes
srcMapTraverse to the source node (out-going node).
dstMapTraverse to the destination node (incoming node).

Property Directives

The steps described below are available when called on the entity/object directly. These are available as methods or properties on the objects so no import is necessary.

Graph Steps

Steps available from an instance of Graph.

NameTypeNotes
allNodesMapThe nodes of the graph.
allEdgesMapThe edges of the graph.
edgeCountTerminalThe total edges in the graph. Can be restricted by a given label.
nodesFilterCreate a traversal from the nodes of the graph that match the given IDs or labels.
nodeCountTerminalTotal nodes in the graph. Can be restricted by a given label.

Edge Steps

Steps available from an instance of Edge.

NameTypeNotes
labelMapThe edge label.
propertyNameMapThe property value of the edge, if one exists.

Node Steps

Steps available from an instance of GNode.

NameTypeNotes
graphMapThe graph this node belongs to.
idMapThe node identifier.
labelMapThe node label.