Much of software development is searching for code. Either to find how something works, or to find something you can reuse. There are several broad classes of tool that can be used for searching code. In this article we’ll look at a few and evaluate their pros and cons.
In this category we have tools like the classic grep, but also faster replacements like ripgrep (rg). Though the many tools in this category vary slightly based on features, the general approach is the same. If you can represent your search in terms of a simple text match, or a regular expression, these tools can run the search over all the files in the project and surface results.
These tools can be very fast, even on the largest repositories. However, they are not always the most precise. Lets illustrate this with a test against the linux 6.6 codebase, which at the time of writing is 2.1GB.
Let’s try to find the definition of “struct device”, which is the key interface for defining devices in linux. We’ll use ripgrep which tends to be the fastest of the regexp based tools.
linux $ time rg "struct device"
...
real 0m1.659s
user 0m1.483s
sys 0m1.423s
linux $ rg "struct device" wc -l
63959
Very fast, yet most of the results are related to uses of this struct, rather than the definition. Though regex based approaches can be very fast, they share a common defecit: not all languages actually use a regular grammar. It is not always possible to construct a regex to match exactly what you need.
Sometimes, integrated development enviroments (IDEs) include some understanding of the code you are writing. This is more common with editors that are specific to particular programming languages. If you work with primarily a single programming language such tools can be quite effective. However many your language does not have such a specialized IDE, or if you often use multiple languages, then this approach may not work for you.
And the old joke: these IDEs are powerful operating systems but lack a good editor.
The other way to implement language specific search is via a tool designed just for that purpose. Rather than perform a search over all the files for each query, these tools typically precompute some information for the codebase. This analysis is usually more sophisticated than a regex, and can incorporate a better understanding of the lanugage’s grammar. This makes it possible to more precisely provide “jump to definition” like functionality.
OpenGrok is an open source code search engine, written in Java. It is self hostable, so you can deploy an instance for your self or your organization to search code.
SourceGraph is a hosted service that provides code search. It is used by several notable large organizations. If you are interested in a code search offering for your organizations internal code, you may want to look at SourceGraph.
SourceProbe is our offering of code search and navigation tools for individual developers. If you are primarily interested in searching open source code(which includes libraries for most languages), and dont want to wait for your organization to adopt SourceGraph, then SourceProbe could be a good fit for you.
Like SourceGraph, we provide this as a hosted service. So unlike OpenGrok, you can use SourceProbe without needing to maintain more infrastructure.