Skip to main content

Onboarding to a project

This is a part of our use case guide series, exploring how using Swimm documents can help engineering teams thrive.

When diving into a large codebase, getting overwhelmed is common as there is so much to learn. This happens whenever a developer gets to a new project whether they start a new job, switch teams, or just start contributing to a project they haven't contributed to so far.

There are many ways to start swimming in the unknown water of a new codebase (pun intended), yet many developers tend to fall into one of two common pitfall categories:

  1. Start without guidance, and find themselves lacking a lot of basic information. Even for an experienced engineer, it may be hard to understand all the underlying assumptions and high-level architecture by simply looking at the code. When β€œjumping into the water” without aid, engineers may spend a lot of time trying to figure out the big picture from the details.
  2. Start with guidance, but are overwhelmed with details. It is hard to separate the wheat from the chaff, especially for someone new to the project. Engineers may consume every possible resource they can access: documents, videos, and other resources. Many times, these resources are outdated and may, as a result, be more confusing than helpful. In other cases, they are just too specific. As someone learning a new codebase, you want to initially understand the big picture and how things are done in general.

Onboarding efficiently with Swimm documentation​

The most effective way to be productive when onboarding to a new project is to follow a two-step methodology:

Step 1 - Start with a high-level overview of the codebase. This step should provide the developer with everything they need to know to get started, but not everything they may need to know to solve any possible task at hand.

Swimm tip: We highly recommend using a Playlist - an ordered sequence of documents and other resources.

Step 2 - learn as you go, upon specific needs. Once you get the big picture, you should find the resources that help you with the specific tasks at hand. For example, if you need to add a new configuration value, check out the relevant part in the documentation that explains what files you should update, what to consider, and why.

Swimm tip: Use Swimm's unique IDE integrations, documentation finds the developers as they need it while browsing relevant code.

Below, we highlight how to approach each step to set up your team for success.

Step 1 - overview playlist​

After you've created the relevant documents (see below), you should compile them in a Swimm Playlist so that the developers can read through them in a structured manner. But what should this Playlist include?

As a rule of thumb, Playlists should include everything the developer needs to know to get started, but really nothing more than that.

In our experience, what developers consider to be "required knowledge" is not an exact science. In case of doubt, it is usually better to include information rather than omit it.

Note that when dealing with a microservice architecture, it might make sense to create this overview for a domain of related services rather than for each one separately.

Overview Playlists should include the following documents:

  • High-level overview
  • Folder structure
  • Dev environment
  • Main flows
  • Testing principles

High-level overview document​

This document should provide the entry point for developers, a high-level overview explaining a specific part of the codebase. The idea here is to help guide developers so that they have a better understanding of the context of the code as part of the whole puzzle.

A high-level overview document should include the following:

  • Goals for the repository. What is this repo trying to accomplish?
  • Leading technologies used. Introduce the major frameworks this codebase relies on. For example, ReactJS, Django, Kafka, or MongoDB. No need to mention every third-party library in the codebase, but rather include frameworks that are heavily integrated with the structure of the code.
  • Major components/models. What are the main parts of this codebase? For example, the codebase may consist of a frontend library as well as a Backend library, in addition to a CLI. Each one of these is a different major component.
  • Major conventions. Discuss, for example, file naming conventions that will help the reader navigate through files in the repo in general.

Two bonus items for high-level document overviews:

  • Diagrams. Consider including a diagram explaining the various components of this repository and the interactions between them.
  • How this repository fits in the bigger picture. For example, if the repository you are documenting is a microservice, explain how it interacts with other services.

Folder structure​

Directories create a much-needed structure. They are simply hierarchical logic that helps you find your way through many different files and code lines. Yet some directories are commonly used and others are not. Some contain lots of code, and others contain static assets and are rarely utilized.

Just like a map helps you navigate unknown territory, a well-written and well-structured document provides a high-level overview that will help navigate others through the codebase.

To create this doc, we recommend starting from the repository's root folder. Scan all folders and files, and mark those important to mention.

  • For every folder you marked in the previous step, do the same process recursively.
  • Do not go too deep; If you are more than three hierarchical layers deep, there is probably a folder that deserves a doc on its own. In fact, it is common to have multiple structure documents for a large codebase, one for every major component.

The result should be a list of the most important files and folders each with a short description. Structure your document in a logical manner: either purely hierarchical (that is, list all meaningful sub-folders under folder F1, then all important sub-folders under folder F2 and so on), or in an order that makes sense -e.g., F1 and then the folder containing tests for F1, T1.

You should not attempt to mention all files and folders. Mention only those that deserve the reader's attention when they first go through this codebase or module.

Dev environment​

Help developers run the project running locally by explaining the setup needed. Should they install anything special? What commands should they run to install dependencies? What should they run to debug the project?

Main flows​

Modern systems often manipulate data. Data is transferred from one component to another. Data initialized by a user's intention, sent from a frontend component to the store, can then perhaps trigger a backend function by sending a request, and the response can then be processed by yet another function.

Some of these flows are specific to a specific user story, but some patterns may recur in many cases in the repo, while some may occur once but get called frequently so they are important to understand.

When traveling to a new city, we use maps to navigate our way around. It is also important to know the traffic system in this city. In London, you should know the Underground. In Berlin, it may be the U-Bahn, S-Bahn, Bus, and Tram systems. This document will serve as your reader's map to the codebase. A document depicting the flow will explain how data gets from one part of the system to another.

Code that manipulates data can be hard to understand since it entails various areas of the code that interact with one another in a way that is not necessarily obvious. Imagine a simple web app where to understand the flow fully, you need to switch back and forth between the frontend and backend. Especially for someone new to the project, a document explaining how the interaction between different code areas can drastically ease onboarding and clarify the constituents and interactions involved.

Data is often manipulated at every step of the process, and developers need to know these steps and what happens in each one to understand the state of the data and what each step expects to receive.

First, understand what the major flows in your system are. Again, you should explain the Tube's system, not every station of every line, so look for the recurring patterns. Do you use a store? Is data first sent to some preprocessing? Are there multiple services in the process?

To write this doc:

  1. Go through your code, and trace the data as it flows back and forth between the different components.
  2. If there are flows that are recurring patterns, look for one that is simple yet covers the whole story. For example, if you explain the flow of an API request going from the frontend to the backend, find one example that shows this interaction clearly without too many specific details.

Testing principles​

Understanding the testing principles of a codebase is helpful at the initial step for two main reasons:

  1. To understand a codebase, it is really helpful to read tests as they usually cover complete flows and provide usage examples for various functionalities.
  2. It is common to implement tests yourself as you start working on a new codebase, and it is a great way to interact with it.

This document should describe the main types of tests and some of their basic principles. It should also link to other more detailed resources about testing such as a dedicated testing Playlist.

Step 2 - learn as you go​

After a developer gets the big picture, they should ideally get all the information they need to tackle the specific task at hand.

Using classic knowledge management solutions, this is a purely theoretical ideal state. If a relevant resource exists, it may be outdated and irrelevant (or misleading). Even if a relevant document exists, the developer may not know about it and not think to look for it, or worse, spend time looking for it in various resources and not finding what they are looking for.

With Swimm, the state is fundamentally different. Swimm keeps your documents up to date thanks to our patented Auto-sync algorithm.

And even more, Swimm's unique IDE integrations ensure that the relevant documentation finds the developer - when needed.

For example, say the developer needs to debug a certain component, and a document was once written about this component. While browsing the code of this component in their IDE, and without the need to search, Swimm's plugin informs the developer that a relevant doc exists and lets them read documentation right from the IDE.

Summary​

When diving into a large codebase, there is much to learn.

The most effective way to be productive when onboarding to a new project is to follow a two-step methodology:

Step 1 - get a high-level overview of the codebase. This step should provide the developer with everything they need to know to get started, but not everything they may need to know in order to solve any possible task at hand. Swimm Playlists are a great way to achieve this.

Step 2 - learn as you go, upon specific needs. Swimm's IDE integrations provide a unique way to get the information the developer needs at the right time.


This document is automatically kept up to date using Swimm.