Get the highlights in your inbox every week.
8 roles on a cross-functional DevOps team
8 roles on a cross-functional DevOps team
Defining roles in the first step to implementing a squad model approach to DevOps. Here's an example of how to do it.
If you’re just getting started with a squad model, you may not be sure what roles you’ll need for your team to function smoothly. Our squad model in the IBM Digital Business Group is based on the Spotify Squad framework. At a high level, a squad is a small, cross-functional team that has autonomy to deliver on their squad mission. The squad missions and cross-squad priorities are set at an organizational level. Then within each squad, they decide "what to build, how to build it, and how to work together while building it."
We tweaked the Spotify squad model a bit to fit our own style of working. One key difference for us is that our squads are more long-lived than those at Spotify. Some squads in our org will last for a few months, and others will last for a couple of years. The squads that build and operate new services tend to be long-lived, while the mission-oriented squads that use existing services to build something new tend to be short-lived.
The key takeaway here is not that our way of assigning responsibilities to roles is the best or only way to do it. In fact, we have some variation between the squads in our own organization. The most important thing is that it’s crystal clear to everyone involved who is responsible for what. Perhaps you’ll see something on this list of responsibilities that you were missing before. So, let’s dive in with the first role.
HR manager/DevOps coachFirst-line managers represent the business. They also act as mentors and coaches for their teams. Many of our managers were once tech leads, project managers, or offering managers who have chosen to take on additional people management responsibility. They handle the following tasks:
- HR issues
- Money (procurement, reimbursement, etc.)
- Development discipline (code review guidelines, test coverage, etc.)
- Release management discipline (support plans, disaster recovery plans, etc.)
- Compliance (business conduct, auditability, security, etc.)
- Corporate initiatives
- Red tape and paperwork
- Removing impediments (equipment needs, technology needs, unmet dependencies, etc.)
- Career planning & growth (areas for improvement, stretch assignments, reassignments)
- Agile process coaching
Squad leads are the primary stakeholders for each squad. They own what the squad works on.
Squad leads start with loosely defined epics and long-term business plans. From there, they define and prioritize the milestones, hills, and story backlog to direct the work of the team.
- Define and prioritize the story backlog:
- Stories are in this format: "As a [who], I want to [what], for [reason]".
- Stories must include a description, business value, and acceptance criteria.
- Think about the wow/delight factor
- We can include technical foundation stories as well
- Work with other squad leads to manage other squads' priorities
- Prioritize and plan activities needed for upcoming stories:
- Design work
- A/B testing
- User research
- Other customer feedback
- Schedule and run weekly story scrubs with the tech lead and scrum master. At these sessions, they ensure that stories have:
- Well-defined acceptance criteria and assumptions
- Well-defined dependencies and prerequisites
- Designs and A/B tests as needed
- Determine which stories are ready to size and plan for a sprint
- Schedule and run a weekly story sizing with the developers. The stories should already be clear and ready to size thanks to the earlier scrub, so at the sizing meeting the team can discuss and quickly size the stories.
- As stories are completed, review them to make sure they meet the acceptance criteria
- Attend end-of-sprint playbacks
Technical leads decide how the squad's stories are implemented. First and foremost, they are contributing members of the squad, and we do expect them to write code, configure systems, and so on. They lead by example, and they have the final call on technical decisions within the squad.
Some of their other responsibilities include:
- Defining component boundaries and interfaces
- Helping the squad lead and scrum master define and refine stories
- Choosing the stories for the current sprint in consultation with the scrum master:
- From the top of the backlog
- Have no unsatisfied dependencies or blockers
- Adding tasks to the Kanban board/sprint plan
- Responsible for delivery of stories; stories must meet the acceptance criteria on time
- Responsible for technical debt and calling out technical foundation work that must be done
- Schedule and run story breakdowns with the developers. At the story breakdown meeting, the team breaks the stories down into tasks and puts the tasks into a logical, prioritized order.
- Schedule and run weekly learning/team-building exercises
Project managers help the HR manager and squad lead with coordination tasks. Not all squads have a dedicated project manager; in this case, the work is usually divided between the HR manager and squad lead.
Project managers perform the following tasks:
- Create status reports
- Open requirements on other squads
- Get status updates from other squads
- Plan non-development activities such as translation and globalization
- Complete release management activities such as security reviews and open source reviews
- Provide financial approvals
- Manage equipment orders and purchase orders
- Track machines and office space usage
- Manage maintenance requests
- Any other activities from the project management office
Scrum masters keep the Scrum/Kanban/Scrum-ban process moving smoothly. We give the individual squads leeway to decide what process they want to manage their work within the team, as long as they use Epics and Stories to communicate dependencies across teams. Squads with dedicated project managers often give some or all of these responsibilities to the project manager; other squads assign this work to the technical lead. Many squads have rotational scrum masters, so everyone learns more about agile software development methodologies. Often, the developer on call will be the scrum master for the week.
Scrum masters do the following tasks:
- Run all daily standups
- Keep them short and focused on "what I did yesterday, what I'll do today, any blockers"
- Move discussions to the post-scrum parking lot as needed
- Triage defects to determine which ones need to be worked on now vs. later
- Add important/urgent defects to the Kanban board or sprint plan to be fixed
- Investigate deployment errors and get them fixed
- Present during the sprint playbacks
- Record and handle impediments
- Track and follow up on external defects
- Follow up on blocked defects
- Schedule and run weekly team-building exercises
- Run weekly retrospectives and record retrospective results. The retrospective asks team members to write down and discuss what went well or helped them get work done, what went poorly or slowed things down, and what they can do better in the future.
All squad members follow some general guidelines to manage their work:
- Implement the tasks on the Kanban board/sprint plan.
- Defects have higher priority than new tasks
- Pick up a new top-priority task when you’re ready. You're ready when:
- You've reviewed your own code
- Unit tests and functional tests are in place
- Your code is up for review, in a pull request
- You've asked the squad to review your changes
- You’ve reviewed all open pull requests for your squad
- You've addressed outstanding review comments on your changes
- You have no more than one other task in the review process
- Prefer tasks you don’t know how to implement, so you’re always learning
- Tasks can be implemented alone, in pairs, or even in a mob
- Educate yourself on technology and frameworks needed to implement the task
- Figure out what you can on your own, but don't be afraid to ask for help
- For teams that use Scrum: You may work on what you like, once the stories for the sprint are completed
- For teams that use Kanban: You may allocate 20% of your time to work on side projects, as long as they benefit our company or our clients in some way
- Experiment with new technologies and share what you learn with the team
- Some squads also run their own operational infrastructure. Those squads will have an on-call rotation. Developers join the on-call rotation once the technical lead feels that they are ready and able to handle an outage. We like to mention this in the sprint playback and cheer for the developers who have graduated to this level.
The Github committers or Gitlab +2’s are guardians of quality. They are the people who have the right to merge code into the master branch. This is important because in a continuous delivery system, merging code into the master branch triggers a deployment to production. Therefore, this is a position of considerable responsibility.
Committer rights are earned by being an excellent code reviewer, and it’s up to the current maintainers/committers of each repository to nominate new committers. We have a strict code review process, based on the OpenStack and Github contribution models. All code changes, whether for defect fixes or new features, and it must be reviewed and approved by at least two people other than the originator, and one of those must be a committer.
Committer rights can be lost if a developer habitually acts in an irresponsible manner, or, more often, if a developer has stopped working in a code base long enough to forget how to maintain it.
Usually, our committers are the same people who are on-call to support the application 24/7. That said, all squad members help fix critical problems in production, not just the committers.
Test and operations squad
Why, you might ask, would an organization that expects squads to own and operate their own operations have a test and operations squad? The short answer is that it’s still helpful to have some true experts in test and operations who educate new teams, evaluate new tools, and share best practices.
Individual squads own their own unit tests and functional tests. Code coverage reports are required, and we aim for 100%. Low coverage numbers are a risk to system and integration testing. If a squad is reporting low test coverage, we simply ask them to plan to increase their test coverage every sprint until they have full test coverage.
The testers in our test and operations squad write automated tests for things like:
- End-to-end, cross-component flows
- Multi-browser support
- Globalization and localization across components
- Performance, scalability, security, etc.
- Automated tests can fail the build/prevent a deploy
- Our testers also do some creative manual and free-form testing
These experts help other teams to learn how to use new testing tools. They do this through documentation, training sessions, and pair programming.
The operations experts in our Test and Operations Squad teach others how to set up and tune:
- Production monitors
- On-call rotations
- Log collection and analysis
- Network administration
- Content delivery networks
This team also maintains operational dashboards that help us visualize the health of the system across components. Finally, this team sets up frameworks for running experiments (A/B tests and so on).
There you have it—the IBM Digital Business Group’s squad model in a nutshell. We’ve been doing this with about 80-90 squads for three years now. Hopefully, you’ve picked up a few best practices that have helped us along the way.
I would like to thank Richard Gebhardt for his contributions to our IBM squad model roles and responsibilities.
[See our related story, Your DevOps attempt will fail without these 7 departments buying in]