TTC Streetcar Testers: A Lesson In Pair Programming

Unproductive Software Engineering Coops?

Unproductive Software Engineering Coops?

This past spring, I took a class called Software Quality Assurance. My professor, Mr. Laboon, had one primary focus throughout the semester: encourage us to create the highest quality software possible. One of the techniques that I was introduced to was the idea of “pair programming,” or in other words, having two programmers sit down in front of one machine and program together to solve a problem. I was intrigued by this idea, and throughout the class, I had a handful of opportunities to take that approach to complete the assignments.

After spring classes ended, I quickly transitioned into my second co-op rotation with Bridge Fusion Systems. Along with myself as a returning co-op, two new co-ops joined the Bridge Fusion Systems team: Erin Welling, a second rotation Electrical Engineering Co-op, and Nick Wilke, a first rotation Computer Science Co-op.

Bridge Fusion Systems specializes in embedded systems, which requires a combination of knowledge in both hardware (e.g. electrical circuits)  and software (e.g. programming in C/C++ and the paradigms involved with these kinds of applications). Having gone through my first co-op rotation here at Bridge Fusion Systems, I had first-hand experience with the questions and struggles of embedded systems: how do I flip a GPIO pin to high or low? What does pull up / pull down mean? Why do embedded programs run in a main while(1) loop? Why are there watchdogs within the system? Why are there so many state machines?

Putting myself back into the mindset of embedded after a long semester of Ruby programming, I was reminded of these struggles. Additionally, with Nick being a first-rotation co-op, having no prior embedded programming experience, this sounded like the perfect testing grounds for a pair programming experience. I brought this idea up to Andy during a one-on-one, and he was thrilled with the idea. Within a few days, Nick and I became a programming pair.

The project that we were assigned to work on was the Toronto Transportation Commission (TTC) Tester units. In Toronto, there is a streetcar system with a legacy switch control system fully implemented. Each streetcar emits a special signal that is picked up by loops in the track. Unfortunately, the devices that they use to ensure the streetcar is emitting a correct signal are no longer functional. Bridge Fusion Systems has been working for the past few months to build replacement hardware that is compatible with the legacy system already in place.

The Initial Benefits of Pair Programming

The first aspect of the testers that we worked on was the logging system. Since the TTC Testers are based on existing software and hardware, the testers use the same logging modules as the RTP-110. The RTP-110 writes log information to an SPI flash chip, which supports page-granularity when erasing data.  These chips have become obsolete. The replacement flash chips used in the TTC Testers, however, only support sector-granularity for erasing data. This was a known problem months before pair programming had begun, and for the most part, the underlying drivers were already modified during my first co-op rotation. However, the algorithm in place did not support partial log dumps, a feature that was previously requested by the customer. Because the drivers underneath the logging module were changed, the logging functionality of the entire system was broken when Nick joined the project.

AtBoard-Orig-750x412.png

Nick and I, using existing code written by Andy and Elliott, led development for the algorithm itself. My skills came into use when we wanted to take this algorithm, and split it up into an embedded paradigm that would work in the existing architecture of the TTC Tester codebase.

What was immediately refreshing to me was having another set of eyes looking at my code - and a quick pair, at that, since Nick is able to read code fairly quickly (at least faster than I am!). Once we both jumped into working on the logging system, we both realized how confusing this aspect of the program was. Originally there were two modules, named FormatDataLogging.c and DataLogging.c, many of which had functions that produced similar or identical outcomes. Sometimes, a function in DataLogging.c would call a FormatDataLogging.c function and vice versa. There was no coherent organization to these modules, and this made understanding the code a bit more difficult to do. With my newly found love for refactoring, Nick and I adjusted the code to make a lot more sense: there were three distinct modules to our logging system: BufferDataLogging.c, responsible for log entries coming into the battery-backed buffers of the system; FormatDataLogging.c, previously held the responsibility of BufferDataLogging.c, but now strictly deals with the formatting of log entries, down to the byte granularity; DataLogging.c, actually deals with permanently writing the log entry bytes to NVData / SPIFlash.

For Nick, the restructuring of the logging modules served as a good introduction to the architecture of the TTC Tester code. One of the first questions I recall from this experience was how we actually test and see whether or not our code is doing what we think it is. In embedded, many processor components interact in real time, in such a way that pausing the debugger and viewing the current state of the code doesn’t reflect what may actually be happening. DMA, for example, will continue to run even if the processor halts.

The takeaway: Once you’re comfortable in a project, it makes it a lot harder to see the flaws in said project’s organization. Throwing an outsider into the project, especially as a pair programmer, help with seeing things that a person familiar with the project would overlook.

How A Person Codes

One of the things that I noticed from the beginning was a fundamental difference in priority when it came to how Nick and I write code: Nick is very good at whipping code up quickly to get something done. He’s able to grasp the elements of the code quickly and figure out how everything works together immediately, albeit with some early struggles with embedded program structure.

However, I focus on the “fit” of the code that I’m writing, along with the “fit” of the code that’s already there. My philosophy when writing code is simple: code shouldn’t be a struggle to read. I should be able to look at it and understand what is going on, at least at a basic level. The original RTP-110 code follows a coherent code standard, but there were elements of the code where the complexity of the problem obscured the function of the code. When I’m working in an embedded systems codebase, I tend to refactor as I’m going along, just to make the naming of variables and functions clearer. For example, I’ll name functions such that they include the module name so I know where within the project they come from.

After the first few weeks, Nick became the “what” guy: Here’s what we have to do, and here’s how we could do it. I was the “why” guy: thinking more about the consequences of code changes, how our code changes can affect anything else in the program, and generally cleaning up or refactoring the code to make it easier to read and modify. When it came to making the adjustments for the logging system as a whole, this dynamic worked well.

The takeaway: Pair programming with different types and ideologies of programmers seems to work well-- knowing how to code and the consequences of your code is important.

Knowing When To Pair

The logging system in the TTC Testers was the primary reason that pair programming began within this project. However, once that component of the project was complete, there were other changes that needed to be made; arguably, for some of these changes, they didn’t necessitate the same amount of attention or require the same amount of time. Ultimately, it wasn’t the best approach to do the traditional pair programming techniques for these tasks. Instead, Nick and I would split up the work that needed to be done, and focus on finishing parts of the task at hand, on our own hardware, and then pair program the merged changes on one machine, just to make sure none of our changes would break the program.

Generally speaking, this was a good approach when it needed to be done, but it requires having people who recognize that pair programming isn’t the best approach for the particular scenario. For example, power manager and serial menu code were two aspects of the project that we could easily split up some of the tasks without stomping on each other, but ultimately, when we wanted to make sure that the changes we made would merge well together without conflict, that’s when we went back to the pair programming ideology. Perhaps this is also where my fatal flaw came into play: I have a strong desire to make code clean and readable, which means a lot of voluntary refactoring on my part (I swear, Nick thinks refactoring is my new favorite word).

The other good thing about kinda having the pair programming option hovering above our heads was that it kept us on track, even when Nick and I were having rough days focusing. As what Nick refers to as “bringing the ruckus”, there were days where one of us or both of us couldn’t quite focus on the work in front of us, and ultimately, might have spent a little bit too much time goofing off. But, at the same time, as a pair we were also able to police each other and keep each other on track. In a way, whether or not we were directly pair programming, just having the second person to keep me accountable for my work helped make me a bit more efficient in the tasks that were at hand.

The takeaway: Not every situation is a pair programming situation. But, having the option and knowing when to use it is a valuable tool. Simply being in a pair programming environment also helps keep each other on track for the tasks at hand.

Someone’s causing ruckus (Hint: it’s me!). Would this be considered a probe attack?

Someone’s causing ruckus (Hint: it’s me!). Would this be considered a probe attack?

The Hardware Could Be Wrong!

Most software engineering co-ops don’t really like touching hardware--I’m a weird exception to that rule myself. In software land, we assume that the hardware is there and functional for us, and we need to do everything we can from the software side to make things work. We rarely like to blame hardware-- But, in reality, hardware isn’t always right. And after a few days of struggling with power manager code, we learned this the hard way.

Before we received actual tester board hardware, we were running our firmware on older RTP-110 control boards. Since these boards didn’t have a power manager hardware built into them, we used a ST Microelectronics Nucleo Evaluation board to emulate the power manager. Additionally, since we were updating code on both the power manager and the control board, we didn’t want our OpenOCD debugger to confuse which device we actually wanted to debug to: so ultimately, we opted to use a USB wall outlet power brick. Except there was a weird problem: the Nucleo board would only run code on the STM32 processor if it was plugged into a USB port on a computer.

Given Nick and my lack of hardware knowledge, we deferred the problem to Andy, who is a bit more knowledgeable in hardware land. Andy came to the initial conclusion that we “probably” have a hardware problem, and thus, we should focus our programming efforts on trying to figure out why this was acting wonky. Additionally, it was the last functioning Nucleo board that we had in the office at the time, so there was no other hardware that we can test or compare against.

The jumper above the USB port was missing!

The jumper above the USB port was missing!

Nick and I tried adjusting the configurations for the board, but to no avail. After spending a day or two attempting to debug this behavior, Sean eventually made his way around and was able to bring in another Nucleo board to try out. On his Nucleo board, however, the code would run just fine when the board was powered through a USB wall outlet power brick.

As it turns out, there was a very minor hardware difference between the two boards, in the form of a missing jumper.

The takeaway: Pair programming can’t solve hardware problems.

Hardware that *isn’t* safe for Software Engineers!

This was probably one of the points where Nick struggled the most. Unfortunately, Nick didn’t have the best track record for keeping the micros alive. Enough so, that in the office, we created a counter for the days since our last dead micro.

You’ll notice that it in this photo, the counter is in the double digits-- only because we finally finished the project.

You’ll notice that it in this photo, the counter is in the double digits-- only because we finally finished the project.

You’ll notice that it in this photo, the counter is in the double digits-- only because we finally finished the project.

There were various reasons why we fried micros-- mostly because of our own carelessness, (e.g. from not being careful with the mess of wires we had on our desk and accidentally shorting 12V or -18V to 3.3V). Nonetheless, the careless mistakes that we made cost us, especially when it came down to production time, because at points, Nick and I were relying on each other’s hardware.

In this particular scenario, blowing up our own hardware actually helped us find a relatively serious bug, that could potentially cause an end user to blow up a micro. In other words, blowing up hardware on our own, from what Nick and I believed to be our own mistakes, led us to finding an actual hardware problem that could have sporadically and catastrophically destroyed hardware.

Nick seemed to be blowing up one particular board repeatedly. Andy originally blamed this on something that Nick must have been unaware of doing; that was, until Sean was using one of the production units for testing transmitter wands and had the same failure. In this case, Nick was nowhere near the hardware when it happened. More investigation by Sean uncovered the scenario that happened occasionally and would have happened to the end user. A fix was implemented in hardware to prevent damage to the unit, and a secondary fix was additionally added in software to help make the problem even less likely. As it turns out, that particular failure wasn’t really Nick’s fault!

Likewise, having a pair that really didn’t understand hardware made it even more worthwhile to place a focus on making our hardware “safe” for software engineers, as best as we could. In other words, don’t create a setup that could cause a disaster: keep track of your stray wires, and make sure they aren’t in a position to short higher voltages into 3.3V. Setups like these kept ourselves from destroying hardware left and right:

The takeaway: Take some time to sit down and know what the heck you’re doing, hardware wise, so that you don’t blow up hardware. If it means adjusting the hardware so that it’s harder to blow up, do it.

Putting Trust in the Overhead of Pair Programming

In our situation, Nick and I were in the perfect position to pair program. Nick previously had experience building software, and I had prior experience in embedded from my last co-op rotation. Compared to the rest of our team at Bridge Fusion Systems, Nick and I were really the closest in terms of time and experience within the field of Computer Science generally; thus, I think that helped me anticipate the things that Nick wouldn’t have known walking in to the world of embedded--after all, I was in Nick’s shoes less than a year ago, not really knowing my own footings in embedded land.

I’m still shocked that we spent company resources on printing out an XKCD meme, going to the effort to tape it to the wall, then grabbing the Testers and posing in front of the meme in the same pose as those in the picture, and taking a photo. Actually, it’s glorious and is a perfect example of the Bridge Fusion Systems culture… Eh, we have fun!

I’m still shocked that we spent company resources on printing out an XKCD meme, going to the effort to tape it to the wall, then grabbing the Testers and posing in front of the meme in the same pose as those in the picture, and taking a photo. Actually, it’s glorious and is a perfect example of the Bridge Fusion Systems culture… Eh, we have fun!

I think many reasons why companies tend to steer away towards pair programming is the idea of losing efficiency in regards to the amount of code that can be produced, or adding unnecessary overhead to the development process. And through Nick and I’s experience, I’ve developed a bit of a counterargument to this thought. There were many times where Nick and I got into discussions and/or even arguments over what the best approach was for implementing a feature or segment of code, and from the perspective of an outsider, that looks like wasted time. But in reality, it’s not: even if our discussions regarding the best implementation for code are seemingly pointless in the moment, learning the different methodologies and consequences of those methodologies helps us as embedded programmers in the long run. I constantly have to remind myself when working in embedded that the entirety of the program is tied together so closely, and so a change that I make in one portion of the code can have drastic effects on any other part of the code. How code is implemented in the world of embedded really matters in this kind of programming environment, and sometimes, the “overhead cost” of pair programming can really be worth it in the long run. I’m glad I had the opportunity to try this programming approach in en embedded environment at Bridge Fusion.

The takeaway: Pair program!