The case of mysterious BGP session resets caused by a malformed OTC attribute
We recently ran into an unusual issue: BGP sessions were resetting with no obvious explanation shortly after routers received new routes. To understand what was happening, we collected and analyzed the BGP UPDATE messages that consistently triggered the errors.
It quickly became clear that all problematic routes had one thing in common: they carried a malformed OTC attribute. The real root cause, however, was not just the presence of a bad attribute, but the way different routers reacted to it. Before looking at that behavior, it helps to recap what OTC is and how it is supposed to work.
What OTC is and how it works
OTC (BGP Only to Customer) is one of the two core components of the BGP roles framework defined in RFC9234. The framework, co-developed by Qrator engineers, is designed to detect and prevent route leaks.OTC is an optional BGP UPDATE attribute with a standard-defined length of 32 bits (4 octets). Its value is an ASN that is designated as the route origin for the downstream portion of the route. OTC works together with the BGP Role mechanism, which defines the relationship between two autonomous systems when they establish a BGP session (Provider–Customer, Peer–Peer, Customer–Provider).
For inbound routes, the receiving AS compares the OTC value with the ASN of the neighbor that advertised the route, taking into account the relationship established via the BGP Role mechanism. Routes received from customers or peers are rejected unless the OTC value matches the peer’s ASN, which prevents leaked routes from being accepted.
For outbound routes, the logic is similar. If an AS is about to advertise a route to its peers or providers (that is, to non-customers) and the route contains an OTC attribute that does not match the AS’s own ASN, the route is rejected to avoid creating a route leak.
In short, BGP Role and OTC let participating networks verify that routes propagate only in the intended direction, and detect cases where they do not — that is, route leaks.
RFC9234 also allows OTC to be added to otherwise legitimate routes where it is missing. This can be done both on egress by a transit AS that forwards the route, which writes its own ASN into OTC, and on ingress, by the receiving AS, which records the ASN of the neighbor that advertised the route. As a result, route leaks can be prevented even when the original route originators do not support RFC9234, effectively creating a form of collective immunity.
What the error messages looked like
Returning to our case, the immediate trigger for the BGP session resets was that some autonomous systems were sending BGP UPDATE messages with an incorrectly set Extended Length field inside the OTC attribute.
To understand the root cause of the failure, let’s examine a hex dump from one of the UPDATE messages that triggered a session reset. In this case, the fragment containing attribute type 35 (OTC) was: “f0fa 0423 0000 4cfe”
Because OTC is relatively new, many routers may not recognize it, but they still need to parse it somehow. Let's walk through the parsing as well (reading bytes in two-byte words with the right byte first, then the left) to determine how much memory the attribute is supposed to occupy and then extract its content.
- 0xfa — part of the previous attribute (Community), which we skip.
- 0xf0 — 0b11110000, attribute flags: Optional=1, Transitive=1, Partial=1, Extended Length=1.
- 0x23 — 35, attribute type, which corresponds to OTC.
The attribute length is determined by two elements:
- The Extended Length (EL) flag. If EL=0, the length field is 1 byte; if EL=1, it is 2 bytes.
- The length value itself.
When EL=1, the next two bytes are interpreted as the length. In this case, the next byte is:
- 0x04 — 4, would normally indicate the expected attribute length. However, because the Extended Length flag was set incorrectly, the parser also treated the following byte (0x00) as part of the length field. As a result, the length was read as 0x0400, that is, 1024 bytes, which triggered an error.
- 0x00004cfe — 0xfe4c or 65100, the attribute value (a private ASN in this example).
In other words, the issue appears to come from an implementation error in relatively new functionality: the EL flag and the actual encoding of the length field did not match.
Why a malformed OTC attribute caused BGP session resets
Once these malformed UPDATE messages began circulating between routers and autonomous systems, different implementations handled them in different ways:
- Some routers probably dropped the invalid route and continued operating normally. We say “probably” because such behavior is fundamentally unobservable: once the route is discarded, it is neither propagated nor visible downstream.
- Other routers attempted to correct the issue by rewriting the attribute’s EL flag to the expected value.
- Others ignored the inconsistency and forwarded the route with the malformed attribute intact.
- Finally, some routers reset the BGP session upon receiving the malformed attribute.
This last behavior is exactly what we observed in our case. But why were these routers reacting this way? RFC9234 explicitly defines how a malformed OTC attribute must be handled:
“The OTC Attribute is considered malformed if the length value is not 4. An UPDATE message with a malformed OTC Attribute SHALL be handled using the approach of ‘treat-as-withdraw’ [RFC7606].”
To put it simply, the standard requires that routes carrying a malformed OTC attribute be rejected. Unfortunately, as we observed, not all routers followed this requirement: instead of discarding the route, some forwarded the malformed UPDATE further or even reset the BGP session.
The latter behavior was clearly the result of routers acting not under the treat-as-withdraw procedure defined in RFC7606, but under an older rule specified in the base BGP specification:
“According to the base BGP specification [RFC4271], a BGP speaker that receives an UPDATE message containing a malformed attribute is required to reset the session over which the offending attribute was received.”
Conclusions
Bugs and edge cases are inevitable when new standards are introduced, but they must be identified and corrected in production software. This OTC incident shows that even when an RFC provides explicit guidance, real-world implementations can diverge significantly. In practice, this leads to unpredictable outcomes — from the silent propagation of malformed routes to seemingly inexplicable BGP session resets.The correct approach in such cases is consistent adherence to standards, specifically the RFC9234 requirement to apply the RFC7606 treat-as-withdraw procedure to the messages carrying malformed OTC attributes. The faster vendors and operators implement the specification correctly, the more resilient the global routing infrastructure becomes.
Support for the BGP roles framework and the OTC attribute has been available for some time in the FRR (FRRouting) and BIRD routing daemons. In September 2025, Juniper also added support for RFC9234, starting with Junos OS Release 25.2R1.