Google’s scientists lift lid on Effingo at Sigcomm 2024
Effingo is different in that it “has requirements and features uncommon in reported large-scale data transfer systems.” Rather than optimizing for transfer time, it optimizes for smooth bandwidth usage while controlling network costs by, for example, optimizing the copy tree to minimize the use of expensive links such as subsea cables.
Its other design requirements included client isolation, which prevents transfers by one client affecting those of other clients; isolated failure domains restricting copies between two clusters from depending on a third cluster; data residency constraints that prohibit copies being made to any location not explicitly specified by the client; and data integrity checks to prevent data loss or corruption. And, the system must continue to operate even when dependencies are slow or temporarily unavailable.
The paper provides details of how Google achieved each of these goals, with a section on lessons learned chronicling Effingo’s evolution. It emphasizes, however, that Effingo is still a work in progress and is continuously evolving. The authors said that Google plans to improve CPU usage during cross-data center transfers, improve integration with resource management systems, and enhance the control loop to let it scale out transfers faster.
Nabeel Sherif, principal advisory director at Info-Tech Research Group, sees great value in the service. He said today, “while there might be considerations around cost and sustainability for such a resource- and network-intensive use case, the ability for organizations to greatly increase the scale and distance of their georedundancy means being able to achieve better user experiences as well as removing some of the limitations of making data accessible to applications that don’t sit very close by.”
This, he said, “can be a game changer in both the areas of business continuity, global reach for web applications, and many other types of collaborations.”