Meetups/Infra/2026-03-16
(Preamble:
= Meetup - Infra = https://www.noisebridge.net/wiki/Meetups/Infra https://www.noisebridge.net/wiki/Meetups/Infra/2026-..-.. )
2026-03-16m Meetups/Infra
| Noisebridge | About | Visit | 272 | Manual | Contact | Guilds | Stuff | Events | Projects | Meetings | Donate | E |
| Events | 5MoF | Hosting | Streaming | Meetup | Classes | Anniversaries | Hackathons Upcoming Events | External Events | Past Events | Future Events |
E |
| Meetups / Infra: 2026 | Template | Pad (live notes) | Jitsi (video call/screen sharing) | (M | lu.ma | discord events | chat) | V · T · E |
(TODO summary)
Introductions
[edit | edit source]- [name] - [background]. [goals for meetup, or interests to explore]
- Loren - background: data & sre, exploring tail at scale, statistics
- Ellie - background, self hosting
- Jason - software engineer, SRE, devops.
- Ciara - self hosting.
- Amber - new, from chicago, works with web3, blockchain, has startup. background in art direction
- Zacchae - interested in democratizing usage of computers to share resource, reduce bariier to entry, know someone and trust someone who manaages their own computer
- Chris - building a party calendar
- Derek - applying for jobs and procrastinating by writing an application that tracks my applying to jobs
- Dave - between job, infra dev
- Victor - nix sycophant :D
- Elan - iontror self hosting, self hosting, added nix package manager
- David - CS student, infrastucture is new, inventory management, network infrastructure, installation, running cable. Starting AWS terraform on open avenues, take, hardware refurb
- Tre - Working on games, pixel artist, solo dev. Infrastucture is cool, much gratitude.
- Heather - back to the bay area.
- Jet - optimizing a minecraft server, nix sycophant, nixophant
- Robbie - like the random things we talk about here
- Max - ttravelling around south east asia, kind of looking for a job.
- Erik - likes infrastructure, where are things going to land since AI advent. Creating a giant Terraform project, then destroying it.
- Rachel - just using 3d printer
- Robert - interested in OSes, in customizing linux, into robotics + hw soon too
Lesson
[edit | edit source]Papers
[edit | edit source]= Tail at Scale
[edit | edit source]https://research.google/pubs/the-tail-at-scale/ https://www.barroso.org/publications/TheTailAtScale.pdf StrangeLoop talk on this work (Kathryn McKinley @ Google): https://www.thestrangeloop.com/2017/measuring-and-optimizing-tail-latency.html
- Papers similar to tail at scale?
- setup motivation - written at google by Jeff Dean, chief engineer. the tail end of a distribution of response. many connections for different page components shopping, AI, graph parallel connections required what fractions of page view - latency before bouncing, 100-200 ms everyone is annoyed by latency If there are thousands of requests for a single page load, then counterintuitively, many users will experience painful latency if the slow request is on the critical path. Redundent requests, and other techniques, refinements for structure requests to minimize latency. there are approaches that handle the write pattern, most applications are not super senstive to having the most upd to date. Ther last one, most of the techniques only apply to writes, eventual consistency. Media wiki documentation, they have multiple writes, read replicas and lots of chaching, editors freak out if they can't see their edits. Proxy nodes include revision ID, if they just made an edit the CDN know the user needs to. Write ahead log? (your changes haven't made into the central database)? Disk flushing, cache for awhile, if editor is editing pagee, changes are locally stored,
create a mapping, master/slave, reader follower, not necessarily trailing, this header that the edge CDN receives
TAG TEAM!
The approaches are aimed at tackling arbitrary types of latency, network issues, window sunight thermal throttling under any type of latency causality.
Concept you should have in your head. Within Request short term adaptations:
sending redundent requests can minimize latency given differnet latency distributions. For two requests, after a short delay, send another. If the first one completes, send a cancel request. Adds acceptable overhead even with compound requests.
Fun paper: The proplem of metastability, how things break and then stay broken under moderate to high load: https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s11-bronson.pdf
Metastablity
[edit | edit source]The paper is summary, covers the problem taht sometimes you have complex systems, under a fair amount of load, service goes down, when rolling back, the systems remains broken, load effect become self sustaining. Weird dynamic effects, requiring aggressive intervention. With huge pipeline, and all the requests build up. The only recovery solution clears the pipeline.
Collection of gnarly story. Not a poison pill.
What's a poison pill? What's a load spike?
Cloudflare outages
Process for recovery, slow bringup Failure are vexxing and frustrating for team. The OPs team. Hard to predict, with load tests. Complexity is hard to predict transition point to failure is unpredictable
https://engineering.fb.com/2014/11/14/production-engineering/solving-the-mystery-of-link-imbalance-a-metastable-failure-state-at-scale/ https://muratbuffalo.blogspot.com/2023/09/metastable-failures-in-wild.html
interconnected services If your service is over resource, it's fine up to a point can go N-times over, you get warnings and warning and warning, dealing with an outage mmight have a spike, if something is unexpected, scale up quickly as opposed to killing a service and get stuck in a failure loop.
What is the difference between scaling up and scaling out?
What is hotswapping? Hot fix, seemlessly switching over service.
Classifying essential/non-essential is a big difficult task, if you can drop of the overloaded non-essential services. Identifying them is difficult
At the end of the blog post, there is a paper summary from 600 postmortems, by vendor, date, understood from the incidence reports
Token lottery, minefield development
terraform / cloud formation / nb infra?
[edit | edit source]Terraform course. Worth talking about target endstate.
- Ansible pets - Ansible notebooks, declarative coding with imperative playbooks, the framework handles SSH on different machines to configure long lived hosts. Other ways, may be deliver container images and kubernetes. - Containers deployment does require some config - Kubernetes teraform defines hardware as code
Cattle vs. Pets?
- cattle are easily replicated - pets have names, child computer with specific settings - key distinction is that pets have a lot of time invested - pro cattle! no special singletons
- Noisebridge has git ops now!
- the way to deploy? Run ansible playbooks - Deploying was happening from hacker in Berlin - No CI server, - noise garden is well documented for destroying and resurrecting - deployed everying - wiki broke - the wiki is down? Error message: Can't find the database! - What if ansible playbooks nukes the backups? - deployment drift - need to implement continuous deployment -https://noisebridge.zulipchat.com/#narrow/channel/558694-rack/topic/OMG.3A.20noisebridge.2Enet.20ansible.20woes.20.F0.9F.96.A5.EF.B8.8F.F0.9F.94.A5/with/578724560 - blameself post-mortems :D - git ops - maybe a good topic for the friday meeting - separating data management, stick to containers, setup a read replica and testing. - infrastructure roundtable, rack meetup. - should have a replicated media wiki server? - what would be the solution to the post mortem? - smoke tests of services - open source pager duty? Uptime kuma - fun to run noisebridge as a startup? - testing and prod - Bystander effect for pager duty, solve with round robin scheduler.
- Work will get done outside of meetup
- later this week wikipedians meetup - media wiki game jam this weekend - how to do cool things with media wiki and data - rack is the infra guild, rack is called rack guild nb.wtf/resources/rack - if you don't write about them people won't bug you about it? - network storage? - collosus, there is a proxmox server. einhorn? unicorn? Where is it? No one knows!
[aka "terrorform"]
- performance
- Ciara: want to scale up Matrix, element etc
https://noisebridge.zulipchat.com/
(mediawiki next week) https://meta.wikimedia.org/wiki/BAWUG
https://meta.wikimedia.org/wiki/Bay_Area_Wikipedians_User_Group/Events https://en.wikipedia.org/wiki/Event:Bay_Area_Meetup_March_2026
Outros
[edit | edit source]- Loren - wiki technical, wiki games, adapt to our data (reading changelogs together)
- Ellie - the tail at scale, gonna read
- Jason - metastable paper looks super interesting, containerizig more infrastructure
- Ciara - read the tail at scale, love infra stuff / papers. Reminded me of discord sharding at scale, 12 trillion connections at scale
- Robbie - learned about cattle vs pets distinction, going to turn my pets into cattle (hard)
- Perry - Looking fowward to reading the notes
- Zaccae - enjoyed discussion, split brain ideal git repo vs. noisebridge m3. Configuration drift
- Chris - doing own thing
- Jet - want to think more about CI/CD make sure things are communicated well and more resili(a|e)nt (loren: it's a culture thing, make sure it's robust in 6-12 mo, next people, different environment)
- Derek - Was surprised how intricate request/response analysis
- Dave - tail at scale is cool (also tailscale)
- Victor - liked the metastable
- Elan - would like to implement some of the suggestions, look into that later tonight
- Max - like to be a fly on the wall for whatever discussion happens around solving the nb infra issues.
- David - would like to be fly on the wall, real world problems, enjoyed cattle vs. pets.
- Erik - like the idea of doing a database dump periodic and saved to a bucket, save everything, not have to roll back three years ago. If that's not happening, it's a missing chunk, stick it in a bucket! Done!
- Tre - developing
github discussion, opened a discussion topic around wiki, what material to cover, more meta discussion about topics, volunteer labor, "what if we do X", otherwise be sensitive to people's time by not volunteering.
Questions, Discussion, or Coworking
[edit | edit source]- [Issue]
For next time
[edit | edit source]Questions
[edit | edit source]Readings & Exercises
[edit | edit source]- Readings
- Exercises
Join online
[edit | edit source]- Try it yourself!
- Join libera.chat #nb-meetup-infra