• db0@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    259
    ·
    4 days ago

    It’s wild that these cloud providers were seen as a one-way stop to ensure reliability, only to make them a universal single point of failure.

    • Nighed@feddit.uk
      link
      fedilink
      English
      arrow-up
      137
      arrow-down
      1
      ·
      4 days ago

      But if everyone else is down too, you don’t look so bad 🧠

        • cdzero@lemmy.ml
          link
          fedilink
          English
          arrow-up
          17
          ·
          4 days ago

          I wouldn’t be so sure about that. The state government of Queensland, Australia just lifted a 12 year ban on IBM getting government contracts after a colossal fuck up.

          • queerlilhayseed@piefed.blahaj.zone
            link
            fedilink
            English
            arrow-up
            58
            ·
            edit-2
            4 days ago

            It’s an old joke from back when IBM was the dominant player in IT infrastructure. The idea was that IBM was such a known quantity that even non-technical executives knew what it was and knew that other companies also used IBM equipment. If you decide to buy from a lesser known vendor and something breaks, you might be blamed for going off the beaten track and fired (regardless of where the fault actually lay), whereas if you bought IBM gear and it broke, it was simply considered the cost of doing business, so buying IBM became a CYA tactic for sysadmins even if it went against their better technical judgement. AWS is the modern IBM.

            • NotMyOldRedditName@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              3 days ago

              if you bought IBM gear and it broke, it was simply considered the cost of doing business,

              IBM produced Canadian Phoenix Pay system has entered the chat with a record 0 firings.

          • ByteJunk@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            ·
            4 days ago

            Such a monstrous clusterfuck, and you’ll be hard pressed to find anyone having been sacked, let alone facing actual charges over the whole debacle.

            If anything, I’d say that’s the single best case for buying IBM - if you’re incompetent and/or corrupt, just go with them and even if shit hits the fan, you’ll be OK.

      • clif@lemmy.world
        link
        fedilink
        English
        arrow-up
        13
        ·
        edit-2
        4 days ago

        One of our client support people told an angry client to open a Jira with urgent priority and we’d get right on it.

        … the client support person knew full well that Jira was down too : D

        At least, I think they knew. Either way, not shit we could do about it for that particular region until AWS fixed things.

    • GissaMittJobb@lemmy.ml
      link
      fedilink
      English
      arrow-up
      61
      ·
      4 days ago

      It’s mostly a skill issue for services that go down when USE-1 has issues in AWS - if you actually know your shit, then you don’t get these kinds of issues.

      Case in point: Netflix runs on AWS and experienced no issues during this thing.

      And yes, it’s scary that so many high-profile companies are this bad at the thing they spend all day doing

      • village604@adultswim.fan
        link
        fedilink
        English
        arrow-up
        20
        ·
        4 days ago

        Yeah, if you’re a major business and don’t have geographic redundancy for your service, you need to rework your BCDR plan.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          4 days ago

          Absolutely this. We are based out of one region, but also have a second region as a quick disaster recovery option, and we have people 24/7 who can manage the DR process. We’re not big enough to have live redundancy, but big enough that an hour of downtime would be a big deal.

      • B0rax@feddit.org
        link
        fedilink
        English
        arrow-up
        4
        ·
        edit-2
        3 days ago

        Case in point: Netflix runs on AWS and experienced no issues during this thing.

        But Netflix did encounter issues. For example the account cancel page did not work.

      • tourist@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        4 days ago

        What’s the general plan of action when a company’s base region shits the bed?

        Keep dormant mirrored resources in other regions?

        I presumed the draw of us-east-1 was its lower cost, so if any solutions involve spending slightly more money, I’m not surprised high profile companies put all their eggs in one basket.

        • corsicanguppy@lemmy.ca
          link
          fedilink
          English
          arrow-up
          4
          ·
          4 days ago

          I presumed the draw of us-east-1 was its lower cost

          At no time is pub-cloud cheaper than priv-cloud.

          The draw is versatility, as change didn’t require spinning up hardware. No one knew how much the data costs would kill the budget, but now they do.

    • tburkhol@lemmy.world
      link
      fedilink
      English
      arrow-up
      33
      arrow-down
      1
      ·
      4 days ago

      It is still a logical argument, especially for smaller shops. I mean, you can (as self-hosters know) set up automatic backups, failover systems, and all that, but it takes significant time & resources. Redundant internet connectivity? Redundant power delivery? Spare capacity to handle a 10x demand spike? Those are big expenses for small, even mid-sized business. No one really cares if your dentist’s office is offline for a day, even if they have to cancel appointments because they can’t process payments or records.

      Meanwhile, theoretically, reliability is such a core function of cloud providers that they should pay for experts’ experts and platinum standard infrastructure. It makes any problem they do have newsworthy.

      I mean,it seems silly for orgs as big and internet-centric as Fortnite, Zoom, or forturne-500 bank to outsource their internet, and maybe this will be a lesson for them.

        • killabeezio@lemmy.zip
          link
          fedilink
          English
          arrow-up
          3
          ·
          4 days ago

          No it’s not. It’s very expensive to run and there are a lot of edge cases. It’s much easier to have regional redundancy for a fraction of the cost.

          • village604@adultswim.fan
            link
            fedilink
            English
            arrow-up
            4
            ·
            edit-2
            3 days ago

            The organizations they were talking about and I was referring to have a global presence

            Plus, it’s not significantly more expensive to have a cold standby in a different geographic location in AWS.

    • corsicanguppy@lemmy.ca
      link
      fedilink
      English
      arrow-up
      10
      ·
      4 days ago

      universal single point of failure.

      If it’s not a region failure, it’s someone pushing untested slop into the devops pipeline and vaping a network config. So very fired.

    • joel_feila@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 days ago

      Well companies use not for relibibut to outsource responsibility. Even a medium sized company treated Windows like a subscription for many many years. People have been emailing files to themself since the start of email.

      For companies moving everything to msa or aws just was the next step and didn’t change day to operations

      • NotMyOldRedditName@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        3 days ago

        People also tend to forget all the compliance issues that can come around hosting content, and using someone with expertise in that can reduce a very large burden. It’s not something that would hit every industry, but it does hit many.

  • Sips'@slrpnk.net
    link
    fedilink
    English
    arrow-up
    83
    ·
    4 days ago

    I hate how Signal went down because of this… Wish it wasn’t so centralised.

    • /home/pineapplelover@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      40
      ·
      edit-2
      4 days ago

      My friend messaged me on Signal asking if Instructure (runs on AWS) was down. I got the message. That being said, it’s scary that Signal’s backbone depends on AWS

      • retro@infosec.pub
        link
        fedilink
        English
        arrow-up
        8
        ·
        3 days ago

        Why is this scary? That’s what e2ee is for, so that no one besides your recipient can view the contents of a message. It does not matter which server is used. If anything for a service like Signal, you want a server with high availability like AWS, Azure, Google Cloud or Cloudflare.

        • ReducedArc@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          3 days ago

          Willing to bet a lot of companies will be considering that now lol. Will it actually happen though? ¯\_(ツ)_/¯

      • Sips'@slrpnk.net
        link
        fedilink
        English
        arrow-up
        2
        ·
        3 days ago

        For me it was not possible to send or receive messages for a couple of hours.

    • MrMcGasion@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 days ago

      Started moving to Element/Matrix this weekend when I attended a protest and wanted to have some kind of communication, but also wanted to leave my primary phone at home. I was using a de-googled android fork and an e-sim, but being a data-only e-sim, I couldn’t use Signal due to the phone number requirement.

      Annoying to have try to get contacts to get another app, but at least it’s decentralized and comes with the option of being self-hosted once I’m ready to tackle that.

      • pedroapero@lemmy.ml
        link
        fedilink
        English
        arrow-up
        7
        ·
        3 days ago

        Hey, note that you can use mautrix-signal to access your Signal account within Element on this phone.

        • poVoq@slrpnk.net
          link
          fedilink
          English
          arrow-up
          4
          ·
          3 days ago

          @Sunny@slrpnk.net already has an XMPP account, as that is included in every slrpnk.net account automatically. It is very easy to set that up for most Fediverse software, and the user id is identical between Fediverse and XMPP.

          • Sips'@slrpnk.net
            link
            fedilink
            English
            arrow-up
            3
            ·
            3 days ago

            Oh damn i did not even know about this! I will defo have a play around with this tomorrow, how very neat!

            However, it isnt me im really worried about in the grand picture, its family and friends. It was already difficult enough to convert them to using Signal.

  • jali67@lemmy.zip
    link
    fedilink
    English
    arrow-up
    57
    arrow-down
    6
    ·
    4 days ago

    Why do we place so much reliance on one mega company? This level of importance. It should be seized by the government.

    • Alaknár@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      4
      ·
      3 days ago

      Why do we place so much reliance on one mega company? This level of importance.

      Because it’s cheaper and (in broad terms) more reliable than everybody having a data centre.

      It should be seized by the government.

      Oh yeah, what could possibly go wrong if the US government owned Amazon!

      • Andres@social.ridetrans.it
        link
        fedilink
        arrow-up
        6
        ·
        3 days ago

        @Alaknar @jali67 It is absolutely not cheaper. Monopolists have a tendency to raise prices once they corner the market. I took over maintenance of a journalism site and cut hosting costs roughly in half while increasing performance by switching from AWS to DigitalOcean.

        • Alaknár@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          2
          ·
          3 days ago

          So, you changed one cloud provider to another…

          But let me rephrase: cloud can be significantly cheaper - if you know what you’re doing and what you’re putting on the cloud.

          I’ve been to data centres that cost as much as a decade of cloud hosting the service they were supporting (and that’s without operational costs).

          Cloud is especially great for small businesses where you have two alternative options: either build your own data centre which you absolutely cannot afford (or risk making it barely operational and unreliable) or host your company at someone else’s DC - which is what cloud is, but worse (because nobody can set up so much resiliency and have so many DC techs/admins as Microsoft or Amazon).

          There absolutely are situations where self-hosting is preferable, and even cheaper, but wondering “why do we place so much reliance” on cloud service providers just shows that people have no clue what cloud actually offers.

      • atmorous@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 days ago

        Best alternatives is making Amazon something owned by the people and not any corporation/government but who knows if that would ever happen

      • 1984@lemmy.today
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        edit-2
        3 days ago

        Leta give it to Trump and Elon Musk, they will take good care of it… Lol.

        Trump will isolate aws to America only, claiming other countries are ripping him off.

        Aws becomes American Web Services.

    • Noxy@pawb.social
      link
      fedilink
      English
      arrow-up
      15
      ·
      edit-2
      3 days ago

      AWS aggressively pursues high priced and years-long spending commitments with large customers, and they incentivize it with huge discounts for doing so.

      And when AWS does this they intentionally incentivize these large customers to migrate existing workloads away from other cloud service providers as well, going so far as to offer assistance in doing so.

        • modus@lemmy.world
          link
          fedilink
          English
          arrow-up
          9
          ·
          edit-2
          3 days ago

          At that point you’re completely invested in their ecosystem and it’ll cost you triple to get out.

        • Noxy@pawb.social
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          Way above my pay grade! I would never suggest or support making such agreements, but I also don’t want to be in a position where I’d even be asked, so I’d just sit back with a bowl of popcorn

      • jali67@lemmy.zip
        link
        fedilink
        English
        arrow-up
        34
        arrow-down
        4
        ·
        edit-2
        4 days ago

        Large corporations and oligarchs are better? I’ll take the government. At least we can vote on them.

        • bss03@infosec.pub
          link
          fedilink
          English
          arrow-up
          13
          ·
          3 days ago

          I think co-ops are the way to go, but I can understand that someone “just” wanting to purchase the good/service might not see the difference between a co-op and corporation like Amazon.

          I don’t think it’s a size issue really, but co-ops generally stay smaller in part due to how they are internally organized compared to a “median” corporation.

          I also think that the government actually does a pretty good job at managing things; it’s just their failures are public. Private boondoggles might drive many people into bankruptcy, but they aren’t publicized any more than absolutely necessary.

          • atmorous@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 days ago

            We could really use an Open Source app that partners with and displays all kinds of websites/stores to buy stuff and pay from the app

            • bss03@infosec.pub
              link
              fedilink
              English
              arrow-up
              2
              ·
              2 days ago

              Oh, don’t worry, open source (or, worse, Free Software) apps won’t be allowed on Android or Apple devices, soon. /s

        • Limonene@lemmy.world
          link
          fedilink
          English
          arrow-up
          6
          ·
          3 days ago

          It would be a more meaningful discussion if the government wasn’t controlled so much by large corporations and oligarchs.

        • Rivalarrival@lemmy.today
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          4
          ·
          3 days ago

          Government is also the entity that will be prosecuting/persecuting you when they don’t like what you have to say.

        • erock@lemmy.ml
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          12
          ·
          3 days ago

          Sorry but this is a ridiculous argument. What entity has dropped nukes on an entire population? Who is the current president of the US? Insane take.

          • jali67@lemmy.zip
            link
            fedilink
            English
            arrow-up
            10
            arrow-down
            1
            ·
            edit-2
            3 days ago

            Do you literally hear yourself? You think large corporate and oligarchs run insurance, tech, etc., is a better route than a public option? 💀 Jeff Bezos, Musk, Thiel, and Ellison for everything?

          • bridgeenjoyer@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            4
            ·
            3 days ago

            Oil companies are private. Wars are started for oil.

            Also government distrust and heavy focus on its inefficiencies is a tried and true right wing/authoritarian tactic. The public gobble it up because they dont take 6 seconds to actually think.

          • explodicle@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            2
            ·
            3 days ago

            What are you actually arguing with the president thing? I literally don’t understand how that’s supposed to support your point.

      • mic_check_one_two@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        14
        arrow-down
        1
        ·
        3 days ago

        That’s largely because one half of the elected officials are dedicated to defunding and deconstructing government organizations, so they can then point at those same organizations and go “look, the government doesn’t work! We should stop funding it!” The government is actually great at organizing a lot of things. But they’re all so engrained in society that you don’t even think about them as being organized by the government. Systems that just work, reliably, all the time.

        The government’s job is stability and reliability, not being as efficient as possible. Where a corporation may only have one person doing a job, the government will have four or five. Those people aren’t bloat; They’re on the payroll because the government is expected to keep functioning during emergencies. People would lose their minds if the streets department (responsible for clearing downed trees out of public roads) shut down after a bad storm rolled through, just because a few government employees had a tree branch fall on their house. What if firefighters stopped working because a local wildfire burnt a few firefighters’ houses? What if the city water department shut down because three or four city employees’ water supply was affected? What if the health department shut down during a pandemic?

        The people who work in government also live in the same areas they serve. Which means that they are affected by the same emergencies. The government needs enough redundancy to be able to continue functioning, even after those employees are affected by the same emergencies as the general public. If some emergency affects 75% of the public in a given area, then 75% of the local government employees are likely going to be affected. So if the government doesn’t have enough redundancy to be able to redistribute the work, people will see their government shutting down in the wake of the emergency. And to make matters even worse, during (and in the wake of) those emergencies, people look to the government for help. Which means that’s the most critical time for the government to continue functioning.

        I say all of this because the same is true for the infrastructure that runs critical government systems. The government expands and implements things slowly by design, because everything critical has to go through multiple levels of design approval, and have multiple redundancies built in. If the government has updated a critical system, I can guarantee that new system has been in the works for the past two years at least. That process is designed to ensure everything works as intended. I wouldn’t want my city traffic lights managed by a private company, because they’d try to cut costs and avoid building in redundant systems.

        • Norah (pup/it/she)@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          4
          ·
          3 days ago

          I wouldn’t want my city traffic lights managed by a private company, because they’d try to cut costs and avoid building in redundant systems.

          While they aren’t run by private companies, the traffic lights at the entrances to most housing estates are procured and installed by the developer, at least in Australia. Without fail, about 12-24 months later, the red and green LED lights will have half a dozen or more dead pixels on them. Meanwhile, newer LED lights installed by the roads department are still going strong years later.

      • ayyy@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        2
        ·
        4 days ago

        When was the last time you heard about a large government computer outage? (I don’t count the VA because that’s broken on purpose.)

        • goferking (he/him)@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          5
          ·
          3 days ago

          Launch of ACA markets? But that seemed more like the company paid to make it under sized it or just did shit code.

          Which goes back to somethings shouldn’t be done for profit

          • Norah (pup/it/she)@lemmy.blahaj.zone
            link
            fedilink
            English
            arrow-up
            3
            ·
            3 days ago

            Government’s also shouldn’t be incentivised to always go with the cheapest option during procurements and tenders. Price is not the only factor in a value calculation and it is insane that we just ignore that fact.

  • regedit@lemmy.zip
    link
    fedilink
    English
    arrow-up
    10
    ·
    3 days ago

    This kind of shit will only increase as more of these companies believe they can vibe-code their way out of paying software devs what they are worth.

  • AllHailTheSheep@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    23
    ·
    edit-2
    4 days ago

    according to that page the issue stemmed from an underlying system responsible for health checks in load balancing servers.

    how the hell do you fuck up a health check config that bad? that’s like messing up smartd.conf and taking your system offline somehow

    • ayyy@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      19
      ·
      4 days ago

      Well, you see, the mistake you are making is believing a single thing the stupid AWS status board says. It is always fucking lying, sometimes in new and creative ways.

    • tatterdemalion@programming.dev
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 days ago

      If your health check is broken, then you might not notice that a service is down and you’ll fail to deploy a replacement. Or the opposite, and you end up constantly replacing it, creating a “flapping” service.

    • flux@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      ·
      3 days ago

      I mean if your OS was “smart” as not to send IO to devices that indicate critical failure (e.g. by marking them read-only in the array?), and then thinks all devices have failed critically, wouldn’t this happen in that kind of system as well…

      • magguzu@lemmy.ml
        link
        fedilink
        English
        arrow-up
        10
        ·
        4 days ago

        I know this is selfhosted so most people here are hobbyists, but it’s a ton of work to selfhost in enterprise setting. I’d wager 90%+ of people using image registries are using Docker Hub, GHCR, or AWS ECR.

        • HelloRoot@lemy.lol
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          4 days ago

          For your personal use, you don’t need an enterprise setting. It’s just a simple compose file that you run.

          You can host a registry in pull through mode, so you still have all the images you use locally, but if it’s not in your registry yet, it pulls it from docker hub or whatever.

          The only pain point is that a single registry can’t do both. So if you want to push your own docker images AND have a “cache” of stuff from docker hub, you need to run two registries in two different modes. And then juggle the url’s.

          • arcayne@lemmy.today
            link
            fedilink
            English
            arrow-up
            5
            ·
            4 days ago

            Pretty sure you could run Pulp in pull-through mode and add your local Forgejo/whatever registry as a remote, which would at least give you a unified “pull” URL. Then just use Forgejo actions to handle the actual build/publish for your local images whenever you push to main (or tag a release, or whatever).

            Pulp might actually be able to handle both on its own, I haven’t ever tried though.

        • HelloRoot@lemy.lol
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          4 days ago

          I have just this (which ironically won’t work now cause docker hub is down)

          services:
            registry:
              restart: always
              image: registry:2
              ports:
                - 5000:5000
              dns:
                - 9.9.9.9
                - 1.1.1.1
              volumes:
                - ../files/auth/registry.password:/auth/registry.password
                - registry-data:/var/lib/registry
              environment:
                REGISTRY_STORAGE_DELETE_ENABLED: true
                REGISTRY_HEALTH_STORAGEDRIVER_ENABLED: false
                REGISTRY_HTTP_SECRET: ${REGISTRY_HTTP_SECRET}
                REGISTRY_AUTH: htpasswd
                REGISTRY_AUTH_HTPASSWD_REALM: Registry Realm
                REGISTRY_AUTH_HTPASSWD_PATH: /auth/registry.password
                # REGISTRY_PROXY_REMOTEURL: "https://registry-1.docker.io/"
          
          volumes:
            registry-data:
          

          I don’t even remember how and when I set it up. I think it might be this: https://github.com/distribution/distribution/releases/tag/v2.0.0

          Recently somebody has created a frontend, which I bookmarked but didn’t bother to set up: https://github.com/Joxit/docker-registry-ui

    • krimson@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 days ago

      Yeah I ran into this as well. Wondered why it needs a call to auth for public container images in the first place.

  • -RJ-@lemmy.world
    link
    fedilink
    English
    arrow-up
    31
    arrow-down
    2
    ·
    4 days ago

    Who wants to bet Amazon gave AI full access to their prod config and it screwed it up.

    • Otter@lemmy.ca
      link
      fedilink
      English
      arrow-up
      15
      ·
      4 days ago

      Is there no way to check the doorbell video locally?

      An Amazon employee misconfigures something and now your doorbell doesn’t work

      • SayCyberOnceMore@feddit.uk
        link
        fedilink
        English
        arrow-up
        6
        ·
        4 days ago

        I don’t have one (because of that point), so I don’t know…

        Presumably the app and doorbell are hardcoded to go to an AWS URL (so it’s “easier” for consumers), but in theory the data’s all on your wifi.

  • Domi@lemmy.secnd.me
    link
    fedilink
    English
    arrow-up
    20
    ·
    4 days ago

    That explains why my Matrix <-> Signal bridge was complaining about being disconnected.

  • Tuxxer@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    edit-2
    4 days ago

    For some reason I hear Gilfoyle pontificating about what he does