• 0 Posts
  • 99 Comments
Joined 1 year ago
cake
Cake day: July 30th, 2023

help-circle



    • I never said anything about EFI not supporting multi boot. I said that the had to be kept in lockstep during updates. I recognize the term “manual” might have been a bit of a misnomer there, since I included systems where the admin has to take action to enable replication. ESXi (my main hardware OS for now) doesn’t even have software RAID for single-server datastores (only vSAN). Windows and Linux both can do it, but its a non-default manual process of splicing the drives together with no apparent automatic replacement mechanism - full manual admin intervention. With a hardware RAID, you just have to plop the new disk in and it splices the drive back into the array automatically (if the drive matches)
    • Dell and HPe both have had RAM caching for reads and writes since at least 2011. That’s why the controllers have batteries :)
      • also, I said it only had to handle the boot disk. Plus you’re ignoring the fact that all modern filesystems will do page caching in the background regardless of the presence of hardware cache. That’s not unique to ZFS, Windows and Linux both do it.
    • mdadm and hardware RAID offer the same level of block consistency validation to my current understanding- you’d need filesystem-level checksumming no matter what, and as both mdadm and hardware RAID are both filesystem agnostic, they will almost equally support the same filesystem-level features (Synology implements BTRFS on top of mdadm - I saw a small note somewhere that they had their implementation request block rebuild from mdadm if btrfs detected issues, but I have been unable to verify this claim so I do not consider it (yet) as part of my hardware vs md comparison)

    Hardware RAID just works, and for many, that’s good enough. In more advanced systems, all its got to handle is a boot partition, and if you’re doing your job as a sysadmin there’s zero important data in there that can’t be easily rebuilt or restored.


  • I never said I didn’t use software RAID, I just wanted to add information about hardware RAID controllers. Maybe I’m blind, but I’ve never seen a good implementation of software RAID for the EFI partition or boot sector. During boot, most systems I’ve seen will try to always access one partition directly and a second in order, which is bypassing the concept of a RAID, so the two would need to be kept manually in sync during updates.

    Because of that, there’s one notable place where I won’t - I always use hardware RAID for at minimum the boot disk because Dell firmware natively understands everything about it from a detect/boot/replace perspective. Or doesn’t see anything at all in a good way. All four of my primary servers have a boot disk on either a Startech RAID card similar to a Dell BOSS or have an array to boot off of directly on the PERC. It’s only enough space to store the core OS.

    Other than that, at home all my other physical devices are hypervisors (VMware ESXi for now until I can plot a migration), dedicated appliance devices (Synology DSM uses mdadm), or don’t have a redundant disks (my firewall - backed up to git, and my NUC Proxmox box, both firewalls and the PVE are all running ZFS for features).

    Three of my four ESXi servers run vSAN, which is like Ceph and replaces RAID. Like Ceph and ZFS, it requires using an HBA or passthrough disks for full performance. The last one is my standalone server. Notably, ESXi does not support any software RAID natively that isn’t vSAN, so both of the standalone server’s arrays are hardware RAID.

    When it comes time to replace that Synology it’s going to be on TrueNAS


  • For recovering hardware RAID: most guaranteed success is going to be a compatible controller with a similar enough firmware version. You might be able to find software that can stitch images back together, but that’s a long shot and requires a ton of disk space (which you might not have if it’s your biggest server)

    I’ve used dozens of LSI-based RAID controllers in Dell servers (of both PERC and LSI name brand) for both work and homelab, and they usually recover the old array to the new controller pretty well, and also generally have a much lower failure rate than the drives themselves (I find myself replacing the cache battery more often than the controller itself)

    Only twice out of the handful of times I went to a RAID controller from a different generation

    • first time from a mobi failed R815 (PERC H700) physically moving the disks to an R820 (PERC H710, might’ve been an H710P) and they were able to foreign import easily
    • Second time on homelab I went from an H710 mini mono to an H730P full size in the same chassis (don’t do that, it was a bad idea), but aside from iDRAC being very pissed off, the card ran for years with the same RAID-1 array imported.

    As others have pointed out, this is where backups come into play. If you have to replace the server with one from a different generation, you run the risk that the drives won’t import. At that point, you’d have to sanitize the super block of the array and re-initialize it as a new array, then restore from backup. Now, the array might be just fine and you never notice a difference (like my users that had to replace a failed R815 with an 820), but the result pattern is really to the extremes of work or fault with no in between.

    Standalone RAID controllers are usually pretty resilient and fail less often than disks, but they are very much NOT infallible as you are correct to assess. The advantage to software systems like mdadm, ZFS, and Ceph is that it removed the precise hardware compatibility requirements, but by no means does it remove the software compatible requirements - you’ll still have to do your research and make sure the new version is compatible with the old format, or make sure it’s the same version.

    All that’s said, I don’t trust embedded motherboard RAIDs to the same degree that I trust standalone controllers. A friend of mine about 8-10 years ago ran a RAID-0 on a laptop that got it’s super block borked when we tried to firmware update the SSDs - stopped detecting the array at all. We did manage to recover data, but it needed multiple times the raw amount of storage to do so.

    • we made byte images of both disks in ddrescue to a server that had enough spare disk space
    • found a software package that could stitch together images with broken super blocks if we knew the order the disks were in (we did), which wrote a new byte images back to the server
    • copied the result again and turned it into a KVM VM to network attach and copy the data off (we could have loop mounted the disk to an SMB share and been done, but it was more fun and rewarding to boot the recovered OS afterwards as kind of a TAKE THAT LENOVO…we were younger)
    • took in total a bit over 3TB to recover the 2x500GB disks to a usable state - and took about a week of combined machine and human time to engineer and cook, during which my friend opted to rebuild his laptop clean after we had images captured - to one disk windows, one disk Linux, not RAID-0 this time :P



  • Yup. My background is computer science transitioned to IT Infra.

    My sister sent me a screenshot of a Spotify one-liner error, white text on black background, captioned “they wrote a lazy error”. I immediately recognized that the actual problem was the load balancer in the front end trying and failing to connect to the backend/middleware in the first error, then in the second it recognized a failed health check and reporting that no back ends were available. Root cause is probably networking issue or actual server crash.

    I also have a bonus that in high school I had watched a ton of videos on VFX/SFX and knew a rough way around After Effects and compositing (before I jumped into CS I had considered this as a career path), so now when I watch TV and movies I can also see some of the “layers” they use to compile the on screen effect.






  • A to B made more sense in a world where devices cannot serve as both roles via negotiation. My android phone when I got it utilized a data transfer method of plugging my iPhone charge port into my Android charge port, then the Android initiated the connection as a host device.

    The true crime is not that the cable is bidirectional, the true crime is that there is little to no proper distinction and error checking between USB, Thunderbolt, and DisplayPort modes and are simply carried on the same connector. I have no issues with the port supporting tunneled connections - that is in fact how docking stations work - just the minimal labeling we get in modern devices.

    I’d be fine with a type-A to type-A cable if both devices had a reasonable chance at operating as both the initiator and target - but that type of behavior starts with USB-OTG and continues in type-C.


  • Others have some good information here - all I’d like to add to the root is that Windows and Mac have a built-in DNS cache and it’s pretty straightforward to add a DNS cache to systemd distros (if it’s not already installed or in use) using systemd-resolved or dnsmasq if you really dislike systemd. Some distros enable this from install time.

    Systems that utilize a DNS cache will keep copies of DNS query results for a period of time, making the application-level name lookup speed essentially 0ms for a cached result. Cold results obviously incur the latency of the DNS server itself.






  • TLDR: probably a lot of people continue using the thing that they know if it just works as long as it works well enough not to be a bother.

    Many many years ago when I learned, I think the only ones I found were Apache and IIS. I had a Mac at the time which came pre installed with Apache2, so I learned Apache2 and got okay at it. While by release dates Nginx and HAProxy most definitely existed, I don’t think I came across either in my research. I don’t have any notes from the time because I didn’t take any because I was in high school.

    When I started Linux things, I kept using Apache for a while because I knew it. Found Nginx, learned it in a snap because the config is more natural language and hierarchical than Apache’s XMLish monstrosity. Then for the next decade I kept using Nginx whenever I needed a webserver fast because I knew it would work with minimal tinkering.

    Now, as of a few years ago, I knew that haproxy, caddy, and traefik all existed. I even tried out Caddy on my homelab reverse proxy server (which has about a dozen applications routed through it), and the first few sites were easy - just let the auto-LetsEncrypt do its job - but once I got to the sites that needed manual TLS (I have both an internal CA and utilize Cloudflare’ origin HTTPS cert), and other special config, Caddy started becoming as cumbersome as my Nginx conf.d directory. At the time, I also didn’t have a way to get software updates easily on my then-CentOS 7 server, so Caddy was okay-enough, but it was back to Nginx with me because it was comparatively easier to manage.

    HAProxy is something I’ve added to my repertoire more recently. It took me quite a while and lots of trial and error to figure out the config syntax which is quite different from anything I’d used before (except maybe kinda like Squid, which I had learned not a year prior…), but once it clicked, it clicked. Now I have an internal high availability (+keepalived) load balancer than can handle so many backend servers and do wildcard TLS termination and validate backend TLS certs. I even got LDAP and LDAPS load balancing to AD working on that for services like Gitea that don’t behave well when there’s more than one LDAPS backend server.

    So, at some point I’ll get around to converting that everything reverse proxy to HAProxy. But I’ll probably need to deploy another VM or two because the existing one also has a static web server and I’ve been meaning to break up that server’s roles anyways (long ago, it was my everything server before I used VMs).