Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 date: Sat, 27 Dec 2025 00:02:15 GMT content-type: application/xml access-control-allow-origin: * cache-control: public, max-age=0, must-revalidate nel: {"report_to":"cf-nel","success_fraction":0.0,"max_age":604800} referrer-policy: strict-origin-when-cross-origin x-content-type-options: nosniff vary: accept-encoding report-to: {"group":"cf-nel","max_age":604800,"endpoints":[{"url":"https://a.nel.cloudflare.com/report/v4?s=bpu6btVZfj%2FO%2BGhPWDuu9ctvBAiZOZn8Tq9n5Fjf9lWBX9z98WiXMmRAAfQvl5YU1xEPvgSpPfOJhx6cKpZtDferDBSOoxjPwjNKxJ7i"}]} etag: W/"e08f3da777628c956ebbe6e59129f75e" content-encoding: gzip server: cloudflare cf-cache-status: DYNAMIC cf-ray: 9b448c4c197248f2-BOM alt-svc: h3=":443"; ma=86400 Paul's Noteshttps://notes.pault.ag/Recent content on Paul's NotesHugo -- gohugo.ioen-usMon, 27 Oct 2025 13:15:00 -0400It's NOT always DNS.https://notes.pault.ag/its-not-always-dns/Mon, 27 Oct 2025 13:15:00 -0400https://notes.pault.ag/its-not-always-dns/I’ve written down a new rule (no name, sorry) that I’ll be repeating to myself and those around me. “If you can replace ‘DNS’ with ‘key value store mapping a name to an ip’ and it still makes sense, it was not, in fact, DNS.” Feel free to repeat it along with me. Sure, the “It’s always DNS” meme is funny the first few hundred times you see it – but what’s less funny is when critical thinking ends because a DNS query is involved. DNS failures are often the first observable problem because it’s one of the first things that needs to be done. DNS is fairly complicated, implementation-dependent, and at times – frustrating to debug – but it is not the operational hazard it’s made out to be. It’s at best a shallow take, and at worst actively holding teams back from understanding their true operational risks. IP connectivity failures between a host and the rest of the network is not a reason to blame DNS. This would happen no matter how you distribute the updated name to IP mappings. Wiping out <a href="https://aws.amazon.com/message/101925/">all the records during the course of operations due to an automation bug</a> is not a reason to blame DNS. This, too, would happen no matter how you distribute the name to IP mappings. Something made the choice to delete all the mappings, and <a href="https://web.archive.org/web/20251005205731/https://www.team.net/mjb/hawg.html">it did what you asked it to do</a> There’s plenty of annoying DNS specific sharp edges to blame when things do go wrong (like <code>8.8.8.8</code> and <code>1.1.1.1</code> disagreeing about resolving a domain because of DNSSEC, or since we’re on the topic, a <a href="https://slack.engineering/what-happened-during-slacks-dnssec-rollout/">DNSSEC rollout bricking prod for hours</a>) for us to be cracking jokes anytime a program makes a DNS request. We can do better.The Promised LANhttps://notes.pault.ag/tpl/Mon, 16 Jun 2025 11:58:00 -0400https://notes.pault.ag/tpl/The Internet has changed a lot in the last 40+ years. Fads have come and gone. Network protocols have been designed, deployed, adopted, and abandoned. Industries have come and gone. The types of people on the internet have changed a lot. The number of people on the internet has changed a lot, creating an information medium unlike anything ever seen before in human history. There’s a lot of good things about the Internet as of 2025, but there’s also an inescapable hole in what it used to be, for me. I miss being able to throw a site up to send around to friends to play with without worrying about hordes of AI-feeding HTML combine harvesters DoS-ing my website, costing me thousands in network transfer for the privilege. I miss being able to put a lightly authenticated game server up and not worry too much at night – wondering if that process is now mining bitcoin. I miss being able to run a server in my home closet. Decades of cat and mouse games have rendered running a mail server nearly impossible. Those who are “brave” enough to try are met with weekslong stretches of delivery failures and countless hours yelling ineffectually into a pipe that leads from the cheerful lobby of some disinterested corporation directly into a void somewhere 4 layers below ground level. I miss the spirit of curiosity, exploration, and trying new things. I miss building things for fun without having to worry about being too successful, after which “security” offices start demanding my supplier paperwork in triplicate as heartfelt thanks from their engineering teams. I miss communities that are run because it is important to them, not for ad revenue. I miss community operated spaces and having more than four websites that are all full of nothing except screenshots of each other. Every other page I find myself on now has an AI generated click-bait title, shared for rage-clicks all brought-to-you-by-our-sponsors–completely covered wall-to-wall with popup modals, telling me how much they respect my privacy, with the real content hidden at the bottom bracketed by deceptive ads served by companies that definitely know which new coffee shop I went to last month. This is wrong, and those who have seen what was know it. I can’t keep doing it. I’m not doing it any more. I reject the notion that this is as it needs to be. It is wrong. The hole left in what the Internet used to be must be filled. I will fill it. <h2 id="what-comes-before-part-b">What comes before part b?</h2> Throughout the 2000s, some of my favorite memories were from LAN parties at my friends’ places. Dragging your setup somewhere, long nights playing games, goofing off, even building software all night to get something working—being able to do something fiercely technical in the context of a uniquely social activity. It wasn’t really much about the games or the projects—it was an excuse to spend time together, just hanging out. A huge reason I learned so much in college was that campus was a non-stop LAN party – we could freely stand up servers, talk between dorms on the LAN, and hit my dorm room computer from the lab. Things could go from individual to social in the matter of seconds. The Internet used to work this way—my dorm had public IPs handed out by DHCP, and my workstation could serve traffic from anywhere on the internet. I haven’t been back to campus in a few years, but I’d be surprised if this were still the case. In December of 2021, three of us got together and connected our houses together in what we now call The Promised LAN. The idea is simple—fill the hole we feel is gone from our lives. Build our own always-on 24/7 nonstop LAN party. Build a space that is intrinsically social, even though we’re doing technical things. We can freely host insecure game servers or one-off side projects without worrying about what someone will do with it. Over the years, it’s evolved very slowly—we haven’t pulled any all-nighters. Our mantra has become “old growth”, building each layer carefully. As of May 2025, the LAN is now 19 friends running around 25 network segments. Those 25 networks are connected to 3 backbone nodes, exchanging routes and IP traffic for the LAN. We refer to the set of backbone operators as “The Bureau of LAN Management”. Combined decades of operating critical infrastructure has driven The Bureau to make a set of well-understood, boring, predictable, interoperable and easily debuggable decisions to make this all happen. <a href="https://tpl.house/">Nothing here is exotic or even technically interesting</a>. <h2 id="applications-of-trusting-trust">Applications of trusting trust</h2> The hardest part, however, is rejecting the idea that anything outside our own LAN is untrustworthy—nearly irreversible damage inflicted on us by the Internet. We have solved this by not solving it. We strictly control membership—the absolute hard minimum for joining the LAN requires 10 years of friendship with at least one member of the Bureau, with another 10 years of friendship planned. Members of the LAN can veto new members even if all other criteria is met. Even with those strict rules, there’s no shortage of friends that meet the qualifications—but we are not equipped to take that many folks on. It’s hard to join—-both socially and technically. Doing something malicious on the LAN requires a lot of highly technical effort upfront, and it would endanger a decade of friendship. We have relied on those human, social, interpersonal bonds to bring us all together. It’s worked for the last 4 years, and it should continue working until we think of something better. We assume roommates, partners, kids, and visitors all have access to The Promised LAN. If they’re let into our friends’ network, there is a level of trust that works transitively for us—I trust them to be on mine. This LAN is not for “security”, rather, the network border is a social one. Benign “hacking”—in the original sense of misusing systems to do fun and interesting things—is encouraged. Robust ACLs and firewalls on the LAN are, by definition, an interpersonal—not technical—failure. We all trust every other network operator to run their segment in a way that aligns with our collective values and norms. Over the last 4 years, we’ve grown our own culture and fads—around half of the people on the LAN have thermal receipt printers with open access, for printing out quips or jokes on each other’s counters. It’s incredible how much network transport and a trusting culture gets you—there’s a 3-node IRC network, exotic hardware to gawk at, radios galore, a NAS storage swap, LAN only email, and even a SIP phone network of “redphones”. <h2 id="diy">DIY</h2> We do not wish to, nor will we, rebuild the internet. We do not wish to, nor will we, scale this. We will never be friends with enough people, as hard as we may try. Participation hinges on us all having fun. As a result, membership will never be open, and we will never have enough connected LANs to deal with the technical and social problems that start to happen with scale. This is a feature, not a bug. This is a call for you to do the same. Build your own LAN. Connect it with friends’ homes. Remember what is missing from your life, and fill it in. Use software you know how to operate and get it running. Build slowly. Build your community. Do it with joy. Remember how we got here. Rebuild a community space that doesn’t need to be mediated by faceless corporations and ad revenue. Build something sustainable that brings you joy. Rebuild something you use daily. Bring back what we’re missing.boot2kierhttps://notes.pault.ag/boot2kier/Thu, 20 Feb 2025 09:40:00 -0500https://notes.pault.ag/boot2kier/I can’t remember exactly the joke I was making at the time in my <a href="https://zoo.dev">work’s</a> slack instance (I’m sure it wasn’t particularly funny, though; and not even worth re-reading the thread to work out), but it wound up with me writing a UEFI binary for the punchline. Not to spoil the ending but it worked - no pesky kernel, no messing around with “userland”. I guess the only part of this you really need to know for the setup here is that it was a <a href="https://en.wikipedia.org/wiki/Severance_(TV_series)">Severance</a> joke, which is some fantastic TV. If you haven’t seen it, this post will seem perhaps weirder than it actually is. I promise I haven’t joined any new cults. For those who have seen it, the payoff to my joke is that I wanted my machine to boot directly to an image of <a href="https://severance-tv.fandom.com/wiki/Kier_Eagan">Kier Eagan</a>. As for how to do it – I figured I’d give the <a href="https://docs.rs/uefi/latest/uefi/">uefi crate</a> a shot, and see how it is to use, since this is a low stakes way of trying it out. In general, this isn’t the sort of thing I’d usually post about – except this wound up being easier and way cleaner than I thought it would be. That alone is worth sharing, in the hopes someome comes across this in the future and feels like they, too, can write something fun targeting the UEFI. First thing’s first – gotta create a rust project (I’ll leave that part to you depending on your life choices), and to add the <code>uefi</code> crate to your <code>Cargo.toml</code>. You can either use <code>cargo add</code> or add a line like this by hand: <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-toml" data-lang="toml">uefi = { version = "0.33", features = ["panic_handler", "alloc", "global_allocator"] } </code></pre></div>We also need to teach cargo about how to go about building for the UEFI target, so we need to create a <code>rust-toolchain.toml</code> with one (or both) of the UEFI targets we’re interested in: <aside class="left"> I think there's a UEFI for riscv64 too, but I haven't found notes about it in Rust-land. </aside> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-toml" data-lang="toml">[toolchain] targets = ["aarch64-unknown-uefi", "x86_64-unknown-uefi"] </code></pre></div>Unfortunately, I wasn’t able to use the <a href="https://docs.rs/image/latest/image/">image</a> crate, since it won’t build against the <code>uefi</code> target. This looks like it’s because rustc had no way to compile the required floating point operations within the <code>image</code> crate without hardware floating point instructions specifically. Rust tends to punt a lot of that to <code>libm</code> usually, so this isnt entirely shocking given we’re <code>nostd</code> for a non-hardfloat target. <aside class="right"> I didn't file any bugs or even track them down between the image crate and rustc, since I figured this isn't actionable for anyone involved aside from "implement soft floats in the compiler to backfill this target". </aside> So-called “softening” requires a software floating point implementation that the compiler can use to “polyfill” (feels weird to use the term polyfill here, but I guess it’s spiritually right?) the lack of hardware floating point operations, which rust hasn’t implemented for this target yet. As a result, I changed tactics, and figured I’d use <code>ImageMagick</code> to pre-compute the pixels from a <code>jpg</code>, rather than doing it at runtime. A bit of a bummer, since I need to do more out of band pre-processing and hardcoding, and updating the image kinda sucks as a result – but it’s entirely manageable. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh">$ convert -resize 1280x900 kier.jpg kier.full.jpg $ convert -depth 8 kier.full.jpg rgba:kier.bin </code></pre></div>This will take our input file (<code>kier.jpg</code>), resize it to get as close to the desired resolution as possible while maintaining aspect ration, then convert it from a <code>jpg</code> to a flat array of 4 byte <code>RGBA</code> pixels. Critically, it’s also important to remember that the size of the <code>kier.full.jpg</code> file may not actually be the requested size – it will not change the aspect ratio, so be sure to make a careful note of the resulting size of the <code>kier.full.jpg</code> file. Last step with the image is to compile it into our Rust bianary, since we don’t want to struggle with trying to read this off disk, which is thankfully real easy to do. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-rust" data-lang="rust">const KIER: &[u8] = include_bytes!("../kier.bin"); const KIER_WIDTH: usize = 1280; const KIER_HEIGHT: usize = 641; const KIER_PIXEL_SIZE: usize = 4; </code></pre></div>Remember to use the width and height from the final <code>kier.full.jpg</code> file as the values for <code>KIER_WIDTH</code> and <code>KIER_HEIGHT</code>. <code>KIER_PIXEL_SIZE</code> is 4, since we have 4 byte wide values for each pixel as a result of our conversion step into RGBA. We’ll only use RGB, and if we ever drop the alpha channel, we can drop that down to 3. I don’t entirely know why I kept alpha around, but I figured it was fine. My <code>kier.full.jpg</code> image winds up shorter than the requested height (which is also qemu’s default resolution for me) – which means we’ll get a semi-annoying black band under the image when we go to run it – but it’ll work. Anyway, now that we have our image as bytes, we can get down to work, and write the rest of the code to handle moving bytes around from in-memory as a flat block if pixels, and request that they be displayed using the <a href="https://wiki.osdev.org/GOP">UEFI GOP</a>. We’ll just need to hack up a container for the image pixels and teach it how to blit to the display. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-rust" data-lang="rust">/// RGB Image to move around. This isn't the same as an /// `image::RgbImage`, but we can associate the size of /// the image along with the flat buffer of pixels. struct RgbImage { /// Size of the image as a tuple, as the /// (width, height) size: (usize, usize), /// raw pixels we'll send to the display. inner: Vec<BltPixel>, } impl RgbImage { /// Create a new `RgbImage`. fn new(width: usize, height: usize) -> Self { RgbImage { size: (width, height), inner: vec![BltPixel::new(0, 0, 0); width * height], } } /// Take our pixels and request that the UEFI GOP /// display them for us. fn write(&self, gop: &mut GraphicsOutput) -> Result { gop.blt(BltOp::BufferToVideo { buffer: &self.inner, src: BltRegion::Full, dest: (0, 0), dims: self.size, }) } } impl Index<(usize, usize)> for RgbImage { type Output = BltPixel; fn index(&self, idx: (usize, usize)) -> &BltPixel { let (x, y) = idx; &self.inner[y * self.size.0 + x] } } impl IndexMut<(usize, usize)> for RgbImage { fn index_mut(&mut self, idx: (usize, usize)) -> &mut BltPixel { let (x, y) = idx; &mut self.inner[y * self.size.0 + x] } } </code></pre></div>We also need to do some basic setup to get a handle to the UEFI GOP via the UEFI crate (using <a href="https://docs.rs/uefi/latest/uefi/boot/fn.get_handle_for_protocol.html">uefi::boot::get_handle_for_protocol</a> and <a href="https://docs.rs/uefi/latest/uefi/boot/fn.open_protocol_exclusive.html">uefi::boot::open_protocol_exclusive</a> for the <a href="https://docs.rs/uefi/latest/uefi/proto/console/gop/struct.GraphicsOutput.html">GraphicsOutput</a> protocol), so that we have the object we need to pass to <code>RgbImage</code> in order for it to write the pixels to the display. The only trick here is that the display on the booted system can really be any resolution – so we need to do some capping to ensure that we don’t write more pixels than the display can handle. Writing fewer than the display’s maximum seems fine, though. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-rust" data-lang="rust">fn praise() -> Result { let gop_handle = boot::get_handle_for_protocol::<GraphicsOutput>()?; let mut gop = boot::open_protocol_exclusive::<GraphicsOutput>(gop_handle)?; // Get the (width, height) that is the minimum of // our image and the display we're using. let (width, height) = gop.current_mode_info().resolution(); let (width, height) = (width.min(KIER_WIDTH), height.min(KIER_HEIGHT)); let mut buffer = RgbImage::new(width, height); for y in 0..height { for x in 0..width { let idx_r = ((y * KIER_WIDTH) + x) * KIER_PIXEL_SIZE; let pixel = &mut buffer[(x, y)]; pixel.red = KIER[idx_r]; pixel.green = KIER[idx_r + 1]; pixel.blue = KIER[idx_r + 2]; } } buffer.write(&mut gop)?; Ok(()) } </code></pre></div>Not so bad! A bit tedious – we could solve some of this by turning <code>KIER</code> into an <code>RgbImage</code> at compile-time using some clever <code>Cow</code> and <code>const</code> tricks and implement blitting a sub-image of the image – but this will do for now. This is a joke, after all, let’s not go nuts. All that’s left with our code is for us to write our <code>main</code> function and try and boot the thing! <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-rust" data-lang="rust">#[entry] fn main() -> Status { uefi::helpers::init().unwrap(); praise().unwrap(); boot::stall(100_000_000); Status::SUCCESS } </code></pre></div>If you’re following along at home and so interested, the final source is over at <a href="https://gist.github.com/paultag/60334e9f6c06388cc4b1c2cf12d85085">gist.github.com</a>. We can go ahead and build it using <code>cargo</code> (as is our tradition) by targeting the UEFI platform. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh">$ cargo build --release --target x86_64-unknown-uefi </code></pre></div><h1 id="testing-the-uefi-blob">Testing the UEFI Blob</h1> While I can definitely get my machine to boot these blobs to test, I figured I’d save myself some time by using QEMU to test without a full boot. If you’ve not done this sort of thing before, we’ll need two packages, <code>qemu</code> and <code>ovmf</code>. It’s a bit different than most invocations of qemu you may see out there – so I figured it’d be worth writing this down, too. <aside class="left"> It's perhaps likely that you aren't using <code>doas</code> with Debian. Replace <code>doas</code> with <code>sudo</code> if that's your thing. </aside> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh">$ doas apt install qemu-system-x86 ovmf </code></pre></div><code>qemu</code> has a nice feature where it’ll create us an EFI partition as a drive and attach it to the VM off a local directory – so let’s construct an EFI partition file structure, and drop our binary into the conventional location. If you haven’t done this before, and are only interested in running this in a VM, don’t worry too much about it, a lot of it is convention and this layout should work for you. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh">$ mkdir -p esp/efi/boot $ cp target/x86_64-unknown-uefi/release/*.efi \ esp/efi/boot/bootx64.efi </code></pre></div>With all this in place, we can kick off <code>qemu</code>, booting it in UEFI mode using the <code>ovmf</code> firmware, attaching our EFI partition directory as a drive to our VM to boot off of. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh">$ qemu-system-x86_64 \ -enable-kvm \ -m 2048 \ -smbios type=0,uefi=on \ -bios /usr/share/ovmf/OVMF.fd \ -drive format=raw,file=fat:rw:esp </code></pre></div>If all goes well, soon you’ll be met with the all knowing gaze of Chosen One, Kier Eagan. The thing that really impressed me about all this is this program worked first try – it all went so boringly normal. Truly, kudos to the <code>uefi</code> crate maintainers, it’s incredibly well done. <div> <img src="https://notes.pault.ag/boot2kier/boot2kier.png" /> </div> <h1 id="booting-a-live-system">Booting a live system</h1> Sure, we could stop here, but anyone can open up an app window and see a picture of Kier Eagan, so I knew I needed to finish the job and boot a real machine up with this. In order to do that, we need to format a USB stick. BE SURE /dev/sda IS CORRECT IF YOU’RE COPY AND PASTING. All my drives are NVMe, so BE CAREFUL – if you use SATA, it may very well be your hard drive! Please do not destroy your computer over this. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-txt" data-lang="txt">$ doas fdisk /dev/sda Welcome to fdisk (util-linux 2.40.4). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Command (m for help): n Partition type p primary (0 primary, 0 extended, 4 free) e extended (container for logical partitions) Select (default p): p Partition number (1-4, default 1): First sector (2048-4014079, default 2048): Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-4014079, default 4014079): Created a new partition 1 of type 'Linux' and of size 1.9 GiB. Command (m for help): t Selected partition 1 Hex code or alias (type L to list all): ef Changed type of partition 'Linux' to 'EFI (FAT-12/16/32)'. Command (m for help): w The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks. </code></pre></div>Once that looks good (depending on your flavor of <code>udev</code> you may or may not need to unplug and replug your USB stick), we can go ahead and format our new EFI partition (BE CAREFUL THAT /dev/sda IS YOUR USB STICK) and write our EFI directory to it. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-txt" data-lang="txt">$ doas mkfs.fat /dev/sda1 $ doas mount /dev/sda1 /mnt $ cp -r esp/efi /mnt $ find /mnt /mnt /mnt/efi /mnt/efi/boot /mnt/efi/boot/bootx64.efi </code></pre></div>Of course, naturally, devotion to Kier shouldn’t mean backdooring your system. Disabling Secure Boot runs counter to the Core Principals, such as Probity, and not doing this would surely run counter to Verve, Wit and Vision. This bit does require that you’ve taken the step to enroll a <a href="https://wiki.debian.org/SecureBoot#MOK_-_Machine_Owner_Key">MOK</a> and know how to use it, right about now is when we can use <code>sbsign</code> to sign our UEFI binary we want to boot from to continue enforcing Secure Boot. The details for how this command should be run specifically is likely something you’ll need to work out depending on how you’ve decided to manage your MOK. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh">$ doas sbsign \ --cert /path/to/mok.crt \ --key /path/to/mok.key \ target/x86_64-unknown-uefi/release/*.efi \ --output esp/efi/boot/bootx64.efi </code></pre></div>I figured I’d leave a signed copy of <code>boot2kier</code> at <code>/boot/efi/EFI/BOOT/KIER.efi</code> on my Dell XPS 13, with Secure Boot enabled and enforcing, just took a matter of going into my BIOS to add the right boot option, which was no sweat. I’m sure there is a way to do it using <code>efibootmgr</code>, but I wasn’t smart enough to do that quickly. I let ’er rip, and it booted up and worked great! It was a bit hard to get a video of my laptop, though – but lucky for me, I have a Minisforum Z83-F sitting around (which, until a few weeks ago was running the annual http server to control my <a href="https://k3xec.com/christmas/">christmas tree</a> ) – so I grabbed it out of the christmas bin, wired it up to a video capture card I have sitting around, and figured I’d grab a video of me booting a physical device off the boot2kier USB stick. <div> <img class="note-pad" src="https://notes.pault.ag/boot2kier/z83-boot2kier.gif" /> </div> Attentive readers will notice the image of Kier is smaller then the qemu booted system – which just means our real machine has a larger GOP display resolution than qemu, which makes sense! We could write some fancy resize code (sounds annoying), center the image (can’t be assed but should be the easy way out here) or resize the original image (pretty hardware specific workaround). Additionally, you can make out the image being written to the display before us (the Minisforum logo) behind Kier, which is really cool stuff. If we were real fancy we could write blank pixels to the display before blitting Kier, but, again, I don’t think I care to do that much work. <h1 id="but-now-i-must-away">But now I must away</h1> If I wanted to keep this joke going, I’d likely try and find a copy of the original <a href="https://www.youtube.com/watch?v=U6EUG22elbs">video when Helly 100%s her file</a> and boot into that – or maybe play a terrible midi PC speaker rendition of <a href="https://www.youtube.com/watch?v=OsbxAsdR0QI">Kier, Chosen One, Kier</a> after rendering the image. I, unfortunately, don’t have any friends involved with production (yet?), so I reckon all that’s out for now. I’ll likely stop playing with this – the joke was done and I’m only writing this post because of how great everything was along the way. All in all, this reminds me so much of building a homebrew kernel to boot a system into – but like, good, though, and it’s a nice reminder of both how fun this stuff can be, and how far we’ve come. UEFI protocols are light-years better than how we did it in the dark ages, and the tooling for this is SO much more mature. Booting a custom UEFI binary is miles ahead of trying to boot your own kernel, and I can’t believe how good the <code>uefi</code> crate is specifically. Praise Kier! Kudos, to everyone involved in making this so delightful ❤️.Complex for Whom?https://notes.pault.ag/complex-for-whom/Tue, 12 Nov 2024 15:21:00 -0500https://notes.pault.ag/complex-for-whom/In basically every engineering organization I’ve ever regarded as particularly high functioning, I’ve sat through one specific recurring conversation which is not – a conversation about “complexity”. Things are good or bad because they are or aren’t complex, architectures needs to be redone because it’s too complex – some refactor of whatever it is won’t work because it’s too complex. You may have even been a part of some of these conversations – or even been the one advocating for simple light-weight solutions. I’ve done it. Many times. <aside class="right"> When I was writing this, I had a flash-back to a over-10 year old post by <code>mjg59</code> about <a href="https://mjg59.dreamwidth.org/2414.html">LightDM</a>. It would be a mistake not to link it here. </aside> Rarely, if ever, do we talk about complexity within its rightful context – complexity for whom. Is a solution complex because it’s complex for the end user? Is it complex if it’s complex for an API consumer? Is it complex if it’s complex for the person maintaining the API service? Is it complex if it’s complex for someone outside the team maintaining it to understand? Complexity within a problem domain I’ve come to believe, is fairly zero-sum – there’s a fixed amount of complexity in the problem to be solved, and you can choose to either solve it, or leave it for those downstream of you to solve that problem on their own. <aside class="left"> Although I believe there is a fixed amount of complexity in the lower bound of a problem, you always have the option to change the problem you're solving! </aside> That being said, while I believe there is a lower bound in complexity to contend with for a problem, I do not believe there is an upper bound to the complexity of solutions possible. It is always possible, and in fact, very likely that teams create problems for themselves while trying to solve a problem. The rest of this post is talking to the lower bound. When getting feedback on an early draft of this blog post, I’ve been informed that Fred Brooks coined a term for what I call “lower bound complexity” – “Essential Complexity”, in the paper “<a href="https://www.cs.unc.edu/techreports/86-020.pdf">No Silver Bullet—Essence and Accident in Software Engineering</a>”, which is a better term and can be used interchangeably. <h1 id="complexity-culture">Complexity Culture</h1> In a large enough organization, where the team is high functioning enough to have and maintain trust amongst peers, members of the team will specialize. People will begin to engage with subsets of the work to be done, and begin to have their efficacy measured against that part of the organization’s problems. Incentives shift, and over time it becomes increasingly likely that two engineers may have two very different priorities when working on the same system together. Someone accountable for uptime and tasked with responding to outages will begin to resist changes. Someone accountable for rapidly delivering features will resist gates between them and their users. Companies (either wittingly or unwittingly) will deal with this by tasking engineers with both production (feature development) and operational tasks (maintenance), so the difference in incentives isn’t usually as bad as it could be. <aside class="left"> The events depicted in this movie are fictitious. Any similarity to any person living or dead is merely coincidental. </aside> When we get a bunch of folks from far-flung corners of an organization in a room, fire up a slide deck and throw up some aspirational to-be architecture diagram in order to get a sign-off to solve some problem (be it someone needs a credible promotion packet, new feature needs to get delivered, or the system has begun to fail and needs fixing), the initial reaction will, more often than I’d like, start to devolve into a discussion of how this is going to introduce a bunch of complexity, going to be hard to maintain, why can’t you make it less complex? <aside class="right"> In a high functioning environment, this is a mostly healthy impulse, coming from a good place, and genuinely intended to prevent problems for the whole organization by reducing non-essental complexity. That is good. I'm talking about a conversation discussing removing lower-limit complexity. </aside> Right around here is when I start to try and contextualize the conversation happening around me – understand what complexity is that being discussed, and understand who is taking on that burden. Think about who should be owning that problem, and work through the tradeoffs involved. Is it best solved here, or left to consumers (be them other systems, developers, or users). Should something become an API call’s optional param, taking on all the edge-cases and on, or should users have to implement the logic using the data you return (leaving everyone else to take on all the edge-cases and maintenance)? Should you process the data, or require the user to preprocess it for you? <aside class="left"> <a href="https://layeraleph.com/">Carla Geisser</a> described this as being reminicent of the technique outlined in "<a href="https://web.mit.edu/saltzer/www/publications/endtoend/endtoend.pdf">end to end arguments in system design</a>", which she uses to think about where complexity winds up in a system. It's an extremely good parallel. </aside> Frequently it’s right to make an active and explicit decision to simplify and leave problems to be solved downstream, since they may not actually need to be solved – or perhaps you expect consumers will want to own the specifics of how the problem is solved, in which case you leave lots of documentation and examples. Many other times, especially when it’s something downstream consumers are likely to hit, it’s best solved internal to the system, since the only thing that can come of leaving it unsolved are bugs, frustration and half-correct solutions. This is a grey-space of tradeoffs, not a clear decision tree. No one wants the software manifestation of a katamari ball or a junk drawer, nor does anyone want a half-baked service unable to handle the simplest use-case. <h1 id="head-in-sand-as-a-service">Head-in-sand as a Service</h1> Popoffs about how complex something is, are, to a first approximation, best understood as meaning “complicated for the person making comments”. A lot of the <code>#thoughtleadership</code> believe that an AWS hosted EKS <code>k8s</code> cluster running images built by CI talking to an AWS hosted PostgreSQL RDS is not complex. They’re right. Mostly right. This is less complex – less complex for them. It’s not, however, without complexity and its own tradeoffs – it’s just complexity that they do not have to deal with. Now they don’t have to maintain machines that have pesky operating systems or hard drive failures. They don’t have to deal with updating the version of <code>k8s</code>, nor ensuring the backups work. No one has to push some artifact to prod manually. Deployments happen unattended. You click a button and get a cluster. On the other hand, developers outside the ops function need to deal with troubleshooting CI, debugging access control rules encoded in turing complete YAML, permissions issues inside the cluster due to whatever the fuck a service mesh is, everyone needs to learn how to use some <code>k8s</code> tools they only actually use during a bad day, likely while doing some <code>x.509</code> troubleshooting to connect to the cluster (an internal only endpoint; just port forward it) – not to mention all sorts of rules to route packets to their project (a single repo’s binary being run in 3 containers on a single vm host). <aside class="right"> Truly I'm not picking on k8s here; I do genuinely believe it when I say EKS is less complex for me to operate well; that's kinda the whole point. </aside> Beyond that, there’s the invisible complexity – complexity on the interior of a service you depend on. I think about the dozens of teams maintaining the EKS service (which is either run on EC2 instances, or alternately, EC2 instances in a trench coat, moustache and even more shell scripts), the RDS service (also EC2 and shell scripts, but this time accounting for redundancy, backups, availability zones), scores of hypervisors pulled off the shelf (<code>xen</code>, <code>kvm</code>) smashed together with the ones built in-house (<code>firecracker</code>, <code>nitro</code>, etc) running on hardware that has to be refreshed and maintained continuously. Every request processed by network ACL rules, AWS IAM rules, security group rules, using IP space announced to the internet wired through IXPs directly into ISPs. I don’t even want to begin to think about the complexity inherent in how those switches are designed. Shitloads of complexity to solve problems you may or may not have, or even know you had. <aside class="left"> Do I care about invisible complexity? Generally, no. I don't. It's not my problem and they don't show up to my meetings. </aside> What’s more complex? An app running in an in-house 4u server racked in the office’s telco closet in the back running off the office Verizon line, or an app running four hypervisors deep in an AWS datacenter? Which is more complex to you? What about to your organization? In total? Which is more prone to failure? Which is more secure? Is the complexity good or bad? What type of Complexity can you manage effectively? Which threaten the system? Which threaten your users? <h1 id="complexivibes">COMPLEXIVIBES</h1> This extends beyond Engineering. Decisions regarding “what tools are we able to use” – be them existing contracts with cloud providers, CIO mandated SaaS products, a list of the only permissible open source projects – will incur costs in terms of expressed “complexity”. Pinning open source projects to a fixed set makes SBOM production “less complex”. Using only one SaaS provider’s product suite (even if its terrible, because it has all the types of tools you need) makes accreditation “less complex”. If all you have is a contract with Pauly T’s lowest price technically acceptable artisinal cloudary and haberdashery, the way you pay for your compute is “less complex” for the CIO shop, though you will find yourself building your own hosted database template, mechanism to spin up a k8s cluster, and all the operational and technical burden that comes with it. Or you won’t and make it everyone else’s problem in the organization. Nothing you can do will solve for the fact that you must now deal with this problem somewhere because it was less complicated for the business to put the workloads on the existing contract with a cut-rate vendor. Suddenly, the decision to “reduce complexity” because of an existing contract vehicle has resulted in a huge amount of technical risk and maintenance burden being onboarded. Complexity you would otherwise externalize has now been taken on internally. With large enough organizations (specifically, in this case, I’m talking about you, bureaucracies), this is largely ignored or accepted as normal since the personnel cost is understood to be free to everyone involved. Doing it this way is more expensive, more work, less reliable and less maintainable, and yet, somehow, is, in a lot of ways, “less complex” to the organization. It’s particularly bad with bureaucracies, since screwing up a contract will get you into much more trouble than delivering a broken product, leaving basically no reason for anyone to care to fix this. I can’t shake the feeling that for every story of <a href="https://mjw.wtf/weaver-a-tale-of-technical-policy.html">technical mandates gone awry</a>, somewhere just out of sight there’s a decisionmaker optimizing for what they believe to be the least amount of complexity – least hassle, fewest unique cases, most consistency – as they can. They freely offload complexity from their accreditation and risk acceptance functions through mandates. They will never have to deal with it. That does not change the fact that someone does. <h1 id="tcdr-too-complex-didnt-review">TC;DR (TOO COMPLEX; DIDN’T REVIEW)</h1> We wish to rid ourselves of systemic Complexity – after all, complexity is bad, simplicity is good. Removing upper-bound own-goal complexity (“accidental complexity” in Brooks’s terms) is important, but once you hit the lower bound complexity, the tradeoffs become zero-sum. Removing complexity from one part of the system means that somewhere else - maybe outside your organization or in a non-engineering function - must grow it back. Sometimes, the opposite is the case, such as when a previously manual business processes is automated. Maybe that’s a good idea. Maybe it’s not. All I know is that what doesn’t help the situation is conflating complexity with everything we don’t like – legacy code, maintenance burden or toil, cost, delivery velocity. <ul> <li>Complexity is not the same as proclivity to failure. The most reliable systems I’ve interacted with are unimaginably complex, with layers of internal protection to prevent complete failure. This has its own set of costs which other people <a href="https://how.complexsystems.fail/">have written about extensively</a>.</li> <li>Complexity is not cost. Sometimes the cost of taking all the complexity in-house is less, for whatever value of cost you choose to use.</li> <li>Complexity is not absolute. Something simple from one perspective may be wildly complex from another. The impulse to burn down complex sections of code is helpful to have generally, but <a href="https://en.wiktionary.org/wiki/Chesterton%27s_fence">sometimes things are complicated for a reason</a>, even if that reason exists outside your codebase or organization.</li> <li>Complexity is not something you can remove without introducing complexity elsewhere. Just as not making a decision is a decision itself; choosing to require someone else to deal with a problem rather than dealing with it internally is a choice that needs to be considered in its full context.</li> </ul> <aside class="left"> After reviewing an early draft of this post, <a href="https://layeraleph.com/">Mikey Dickerson</a> described what I was trying to say here back to me as "if you squeeze one part of the water balloon it goes somewhere else", which is a metaphor I've become attached to. </aside> <aside class="right"> Mikey also described these asides as being a Dr. Bronner's label, which I'll own. </aside> Next time you’re sitting through a discussion and someone starts to talk about all the complexity about to be introduced, I want to pop up in the back of your head, politely asking what does complex mean in this context? Is it lower bound complexity? Is this complexity desirable? Is what they’re saying mean something along the lines of I don’t understand the problems being solved, or does it mean something along the lines of this problem should be solved elsewhere? Do they believe this will result in more work for them in a way that you don’t see? Should this not solved at all by changing the bounds of what we should accept or redefine the understood limits of this system? Is the perceived complexity a result of a decision elsewhere? Who’s taking this complexity on, or more to the point, is failing to address complexity required by the problem leaving it to others? Does it impact others? How specifically? What are you not seeing? What can change? What should change?Domo Arigato, Mr. debugfshttps://notes.pault.ag/debugfs/Sat, 13 Apr 2024 09:27:00 -0400https://notes.pault.ag/debugfs/Years ago, at what I think I remember was DebConf 15, I hacked for a while on debhelper to <a href="https://github.com/Debian/debhelper/commit/5549f841fd7cba07e21df8e4f70b21c31cfb3da6">write build-ids to debian binary control files</a>, so that the <code>build-id</code> (more specifically, the ELF note <code>.note.gnu.build-id</code>) wound up in the Debian apt archive metadata. I’ve always thought this was super cool, and seeing as how Michael Stapelberg <a href="https://michael.stapelberg.ch/posts/2019-02-15-debian-debugging-devex/">blogged</a> some great pointers around the ecosystem, including the fancy new <code>debuginfod</code> service, and the <a href="https://manpages.debian.org/testing/debian-goodies/find-dbgsym-packages.1.en.html">find-dbgsym-packages</a> helper, which uses these same headers, I don’t think I’m the only one. At work I’ve been using a lot of <a href="https://www.rust-lang.org/">rust</a>, specifically, async rust using <a href="https://tokio.rs/">tokio</a>. To try and work on my style, and to dig deeper into the how and why of the decisions made in these frameworks, I’ve decided to hack up a project that I’ve wanted to do ever since 2015 – write a debug filesystem. Let’s get to it. <h1 id="back-to-the-future">Back to the Future</h1> <aside class="left"> It shouldn't shock anyone to learn I'm a huge fan of Go, right? </aside> Time to admit something. I really love <a href="https://9front.org/">Plan 9</a>. It’s just so good. So many ideas from Plan 9 are just so prescient, and everything just feels right. Not just right like, feels good – like, correct. The bit that I’ve always liked the most is <code>9p</code>, the network protocol for serving a filesystem over a network. This leads to all sorts of fun programs, like the Plan 9 <code>ftp</code> client being a 9p server – you mount the ftp server and access files like any other files. It’s kinda like if fuse were more fully a part of how the operating system worked, but fuse is all running client-side. With 9p there’s a single client, and different servers that you can connect to, which may be backed by a hard drive, remote resources over something like SFTP, FTP, HTTP or even purely synthetic. <aside class="right"> I even triggered a weird bug in <a href="https://github.com/vim/vim/commit/14759ded57447345ba11c11a99fd84344797862c">vim</a> when writing a 9p filesystem that wound up impacting <a href="https://github.com/microsoft/WSL/issues/11256">WSL</a> -- although it seems like maybe not due to 9p (rather, SMB) </aside> The interesting (maybe sad?) part here is that 9p wound up outliving Plan 9 in terms of adoption – <code>9p</code> is in all sorts of places folks don’t usually expect. For instance, the Windows Subsystem for Linux uses the 9p protocol to share files between Windows and Linux. ChromeOS uses it to share files with Crostini, and qemu uses 9p (<code>virtio-p9</code>) to share files between guest and host. If you’re noticing a pattern here, you’d be right; for some reason 9p is the go-to protocol to exchange files between hypervisor and guest. Why? I have no idea, except maybe due to being designed well, simple to implement, and it’s a lot easier to validate the data being shared and validate security boundaries. Simplicity has its value. As a result, there’s a lot of lingering 9p support kicking around. Turns out Linux can even handle mounting 9p filesystems out of the box. This means that I can deploy a filesystem to my LAN or my <code>localhost</code> by running a process on top of a computer that needs nothing special, and mount it over the network on an unmodified machine – unlike <code>fuse</code>, where you’d need client-specific software to run in order to mount the directory. For instance, let’s mount a 9p filesystem running on my localhost machine, serving requests on <code>127.0.0.1:564</code> (tcp) that goes by the name “<code>mountpointname</code>” to <code>/mnt</code>. <aside class="left"> Unfortunately, this requires root to mount and feels very un-plan9, but it does work and the protocol is good. </aside> <pre> $ mount -t 9p \ -o trans=tcp,port=564,version=9p2000.u,aname=mountpointname \ 127.0.0.1 \ /mnt </pre> Linux will mount away, and attach to the filesystem as the root user, and by default, attach to that mountpoint again for each local user that attempts to use it. Nifty, right? I think so. The server is able to keep track of per-user access and authorization along with the host OS. <h1 id="wherein-i-styx-with-it">WHEREIN I STYX WITH IT</h1> <aside class="right"> "Simple" here is intended as my highest form of praise. Writing complex things is easy. Taking your work, and simplifying it down the core is the most difficult part of our work. </aside> Since I wanted to push myself a bit more with <code>rust</code> and <code>tokio</code> specifically, I opted to implement the whole stack myself, without third party libraries on the critical path where I could avoid it. The 9p protocol (sometimes called <code>Styx</code>, the original name for it) is incredibly simple. It’s a series of client to server requests, which receive a server to client response. These are, respectively, “<code>T</code>” messages, which <code>t</code>ransmit a request to the server, which trigger an “<code>R</code>” message in response (<code>R</code>eply messages). These messages are <a href="https://en.wikipedia.org/wiki/Type%E2%80%93length%E2%80%93value">TLV</a> payload with a very straight forward structure – so straight forward, in fact, that I was able to implement a working server off nothing more than a handful of <a href="https://9fans.github.io/plan9port/man/man9/">man pages</a>. <aside class="left"> There's also a <code>9P2000.L</code> 9p variant which has more Linux specific extensions. There's a good chance I port this forward when I get the chance. </aside> Later on after the basics worked, I found a more complete <a href="https://ericvh.github.io/9p-rfc/rfc9p2000.html">spec page</a> that contains more information about the <a href="https://ericvh.github.io/9p-rfc/rfc9p2000.u.html">unix specific variant</a> that I opted to use (<code>9P2000.u</code> rather than <code>9P2000</code>) due to the level of <code>Linux</code> specific support for the <code>9P2000.u</code> variant over the <code>9P2000</code> protocol. <h1 id="mr-roboto">MR ROBOTO</h1> <aside class="right"> It really bothers me rust libraries that deal with I/O need to support std::io, but to add support for async runtimes, you need to implement support for tokio::io and every other runtime; but them's the breaks I guess. I really miss Go's built-in async support and io module. </aside> The backend stack over at <a href="https://zoo.dev">zoo</a> is <code>rust</code> and <code>tokio</code> running i/o for an <code>HTTP</code> and <code>WebRTC</code> server. I figured I’d pick something fairly similar to write my filesystem with, since <code>9P</code> can be implemented on basically anything with I/O. That means <code>tokio</code> tcp server bits, which construct and use a <code>9p</code> server, which has an idiomatic Rusty API that partially abstracts the raw <code>R</code> and <code>T</code> messages, but not so much as to cause issues with hiding implementation possibilities. At each abstraction level, there’s an escape hatch – allowing someone to implement any of the layers if required. I called this framework <a href="https://github.com/paultag/arigato">arigato</a> which can be found over on <a href="https://docs.rs/arigato">docs.rs</a> and <a href="https://crates.io/crates/arigato">crates.io</a>. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-rust" data-lang="rust">/// Simplified version of the arigato File trait; this isn't actually /// the same trait; there's some small cosmetic differences. The /// actual trait can be found at: /// /// https://docs.rs/arigato/latest/arigato/server/trait.File.html trait File { /// OpenFile is the type returned by this File via an Open call. type OpenFile: OpenFile; /// Return the 9p Qid for this file. A file is the same if the Qid is /// the same. A Qid contains information about the mode of the file, /// version of the file, and a unique 64 bit identifier. fn qid(&self) -> Qid; /// Construct the 9p Stat struct with metadata about a file. async fn stat(&self) -> FileResult<Stat>; /// Attempt to update the file metadata. async fn wstat(&mut self, s: &Stat) -> FileResult<()>; /// Traverse the filesystem tree. async fn walk(&self, path: &[&str]) -> FileResult<(Option<Self>, Vec<Self>)>; /// Request that a file's reference be removed from the file tree. async fn unlink(&mut self) -> FileResult<()>; /// Create a file at a specific location in the file tree. async fn create( &mut self, name: &str, perm: u16, ty: FileType, mode: OpenMode, extension: &str, ) -> FileResult<Self>; /// Open the File, returning a handle to the open file, which handles /// file i/o. This is split into a second type since it is genuinely /// unrelated -- and the fact that a file is Open or Closed can be /// handled by the `arigato` server for us. async fn open(&mut self, mode: OpenMode) -> FileResult<Self::OpenFile>; } /// Simplified version of the arigato OpenFile trait; this isn't actually /// the same trait; there's some small cosmetic differences. The /// actual trait can be found at: /// /// https://docs.rs/arigato/latest/arigato/server/trait.OpenFile.html trait OpenFile { /// iounit to report for this file. The iounit reported is used for Read /// or Write operations to signal, if non-zero, the maximum size that is /// guaranteed to be transferred atomically. fn iounit(&self) -> u32; /// Read some number of bytes up to `buf.len()` from the provided /// `offset` of the underlying file. The number of bytes read is /// returned. async fn read_at( &mut self, buf: &mut [u8], offset: u64, ) -> FileResult<u32>; /// Write some number of bytes up to `buf.len()` from the provided /// `offset` of the underlying file. The number of bytes written /// is returned. fn write_at( &mut self, buf: &mut [u8], offset: u64, ) -> FileResult<u32>; } </code></pre></div><h1 id="thanks-decade-ago-paultag">Thanks, decade ago paultag!</h1> <aside class="left"> If this isn't my record for longest idea-to-wip-project time, it's close. </aside> Let’s do it! Let’s use <code>arigato</code> to implement a <code>9p</code> filesystem we’ll call <a href="https://github.com/paultag/debugfs">debugfs</a> that will serve all the debug files shipped according to the <code>Packages</code> metadata from the <code>apt</code> archive. We’ll fetch the <code>Packages</code> file and construct a filesystem based on the reported <code>Build-Id</code> entries. For those who don’t know much about how an <code>apt</code> repo works, here’s the 2-second crash course on what we’re doing. The first is to fetch the <code>Packages</code> file, which is specific to a binary architecture (such as <code>amd64</code>, <code>arm64</code> or <code>riscv64</code>). That <code>architecture</code> is specific to a <code>component</code> (such as <code>main</code>, <code>contrib</code> or <code>non-free</code>). That <code>component</code> is specific to a <code>suite</code>, such as <code>stable</code>, <code>unstable</code> or any of its aliases (<code>bullseye</code>, <code>bookworm</code>, etc). Let’s take a look at the <code>Packages.xz</code> file for the <code>unstable-debug</code> <code>suite</code>, <code>main</code> <code>component</code>, for all <code>amd64</code> binaries. <pre tabindex="0"><code>$ curl \ https://deb.debian.org/debian-debug/dists/unstable-debug/main/binary-amd64/Packages.xz \ | unxz </code></pre>This will return the Debian-style <a href="https://man7.org/linux/man-pages/man5/deb822.5.html">rfc2822-like</a> headers, which is an export of the metadata contained inside each <code>.deb</code> file which <code>apt</code> (or other tools that can use the <code>apt</code> repo format) use to fetch information about debs. Let’s take a look at the debug headers for the <code>netlabel-tools</code> package in <code>unstable</code> – which is a package named <code>netlabel-tools-dbgsym</code> in <code>unstable-debug</code>. <pre tabindex="0"><code>Package: netlabel-tools-dbgsym Source: netlabel-tools (0.30.0-1) Version: 0.30.0-1+b1 Installed-Size: 79 Maintainer: Paul Tagliamonte <paultag@debian.org> Architecture: amd64 Depends: netlabel-tools (= 0.30.0-1+b1) Description: debug symbols for netlabel-tools Auto-Built-Package: debug-symbols Build-Ids: e59f81f6573dadd5d95a6e4474d9388ab2777e2a Description-md5: a0e587a0cf730c88a4010f78562e6db7 Section: debug Priority: optional Filename: pool/main/n/netlabel-tools/netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb Size: 62776 SHA256: 0e9bdb087617f0350995a84fb9aa84541bc4df45c6cd717f2157aa83711d0c60 </code></pre>So here, we can parse the package headers in the <code>Packages.xz</code> file, and store, for each <code>Build-Id</code>, the <code>Filename</code> where we can fetch the <code>.deb</code> at. Each <code>.deb</code> contains a number of files – but we’re only really interested in the files inside the <code>.deb</code> located at or under <code>/usr/lib/debug/.build-id/</code>, which you can find in <code>debugfs</code> under <a href="https://github.com/paultag/debugfs/blob/main/src/deb822.rs">rfc822.rs</a>. It’s crude, and very single-purpose, but I’m feeling a bit lazy. <h1 id="who-needs-dpkg">Who needs dpkg?!</h1> <aside class="right"> Hilariously, the fourth? fifth? non-serious time (second serious time) I've had to do this for a new language. </aside> For folks who haven’t seen it yet, a <code>.deb</code> file is a special type of <a href="https://en.wikipedia.org/wiki/Ar_(Unix)">.ar</a> file, that contains (usually) three files inside – <code>debian-binary</code>, <code>control.tar.xz</code> and <code>data.tar.xz</code>. The core of an <code>.ar</code> file is a fixed size (<code>60 byte</code>) entry header, followed by the specified <code>size</code> number of bytes. <pre tabindex="0"><code>[8 byte .ar file magic] [60 byte entry header] [N bytes of data] [60 byte entry header] [N bytes of data] [60 byte entry header] [N bytes of data] ... </code></pre><aside class="left"> I can't believe it's already been over a decade since my NM process, and nearly 16 years since I became an Ubuntu member. </aside> First up was to implement a basic <code>ar</code> parser in <a href="https://github.com/paultag/debugfs/blob/main/src/ar.rs">ar.rs</a>. Before we get into using it to parse a deb, as a quick diversion, let’s break apart a <code>.deb</code> file by hand – something that is a bit of a rite of passage (or at least it used to be? I’m getting old) during the Debian nm (new member) process, to take a look at where exactly the <code>.debug</code> file lives inside the <code>.deb</code> file. <pre tabindex="0"><code>$ ar x netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb $ ls control.tar.xz debian-binary data.tar.xz netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb $ tar --list -f data.tar.xz | grep '.debug$' ./usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug </code></pre>Since we know quite a bit about the structure of a <code>.deb</code> file, and I had to implement support from scratch anyway, I opted to implement a (very!) basic debfile parser using HTTP Range requests. HTTP Range requests, if supported by the server (denoted by a <code>accept-ranges: bytes</code> HTTP header in response to an HTTP <code>HEAD</code> request to that file) means that we can add a header such as <code>range: bytes=8-68</code> to specifically request that the returned <code>GET</code> body be the byte range provided (in the above case, the bytes starting from byte offset <code>8</code> until byte offset <code>68</code>). This means we can fetch just the ar file entry from the <code>.deb</code> file until we get to the file inside the <code>.deb</code> we are interested in (in our case, the <code>data.tar.xz</code> file) – at which point we can request the body of that file with a final <code>range</code> request. I wound up writing a struct to handle a <code>read_at</code>-style API surface in <a href="https://github.com/paultag/debugfs/blob/main/src/hrange.rs">hrange.rs</a>, which we can pair with <code>ar.rs</code> above and start to find our data in the <code>.deb</code> remotely without downloading and unpacking the <code>.deb</code> at all. <aside class="right"> I really like <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests">HTTP Range</a> requests a lot. </aside> <aside class="left"> I did some stats to figure out what compression dbgsym packages use these days; my LAN debug mirror contains 113459 xz compressed tarfiles, and 9 gzip compressed tarfiles at the time of writing. </aside> After we have the body of the <code>data.tar.xz</code> coming back through the HTTP response, we get to pipe it through an <code>xz</code> decompressor (this kinda sucked in Rust, since a <code>tokio</code> <code>AsyncRead</code> is not the same as an <code>http</code> Body response is not the same as <code>std::io::Read</code>, is not the same as an async (or sync) <code>Iterator</code> is not the same as what the <code>xz2</code> crate expects; leading me to read blocks of data to a buffer and stuff them through the decoder by looping over the buffer for each <code>lzma2</code> packet in a loop), and <code>tar</code>file parser (similarly troublesome). From there we get to iterate over all entries in the tarfile, stopping when we reach our file of interest. Since we can’t seek, but <code>gdb</code> needs to, we’ll pull it out of the stream into a <code>Cursor<Vec<u8>></code> in-memory and pass a handle to it back to the user. From here on out its a matter of <a href="https://github.com/paultag/debugfs/blob/main/src/debugfs.rs">gluing together a File traited struct</a> in <code>debugfs</code>, and serving the filesystem over TCP using <code>arigato</code>. Done deal! <h1 id="a-quick-diversion-about-compression">A quick diversion about compression</h1> I was originally hoping to avoid transferring the whole tar file over the network (and therefore also reading the whole debug file into ram, which objectively sucks), but quickly hit issues with figuring out a way around seeking around an <code>xz</code> file. What’s interesting is <code>xz</code> has a great primitive to solve this specific problem (specifically, use a block size that allows you to seek to the block as close to your desired seek position just before it, only discarding at most <code>block size - 1</code> bytes), but <code>data.tar.xz</code> files generated by <code>dpkg</code> appear to have a single mega-huge block for the whole file. I don’t know why I would have expected any different, in retrospect. That means that this now devolves into the base case of “How do I seek around an <code>lzma2</code> compressed data stream”; which is a lot more complex of a question. <aside class="left"> After going through a lot of this, I realized just how complex the xz format is -- it's a lot more than just lzma2! </aside> Thankfully, notoriously brilliant <a href="https://github.com/tianon">tianon</a> was nice enough to introduce me to <a href="https://github.com/jonjohnsonjr">Jon Johnson</a> who did something super similar – adapted a technique to seek inside a compressed <code>gzip</code> file, which lets his service <a href="https://oci.dag.dev/?image=debian%3Aunstable">oci.dag.dev</a> seek through Docker container images super fast based on some prior work such as <code>soci-snapshotter</code>, <code>gztool</code>, and <a href="https://github.com/madler/zlib/blob/0f51fb4933fc9ce18199cb2554dacea8033e7fd3/examples/zran.c">zran.c</a>. He also pulled this party trick off for apk based distros over at <a href="https://apk.dag.dev/">apk.dag.dev</a>, which seems apropos. Jon was nice enough to publish a lot of his work on this specifically in a central place under the name “<a href="https://github.com/jonjohnsonjr/targz">targz</a>” on his GitHub, which has been a ton of fun to read through. The gist is that, by dumping the decompressor’s state (window of previous bytes, in-memory data derived from the last <code>N-1 bytes</code>) at specific “checkpoints” along with the compressed data stream offset in bytes and decompressed offset in bytes, one can seek to that checkpoint in the compressed stream and pick up where you left off – creating a similar “block” mechanism against the wishes of gzip. It means you’d need to do an <code>O(n)</code> run over the file, but every request after that will be sped up according to the number of checkpoints you’ve taken. Given the complexity of <code>xz</code> and <code>lzma2</code>, I don’t think this is possible for me at the moment – especially given most of the files I’ll be requesting will not be loaded from again – especially when I can “just” cache the debug header by <code>Build-Id</code>. I want to implement this (because I’m generally curious and Jon has a way of getting someone excited about compression schemes, which is not a sentence I thought I’d ever say out loud), but for now I’m going to move on without this optimization. Such a shame, since it kills a lot of the work that went into seeking around the <code>.deb</code> file in the first place, given the <code>debian-binary</code> and <code>control.tar.gz</code> members are so small. <h1 id="the-good">The Good</h1> First, the good news right? It works! That’s pretty cool. I’m positive my younger self would be amused and happy to see this working; as is current day paultag. Let’s take <code>debugfs</code> out for a spin! First, we need to mount the filesystem. It even works on an entirely unmodified, stock Debian box on my LAN, which is huge. Let’s take it for a spin: <pre tabindex="0"><code>$ mount \ -t 9p \ -o trans=tcp,version=9p2000.u,aname=unstable-debug \ 192.168.0.2 \ /usr/lib/debug/.build-id/ </code></pre>And, let’s prove to ourselves that this actually mounted before we go trying to use it: <pre tabindex="0"><code>$ mount | grep build-id 192.168.0.2 on /usr/lib/debug/.build-id type 9p (rw,relatime,aname=unstable-debug,access=user,trans=tcp,version=9p2000.u,port=564) </code></pre>Slick. We’ve got an open connection to the server, where our host will keep a connection alive as root, attached to the filesystem provided in <code>aname</code>. Let’s take a look at it. <pre tabindex="0"><code>$ ls /usr/lib/debug/.build-id/ 00 0d 1a 27 34 41 4e 5b 68 75 82 8E 9b a8 b5 c2 CE db e7 f3 01 0e 1b 28 35 42 4f 5c 69 76 83 8f 9c a9 b6 c3 cf dc E7 f4 02 0f 1c 29 36 43 50 5d 6a 77 84 90 9d aa b7 c4 d0 dd e8 f5 03 10 1d 2a 37 44 51 5e 6b 78 85 91 9e ab b8 c5 d1 de e9 f6 04 11 1e 2b 38 45 52 5f 6c 79 86 92 9f ac b9 c6 d2 df ea f7 05 12 1f 2c 39 46 53 60 6d 7a 87 93 a0 ad ba c7 d3 e0 eb f8 06 13 20 2d 3a 47 54 61 6e 7b 88 94 a1 ae bb c8 d4 e1 ec f9 07 14 21 2e 3b 48 55 62 6f 7c 89 95 a2 af bc c9 d5 e2 ed fa 08 15 22 2f 3c 49 56 63 70 7d 8a 96 a3 b0 bd ca d6 e3 ee fb 09 16 23 30 3d 4a 57 64 71 7e 8b 97 a4 b1 be cb d7 e4 ef fc 0a 17 24 31 3e 4b 58 65 72 7f 8c 98 a5 b2 bf cc d8 E4 f0 fd 0b 18 25 32 3f 4c 59 66 73 80 8d 99 a6 b3 c0 cd d9 e5 f1 fe 0c 19 26 33 40 4d 5a 67 74 81 8e 9a a7 b4 c1 ce da e6 f2 ff </code></pre>Outstanding. Let’s try using <code>gdb</code> to debug a binary that was provided by the <code>Debian</code> archive, and see if it’ll load the ELF by <code>build-id</code> from the right <code>.deb</code> in the <code>unstable-debug</code> suite: <pre tabindex="0"><code>$ gdb -q /usr/sbin/netlabelctl Reading symbols from /usr/sbin/netlabelctl... Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug... (gdb) </code></pre>Yes! Yes it will! <pre tabindex="0"><code>$ file /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter *empty*, BuildID[sha1]=e59f81f6573dadd5d95a6e4474d9388ab2777e2a, for GNU/Linux 3.2.0, with debug_info, not stripped </code></pre><h1 id="the-bad">The Bad</h1> Linux’s support for <code>9p</code> is mainline, which is great, but it’s not robust. Network issues or server restarts will wedge the mountpoint (Linux can’t reconnect when the tcp connection breaks), and things that work fine on local filesystems get translated in a way that causes a lot of network chatter – for instance, just due to the way the syscalls are translated, doing an <code>ls</code>, will result in a <code>stat</code> call for each file in the directory, even though linux had just got a <code>stat</code> entry for every file while it was resolving directory names. On top of that, Linux will serialize all I/O with the server, so there’s no concurrent requests for file information, writes, or reads pending at the same time to the server; and <code>read</code> and <code>write</code> throughput will degrade as latency increases due to increasing round-trip time, even though there are offsets included in the <code>read</code> and <code>write</code> calls. It works well enough, but is frustrating to run up against, since there’s not a lot you can do server-side to help with this beyond implementing the <code>9P2000.L</code> variant (which, maybe is worth it). <h1 id="the-ugly">The Ugly</h1> Unfortunately, we don’t know the file size(s) until we’ve actually opened the underlying <code>tar</code> file and found the correct member, so for most files, we don’t know the real size to report when getting a <code>stat</code>. We can’t parse the tarfiles for every <code>stat</code> call, since that’d make <code>ls</code> even slower (bummer). Only hiccup is that when I report a filesize of zero, <code>gdb</code> throws a bit of a fit; let’s try with a size of <code>0</code> to start: <pre tabindex="0"><code>$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug -r--r--r-- 1 root root 0 Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug $ gdb -q /usr/sbin/netlabelctl Reading symbols from /usr/sbin/netlabelctl... Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug... warning: Discarding section .note.gnu.build-id which has a section size (24) larger than the file size [in module /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug] [...] </code></pre>This obviously won’t work since <code>gdb</code> will throw away all our hard work because of <code>stat</code>’s output, and neither will loading the real size of the underlying file. That only leaves us with hardcoding a file size and hope nothing else breaks significantly as a result. Let’s try it again: <pre tabindex="0"><code>$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug -r--r--r-- 1 root root 954M Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug $ gdb -q /usr/sbin/netlabelctl Reading symbols from /usr/sbin/netlabelctl... Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug... (gdb) </code></pre>Much better. I mean, terrible but better. Better for now, anyway. <h1 id="kilroy-was-here">Kilroy was here</h1> Do I think this is a particularly good idea? I mean; kinda. I’m probably going to make some fun <code>9p</code> <code>arigato</code>-based filesystems for use around my LAN, but I don’t think I’ll be moving to use <code>debugfs</code> until I can figure out how to ensure the connection is more resilient to changing networks, server restarts and fixes on i/o performance. I think it was a useful exercise and is a pretty great hack, but I don’t think this’ll be shipping anywhere anytime soon. Along with me publishing this post, I’ve pushed up all my repos; so you should be able to play along at home! There’s a lot more work to be done on <code>arigato</code>; but it does handshake and successfully export a working <code>9P2000.u</code> filesystem. Check it out on on my github at <a href="https://github.com/paultag/arigato">arigato</a>, <a href="https://github.com/paultag/debugfs">debugfs</a> and also on <a href="https://crates.io/crates/arigato">crates.io</a> and <a href="https://docs.rs/arigato">docs.rs</a>. At least I can say I was here and I got it working after all these years.Using PKCS#11 on GNU/Linuxhttps://notes.pault.ag/pkcs11/Sun, 07 Aug 2016 20:17:00 -0500https://notes.pault.ag/pkcs11/PKCS#11 is a standard API to interface with HSMs, Smart Cards, or other types of random hardware backed crypto. On my travel laptop, I use a few Yubikeys in PKCS#11 mode using OpenSC to handle system login. <code>libpam-pkcs11</code> is a pretty easy to use module that will let you log into your system locally using a PKCS#11 token locally. One of the least documented things, though, was how to use an OpenSC PKCS#11 token in Chrome. First, close all web browsers you have open. <pre> sudo apt-get install libnss3-tools certutil -U -d sql:$HOME/.pki/nssdb modutil -add "OpenSC" -libfile /usr/lib/x86_64-linux-gnu/opensc-pkcs11.so -dbdir sql:$HOME/.pki/nssdb modutil -list "OpenSC" -dbdir sql:$HOME/.pki/nssdb modutil -enable "OpenSC" -dbdir sql:$HOME/.pki/nssdb </pre> Now, we’ll have the PKCS#11 module ready for <code>nss</code> to use, so let’s double check that the tokens are registered: <pre> certutil -U -d sql:$HOME/.pki/nssdb certutil -L -h "OpenSC" -d sql:$HOME/.pki/nssdb </pre> If this winds up causing issues, you can remove it using the following command: <pre> modutil -delete "OpenSC" -dbdir sql:$HOME/.pki/nssdb </pre>Hacking a Projector in Hyhttps://notes.pault.ag/hacking-a-projector-in-hy/Sun, 31 Jul 2016 12:02:00 -0500https://notes.pault.ag/hacking-a-projector-in-hy/About a year ago, I bought a Projector after I finally admitted that I could actually use a TV in my apartment. I settled on buying a <a href="https://ap.viewsonic.com/il/products/projectors/PJD5132.php">ViewSonic PJD5132</a>. It was a really great value, and has been nothing short of a delight to own. I was always a bit curious about the DB9 connector on the back of the unit, so I dug into the user manual, and found some hex code strings in there. So, last year, between my last gig at the <a href="https://sunlightfoundation.com/">Sunlight Foundtion</a> and <a href="https://www.usds.gov/">USDS</a>, I spent some time wandering around the US, hitting up <a href="https://debconf15.debconf.org/">DebConf</a>, and exploring Washington DC. Between trips, I set out to figure out exactly what was going on with my Projector, and see if I could make it do anything fun. So, I started off with basics, and tried to work out how these command codes were structured. I had a few working codes, but to write clean code, I’d be better off understanding why the codes looked like they do. Let’s look at the “Power On” code. <pre> 0x06 0x14 0x00 0x04 0x00 0x34 0x11 0x00 0x00 0x5D </pre> Some were 10 bytes, other were 11, and most started with similar looking things. The first byte was usually a <code>0x06</code> or <code>0x07</code>, followed by two bytes <code>0x14 0x00</code>, and either a <code>0x04</code> or <code>0x05</code>. Since the first few bytes were similarly structured, I assumed the first octet (either <code>0x06</code> or <code>0x07</code>) was actually a length, since the first 4 octets seemed always present. So, my best guess is that we have a Length byte at index 0, followed by two bytes for the Protocol, a flag for if you’re Reading or Writing (best guess on that one), and opaque data following that. Sometimes it’s a const of sorts, and sometimes an octet (either little or big endian, confusingly). <aside class="left"> These are all just wild guesses, but thinking of it like this has actually helped a bit, so I'm just going to use this as my working understanding and adjust as needed. </aside> <pre> Length | Read / Write | | | Protocol | Data | |----| | |------------------------| 0x06 0x14 0x00 0x04 0x00 0x34 0x11 0x00 0x00 0x5D </pre> Right. OK. So, let’s get to work. In the spirit of code is data, data is code, I hacked up some of the projector codes into a s-expression we can use later. The structure of this is boring, but it’ll let us both store the command code to issue, as well as define the handler to read the data back. <pre> (setv *commands* ; function type family control '((power-on nil nil (0x06 0x14 0x00 0x04 0x00 0x34 0x11 0x00 0x00 0x5D)) (power-off nil nil (0x06 0x14 0x00 0x04 0x00 0x34 0x11 0x01 0x00 0x5E)) (power-status const power (0x07 0x14 0x00 0x05 0x00 0x34 0x00 0x00 0x11 0x00 0x5E)) (reset nil nil (0x06 0x14 0x00 0x04 0x00 0x34 0x11 0x02 0x00 0x5F)) ... </pre> As well as defining some of the const responses that come back from the Projector itself. These are pretty boring, but it’s helpful to put a name to the int that falls out. <pre> (setv *consts* '((power ((on (0x00 0x00 0x01)) (off (0x00 0x00 0x00)))) (freeze ((on (0x00 0x00 0x01)) (off (0x00 0x00 0x00)))) ... </pre> After defining a few simple functions to write the byte arrays to the serial port as well as reading and understanding responses from the projector, I could start elaborating on some higher order functions that can talk projector. So the first thing I wrote was to make a function that converts the command entry into a native Hy function. <pre> (defn make-api-function [function type family data] `(defn ~function [serial] (import [PJD5132.dsl [interpret-response]] [PJD5132.serial [read-response/raw]]) (serial.write (bytearray [~@data])) (interpret-response ~(str type) ~(str family) (read-response/raw serial)))) </pre> Fun. Fun! Now, we can invoke it to create a Hy & Python importable API wrapper in a few lines! <pre> (import [PJD5132.commands [*commands*]] [PJD5132.dsl [make-api-function]]) (list (map (fn [(, function type family command)] (make-api-function function type family command)) *commands*))) </pre> Cool. So, now we can import things like <code>power-on</code> from <code>*commands*</code> which takes a single argument (<code>serial</code>) for the serial port, and it’ll send a command, and return the response. The best part about all this is you only have to define the data once in a list, and the rest comes for free. Finally, I do want to be able to turn my projector on and off over the network so I went ahead and make a Flask “API” on top of this. First, let’s define a macro to define Flask routes: <pre> (defmacro defroute [name root &rest methods] (import os.path) (defn generate-method [path method status] `(with-decorator (app.route ~path) (fn [] (import [PJD5132.api [~method ~(if status status method)]]) (try (do (setv ret (~method serial-line)) ~(if status `(setv ret (~status serial-line))) (json.dumps ret)) (except [e ValueError] (setv response (make-response (.format "Fatal Error: ValueError: {}" (str e)))) (setv response.status-code 500) response))))) (setv path (.format "/projector/{}" name)) (setv actions (dict methods)) `(do ~(generate-method path root nil) ~@(list-comp (generate-method (os.path.join path method-path) method root) [(, method-path method) methods]))) </pre> Now, we can define how we want our API to look, so let’s define the <code>power</code> route, which will expand out into the Flask route code above. <pre> (defroute power power-status ("on" power-on) ("off" power-off)) </pre> And now, let’s play with it! <pre> $ curl https://192.168.1.50/projector/power "off" $ curl https://192.168.1.50/projector/power/on "on" $ curl https://192.168.1.50/projector/power "on" </pre> Or, the volume! <pre> $ curl 192.168.1.50/projector/volume 10 $ curl 192.168.1.50/projector/volume/decrease 9 $ curl 192.168.1.50/projector/volume/decrease 8 $ curl 192.168.1.50/projector/volume/decrease 7 $ curl 192.168.1.50/projector/volume/increase 8 $ curl 192.168.1.50/projector/volume/increase 9 $ curl 192.168.1.50/projector/volume/increase 10 </pre> Check out the full source over at <a href="https://github.com/paultag/PJD5132/">github.com/paultag/PJD5132</a>!The Open Source License APIhttps://notes.pault.ag/osi-license-api/Sat, 16 Jul 2016 15:30:00 -0500https://notes.pault.ag/osi-license-api/Around a year ago, I started hacking together a machine readable version of the OSI approved licenses list, and casually picking parts up until it was ready to launch. A few weeks ago, we officially announced the <a href="https://opensource.org/node/822">osi license api</a>, which is now live at <a href="https://api.opensource.org/">api.opensource.org</a>. I also took a whack at writing a few API bindings, in <a href="https://github.com/opensourceorg/python-opensource">Python</a>, <a href="https://github.com/opensourceorg/ruby-opensourceapi">Ruby</a>, and using the models from the API implementation itself in <a href="https://github.com/OpenSourceOrg/api/tree/master/client">Go</a>. In the following few weeks, <a href="https://github.com/clinty">Clint</a> wrote one in <a href="https://github.com/OpenSourceOrg/haskell-opensource">Haskell</a>, <a href="https://mornie.org/">Eriol</a> wrote one in <a href="https://github.com/opensourceorg/rust-opensource">Rust</a>, and <a href="https://ironholds.org/">Oliver</a> wrote one in <a href="https://cran.r-project.org/web/packages/osi/">R</a>. The data is sourced from a <a href="https://github.com/opensourceorg/licenses">repo on GitHub</a>, the <code>licenses</code> repo under <code>OpenSourceOrg</code>. Pull Requests against that repo are wildly encouraged! Additional data ideas, cleanup or more hand collected data would be wonderful! In the meantime, use-cases for using this API range from language package managers pulling OSI approval of a licence programmatically to using a license identifier as defined in one dataset (SPDX, for example), and using that to find the identifier as it exists in another system (DEP5, Wikipedia, TL;DR Legal). Patches are hugely welcome, as are bug reports or ideas! I’d also love more API wrappers for other languages!Hello, InfluxDBhttps://notes.pault.ag/hello-influxdb/Sat, 02 Jul 2016 13:13:00 -0500https://notes.pault.ag/hello-influxdb/Last week, I posted about <a href="https://notes.pault.ag/hello-sense/">python-sense</a>, and API wrapper for the internal Sense API. I wrote this so that I could pull data about myself into my own databases, allowing me to use that information for myself. One way I’m doing this is by pulling my room data into an <a href="https://influxdata.com/">InfluxDB</a> database, letting me run time series queries against my environmental data. <pre> #!/usr/bin/env python from influxdb import InfluxDBClient import json import datetime as dt from sense.service import Sense api = Sense() data = api.room_sensors(quantity=20) def items(data): for flavor, series in data.items(): for datum in reversed(series): value = datum['value'] if value == -1: continue timezone = dt.timezone(dt.timedelta( seconds=datum['offset_millis'] / 1000, )) when = dt.datetime.fromtimestamp( datum['datetime'] / 1000, ).replace(tzinfo=timezone) yield flavor, when, value client = InfluxDBClient( 'url.to.host.here', 443, 'username', 'password', 'sense', ssl=True, ) def series(data): for flavor, when, value in items(data): yield { "measurement": "{}".format(flavor), "tags": { "user": "paultag" }, "time": when.isoformat(), "fields": { "value": value, } } client.write_points(list(series(data))) </pre> I’m able to run this on a cron, automatically loading data from the Sense API into my Influx database. I can then use that with something like <a href="https://grafana.org/">Grafana</a>, to check out what my room looks like over time. <img src="https://notes.pault.ag/static/posts/hello-influx/sense-influx-light.png" alt=""> <img src="https://notes.pault.ag/static/posts/hello-influx/sense-influx-temp.png" alt="">Hello, Sense!https://notes.pault.ag/hello-sense/Sun, 26 Jun 2016 21:42:00 -0500https://notes.pault.ag/hello-sense/A while back, I saw a <a href="https://www.kickstarter.com/projects/hello/sense-know-more-sleep-better">Kickstarter</a> for one of the most well designed and pretty sleep trackers on the market. I fell in love with it, and it has stuck with me since. A few months ago, I finally got my hands on one and started to track my data. Naturally, I now want to store this new data with the rest of the data I have on myself in my own databases. I went in search of an API, but I found that the Sense API hasn’t been published yet, and is being worked on by the team. Here’s hoping it’ll land soon! After some subdomain guessing, I hit on <a href="https://api.hello.is">api.hello.is</a>. So, naturally, I went to take a quick look at their Android app and network traffic, lo and behold, there was a pretty nicely designed API. This API is clearly an internal API, and as such, it’s something that should not be considered stable. However, I’m OK with a fragile API, so <a href="https://github.com/paultag/python-sense">I’ve published a quick and dirty API wrapper for the Sense API to my GitHub.</a>. I’ve published it because I’ve found it useful, but I can’t promise the world, (since I’m not a member of the Sense team at Hello!), so here are a few ground rules of this wrapper: <ul> <li>I make no claims to the stability or completeness.</li> <li>I have no documentation or assurances.</li> <li>I will not provide the client secret and ID. You’ll have to find them on your own.</li> <li>This may stop working without any notice, and there may even be really nasty bugs that result in your alarm going off at 4 AM.</li> <li>Send PRs! This is a side-project for me.</li> </ul> This module is currently Python 3 only. If someone really needs Python 2 support, I’m open to minimally invasive patches to the codebase using <code>six</code> to support Python 2.7. <h2 id="working-with-the-api">Working with the API:</h2> First, let’s go ahead and log in using <code>python -m sense</code>. <pre> $ python -m sense Sense OAuth Client ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx Sense OAuth Client Secret: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx Sense email: paultag@gmail.com Sense password: Attempting to log into Sense's API Success! Attempting to query the Sense API The humidity is **just right**. The air quality is **just right**. The light level is **just right**. It's **pretty hot** in here. The noise level is **just right**. Success! </pre> Now, let’s see if we can pull up information on my Sense: <pre> >>> from sense import Sense >>> sense = Sense() >>> sense.devices() {'senses': [{'id': 'xxxxxxxxxxxxxxxx', 'firmware_version': '11a1', 'last_updated': 1466991060000, 'state': 'NORMAL', 'wifi_info': {'rssi': 0, 'ssid': 'Pretty Fly for a WiFi (2.4 GhZ)', 'condition': 'GOOD', 'last_updated': 1462927722000}, 'color': 'BLACK'}], 'pills': [{'id': 'xxxxxxxxxxxxxxxx', 'firmware_version': '2', 'last_updated': 1466990339000, 'battery_level': 87, 'color': 'BLUE', 'state': 'NORMAL'}]} </pre> Neat! Pretty cool. Look, you can even see my WiFi AP! Let’s try some more and pull some trends out. <pre> >>> values = [x.get("value") for x in sense.room_sensors()["humidity"]][:10] >>> min(values) 45.73904 >>> max(values) 45.985928 >>> </pre> I plan to keep maintaining it as long as it’s needed, so I welcome co-maintainers, and I’d love to see what people build with it! So far, I’m using it to dump my room data into InfluxDB, pulling information on my room into Grafana. Hopefully more to come! Happy hacking!Go Debian!https://notes.pault.ag/go-debian/Sun, 19 Jun 2016 12:30:00 -0500https://notes.pault.ag/go-debian/As some of the world knows full well by now, I’ve been noodling with Go for a few years, working through its pros, its cons, and thinking a lot about how humans use code to express thoughts and ideas. Go’s got a lot of neat use cases, suited to particular problems, and used in the right place, you can see some clear massive wins. <aside class="left"> Some of the things Go is great at: Writing a server. Dealing with asynchronous communication. Backend and front-end in the same binary. Fast and memory safe. </aside> <aside class="right"> Things Go is bad at: Having to rebuild everything for a CVE. Having if `err != nil` everywhere. "Better than C" being the excuse for bad semantics. No generics, cgo (enough said) </aside> I’ve started writing Debian tooling in Go, because it’s a pretty natural fit. Go’s fairly tight, and overhead shouldn’t be taken up by your operating system. After a while, I wound up hitting the usual blockers, and started to build up abstractions. They became pretty darn useful, so, this blog post is announcing (a still incomplete, year old and perhaps API changing) Debian package for Go. The Go importable name is <code>pault.ag/go/debian</code>. This contains a lot of utilities for dealing with Debian packages, and will become an edited down “toolbelt” for working with or on Debian packages. <h1 id="module-overview">Module Overview</h1> Currently, the package contains 4 major sub packages. They’re a <code>changelog</code> parser, a <code>control</code> file parser, <code>deb</code> file format parser, <code>dependency</code> parser and a <code>version</code> parser. Together, these are a set of powerful building blocks which can be used together to create higher order systems with reliable understandings of the world. <h2 id="changelog">changelog</h2> The first (and perhaps most incomplete and least tested) is a <a href="https://godoc.org/pault.ag/go/debian/changelog">changelog file parser.</a>. This provides the programmer with the ability to pull out the suite being targeted in the changelog, when each upload was, and the version for each. For example, let’s look at how we can pull when all the uploads of Docker to sid took place: <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go">func main() { resp, err := http.Get("https://metadata.ftp-master.debian.org/changelogs/main/d/docker.io/unstable_changelog") if err != nil { panic(err) } allEntries, err := changelog.Parse(resp.Body) if err != nil { panic(err) } for _, entry := range allEntries { fmt.Printf("Version %s was uploaded on %s\n", entry.Version, entry.When) } } </code></pre></div>The output of which looks like: <pre tabindex="0"><code>Version 1.8.3~ds1-2 was uploaded on 2015-11-04 00:09:02 -0800 -0800 Version 1.8.3~ds1-1 was uploaded on 2015-10-29 19:40:51 -0700 -0700 Version 1.8.2~ds1-2 was uploaded on 2015-10-29 07:23:10 -0700 -0700 Version 1.8.2~ds1-1 was uploaded on 2015-10-28 14:21:00 -0700 -0700 Version 1.7.1~dfsg1-1 was uploaded on 2015-08-26 10:13:48 -0700 -0700 Version 1.6.2~dfsg1-2 was uploaded on 2015-07-01 07:45:19 -0600 -0600 Version 1.6.2~dfsg1-1 was uploaded on 2015-05-21 00:47:43 -0600 -0600 Version 1.6.1+dfsg1-2 was uploaded on 2015-05-10 13:02:54 -0400 EDT Version 1.6.1+dfsg1-1 was uploaded on 2015-05-08 17:57:10 -0600 -0600 Version 1.6.0+dfsg1-1 was uploaded on 2015-05-05 15:10:49 -0600 -0600 Version 1.6.0+dfsg1-1~exp1 was uploaded on 2015-04-16 18:00:21 -0600 -0600 Version 1.6.0~rc7~dfsg1-1~exp1 was uploaded on 2015-04-15 19:35:46 -0600 -0600 Version 1.6.0~rc4~dfsg1-1 was uploaded on 2015-04-06 17:11:33 -0600 -0600 Version 1.5.0~dfsg1-1 was uploaded on 2015-03-10 22:58:49 -0600 -0600 Version 1.3.3~dfsg1-2 was uploaded on 2015-01-03 00:11:47 -0700 -0700 Version 1.3.3~dfsg1-1 was uploaded on 2014-12-18 21:54:12 -0700 -0700 Version 1.3.2~dfsg1-1 was uploaded on 2014-11-24 19:14:28 -0500 EST Version 1.3.1~dfsg1-2 was uploaded on 2014-11-07 13:11:34 -0700 -0700 Version 1.3.1~dfsg1-1 was uploaded on 2014-11-03 08:26:29 -0700 -0700 Version 1.3.0~dfsg1-1 was uploaded on 2014-10-17 00:56:07 -0600 -0600 Version 1.2.0~dfsg1-2 was uploaded on 2014-10-09 00:08:11 +0000 +0000 Version 1.2.0~dfsg1-1 was uploaded on 2014-09-13 11:43:17 -0600 -0600 Version 1.0.0~dfsg1-1 was uploaded on 2014-06-13 21:04:53 -0400 EDT Version 0.11.1~dfsg1-1 was uploaded on 2014-05-09 17:30:45 -0400 EDT Version 0.9.1~dfsg1-2 was uploaded on 2014-04-08 23:19:08 -0400 EDT Version 0.9.1~dfsg1-1 was uploaded on 2014-04-03 21:38:30 -0400 EDT Version 0.9.0+dfsg1-1 was uploaded on 2014-03-11 22:24:31 -0400 EDT Version 0.8.1+dfsg1-1 was uploaded on 2014-02-25 20:56:31 -0500 EST Version 0.8.0+dfsg1-2 was uploaded on 2014-02-15 17:51:58 -0500 EST Version 0.8.0+dfsg1-1 was uploaded on 2014-02-10 20:41:10 -0500 EST Version 0.7.6+dfsg1-1 was uploaded on 2014-01-22 22:50:47 -0500 EST Version 0.7.1+dfsg1-1 was uploaded on 2014-01-15 20:22:34 -0500 EST Version 0.6.7+dfsg1-3 was uploaded on 2014-01-09 20:10:20 -0500 EST Version 0.6.7+dfsg1-2 was uploaded on 2014-01-08 19:14:02 -0500 EST Version 0.6.7+dfsg1-1 was uploaded on 2014-01-07 21:06:10 -0500 EST </code></pre><h2 id="control">control</h2> Next is one of the most complex, and one of the oldest parts of <code>go-debian</code>, which is the <a href="https://godoc.org/pault.ag/go/debian/control">control file parser</a> (otherwise sometimes known as <code>deb822</code>). This module was inspired by the way that the <code>json</code> module works in Go, allowing for files to be defined in code with a <code>struct</code>. This tends to be a bit more declarative, but also winds up putting logic into struct tags, which can be a nasty anti-pattern if used too much. The first primitive in this module is the concept of a <code>Paragraph</code>, a struct containing two values, the order of keys seen, and a map of <code>string</code> to <code>string</code>. All higher order functions dealing with control files will go through this type, which is a helpful interchange format to be aware of. All parsing of meaning from the Control file happens when the Paragraph is unpacked into a struct using reflection. The idea behind this strategy that you define your struct, and let the Control parser handle unpacking the data from the IO into your container, letting you maintain type safety, since you never have to read and cast, the conversion will handle this, and return an Unmarshaling error in the event of failure. <aside class="right"> I'm starting to think parsing and defining the control structs are two different tasks and should be split apart -- or the common structs ought to be removed entirely. More on this later. </aside> Additionally, Structs that define an anonymous member of <code>control.Paragraph</code> will have the raw <code>Paragraph</code> struct of the underlying file, allowing the programmer to handle dynamic tags (such as <code>X-Foo</code>), or at least, letting them survive the round-trip through go. The default <a href="https://godoc.org/pault.ag/go/debian/control#NewDecoder">decoder</a> contains an argument, the ability to verify the input control file using an OpenPGP keyring, which is exposed to the programmer through the <code>(*Decoder).Signer()</code> function. If the passed argument is nil, it will not check the input file signature (at all!), and if it has been passed, any signed data must be found or an <code>error</code> will fall out of the <code>NewDecoder</code> call. On the way out, the opposite happens, where the struct is introspected, turned into a <code>control.Paragraph</code>, and then written out to the <code>io.Writer</code>. Here’s a quick (and VERY dirty) example showing the basics of reading and writing Debian Control files with <code>go-debian</code>. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go">package main import ( "fmt" "io" "net/http" "strings" "pault.ag/go/debian/control" ) type AllowedPackage struct { Package string Fingerprint string } func (a *AllowedPackage) UnmarshalControl(in string) error { in = strings.TrimSpace(in) chunks := strings.SplitN(in, " ", 2) if len(chunks) != 2 { return fmt.Errorf("Syntax sucks: '%s'", in) } a.Package = chunks[0] a.Fingerprint = chunks[1][1 : len(chunks[1])-1] return nil } type DMUA struct { Fingerprint string Uid string AllowedPackages []AllowedPackage `control:"Allow" delim:","` } func main() { resp, err := http.Get("https://metadata.ftp-master.debian.org/dm.txt") if err != nil { panic(err) } decoder, err := control.NewDecoder(resp.Body, nil) if err != nil { panic(err) } for { dmua := DMUA{} if err := decoder.Decode(&dmua); err != nil { if err == io.EOF { break } panic(err) } fmt.Printf("The DM %s is allowed to upload:\n", dmua.Uid) for _, allowedPackage := range dmua.AllowedPackages { fmt.Printf(" %s [granted by %s]\n", allowedPackage.Package, allowedPackage.Fingerprint) } } } </code></pre></div>Output (truncated!) looks a bit like: <pre tabindex="0"><code>... The DM Allison Randal <allison@lohutok.net> is allowed to upload: parrot [granted by A4F455C3414B10563FCC9244AFA51BD6CDE573CB] ... The DM Benjamin Barenblat <bbaren@mit.edu> is allowed to upload: boogie [granted by 3224C4469D7DF8F3D6F41A02BBC756DDBE595F6B] dafny [granted by 3224C4469D7DF8F3D6F41A02BBC756DDBE595F6B] transmission-remote-gtk [granted by 3224C4469D7DF8F3D6F41A02BBC756DDBE595F6B] urweb [granted by 3224C4469D7DF8F3D6F41A02BBC756DDBE595F6B] ... The DM أحمد المحمودي <aelmahmoudy@sabily.org> is allowed to upload: covered [granted by 41352A3B4726ACC590940097F0A98A4C4CD6E3D2] dico [granted by 6ADD5093AC6D1072C9129000B1CCD97290267086] drawtiming [granted by 41352A3B4726ACC590940097F0A98A4C4CD6E3D2] fonts-hosny-amiri [granted by BD838A2BAAF9E3408BD9646833BE1A0A8C2ED8FF] ... ... </code></pre><h2 id="deb">deb</h2> Next up, we’ve got the <code>deb</code> module. This contains code to handle reading Debian 2.0 <code>.deb</code> files. It contains a wrapper that will parse the control member, and provide the data member through the <a href="https://godoc.org/archive/tar">archive/tar</a> interface. Here’s an example of how to read a <code>.deb</code> file, access some metadata, and iterate over the <code>tar</code> archive, and print the filenames of each of the entries. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go">func main() { path := "/tmp/fluxbox_1.3.5-2+b1_amd64.deb" fd, err := os.Open(path) if err != nil { panic(err) } defer fd.Close() debFile, err := deb.Load(fd, path) if err != nil { panic(err) } version := debFile.Control.Version fmt.Printf( "Epoch: %d, Version: %s, Revision: %s\n", version.Epoch, version.Version, version.Revision, ) for { hdr, err := debFile.Data.Next() if err == io.EOF { break } if err != nil { panic(err) } fmt.Printf(" -> %s\n", hdr.Name) } } </code></pre></div>Boringly, the output looks like: <pre tabindex="0"><code>Epoch: 0, Version: 1.3.5, Revision: 2+b1 -> ./ -> ./etc/ -> ./etc/menu-methods/ -> ./etc/menu-methods/fluxbox -> ./etc/X11/ -> ./etc/X11/fluxbox/ -> ./etc/X11/fluxbox/window.menu -> ./etc/X11/fluxbox/fluxbox.menu-user -> ./etc/X11/fluxbox/keys -> ./etc/X11/fluxbox/init -> ./etc/X11/fluxbox/system.fluxbox-menu -> ./etc/X11/fluxbox/overlay -> ./etc/X11/fluxbox/apps -> ./usr/ -> ./usr/share/ -> ./usr/share/man/ -> ./usr/share/man/man5/ -> ./usr/share/man/man5/fluxbox-style.5.gz -> ./usr/share/man/man5/fluxbox-menu.5.gz -> ./usr/share/man/man5/fluxbox-apps.5.gz -> ./usr/share/man/man5/fluxbox-keys.5.gz -> ./usr/share/man/man1/ -> ./usr/share/man/man1/startfluxbox.1.gz ... </code></pre><h2 id="dependency">dependency</h2> The <code>dependency</code> package provides an interface to parse and compute dependencies. This package is a bit odd in that, well, there’s no other library that does this. The issue is that there are actually two different parsers that compute our Dependency lines, one in Perl (as part of <code>dpkg-dev</code>) and another in C (in <code>dpkg</code>). <aside class="left"> I have yet to track it down, but it's shockingly likely that `apt` has another in `C++`, and maybe another in `aptitude`. I don't know this for a fact, so I'll assume nothing </aside> To date, this has resulted in me filing <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=816473">three</a> <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=784808">different</a> <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=784806">bugs</a>. I also found a broken package in the <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=816741">archive</a>, which actually resulted in another bug being (totally accidentally) <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815478">already fixed</a>. I hope to continue to run the archive through my parser in hopes of finding more bugs! This package is a bit complex, but it basically just returns what amounts to be an <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">AST</a> for our Dependency lines. I’m positive there are bugs, so file them! <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go">func main() { dep, err := dependency.Parse("foo | bar, baz, foobar [amd64] | bazfoo [!sparc], fnord:armhf [gnu-linux-sparc]") if err != nil { panic(err) } anySparc, err := dependency.ParseArch("sparc") if err != nil { panic(err) } for _, possi := range dep.GetPossibilities(*anySparc) { fmt.Printf("%s (%s)\n", possi.Name, possi.Arch) } } </code></pre></div>Gives the output: <pre tabindex="0"><code>foo (<nil>) baz (<nil>) fnord (armhf) </code></pre><h2 id="version">version</h2> Right off the bat, I’d like to thank <a href="https://twitter.com/zekjur">Michael Stapelberg</a> for letting me graft this out of <a href="https://github.com/debian/dcs">dcs</a> and into the <code>go-debian</code> package. This was nearly entirely his work (with a one or two line function I added later), and was amazingly helpful to have. Thank you! This module implements Debian version comparisons and parsing, allowing for sorting in lists, checking to see if it’s native or not, and letting the programmer to implement smart(er!) logic based on upstream (or Debian) version numbers. This module is extremely easy to use and very straightforward, and not worth writing an example for. <h1 id="final-thoughts">Final thoughts</h1> This is more of a “Yeah, OK, this has been useful enough to me at this point that I’m going to support this” rather than a “It’s stable!” or even “It’s alive!” post. Hopefully folks can report bugs and help iterate on this module until we have some really clean building blocks to build solid higher level systems on top of. Being able to have multiple libraries interoperate by relying on <code>go-debian</code> will be a massive ease. I’m in need of more documentation, and to finalize some parts of the older sub package APIs, but I’m hoping to be at a “1.0” real soon now.It's all relativehttps://notes.pault.ag/its-all-relative/Fri, 10 Jun 2016 23:45:00 -0500https://notes.pault.ag/its-all-relative/As nearly anyone who’s worked with me will attest to, I’ve long since touted <a href="https://nedbatchelder.com">nedbat’s</a> talk <a href="https://nedbatchelder.com/text/unipain.html">Pragmatic Unicode, or, How do I stop the pain?</a> as one of the most foundational talks, and required watching for all programmers. The reason is because netbat hits on something bigger - something more fundamental than how to handle Unicode – it’s how to handle data which is relative. For those who want the TL;DR, the argument is as follows: Facts of Life: <ol> <li>Computers work with Bytes. Bytes go in, Bytes go out.</li> <li>The world needs more than 256 symbols.</li> <li>You need both Bytes and Unicode</li> <li>You cannot infer the encoding of bytes.</li> <li>Declared encodings can be Wrong</li> </ol> Now, to fix it, the following protips: <ol> <li><a href="https://nedbatchelder.com/text/unipain/unipain.html#35">Unicode sandwich</a></li> <li>Know what you have</li> <li>TEST</li> </ol> <h2 id="relative-data">Relative Data</h2> I’ve started to think more about why we do the things we do when we write code, and one thing that continues to be a source of morbid schadenfreude is watching code break by failing to handle Unicode right. It’s hard! However, watching what breaks lets you gain a bit of insight into how the author thinks, and what assumptions they make. When you send someone Unicode, there are a lot of assumptions that have to be made. Your computer has to trust what you (yes, you!) entered into your web browser, your web browser has to pass that on over the network (most of the time without encoding information), to a server which reads that bytestream, and makes a wild guess at what it should be. That server might save it to a database, and interpolate it into an HTML template in a different encoding (called <a href="https://simple.wikipedia.org/wiki/Mojibake">Mojibake</a>), resulting in a bad time for everyone involved. Everything’s awful, and the fact our computers can continue to display text to us is a goddamn miracle. Never forget that. When it comes down to it, when I see a byte sitting on a page, I don’t know (and can’t know!) if it’s <code>Windows-1252</code>, <code>UTF-8</code>, <code>Latin-1</code>, or <code>EBCDIC</code>. What’s a poem to me is terminal garbage to you. Over the years, hacks have evolved. We have <a href="https://en.wikipedia.org/wiki/Magic_number_(programming)">magic numbers</a>, and plain ole’ hacks to just guess based on the content. Of course, like all good computer programs, this has lead to its fair share of hilarious <a href="https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161/comments/28">bugs</a>, and there’s nothing stopping files from (validly!) being multiple things at the same time. Like many things, it’s all in the eye of the beholder. <h2 id="timezones">Timezones</h2> Just like Unicode, this is a word that can put your friendly neighborhood programmer into a series of profanity laden tirades. Go find one in the wild, and ask them about what they think about timezone handling bugs they’ve seen. I’ll wait. Go ahead. Rants are funny things. They’re fun to watch. Hilarious to give. Sometimes just getting it all out can help. They can tell you a lot about the true nature of problems. It’s funny to consider the isomorphic nature of Unicode rants and Timezone rants. I don’t think this is an accident. <h2 id="unicode-timezone-sandwich">U̶n̶i̶c̶o̶d̶e̶ timezone Sandwich</h2> Ned’s Unicode Sandwich applies – As early as we can, in the lowest level we can (reading from the database, filesystem, wherever!), all datetimes must be timezone qualified with their correct timezone. Always. If you mean UTC, say it’s in UTC. Treat any unqualified datetimes as “bytes”. They’re not to be trusted. <a href="https://youtu.be/W7wpzKvNhfA?t=3m18s">Never, never, never trust ’em</a>. Don’t process any datetimes until you’re sure they’re in the right timezone. This lets the delicious inside of your datetime sandwich handle timezones with grace, and finally, as late as you can, turn it back into bytes (if at all!). Treat locations as <code>tzdb</code> entries, and qualify datetime objects into their absolute timezone (<code>EST</code>, <code>EDT</code>, <code>PST</code>, <code>PDT</code>) It’s not until you want to show the datetime to the user again should you consider how to re-encode your datetime to bytes. You should think about what flavor of bytes, what encoding – what timezone – should I be encoding into? <h2 id="test">TEST</h2> Just like Unicode, testing that your code works with datetimes is important. Every time I think about how to go about doing this, I think about that one time that <a href="https://mjg59.dreamwidth.org/">mjg59</a> couldn’t book a flight starting Tuesday from AKL, landing in HNL on Monday night, because United couldn’t book the last leg to SFO. Do you ever assume dates only go forward as time goes on? Remember timezones. Construct test data, make sure someone in New Zealand’s <a href="https://en.wikipedia.org/wiki/UTC%2B13:45">+13:45</a> can correctly talk with their friends in Baker Island’s <a href="https://en.wikipedia.org/wiki/UTC%E2%88%9212:00">-12:00</a>, and that the events sort right. Just because it’s Noon on New Years Eve in England doesn’t mean it’s not 1 AM the next year in New Zealand. Places a few miles apart may go on Daylight savings different days. Indian Standard Time is not even aligned on the hour to GMT (<code>+05:30</code>)! Test early, and test often. Memorize a few timezones, and challenge your assumptions when writing code that has to do with time. Don’t use wall clocks to mean monotonic time. Remember there’s a whole world out there, and we only deal with part of it. It’s also worth remembering, as <a href="https://twitter.com/andrewindc">Andrew Pendleton</a> pointed out to me, that it’s possible that a datetime isn’t even unique for a place, since you can never know if <code>2016-11-06 01:00:00</code> in <code>America/New_York</code> (in the <code>tzdb</code>) is the first one, or second one. Storing <code>EST</code> or <code>EDT</code> along with your datetime may help, though! <h2 id="pitfalls">Pitfalls</h2> Improper handling of timezones can lead to some interesting things, and failing to be explicit (or at least, very rigid) in what you expect will lead to an unholy class of bugs we’ve all come to hate. At best, you have confused users doing math, at worst, someone misses a critical event, or our security code fails. I recently found what I regard to be a pretty bad <a href="https://bugs.debian.org/819697">bug in apt</a> (which David has prepared a <a href="https://anonscm.debian.org/cgit/apt/apt.git/diff/?id=9febc2b">fix</a> for and is pending upload, yay! Thank you!), which boiled down to documentation and code expecting datetimes in a timezone, but accepting any timezone, and silently treating it as <code>UTC</code>. The solution is to hard-fail, which is an interesting choice to me (as a vocal fan of timezone aware code), but at the least it won’t fail by misunderstanding what the server is trying to communicate, and I do understand and empathize with the situation the <code>apt</code> maintainers are in. <h2 id="final-thoughts">Final Thoughts</h2> Overall, my main point is although most modern developers know how to deal with Unicode pain, I think there is a more general lesson to learn – namely, you should always know what data you have, and always remember what it is. Understand assumptions as early as you can, and always store them with the data.Docker PostgreSQL Foreign Data Wrapperhttps://notes.pault.ag/dockerfdw/Thu, 18 Sep 2014 21:49:00 -0500https://notes.pault.ag/dockerfdw/For the tl;dr: <a href="https://github.com/paultag/dockerfdw">Docker FDW</a> is a thing. Star it, hack it, try it out. File bugs, be happy. If you want to see what it’s like to read, there’s some example SQL down below. <aside class="left"> This post was edited on Sep 21st to add information about the <code>DELETE</code> and <code>INSERT</code> operators </aside> The question is first, what the heck is a PostgreSQL Foreign Data Wrapper? PostgreSQL Foreign Data Wrappers are plugins that allow C libraries to provide an adaptor for PostgreSQL to talk to an external database. Some folks have used this to wrap stuff like <a href="https://github.com/citusdata/mongo_fdw">MongoDB</a>, which I always found to be hilarous (and an epic hack). <h1 id="enter-multicorn">Enter Multicorn</h1> During my time at <a href="https://pygotham.org/">PyGotham</a>, I saw a talk from <a href="https://twitter.com/weschow">Wes Chow</a> about something called <a href="https://multicorn.org/">Multicorn</a>. He was showing off some really neat plugins, such as the git revision history of CPython, and parsed logfiles from some stuff over at Chartbeat. This basically blew my mind. <aside class="right"> If you're interested in some of these, there are a bunch in the Multicorn VCS repo, such as the <a href="https://github.com/Kozea/Multicorn/blob/master/python/multicorn/gitfdw.py">gitfdw</a> example. </aside> All throughout the talk I was coming up with all sorts of things that I wanted to do – this whole library is basically exactly what I’ve been dreaming about for years. I’ve always wanted to provide a SQL-like interface into querying API data, joining data cross-API using common crosswalks, such as using <a href="https://capitolwords.org/">Capitol Words</a> to query for Legislators, and use the <a href="https://bioguide.congress.gov/biosearch/biosearch.asp">bioguide ids</a> to <code>JOIN</code> against the <a href="https://sunlightlabs.github.io/congress/">congress api</a> to get their Twitter account names. My first shot was to Multicorn the new <a href="https://opencivicdata.org/">Open Civic Data</a> API I was working on, chuckled and put it aside as a really awesome hack. <h1 id="enter-docker">Enter Docker</h1> It wasn’t until <a href="https://github.com/tianon">tianon</a> connected the dots for me and suggested a <a href="https://docker.io/">Docker</a> FDW did I get really excited. Cue a few hours of hacking, and I’m proud to say – here’s <a href="https://github.com/paultag/dockerfdw">Docker FDW</a>. This lets us ask all sorts of really interesting questions out of the API, and might even help folks writing webapps avoid adding too much Docker-aware logic. Abstractions can be fun! <h1 id="setting-it-up">Setting it up</h1> <aside class="left"> The only stumbling block you might find (at least on Debian and Ubuntu) is that you'll need a Multicorn `.deb`. It's currently undergoing an official Debianization from the Postgres team, but in the meantime I put the source and binary up on my <a href="https://people.debian.org/~paultag/tmp/">people.debian.org</a>. Feel free to use that while the Debian PostgreSQL team prepares the upload to unstable. </aside> I’m going to assume you have a working Multicorn, PostgreSQL and Docker setup (including adding the <code>postgres</code> user to the <code>docker</code> group) So, now let’s pop open a <code>psql</code> session. Create a database (I called mine <code>dockerfdw</code>, but it can be anything), and let’s create some tables. Before we create the tables, we need to let PostgreSQL know where our objects are. This takes a name for the <code>server</code>, and the <code>Python</code> importable path to our FDW. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql">CREATE SERVER docker_containers FOREIGN DATA WRAPPER multicorn options ( wrapper 'dockerfdw.wrappers.containers.ContainerFdw'); CREATE SERVER docker_image FOREIGN DATA WRAPPER multicorn options ( wrapper 'dockerfdw.wrappers.images.ImageFdw'); </code></pre></div>Now that we have the server in place, we can tell PostgreSQL to create a table backed by the FDW by creating a foreign table. I won’t go too much into the syntax here, but you might also note that we pass in some options - these are passed to the constructor of the FDW, letting us set stuff like the Docker host. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql">CREATE foreign table docker_containers ( "id" TEXT, "image" TEXT, "name" TEXT, "names" TEXT[], "privileged" BOOLEAN, "ip" TEXT, "bridge" TEXT, "running" BOOLEAN, "pid" INT, "exit_code" INT, "command" TEXT[] ) server docker_containers options ( host 'unix:///run/docker.sock' ); CREATE foreign table docker_images ( "id" TEXT, "architecture" TEXT, "author" TEXT, "comment" TEXT, "parent" TEXT, "tags" TEXT[] ) server docker_image options ( host 'unix:///run/docker.sock' ); </code></pre></div>And, now that we have tables in place, we can try to learn something about the Docker containers. Let’s start with something fun - a join from containers to images, showing all image tag names, the container names and the ip of the container (if it has one!). <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql">SELECT docker_containers.ip, docker_containers.names, docker_images.tags FROM docker_containers RIGHT JOIN docker_images ON docker_containers.image=docker_images.id; </code></pre></div><pre tabindex="0"><code> ip | names | tags -------------+-----------------------------+----------------------------------------- | | {ruby:latest} | | {paultag/vcs-mirror:latest} | {/de-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} | {/ny-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} | {/ar-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} 172.17.0.47 | {/ms-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} 172.17.0.46 | {/nc-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} | {/ia-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} | {/az-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} | {/oh-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} | {/va-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} 172.17.0.41 | {/wa-openstates-to-ocd} | {sunlightlabs/scrapers-us-state:latest} | {/jovial_poincare} | {<none>:<none>} | {/jolly_goldstine} | {<none>:<none>} | {/cranky_torvalds} | {<none>:<none>} | {/backstabbing_wilson} | {<none>:<none>} | {/desperate_hoover} | {<none>:<none>} | {/backstabbing_ardinghelli} | {<none>:<none>} | {/cocky_feynman} | {<none>:<none>} | | {paultag/postgres:latest} | | {debian:testing} | | {paultag/crank:latest} | | {<none>:<none>} | | {<none>:<none>} | {/stupefied_fermat} | {hackerschool/doorbot:latest} | {/focused_euclid} | {debian:unstable} | {/focused_babbage} | {debian:unstable} | {/clever_torvalds} | {debian:unstable} | {/stoic_tesla} | {debian:unstable} | {/evil_torvalds} | {debian:unstable} | {/foo} | {debian:unstable} (31 rows) </code></pre>OK, let’s see if we can bring this to the next level now. I finally got around to implementing <code>INSERT</code> and <code>DELETE</code> operations, which turned out to be pretty simple to do. Check this out: <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql">DELETE FROM docker_containers; </code></pre></div><pre tabindex="0"><code>DELETE 1 </code></pre>This will do a <code>stop</code> + <code>kill</code> after a 10 second hang behind the scenes. It’s actually a lot of fun to spawn up a container and terminate it from <code>PostgreSQL</code>. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql">INSERT INTO docker_containers (name, image) VALUES ('hello', 'debian:unstable') RETURNING id; </code></pre></div><pre tabindex="0"><code> id ------------------------------------------------------------------ 0a903dcf5ae10ee1923064e25ab0f46e0debd513f54860beb44b2a187643ff05 INSERT 0 1 (1 row) </code></pre>Spawning containers works too - this is still very immature and not super practical, but I figure while I’m showing off, I might as well go all the way. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql">SELECT ip FROM docker_containers WHERE id='0a903dcf5ae10ee1923064e25ab0f46e0debd513f54860beb44b2a187643ff05'; </code></pre></div><pre tabindex="0"><code> ip ------------- 172.17.0.12 (1 row) </code></pre>Success! This is just a taste of what’s to come, so please feel free to hack on <a href="https://github.com/paultag/dockerfdw">Docker FDW</a>, tweet me <a href="https://twitter.com/paultag">@paultag</a>, file bugs / feature requests. It’s currently a bit of a hack, and it’s something that I think has long-term potential after some work goes into making sure that this is a rock solid interface to the Docker API.Linode pv-grub chaininghttps://notes.pault.ag/linode-pv-grub-chainning/Sat, 14 Jun 2014 21:40:00 -0500https://notes.pault.ag/linode-pv-grub-chainning/I’ve been using <a href="https://linode.com">Linode</a> since 2010, and many of my friends have heard me talk about how big a fan I am of linode. I’ve used Debian unstable on all my Linodes, since I often use them as a remote shell for general purpose Debian development. I’ve found my linodes to be indispensable, and I really love Linode. <h1 id="the-problem">The Problem</h1> Recently, because of my work on <a href="https://docker.io/">Docker</a>, I was forced to stop using the Linode kernel in favor of the stock Debian kernel, since the stock Linode kernel has no aufs support, and the default LVM-based devicemapper backend can be quite a pain. <aside class="left"> The btrfs errors are ones I fully expect to be gone soon, I can't wait to switch back to using it. </aside> I tried loading in <a href="https://en.wikipedia.org/wiki/Btrfs">btrfs</a> support, and using that to host the Docker instance backed with btrfs, but it was throwing errors as well. Stuck with unstable backends, I wanted to use the <a href="https://en.wikipedia.org/wiki/Aufs">aufs</a> backend, which, despite problems in aufs internally, is quite stable with Docker (and in general). I started to run through the <a href="https://library.linode.com/custom-instances/pv-grub-howto">Linode Library’s guide on PV-Grub</a>, but that resulted in a cryptic error with xen not understanding the compression of the kernel. I checked for recent changes to the compression, and lo, the Debian kernel has been switched to use xz compression in sid. Awesome news, really. XZ compression is awesome, and I’ve been super impressed with how universally we’ve adopted it in Debian. Keep it up! However, it appears only a newer pv-grub than the Linode hosts have installed will fix this. After contacting the (ever friendly) Linode support, they were unable to give me a timeline on adding xz support, which would entail upgrading pv-grub. It was quite disappointing news, to be honest. Workarounds were suggested, but I’m not quite happy with them as proper solutions. After asking in <code>#debian-kernel</code>, <a href="https://bblank.thinkmo.de/blog">waldi</a> was able to give me a few pointers, and the following is very inspired by him, the only thing that changed much was config tweaking, which was easy enough. Thanks, Bastian! <h1 id="the-constraints">The Constraints</h1> I wanted to maintain a 100% stock configuration from the kernel up. When I upgraded my kernel, I wanted to just work. I didn’t want to unpack and repack the kernel, and I didn’t want to install software outside main on my system. It had to be 100% Debian and unmodified. <h1 id="the-solution">The Solution</h1> <aside class="right"> It's pretty fun to attach to the lish console and watch bootup pass through GRUB 0.9, to GRUB 2.x to Linux. Free Software, Fuck Yeah. </aside> Left unable to run my own kernel directly in the Linode interface, the tact here was to use Linode’s old pv-grub to chain-load grub-xen, which loaded a modern kernel. Turns out this works great. Let’s start by creating a config for Linode’s pv-grub to read and use. <pre> sudo mkdir -p /boot/grub/ </pre> Now, since pv-grub is legacy grub, we can write out the following config to chain-load in <code>grub-xen</code> (which is just Grub 2.0, as far as I can tell) to <code>/boot/grub/menu.lst</code>. And to think, I almost forgot all about <code>menu.lst</code>. Almost. <pre> default 1 timeout 3 title grub-xen shim root (hd0) kernel /boot/xen-shim boot </pre> Just like riding a bike! Now, let’s install and set up grub-xen to work for us. <pre> sudo apt-get install grub-xen sudo update-grub </pre> And, let’s set the config for the GRUB image we’ll create in the next step in the <code>/boot/load.cf</code> file: <pre> configfile (xen/xvda)/boot/grub/grub.cfg </pre> Now, lastly, let’s generate the <code>/boot/xen-shim</code> file that we need to boot to: <pre> grub-mkimage --prefix '(xen/xvda)/boot/grub' -c /boot/load.cf -O x86_64-xen /usr/lib/grub/x86_64-xen/*.mod > /boot/xen-shim </pre> Next, change your boot configuration to use <code>pv-grub</code>, and give the machine a kick. Should work great! If you run into issues, use the lish shell to debug it, and let me know what else I should include in this post! Hack on!Hy at PyCon 2014https://notes.pault.ag/hy-pycon-2014/Fri, 18 Apr 2014 20:13:00 -0500https://notes.pault.ag/hy-pycon-2014/I gave a talk this year at <a href="https://us.pycon.org/2014/">PyCon 2014</a>, about one of my favorite subjects: <a href="https://hylang.org/">Hy</a>. Many of my regular readers will have no doubt explored Hy’s thriving <a href="https://github.com/hylang">GitHub org</a>, played with <a href="https://try-hy.appspot.com/">try-hy</a>, or even installed it locally by <a href="https://pypi.python.org/pypi/hy">pip installing it</a>. I was lucky enough to be able to attend PyCon on behalf of <a href="https://sunlightfoundation.com/">Sunlight</a>, with a solid contingent of my colleagues. We put together a writeup on the <a href="https://sunlightfoundation.com/blog/2014/04/18/sunlight-at-pycon-2014/">Sunlight blog</a> if anyone was interested in our favorite talks. <div class="video" style="width: 500px; height: 340px; margin: 0 auto;"> <iframe width="500" height="315" src="//www.youtube.com/embed/AmMaN1AokTI" frameborder="0" allowfullscreen></iframe> </div> Tons of really amazing questions, and such an amazingly warm reception from so many of my peers throughout this year’s PyCon. Thank you so much to everyone that attended the talk. As always, you should <a href="https://github.com/hylang/hy">Fork Hy on GitHub</a>, follow <a href="https://twitter.com/hylang">@hylang</a> on the twitters, and send in any bugs you find! Hopefully I’ll be able to put my talk up in blog-post form soon, but until then feel free to look over the <a href="https://slides.pault.ag/hy.html">slides</a> or just <a href="https://www.youtube.com/watch?v=AmMaN1AokTI">watch the talk</a>. An extra shout-out to <a href="https://twitter.com/akaptur">@akaptur</a> for hacking on Hy during the sprints, and giving the exception system <a href="https://github.com/hylang/hy/pull/556">quite the workthrough</a>. Thanks, Allison!Musings about Debian and Pythonhttps://notes.pault.ag/debian-python/Sat, 21 Sep 2013 22:49:00 -0500https://notes.pault.ag/debian-python/On a regular basis, I find myself the odd-man-out when it comes to talking about how to work with Python on Debian systems. I’m going to write this and post it so that I might be able to point people at my thoughts without having to write the same email in response to each thread that pops up. Turns out I don’t fit in with the Debian hardliners (which is to say, the mindset that <code>pip</code> sucks and shouldn’t exist), nor do I fit in with the Python hardliners (which is to say <code>apt</code> and <code>dpkg</code> are out of date, and neither have a place on a Development machine). I think our discourse on this topic has become petty and stupid in general. Let’s all try to step back and drop a bit of the attitude. <h1 id="pip-doesnt-suck-and-neither-does-apt"><code>pip</code> doesn’t suck, and neither does <code>apt</code>.</h1> The truth is, both sides are wrong. As with any subject, the real answer here is much more nuanced than either side presents it. I’m going to try and present my opinion on this, in the way that both my Pythonista self and my Debianite self see the issue. Hopefully I can keep this short, to the point, and caked with logic. <h2 id="the-case-for-dpkg-the-debianite-in-me">The case for <code>dpkg</code> (the Debianite in me)</h2> In defense of <code>dpkg</code> and <code>apt</code>, imagine having to install <code>python-gnome2</code> on all your systems when you install. It’d be hell on earth. Imagine having a user try to do this. It’s insane to assume that end-users will be using <code>pip</code> for this purpose. <code>pip</code> is fun and all, but it’s also installing 100% untrusted code to your system (perhaps as root, if you’re using <code>pip</code> with <code>sudo</code> for some reason), and hasn’t been reviewed for software freeness, which is something Debian (and Debian users) take seriously. This isn’t even to mention the hell that <code>pip</code> wreaks on <code>dpkg</code> controlled files / packages. <aside class="left"> Remember, Debian spends a lot of time and effort into ensuring software is <a href="https://www.debian.org/social_contract#guidelines" >DFSG</a> free, and safe. </aside> Try to remember how much of your system running (yes, right now) is running because of Python or Python modules. Try to imagine how much of a pain in the ass it’d be if you couldn’t boot into <code>GNOME</code> to use <code>nm-applet</code> to connect to wifi to <code>pip</code> install something. I’m sure even the most extreme pip’er understands the need for Operating System level package management. Debian also has a bigger problem scope - we’re not maintaining a library in Debian for kicks, we’re maintaining it so that end user applications may use the library. When we update something like <code>Django</code>, we have to make sure that we don’t break anything using it (although, to be honest, the fact that we package webapps is an entire rant for later) before we get to update it to the newest release. Hell, with a few coffees, I could automate the process of releasing a <code>.deb</code> with a new upstream release, 100% unattended. I won’t, however, since this is an insane idea. Let’s go over a brief list of things I do before uploading a new package: <ol> <li>Review the entire codebase for simple mistakes.</li> <li>Review the entire codebase for license issues.</li> <li>Review the entire codebase for files without source, and track down (and include source for) any sourceless files (such as <code>pickle</code> files, etc).</li> <li>Get to know the upstream, get to know open bugs, write something using the lib, in case I need to debug later.</li> <li>Install the package.</li> <li>Test the package.</li> <li>Work out any Debian package issues (this is easy).</li> </ol> Now, a brief list of things I do before I update a package: <aside class="right"> Some non-Debian people may call this anal. I disagree, since this is important to ensure we have source for all files. In addition, it's trivial to take the next step and ensure things are roughly safe. </aside> <ol> <li>Review the changes between the last uploaded version (in diff format, if it’s sane, otherwise get the VCS and review), ensure all the above are still OK.</li> <li>Review for Debian-local issues (such as how it will upgrade, using <code>piuparts</code>, and <code>adequate</code>, etc).</li> <li>Check to make sure it won’t break any reverse dependencies.</li> <li>Review for bugfixes that I might need to bring back to the <code>stable</code> release.</li> <li>Figure out if I should (or even can) backport the package, if API is stable.</li> <li>Review for bugs (upstream or in Debian) that I need to mention in the debian/changelog.</li> </ol> Clearly, this isn’t a quick-and-dirty task. It’s not a matter of getting a package updated (technically), it’s a much more detailed process than that. This is also why Debian is so highly regarded for its technical virtuosity, and why the <a href="https://training.linuxfoundation.org/why-our-linux-training/training-reviews/linux-foundation-training-prepares-the-international-space-station-for-linux-migration">ISS decided to deploy Debian in space</a>, despite other commercial distros such as <code>Red Hat</code>, or <code>Ubuntu</code>, and community distros, such as <code>Fedora</code> or <code>Arch</code>. <aside class="left"> Cheap shot, I know. </aside> It’s also not Debian’s job to package the world in the archive. This is an insane task, and it’s not Debian’s place to do it. We introduce libraries as things need them, not because we wrote some new library that someone may find slightly useful at some point in the future. maybe. Upstream developers and language communities (not only Python here) tend to lose sight of why we’re doing this in the first place, which is our users. This isn’t some sort of technical pissing contest to see who can distribute the software in the best way. Debian-folk always keep end users as our highest priority. <aside class="right"> I'm sorry to any <a href="https://lists.debian.org/20100106100055.GV3438@radis.liafa.jussieu.fr" >kittens that may have been harmed by this statement</a>. </aside> I quote the <a href="https://www.debian.org/social_contract">Debian Social Contract</a>, when I say that Our priorities are our users and free software. No one’s trying to get developers to use <code>dpkg</code> to create software. In fact, as you’ll see below, I actively discourage using system modules for development. <h2 id="the-case-for-pip-the-pythonista-in-me">The case for <code>pip</code> (the Pythonista in me)</h2> In defense of <code>pip</code>, the idea that Debian will keep the latest versions of packages is insane. The idea that we can keep pace with upstream releases is nuts, and the idea that every upstream release on <code>pypi</code> is ready to ship is bananas. <a href="https://youtu.be/gZHjRQjbHrE?t=2m30s">b-a-n-a-n-a-n-a-s</a>. As a developer, I don’t want to support every release, and I surely don’t want other people depending on some random snapshot. <aside class="right"> In fact, I have a very hard time saying anything but "try upgrading first" when I get a bug report on a side-project. It's tough to remember some edge-case from 2 years ago if this code is tightly coupled with another codebase. </aside> Often times, I’ll put stuff up on <code>pypi</code> as a preview, or to release often, and solicit feedback without having to give out instructions on using a <code>git</code> checkout (it’s also easier to have them try a version from <code>pypi</code> so I can cross-ref the git tag to reproduce issues when they file them) <aside class="left"> Even Debian tools I write, like <a href="https://pypi.python.org/pypi/schroot">python-schroot</a> are released to <code>pypi</code> first, and I treat that as the upstream location when packaging it in Debian. </aside> <code>pypi</code> is easy, ubiquitous and works regardless of the platform, which means less of my development time is spent packaging stuff up for platforms I don’t really care about (<code>Arch</code>, <code>Fedora</code>, <code>OSX</code>, <code>Windows</code>), even though I value feedback from users on those systems. The effort it takes to release something is limited to <code>python setup.py sdist upload</code>, and it’s in a place (and in a shape) that anyone can use it without having 10 sets of platform-local instructions. Even ignoring all the above, when I’m writing a new app or bit of code, I want to be sure I’m targeting the latest version of the code I depend on, so that future changes to API won’t hit me as hard. By following along with my dependencies’ development, I can be sure that my code breaks early, and breaks in development, not production. Upstreams also tend to not like bug reports against old branches, so ensuring I have the latest code from <code>pypi</code> means I can properly file bugs. Lastly, I prefer <code>virtualenv</code> based setups for development, since I’m usually working on many things at once. This often means version mismatches in libraries, which brings in API changes (another whole rant here as well). I don’t want to keep installing and uninstalling packages to switch between the two projects, and using a <code>chroot(8)</code> means a lot of overhead and that it’s disconnected from my development environment / filesystem, so I resort to <code>virtualenv</code> to isolate my Development environment. <h1 id="final-notes">Final notes</h1> <aside class="right"> I love apt, I love pip, why can't you? </aside> I don’t want to keep arguing about this. Just accept that the world’s a big place and that there exist use-cases that both <code>apt</code> and <code>pip</code> need to exist and work in the way they’re working now. At the very least, try and understand there exist smart people on both sides, and no one is trying to screw anyone over or keep their own little private club to themselves. Hopefully, going forward, we can make sure that the integration between these two tools gets better, not worse. Help make this dream a reality. Contribute to a productive tone, not a destructive one. In short: <ul> <li>Use <code>pip</code> without <code>sudo</code> always. Don’t tell people to use <code>sudo</code>.</li> <li>Use <code>apt</code> or <code>dpkg</code> when deploying system-wide.</li> <li>Understand people are going to package, and they will be more concerned about software using your library then keeping your library up to date.</li> <li>Understand Debian Developers and package maintainers have to do a lot of work when updating or sponsoring an upload.</li> <li>Understand upstream developers can’t be bothered to fix every issue with every release (release early, release often) with some snapshot you introduced into unstable.</li> <li>Use <code>pip</code> and <code>virtualenv</code> in development setups, so we can upgrade your app when we upgrade the lib.</li> </ul>Hy: The survival guidehttps://notes.pault.ag/hy-survival-guide/Fri, 02 Aug 2013 23:19:00 -0500https://notes.pault.ag/hy-survival-guide/One of my new favorite languages is a peppy little <a href="https://en.wikipedia.org/wiki/Lisp">lisp</a> called <a href="https://hylang.org">hy</a>. I like it a lot since it’s a result of a hilarious idea I had while talking with some coworkers over Mexican food. Since I’m the most experienced <a href="https://github.com/hylang?tab=members">Hypster</a> on the planet, I figured I should write a survival guide. This will go a lot easier if you already know Lisp, but you can get away with quite a bit of Python. <h1 id="the-tao-of-hy">The Tao of Hy</h1> We don’t have many rules (yet), but we do have quite a bit of philosophy. The collective Hyve Mind has spent quite a bit of time working out Hy’s internals, and we do spend a bit of time looking at how the language “feels”. The following is a brief list of some of the design decisions we’ve picked out. <ol> <li>Look like a lisp, <code>DTRT</code> with it (e.g. dashes turn to underscores, earmuffs turn to all-caps.)</li> <li>We’re still Python. Most of the internals translate 1:1 to Python internals.</li> <li>Use unicode everywhere.</li> <li>Tests or it doesn’t exist.</li> <li>Fix the bad decisions in Python 2 when we can (see <code>true_division</code>)</li> <li>When in doubt, defer to Python.</li> <li>If you’re still unsure, defer to Clojure</li> <li>If you’re even more unsure, defer to Common Lisp</li> <li>Keep in mind we’re not Clojure. We’re not Common Lisp. We’re Homoiconic Python, with extra bits that make sense.</li> </ol> Naturally, this doesn’t cover everything, but if you can drop into that mindset, things start to make quite a bit of sense. <h1 id="the-style-of-hy">The Style of Hy</h1> Although I am perhaps the least qualified person to do so (I still don’t write idiomatic Lisp all the time), I’m going to set up a few ground-rules when it comes to idiomatic Hy code. We borrow quite a bit of syntax from Common Lisp and Clojure, so again, feel free to defer to either if you’re not working on Hy internals. I prefer the <a href="https://github.com/bbatsov/clojure-style-guide">Clojure Style Guidelines</a> myself. As such, these are what we will defer to in the case that the Hy style is undefined. <h2 id="clojure-isms">Clojure-isms</h2> Hy has quite a few Clojure-isms that I rather prefer, such as the threading macro, and dot-notation (for accessing methods on an Object), which I would rather see used throughout the hylands. <pre><code>:::clojure ;; good: (with [fd (open "/etc/passwd")] (print (.readlines fd))) ;; bad: (with [fd (open "/etc/passwd")] (print (fd.readlines))) </code></pre> Some <a href="https://dustycloud.org/">other hy devs</a> very much disagree, and there’s nothing syntactically invalid about the latter, and it will continue to be supported (in fact, it makes some things easier!), but it will not be considered for Hy internal code. We also very much encourage use of the <code>threading macro</code> throughout code where it makes sense. <pre><code>:::clojure ;; good: (import [sh [cat grep]]) (-> (cat "/usr/share/dict/words") (grep "-E" "tag$")) ;; bad: (import [sh [cat grep]]) (grep (cat "/usr/share/dict/words") "-E" "tag$") </code></pre> However, do use it when it helps aid in clarity, like all things, there are cases where it makes a mess out of something that ought to not be futzed with. <h2 id="python-isms">Python-isms</h2> In addition to stealing quite a bit of syntax from Clojure, I’m going to take a few Python rules from PEP8 that apply to Hy as well. These are taken because PEP8 is a really great set of rules, and Hy code ends up pretty, well, Pythonic. The following are a collection of Pythonic rules that explicitly apply to Hy code. Trailing spaces is a huge one. Never ever ever shall it be OK to have trailing spaces on internal Hy code. For they suck. As with Python, you shall always double-space module-level definitions if separated with a newline. All public functions must always contain docstrings. Inline comments shall be two spaces from the end of the code, if they are inline comments. They must always have a space between the comment character and the start of the comment. <h2 id="hy-isms">Hy-isms</h2> Indentation shall be two spaces, except where matching the indentation of the previous line. <pre> ;; good (and preferred): (defn fib [n] (if (<= n 2) n (+ (fib (- n 1)) (fib (- n 2))))) ;; still OK: (defn fib [n] (if (<= n 2) n (+ (fib (- n 1)) (fib (- n 2))))) ;; still OK: (defn fib [n] (if (<= n 2) n (+ (fib (- n 1)) (fib (- n 2))))) ;; Stupid as hell (defn fib [n] (if (<= n 2) n (+ (fib (- n 1)) (fib (- n 2))))) </pre> Parens must never be alone, sad, all by their lonesome on their own line. <pre> ;; good (and preferred): (defn fib [n] (if (<= n 2) n (+ (fib (- n 1)) (fib (- n 2))))) ;; Stupid as hell (defn fib [n] (if (<= n 2) n (+ (fib (- n 1)) (fib (- n 2))) ) ) ; GAH, BURN IT WITH FIRE </pre> Don’t use S-Expression syntax where vector syntax is really required. For instance, the fact that: <pre> ;; bad (and evil) (defn foo (x) (print x)) (foo 1) </pre> works is just because the compiler isn’t overly strict. In reality, the correct syntax in places such as this is: <pre> ;; good (and preferred): (defn foo [x] (print x)) (foo 1) </pre> <h1 id="notice">Notice</h1> This guide is, above all, a guide. This is also only truly binding for working on Hy code internally. This post is also super subject to change in the future, whenever I can be bothered to ensure that we have more of the de facto rules written down.Automatically lint your packages with debuild.mehttps://notes.pault.ag/debuild-me/Sun, 09 Jun 2013 17:43:00 -0500https://notes.pault.ag/debuild-me/Over my time working with Debian packages, I’ve always been concerned that I have been missing catchable mistakes by not running all the static checking tools I could run. As a result, I’ve been interested in writing some code that automates this process, a place where I can push a package and come back a few hours later to check on the results. This is great, since it provides a slightly less scary interface to new packagers, and helps avoid thinking they’ve just been “told off” by a Developer. I’ve spent the time to actually write this code, and I’ve called it <a href="https://debuild.me">debuild.me</a>. The code itself is in its fourth iteration, and is built up from a few core components. The client / server code (<a href="https://github.com/paultag/lucy">lucy</a> and <a href="https://github.com/paultag/ethel">ethel</a>) are quite interconnected, but <a href="https://github.com/fedora-static-analysis/firehose">firehose</a> works great on its own, and is a single, unified (and sane!) spec that is easy to hack with (or even on!). Hopefully, this means that our wrappers will be usable outside of debuild.me, which is a win for everyone. <h1 id="backend-design">Backend Design</h1> The backend (<a href="https://github.com/paultag/lucy">lucy</a>) was the first part I wanted to design. I made the decision (very early on) that everything was going to be 100% Python 3.3+. This lets me use some of the (frankly, sweet) tools in the stdlib. Since I’ve written this type of thing before (I’ve tried to write this tool <a href="https://github.com/paultag/monomoy-old">many</a>, <a href="https://github.com/paultag/monomoy">many</a>, <a href="https://github.com/paultag/chatham-old">many</a>, <a href="https://github.com/paultag/chatham">many</a> times before), so I had a rough sense of how I wanted to design the backend. Past iterations had suffered from an overly complex server half, so I decided to go ultra minimal with the design of debuild.me. <aside class='left'> You can find the code for the server (lucy) on <a href="https://github.com/paultag/lucy">my GitHub</a> </aside> The backend watches a directory (using a simple <code>inotify</code> script) and processes <code>.changes</code> files as they come in. If the package is a source package, a set of jobs are triggered (such as <code>lintian</code>, <code>build</code> and <code>desktop-file-validate</code>), as well as a different set for binary packages (such as <code>lintian</code>, <code>piuparts</code> and <code>adequite</code>). Only people may upload source packages (without any debs) and only builders can upload binary packages (without source). The client and server talk using <a href="https://docs.python.org/3/library/xmlrpc.server.html">XML-RPC</a> with BASIC HTTP auth. I’m going to (eventually) SSL secure the transport layer, but for now, this will work as a proof of concept. Since I tend to like to keep my codebase simple and straightforward, I’ve used <a href="https://www.mongodb.org/">MongoDB</a> as Lucy’s DB. This lets me move between documents in Mongo to Python objects without any trouble. In addition, I evaluated some of the queue code out there (ZMQ, etc), and they all seemed like overkill for my problem, and had a hard time keeping track of jobs that (must never!) get lost. As a result, I wrote my own (very simple) job queue in Mongo, which has no sense of scheduling (at all), but can do its job (and do it well). Jobs describe what’s to be built with a link to the <code>package</code> document that the job relates to, and its <code>arch</code> and <code>suite</code> (don’t worry about the rest just yet). Jobs get assigned via natural sort on its <code>UUID</code> based <code>_id</code>, and assigned to the first builder that can process its <code>arch</code> / <code>suite</code>. Source packages are considered <code>arch:all</code> / <code>suite:unstable</code> (so they always get the most up-to-date linters on any arch that comes along). Lucy also allows for uploads to be given an <code>X-Lucy-Group</code> tag to manage which set of packages they’re a part of. This comes in handy for doing partial archive rebuilds, or eventually using it to manage what jobs should be run on which uploads. This will allow me to run much more time-consuming tools for packages I want to review versus rebuilding to ensure packages don’t FTBFS or aren’t adequite. <h1 id="client-design">Client Design</h1> The buildd client (<a href="https://github.com/paultag/ethel">ethel</a>) talks with <code>lucy</code> via <code>XML-RPC</code> to get assigned new jobs, release old jobs, close finished jobs, and upload package report data. When the <code>etheld</code> requests a new job, it also passes along what <code>suites</code> it knows of, which <code>arches</code> it can build, as well as what <code>types</code> it can run (stuff like <code>lintian</code>, <code>build</code> or <code>cppcheck</code>.) Lucy then assigns the builder to that job (so that we don’t allocate the same job twice), and what time it was assigned at. <aside class='right'> You can find the code for the client (ethel) on <a href="https://github.com/paultag/ethel">my GitHub</a> </aside> Ethel then takes the result of the job (in the form of a <code>firehose.model</code> tree) and transmits it over the line back to the Lucy server as a <code>report</code> (which also contains information on if the build failed or not), at which point lucy hands back a location (on the lucy host) that the daemon can write the log to. If the job was a binary build, the <code>etheld</code> process will <code>dput</code> the package to the server, with a special <code>X-Lucy-Job</code> tag to signal which job that build relates to, so that future lint runs can fetch the <code>deb</code> files that the build produced. <h1 id="tooling">Tooling</h1> Ethel runs a set of static checkers on the source code, which are basically fancy wrappers around the tools we all know and love (like <a href="https://lintian.debian.org/">lintian</a>, <a href="https://freedesktop.org/wiki/Software/desktop-file-utils/">desktop-file-validate</a>, or <a href="https://piuparts.debian.org/">piuparts</a>) which output Firehose in place of home-grown stdout. This allows us to programmatically deal with the output of these tools in a normal and consistent way. <aside class='left'> You can read more about Firehose over in the Firehose <a href="https://github.com/fedora-static-analysis/firehose/blob/master/README.rst">README.rst</a> </aside> Some of the more complex runners are made of 3 parts - a <code>runner</code>, <code>wrapper</code> and <code>command</code>. The server invokes the <code>command</code> routine, which invokes the <code>runner</code> (the command just provides a unified interface to all the runners), who’s output gets parsed by the <code>wrapper</code> to turn it into a Firehose model tree. The goal here is that tons of very quick-running tools get run over a distributed network, and machine-readable reports get filed in a central location to aid in reviewing packages. <h1 id="ricky">Ricky</h1> In addition to the actual code to run builds, I’ve worked on a few tools to aid with using debuild.me for my DD related life. I have some uncommon use-cases that are nice to support. One such use-case is the ability to rebuild packages from the archive (unmodified) to check that they rebuild OK against the target. This is handy for things like <code>arch:all</code> packages that get uploaded (since they never get rebuilt on the buildd machines, and FTBFSs are sadly common) or packages that have had a <code>Build-Dependency</code> change on them. Ricky is able to create a <code>.dsc</code> url to your friendly local mirror, and fetch that exact version of the package. Ricky can then also use the <code>.dsc</code> (in a monumental hack) to forge a <code>package_version_source.changes</code> file, and sign it with an autobuild key and upload it to the debuild.me instance. Since it can also modify the <code>.changes</code>’s target distribution, you can also use this to test if a package will build on <code>stable</code> or <code>testing</code>, unmodified. <h1 id="fred">Fred</h1> Fred is a wrapper around Ricky, to help with fetching packages that may not exist yet. Fred also contains an email scraper to read off such lists as <a href="https://lists.debian.org/debian-devel-changes">debian-devel-changes</a>, and add an entry to fetch that upload when it becomes available on the local mirror, pass it to <code>ricky</code>, and allow debuild.me to rebuild new packages that match a set of criteria. I’m currently playing around with the idea of rebuilding all incoming Python packages to ensure they don’t FTBFS in a clean chroot. <h1 id="loofah">Loofah</h1> Loofah is also another wrapper around Ricky, but for use manually. Loofah is able to sync down the apt <code>Sources</code> list, and place it in Mongo for fast queries. This than allows me to manually run rebuilds on any Source package that fits a set of criteria (written in the form of a Mongo query), which get pulled and uploaded by <code>Ricky</code>. An example script to rebuild any packages that <code>Build-Depend</code> on <code>python3-all-dev</code> in Debian <code>unstable</code> / <code>main</code> would look like: <aside class='right'> You can find more queries in the Loofah <a href = 'https://github.com/paultag/loofah/tree/master/eg' >examples</a> </aside> <pre> [ { "version": "unstable", "suite": "main" }, { "Build-Depends": "python3-all-dev" } ] </pre> Or, a script to rebuild any package that depends on CDBS: <pre> [ {}, {"$or": [{"Build-Depends": "cdbs"}, {"Build-Depends-Indep": "cdbs"}]} ] </pre> You can use anything that exists in the <code>Sources.gz</code> file to query off of ( including <code>Maintainer</code>!) <h1 id="future-work">Future Work</h1> The future work on debuild.me will be centered around making it easier for buildd nodes to be added to the network, with more and more automation in that process (likely in the form of debs). I also want to add better control over the jobs, so that packages I upload only go to my personal servers. I’d also very much like to get better EC2 / Virtualization support integrated into the network, so that the buildd count grows with the queue size. This is a slightly hard problem that I’m keen to fix. I’m also considering moving the log parsing code out of the workers, so that the parsing code can be fixed without upgrading all the workers. This would also drop the <code>Firehose</code> dep on the client code, which would be nice. Migration from a debuild.me build into a local <code>reprepro</code> repo is something that would be fairly easy to do as well, likely to be done remotely via the <code>XML-RPC</code> interface, which calls a couple of <code>reprepro</code> commands (such as <code>includedsc</code> and <code>includedeb</code>) and publishes it to the user’s repo. This is a nice use of the debs that get built, and could also allow debuild.me to be used like a PPA system, but this allows the user to not migrate packages that may contain <code>piuparts</code> issues.A primer on apt's mirror:// protocolhttps://notes.pault.ag/apt-mirror/Sat, 23 Feb 2013 21:04:00 -0500https://notes.pault.ag/apt-mirror/It’s sometimes helpful to keep your machines using a list of apt archives to use, rather then a single mirror, because redundancy is good. Rather then using (the great) services like <code>http.debian.net</code> or <code>ftp.us.debian.org</code>, you can set your own mirror lists using apt’s <code>mirror://</code> protocol. <aside class="right"> While initially hacking this through, Micah ended up filing a bug on <code>mirror://</code>, more information in <a href="https://bugs.debian.org/699310">the bts</a>. I've since been able to get it to work for me, but beware! </aside> All of this is ultra unstable, so be a bit careful when using this. I’ve been using <code>mirror://</code> for a few months now, and it seems fine (even have my servers using it), but it was a bit of a pain to set up. It gets slightly confused if you point it at something bad, and it’s a mild pain to debug. Hopefully more people will see the value in <code>mirror://</code>, and contribute code to it’s development. <h1 id="why-bother">Why bother?</h1> If you have a local network mirror, it’s helpful to have your machines default to a local mirror, if you’re the sort to keep an archive mirror on the LAN, and fall back to your nearest friendly mirror otherwise. In addition, this lets you hand-define where apt searches for mirrors, which is great, since you can control the subset of servers you ping a bit more closely. <h1 id="practical-bits--quickstart">Practical Bits / quickstart</h1> The following block covers the quick and dirty details on how to set up <code>mirror://</code> for use on your machine (today!). This is very basic, and details are very sparse, but hopefully there’s enough here to help folks use this on their local system. Basically, you’ve got three core things to do: <ol> <li>Pick your mirrors (this one’s a bit of a duh)</li> <li>Put them in a public place you can always get to, regardless of where you are in cyberspace (I use <a href="https://static.pault.ag/debian/mirrors.txt">static.pault.ag</a>) - remember, this is the one thing all your machines need to always get to, no matter where they are.</li> <li>Configure your <code>sources.list</code> to use the mirror.txt file by pointing to the text file with the <code>mirror://</code> protocol.</li> </ol> Turns out <code>mirror://</code>’s protocol handler will segfault if you give it something bad, so don’t be afraid if you see <code>apt-get update</code> segfault - it just means you’ve likely not pointed it at a valid text file. The format of the text file should be a simple text file of mirrors it can try, in order of priority. Mine looks a bit like: <pre> https://127.0.0.1:3142/debian.lcs.mit.edu/debian/ https://debian.lcs.mit.edu/debian/ # https://http.debian.net/debian/ </pre> Finally, your <code>sources.list</code> entry should look a bit like: <pre> deb mirror://static.pault.ag/debian/mirrors.txt unstable main deb mirror://static.pault.ag/debian/mirrors.txt experimental main deb-src mirror://static.pault.ag/debian/mirrors.txt unstable main deb-src mirror://static.pault.ag/debian/mirrors.txt experimental main </pre> <h1 id="problems">Problems</h1> With the good comes the bad. Not everything fully supports this, and most tools that parse <code>sources.list</code> break in a really silly way. <h2 id="command-not-found">command-not-found</h2> <code>update-command-not-found</code> will blow up like: <pre> W: Don't know how to handle mirror W: Don't know how to handle mirror W: Don't know how to handle mirror W: Don't know how to handle mirror W: Don't know how to handle mirror W: Don't know how to handle mirror </pre>Using env(1) in the shebanghttps://notes.pault.ag/env-in-shebang/Tue, 15 Jan 2013 20:02:00 -0500https://notes.pault.ag/env-in-shebang/Some of you out there may have tried to pass flags to a script that was being invoked via <code>/usr/bin/env</code> in the shebang (<code>#!</code>), such as <code>python</code>. You might recall an error such as: <pre> /usr/bin/env: python -d: No such file or directory </pre> This error is super annoying, so I went about trying to figure out how I can pass arguments to <code>python</code> (or even things like <code>ipython</code> or <code>bpython</code>). The idea is we can abuse the concept of a <a href="https://en.wikipedia.org/wiki/Polyglot_(computing)">polygot</a> to shim in some things we care about. <h1 id="implementation">Implementation</h1> Let’s take a look at a quick script I hacked up to use bpython with a pre-made script that drops into interactive work. <pre> #!/bin/sh """": exec /usr/bin/env bpython -i $0 $@ """ import hy print "Hython is now importable!" </pre> Let’s step through this slowly. First, the bits the <code>bash</code> sees: <pre> #!/bin/sh """": exec /usr/bin/env bpython -i $0 $@ </pre> Which will cause <code>bpython</code> to reload the file, which looks like the following to Python: <pre> #!/bin/sh """": exec /usr/bin/env bpython -i $0 $@ """ import hy print "Hython is now importable!" </pre> Where Python can now ignore the docstring. Magic!

Original Source | Taken Source