Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 302 server: nginx date: Thu, 07 Aug 2025 04:34:48 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20080913184720 location: https://web.archive.org/web/20080913184720/https://www.hackszine.com/blog/archive/linux_server/ server-timing: captures_list;dur=0.743198, exclusion.robots;dur=0.024803, exclusion.robots.policy;dur=0.010993, esindex;dur=0.013879, cdx.remote;dur=7.215156, LoadShardBlock;dur=228.111100, PetaboxLoader3.datanode;dur=162.885895 x-app-server: wwwb-app204 x-ts: 302 x-tr: 268 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=1 set-cookie: wb-p-SERVER=wwwb-app204; path=/ x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 200 server: nginx date: Thu, 07 Aug 2025 04:34:49 GMT content-type: text/html x-archive-orig-date: Sat, 13 Sep 2008 18:47:20 GMT x-archive-orig-server: Apache x-archive-orig-last-modified: Sat, 13 Sep 2008 07:31:21 GMT x-archive-orig-etag: "281c007-10038-48cb6c49" x-archive-orig-accept-ranges: bytes x-archive-orig-content-length: 65592 x-archive-orig-connection: close x-archive-guessed-content-type: text/html x-archive-guessed-charset: utf-8 memento-datetime: Sat, 13 Sep 2008 18:47:20 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Fri, 26 Oct 2007 06:59:02 GMT", ; rel="prev memento"; datetime="Mon, 23 Jun 2008 06:41:33 GMT", ; rel="memento"; datetime="Sat, 13 Sep 2008 18:47:20 GMT", ; rel="next memento"; datetime="Fri, 31 Oct 2008 12:56:35 GMT", ; rel="last memento"; datetime="Fri, 29 Jan 2010 11:07:03 GMT" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: 52_5_20080913181721_crawl101-c/52_5_20080913184550_crawl100.arc.gz server-timing: captures_list;dur=0.512606, exclusion.robots;dur=0.015927, exclusion.robots.policy;dur=0.007872, esindex;dur=0.011955, cdx.remote;dur=42.657439, LoadShardBlock;dur=272.712267, PetaboxLoader3.datanode;dur=204.740131, PetaboxLoader3.resolve;dur=202.043661, load_resource;dur=236.916714 x-app-server: wwwb-app204 x-ts: 200 x-tr: 641 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=1 x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() content-encoding: gzip Hackszine.com: Linux Server Archives

Archive: Linux Server

August 26, 2008

Dealing with large numbers of files in Unix

Most of the time, you can move a bunch of files from one folder to another by running a simple mv command like "mv sourcedir/* destdir/". The problem is that when that asterisk gets expanded, each file in the directory is added as a command line parameter to the mv command. If sourcedir contains a lot of files, this can overflow the command line buffer, resulting in a mysterious "Too many arguments" error.

I ran into this problem recently while trying to manage a directory that had over a million files in it. It's not every day you run across a directory that contains a metric crap-ton of files, but when the problem arises, there's an easy way to deal with it. The trick is to use the handy xargs program, which is designed to take a big list as stdin and separate it as arguments to another command:

find sourcedir -type f -print | xargs -l1 -i mv {} destdir/

The -l1 tells xargs to only use one argument at a time to pass to mv. The -i parameter tells xargs to replace the {} with the argument. This command will execute mv for each file in the directory. Ideally, you would optimize this and specify something like -l50, sending mv 50 files at a time to move. This is how I remember xargs working on other Unix systems, but the GNU xargs that I have on my Linux box forces the number of arguments to 1 any time the -i is invoked. Either way, it gets the job done.

Without the -i, the -l parameter will work in Linux, but you can no longer use the {} substitution and all parameters are placed as the final arguments in the command. This is useless for when you want to add a final parameter such as the destination directory for the mv command. On the other hand, it's helpful for commands that will end with your file parameters, such as when you are batch removing files with rm.

Oddly enough, in OS X the parameters for xargs are a bit wonky and capitalized. The good news is that you can invoke the parameter substitution with multiple arguments at a time. To move a bunch of files in OS X, 50 files at a time, try the following:

find sourcedir -type f -print | xargs -L50 -I{} mv {} destdir/

That's about all there is to it. This is just a basic example, but once you get used to using xargs and find together, it's pretty easy to tweak the find parameters and move files around based on their date, permissions or file extension.

August 6, 2008

Memcached and high performance MySQL

Memcached is a distributed object caching system that was originally developed to improve the performance of LiveJournal and has subsequently been used as a scaling strategy for a number of high-load sites. It serves as a large, extremely fast hash table that can be spread across many servers and accessed simultaneously from multiple processes. It's designed to be used for almost any back-end caching need, and for high performance web applications, it's a great complement to a database like MySQL.

In a typical environment, a web developer might employ a combination of process level caching and the built-in MySQL query caching to eke out that extra bit of performance from an application. The problem is that in-process caching is limited to the web process running on a single server. In a load-balanced configuration, each server is maintaining its own cache, limiting the efficiency and available size of the cache. Similarly, MySQL's query cache is limited to the server that the MySQL process is running on. The query cache is also limited in that it can only cache row results. With memcached you can set up a number cache servers which can store any type of serialized object and this data can be shared by all of the loadbalanced web servers. Cool, no?

To set up a memcached server, you simple download the daemon and run it with a few parameters. From the memcached web site:

First, you start up the memcached daemon on as many spare machines as you have. The daemon has no configuration file, just a few command line options, only 3 or 4 of which you'll likely use:
# ./memcached -d -m 2048 -l 10.0.0.40 -p 11211

This starts memcached up as a daemon, using 2GB of memory, and listening on IP 10.0.0.40, port 11211. Because a 32-bit process can only address 4GB of virtual memory (usually significantly less, depending on your operating system), if you have a 32-bit server with 4-64GB of memory using PAE you can just run multiple processes on the machine, each using 2 or 3GB of memory.

It's about as simple as it gets. There's no real configuration. No authentication. It's just a gigantor hash table. Obviously, you'd set this up on a private, non-addressable network. From there, the work of querying and updating the cache is completely up to the application designer. You are afforded the basic functions of set, get, and delete. Here's a simple example in PHP:

$memcache = new Memcache; $memcache->addServer('10.0.0.40', 11211); $memcache->addServer('10.0.0.41', 11211);
$value= "Data to cache";

$memcache->set('thekey', $value, 60);
echo "Caching for 60 seconds: $value <br>\n";

$retrieved = $memcache->get('thekey');
echo "Retrieved: $retrieved <br>\n";

The PHP library takes care of the dirty work of serializing any value you pass to the cache, so you can send and retrieve arrays or even complete data objects.

In your application's data layer, instead of immediately hitting the database, you can now query memcached first. If the item is found, there's no need to hit the database and assemble the data object. If the key is not found, you select the relevant data from the database and store the derived object in the cache. Similarly, you update the cache whenever your data object is altered and updated in the database. Assuming your API is structured well, only a few edits need to be made to dramatically alter the scalability and performance of your application.

I've linked to a few resources below where you can find more information on using memcached in your application. In addition to the documentation on the memcached web site, Todd Hoff has compiled a list of articles on memcached and summarized several memcached performance techniques. It's a pretty versatile tool. For those of you who've used memcached, give us a holler in the comments and share your tips and tricks.

Memcached
Strategies for Using Memcached and MySQL Better Together
Memcached and MySQL tutorial (PDF)

July 13, 2008

Find and Grep 101

Find and Grep are perhaps the most used command line tools for the Linux user or administrator. Terse but powerful, these two commands will allow you to search through files and their contents by almost any imaginable attribute or filter criteria: file name, date modified, occurrence of the some specific word in a file, etc. Combined with a couple of other standard unix utilities, you can automate and process modifications over a number of files that match your search.

Here are two blog posts by Eric Wendelin which nicely illustrate the basics of these two commands:

Find is a Beautiful Tool
Grep is a Beautiful Tool

There are a number of other great unix utilities for file search, but knowing how to use find and grep is fundamental, as these two utilities can be found on the most basic build of every unix-like machine you come across.

Got a favorite command line hack that uses find or grep? Drop it on us in the comments.

June 4, 2008

Use video RAM as swap in Linux

If you are into the headless or console experience, there are a couple of ways to put your machine's graphics card to good use. Most new boxes come with a GPU that has a substantial amount of RAM that is normally used for direct rendering. Using the Memory Technology Device (MTD) support in the Linux kernel, you can actually map the video card RAM to a block device and format it for swap use or as a ramdisk.

The Gentoo wiki has detailed instructions for doing this. The only tricky part is determining the video memory address, but after that it's a simple modprobe to load the MTD driver and you can run mkswap/swapon on the device just as if you were creating a normal swap disk. Considering many machines have 512MB of video RAM and it's waaaaay faster than disk, this could give you a pretty huge performance boost.

You can still use your graphics card in X, but you'll need to reserve a small chunk of that RAM for normal graphics use, use the VESA driver, and add inform the driver that it should only use that teensy portion of memory. "VideoRam 4096" in the XF86Config, for instance, will let you use your card in X and only eat the first 4MB of RAM. Everything after that 4MB is fair game for swap. Michal Schulz wrote a bit about calculating the memory address offsets to make this all work. It's the second link below, for those of you who aren't hardcore enough to deal with only the command line.

Use Memory On Video Card As Swap
Configuring X11 With A VRAM Storage Device

May 23, 2008

Helmer render cluster: 186 Gflops in an IKEA cabinet

I usually get all excited about tiny, noiseless, low-power PC hardware, but I have to admit that this 24 core, 186 Gflop render cluster built into an IKEA Helmer cabinet is pretty inspiring. Most cool is that when it's not overburdened and jumping to swap, it's still a reasonably efficient setup for its performance specs:

The most amazing is that this machine just cost as a better standard PC, but has 24 cores that run each at 2.4 Ghz, a total of 48GB ram, and just need 400W of power!! This means that it hardly gets warm, and make less noise then my desktop pc.
Render jobs that took all night, now gets done in 10-12 min.

Janne opted for modifying the Helmer cabinet instead of using standard PC cases because the 6 cases would have cost about as much ass the motherboards and CPUs. Most of the modification involved cutting holes for airflow, power supplies, and cabling, but it looks like the Helmer's drawer dimensions accommodate the ATX motherboards almost perfectly.

I'm not all that familiar with the software behind 3D rendering (anyone care to point us to some howtos?), but Janne is using a batch management system called DrQueue that looks quite useful for a lot of distributed applications. It takes care of distributing jobs between the clsuter's nodes, allowing you to manage and monitor each of the nodes remotely from a central interface. Pretty cool stuff.

Helmer render cluster
DrQueue

May 14, 2008

Debian/Ubuntu users: update your SSL keys and certs

It was announced yesterday that sometime back in September 2006 a line of code was removed from the Debian distributed OpenSSL package. That one line of code was responsible for causing an uninitialized data warning in Valgrind. It also seeded the random number generator used by OpenSSL. Without it, the error went away, but the keyspace used by affected systems went from 2^1024 to about 2^15. Oh noes!

A large majority of Debian and Ubuntu systems are affected. To correct the problem, you'll need to not only update OpenSSL, but also revoke and replace any cryptographic keys and certificates that were generated on the affected systems. From the Debian security advisory:

Affected keys include SSH keys, OpenVPN keys, DNSSEC keys, and key material for use in X.509 certificates and session keys used in SSL/TLS connections. Keys generated with GnuPG or GNUTLS are not affected, though.

For most people, this boils down to your ssh server's host key and any public key pairs used for remote ssh authentication. Any keys or certificates generated on the affected machines for SSL/https use also need to be revoked and regenerated. It's pretty ugly, really.

As far as teachable moments go, there's probably a lot to think about here. Software developers have this weird natural tendency to want to fix and reengineer things that aren't even broken. I'd go so far as to say that the desire to reengineer is inversely proportional to a programmer's familiarity and understanding of the code. I think it comes from our intense desire to make sense of things. It's the guru who's able to channel that hacker urge into solving new problems instead of creating new bugs out of old solutions.

DSA-1571-1 openssl -- predictable random number generator
OpenSSL PRNG Debian Toys (more discussion of the problem here)

October 18, 2007

Remote snapshot backups with rsync and Samba

Thanassis Tsiodras writes:

What would you do if you had to automatically backup a remote Linux box (e.g. your web server), and all you had locally was Windows machines? How about this:
automatically expanding local storage space
transmissions of differences only
automatic scheduling
local storage of differences only
secure and compressed transfer of remote data and
instant filesystem navigation inside daily snapshot images

I covered all these requirements using open source tools, and I now locally backup our 3GB remote server in less than 2min!

We've all used Samba and rsync before, but Thanassis has really put all the pieces together into a complete backup system that's superior to a lot of commercial products I've seen.

The really impressive bit is how he's easily doing snapshot images using filesystem hardlinks. You can save several days worth of snapshots at very little cost because additional space is only taken up by files that have changed. Using hardlinks, identical files from different snapshots all point to the same inode.

root# mount /dev/loop0 /mnt/backup root# cd /mnt/backup root# rm -rf OneBeforeLast root# cp -al LastBackup OneBeforeLast root# cd LastBackup root# rsync -avz --delete root@hosting.machine.in.US:/ ./
The "cp -al" creates a zero-cost copy of the data (using hardlinks, the only price paid is the one of the directory entries, and ReiserFS is well known for its ability to store these extremely efficiently). Then, rsync is executed with the --delete option: meaning that it must remove from our local mirror all the files that were removed on the server - and thus creating an accurate image of the current state.

And here's the icing on the cake: The data inside these files are not lost! They are still accessible from the OneBeforeLast/ directory, since hard links (the old directory entries) are pointing to them!

In plain terms, simple navigation inside OneBeforeLast can be used to examine the exact contents of the server as they were BEFORE the last mirroring.

Just imagine the data recovery headaches you could solve by adapting that to a cron job that shuffles a months worth of nightly backups.

Optimal remote Linux backups with rsync over Samba - Link