CARVIEW |
Archive: Linux Server
August 26, 2008
Dealing with large numbers of files in Unix
Most of the time, you can move a bunch of files from one folder to another by running a simple mv command like "mv sourcedir/* destdir/
". The problem is that when that asterisk gets expanded, each file in the directory is added as a command line parameter to the mv command. If sourcedir contains a lot of files, this can overflow the command line buffer, resulting in a mysterious "Too many arguments" error.
I ran into this problem recently while trying to manage a directory that had over a million files in it. It's not every day you run across a directory that contains a metric crap-ton of files, but when the problem arises, there's an easy way to deal with it. The trick is to use the handy xargs program, which is designed to take a big list as stdin and separate it as arguments to another command:
find sourcedir -type f -print | xargs -l1 -i mv {} destdir/
The -l1 tells xargs to only use one argument at a time to pass to mv. The -i parameter tells xargs to replace the {} with the argument. This command will execute mv for each file in the directory. Ideally, you would optimize this and specify something like -l50, sending mv 50 files at a time to move. This is how I remember xargs working on other Unix systems, but the GNU xargs that I have on my Linux box forces the number of arguments to 1 any time the -i is invoked. Either way, it gets the job done.
Without the -i, the -l parameter will work in Linux, but you can no longer use the {} substitution and all parameters are placed as the final arguments in the command. This is useless for when you want to add a final parameter such as the destination directory for the mv command. On the other hand, it's helpful for commands that will end with your file parameters, such as when you are batch removing files with rm.
Oddly enough, in OS X the parameters for xargs are a bit wonky and capitalized. The good news is that you can invoke the parameter substitution with multiple arguments at a time. To move a bunch of files in OS X, 50 files at a time, try the following:
find sourcedir -type f -print | xargs -L50 -I{} mv {} destdir/
That's about all there is to it. This is just a basic example, but once you get used to using xargs and find together, it's pretty easy to tweak the find parameters and move files around based on their date, permissions or file extension.
Posted by Jason Striegel |
Aug 26, 2008 07:22 PM
Linux, Linux Server, Mac |
Permalink
| Comments (5)
| TrackBack
| Digg It
| Tag w/del.icio.us
August 6, 2008
Memcached and high performance MySQL
Memcached is a distributed object caching system that was originally developed to improve the performance of LiveJournal and has subsequently been used as a scaling strategy for a number of high-load sites. It serves as a large, extremely fast hash table that can be spread across many servers and accessed simultaneously from multiple processes. It's designed to be used for almost any back-end caching need, and for high performance web applications, it's a great complement to a database like MySQL.
In a typical environment, a web developer might employ a combination of process level caching and the built-in MySQL query caching to eke out that extra bit of performance from an application. The problem is that in-process caching is limited to the web process running on a single server. In a load-balanced configuration, each server is maintaining its own cache, limiting the efficiency and available size of the cache. Similarly, MySQL's query cache is limited to the server that the MySQL process is running on. The query cache is also limited in that it can only cache row results. With memcached you can set up a number cache servers which can store any type of serialized object and this data can be shared by all of the loadbalanced web servers. Cool, no?
To set up a memcached server, you simple download the daemon and run it with a few parameters. From the memcached web site:
First, you start up the memcached daemon on as many spare machines as you have. The daemon has no configuration file, just a few command line options, only 3 or 4 of which you'll likely use:
# ./memcached -d -m 2048 -l 10.0.0.40 -p 11211
This starts memcached up as a daemon, using 2GB of memory, and listening on IP 10.0.0.40, port 11211. Because a 32-bit process can only address 4GB of virtual memory (usually significantly less, depending on your operating system), if you have a 32-bit server with 4-64GB of memory using PAE you can just run multiple processes on the machine, each using 2 or 3GB of memory.
It's about as simple as it gets. There's no real configuration. No authentication. It's just a gigantor hash table. Obviously, you'd set this up on a private, non-addressable network. From there, the work of querying and updating the cache is completely up to the application designer. You are afforded the basic functions of set, get, and delete. Here's a simple example in PHP:
$memcache = new Memcache; $memcache->addServer('10.0.0.40', 11211); $memcache->addServer('10.0.0.41', 11211);$value= "Data to cache";
$memcache->set('thekey', $value, 60);
echo "Caching for 60 seconds: $value <br>\n";$retrieved = $memcache->get('thekey');
echo "Retrieved: $retrieved <br>\n";
The PHP library takes care of the dirty work of serializing any value you pass to the cache, so you can send and retrieve arrays or even complete data objects.
In your application's data layer, instead of immediately hitting the database, you can now query memcached first. If the item is found, there's no need to hit the database and assemble the data object. If the key is not found, you select the relevant data from the database and store the derived object in the cache. Similarly, you update the cache whenever your data object is altered and updated in the database. Assuming your API is structured well, only a few edits need to be made to dramatically alter the scalability and performance of your application.
I've linked to a few resources below where you can find more information on using memcached in your application. In addition to the documentation on the memcached web site, Todd Hoff has compiled a list of articles on memcached and summarized several memcached performance techniques. It's a pretty versatile tool. For those of you who've used memcached, give us a holler in the comments and share your tips and tricks.
Memcached
Strategies for Using Memcached and MySQL Better Together
Memcached and MySQL tutorial (PDF)
Posted by Jason Striegel |
Aug 6, 2008 10:37 PM
Data, Linux, Linux Server, MySQL, Software Engineering |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 13, 2008
Find and Grep 101
Find and Grep are perhaps the most used command line tools for the Linux user or administrator. Terse but powerful, these two commands will allow you to search through files and their contents by almost any imaginable attribute or filter criteria: file name, date modified, occurrence of the some specific word in a file, etc. Combined with a couple of other standard unix utilities, you can automate and process modifications over a number of files that match your search.
Here are two blog posts by Eric Wendelin which nicely illustrate the basics of these two commands:
Find is a Beautiful Tool
Grep is a Beautiful Tool
There are a number of other great unix utilities for file search, but knowing how to use find and grep is fundamental, as these two utilities can be found on the most basic build of every unix-like machine you come across.
Got a favorite command line hack that uses find or grep? Drop it on us in the comments.
Posted by Jason Striegel |
Jul 13, 2008 09:19 PM
Linux, Linux Server, Ubuntu |
Permalink
| Comments (3)
| TrackBack
| Digg It
| Tag w/del.icio.us
June 4, 2008
Use video RAM as swap in Linux
If you are into the headless or console experience, there are a couple of ways to put your machine's graphics card to good use. Most new boxes come with a GPU that has a substantial amount of RAM that is normally used for direct rendering. Using the Memory Technology Device (MTD) support in the Linux kernel, you can actually map the video card RAM to a block device and format it for swap use or as a ramdisk.
The Gentoo wiki has detailed instructions for doing this. The only tricky part is determining the video memory address, but after that it's a simple modprobe to load the MTD driver and you can run mkswap/swapon on the device just as if you were creating a normal swap disk. Considering many machines have 512MB of video RAM and it's waaaaay faster than disk, this could give you a pretty huge performance boost.
You can still use your graphics card in X, but you'll need to reserve a small chunk of that RAM for normal graphics use, use the VESA driver, and add inform the driver that it should only use that teensy portion of memory. "VideoRam 4096" in the XF86Config, for instance, will let you use your card in X and only eat the first 4MB of RAM. Everything after that 4MB is fair game for swap. Michal Schulz wrote a bit about calculating the memory address offsets to make this all work. It's the second link below, for those of you who aren't hardcore enough to deal with only the command line.
Use Memory On Video Card As Swap
Configuring X11 With A VRAM Storage Device
Posted by Jason Striegel |
Jun 4, 2008 09:10 PM
Linux, Linux Server |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
May 23, 2008
Helmer render cluster: 186 Gflops in an IKEA cabinet
I usually get all excited about tiny, noiseless, low-power PC hardware, but I have to admit that this 24 core, 186 Gflop render cluster built into an IKEA Helmer cabinet is pretty inspiring. Most cool is that when it's not overburdened and jumping to swap, it's still a reasonably efficient setup for its performance specs:
The most amazing is that this machine just cost as a better standard PC, but has 24 cores that run each at 2.4 Ghz, a total of 48GB ram, and just need 400W of power!! This means that it hardly gets warm, and make less noise then my desktop pc.Render jobs that took all night, now gets done in 10-12 min.
Janne opted for modifying the Helmer cabinet instead of using standard PC cases because the 6 cases would have cost about as much ass the motherboards and CPUs. Most of the modification involved cutting holes for airflow, power supplies, and cabling, but it looks like the Helmer's drawer dimensions accommodate the ATX motherboards almost perfectly.
I'm not all that familiar with the software behind 3D rendering (anyone care to point us to some howtos?), but Janne is using a batch management system called DrQueue that looks quite useful for a lot of distributed applications. It takes care of distributing jobs between the clsuter's nodes, allowing you to manage and monitor each of the nodes remotely from a central interface. Pretty cool stuff.
Posted by Jason Striegel |
May 23, 2008 07:48 PM
Hardware, Linux Multimedia, Linux Server |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
May 14, 2008
Debian/Ubuntu users: update your SSL keys and certs
It was announced yesterday that sometime back in September 2006 a line of code was removed from the Debian distributed OpenSSL package. That one line of code was responsible for causing an uninitialized data warning in Valgrind. It also seeded the random number generator used by OpenSSL. Without it, the error went away, but the keyspace used by affected systems went from 2^1024 to about 2^15. Oh noes!
A large majority of Debian and Ubuntu systems are affected. To correct the problem, you'll need to not only update OpenSSL, but also revoke and replace any cryptographic keys and certificates that were generated on the affected systems. From the Debian security advisory:
Affected keys include SSH keys, OpenVPN keys, DNSSEC keys, and key material for use in X.509 certificates and session keys used in SSL/TLS connections. Keys generated with GnuPG or GNUTLS are not affected, though.
For most people, this boils down to your ssh server's host key and any public key pairs used for remote ssh authentication. Any keys or certificates generated on the affected machines for SSL/https use also need to be revoked and regenerated. It's pretty ugly, really.
As far as teachable moments go, there's probably a lot to think about here. Software developers have this weird natural tendency to want to fix and reengineer things that aren't even broken. I'd go so far as to say that the desire to reengineer is inversely proportional to a programmer's familiarity and understanding of the code. I think it comes from our intense desire to make sense of things. It's the guru who's able to channel that hacker urge into solving new problems instead of creating new bugs out of old solutions.
DSA-1571-1 openssl -- predictable random number generator
OpenSSL PRNG Debian Toys (more discussion of the problem here)
Posted by Jason Striegel |
May 14, 2008 07:57 PM
Cryptography, Linux, Linux Desktop, Linux Server, Ubuntu |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
October 18, 2007
Remote snapshot backups with rsync and Samba
Thanassis Tsiodras writes:
What would you do if you had to automatically backup a remote Linux box (e.g. your web server), and all you had locally was Windows machines? How about this:
- automatically expanding local storage space
- transmissions of differences only
- automatic scheduling
- local storage of differences only
- secure and compressed transfer of remote data and
- instant filesystem navigation inside daily snapshot images
I covered all these requirements using open source tools, and I now locally backup our 3GB remote server in less than 2min!
We've all used Samba and rsync before, but Thanassis has really put all the pieces together into a complete backup system that's superior to a lot of commercial products I've seen.
The really impressive bit is how he's easily doing snapshot images using filesystem hardlinks. You can save several days worth of snapshots at very little cost because additional space is only taken up by files that have changed. Using hardlinks, identical files from different snapshots all point to the same inode.
root# mount /dev/loop0 /mnt/backup root# cd /mnt/backup root# rm -rf OneBeforeLast root# cp -al LastBackup OneBeforeLast root# cd LastBackup root# rsync -avz --delete root@hosting.machine.in.US:/ ./The "cp -al" creates a zero-cost copy of the data (using hardlinks, the only price paid is the one of the directory entries, and ReiserFS is well known for its ability to store these extremely efficiently). Then, rsync is executed with the --delete option: meaning that it must remove from our local mirror all the files that were removed on the server - and thus creating an accurate image of the current state.
And here's the icing on the cake: The data inside these files are not lost! They are still accessible from the OneBeforeLast/ directory, since hard links (the old directory entries) are pointing to them!
In plain terms, simple navigation inside OneBeforeLast can be used to examine the exact contents of the server as they were BEFORE the last mirroring.
Just imagine the data recovery headaches you could solve by adapting that to a cron job that shuffles a months worth of nightly backups.
Optimal remote Linux backups with rsync over Samba - Link
Posted by Jason Striegel |
Oct 18, 2007 10:17 PM
Linux, Linux Server, Windows, Windows Server |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
Bloggers
Welcome to the Hacks Blog!
Categories
- Ajax
- Amazon
- AppleTV
- Astronomy
- Baseball
- BlackBerry
- Blogging
- Body
- Cars
- Cryptography
- Data
- Design
- Education
- Electronics
- Energy
- Events
- Excel
- Excerpts
- Firefox
- Flash
- Flickr
- Flying Things
- Food
- Gaming
- Gmail
- Google Earth
- Google Maps
- Government
- Greasemonkey
- Hacks Series
- Hackszine Podcast
- Halo
- Hardware
- Home
- Home Theater
- iPhone
- iPod
- IRC
- iTunes
- Java
- Kindle
- Knoppix
- Language
- LEGO
- Life
- Lifehacker
- Linux
- Linux Desktop
- Linux Multimedia
- Linux Server
- Mac
- Mapping
- Math
- Microsoft Office
- Mind
- Mind Performance
- Mobile Phones
- Music
- MySpace
- MySQL
- NetFlix
- Network Security
- olpc
- OpenOffice
- Outdoor
- Parenting
- PCs
- PDAs
- Perl
- Philosophy
- Photography
- PHP
- Pleo
- Podcast
- Podcasting
- Productivity
- PSP
- Retro Computing
- Retro Gaming
- Science
- Screencasts
- Security
- Shopping
- Skype
- Smart Home
- Software Engineering
- Sports
- SQL
- Statistics
- Survival
- TiVo
- Transportation
- Travel
- Ubuntu
- Video
- Virtualization
- Visual Studio
- VoIP
- Web
- Web Site Measurement
- Windows
- Windows Server
- Wireless
- Word
- World
- Xbox
- Yahoo!
- YouTube
Archives
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
Recent Posts
- Super GreaseMonkey - your favorite Firefox plugin meets jQuery
- HOWTO - create a see-through information graphic
- HOWTO - make a serial port IR receiver
- All AJAX image editor
- Run Google Chrome in Linux with Wine
- DIY photography speed strap
- Write a Hadoop MapReduce job in any programming language
- Read Excel files in Perl and PHP
- Objective-J and Cappuccino released
- HOWTO - reset a lost Ubuntu password
www.flickr.com
|
Most read entries (last 30 days)
- HOWTO - reset a lost Ubuntu password
- LED security camera disruptor
- Change the message on HP printers
- HOWTO: Reset a lost OS X password
- Using an optical mouse for robotic position sensing
- HOWTO - Read/Write to NTFS drives in OS X
- Unbrick or downgrade any PSP
- HOWTO - Install Ubuntu on the Asus Eee PC
- Make a cheap Xbox 360 Wireless Adapter with DD-WRT
- Free airport WiFi
- Using Google as a Proxy (or HOW TO: View MySpace at School)
- T-Zones and iPhone: the $5.99 data plan
- Star Wars music played by a floppy drive
- Play MS-DOS Games on Vista
- Pocket PC iPhone conversion
© 2008 O'Reilly Media, Inc.
All trademarks and registered trademarks appearing on makezine.com are the property of their respective owners.
Recent comments