CARVIEW |
Archive: Linux Server
November 14, 2008
Linux Tip: super-fast network file copy
If you've ever had to move a huge directory containing many files from one server to another, you may have encountered a situation where the copy rate was significantly less that what you'd expect your network could support. Rsync does a fantastic job of quickly syncing two relatively similar directory structures, but the initial clone can take quite a while, especially as the file count increases.
The problem is that there is a certain amount of per-file overhead when using scp or rsync to copy files from one machine to the other. This is not a problem under most circumstances, but if you are attempting to duplicate tens of thousands of files (think, server or database backup), this per-file overhead can really add up. The solution is to copy the files over in a single stream, which normally means tarring them up on one server, copying the tarball, then untarring on the destination. Unless you are under 50% disk utilization on the source server, this could cause you to run out of space.
Brett Jones has an alternative solution, which uses the handy netcat utility:
After clearing up 10 GBs of log files, we were left with hundreds of thousands of small files that were going to slow us down. We couldn't tarball the file because of a lack of space on the source server. I started searching around and found this nifty tip that takes our encryption and streams all the files as one large file:
This requires netcat on both servers.Destination box: nc -l -p 2342 | tar -C /target/dir -xzf -
Source box: tar -cz /source/dir | nc Target_Box 2342
This causes the source machine to tar the files up and send them over the netcat pipe, where they are extracted on the destination machine, all with no per-file negotiation or unnecessary disk space used. It's also faster than the usual scp or rsync over scp because there is no encryption overhead. If you are on a local protected network, this will perform much better, even for large single-file copies.
If you are on an unprotected network, however, you may still want your data encrypted in transit. You can perform about the same task over ssh:
Run this on the destination machine:
cd /path/to/extract/to/
ssh user@source.server 'tar -cz -C /source/path/ *' | tar -zxv
This command will issue the tar command across the network on the source machine, causing tar's stdout to be sent back over the network. This is then piped to stdin on the destination machine and the files magically appear in the directory you are currently in.
The ssh route is a little slower than using netcat, due to the encryption overhead, but it's still way faster than scping the files individually. It also has the added advantage of potentially being compatible with Windows servers, provided you have a few of the unix tools like ssh and tar installed on your Windows server (using the cygwin linked binaries that are available).
Posted by Jason Striegel |
Nov 14, 2008 08:40 PM
Linux, Linux Server |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
November 9, 2008
SlugPower - Linux controlled power switch
Phil Endecott has done a bit of hacking with the Linksys NSLU2 "Slug", the low-power network storage device which runs Linux under the hood. His SlugPower project is a switched outlet that can be controlled from the Slug. This enables his print server to power up the printer when it needs to be printing, and automatically cut power to the device when it's not in use.
This page describes the hardware and software design of a printer power switch controlled over USB from my Linksys NSLU2, aka Slug. The unit can, however, be controlled from any Linux box, and can switch anything, not just printers.My NSLU2 acts mostly as a file and print server. I can go for weeks without printing anything, so I want to keep the printer switched off when I'm not using it (it takes about 4W while idle, which must be more than 99% of its total energy consumption). But it's upstairs, and I don't want to have to go up and down stairs once to switch it on and again to collect my printing. So I decided to get a power switch.
Remote power switches are pretty common in server rooms, but they are costly. This is a pretty affordable way to control the power to any device from anywhere in the world.
SlugPower - A Slug-Controlled Power Switch
Phil Endecott's Slug Projects
NSLU2-Linux
Posted by Jason Striegel |
Nov 9, 2008 10:13 PM
Electronics, Energy, Linux, Linux Server, Smart Home |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
October 18, 2008
Easy OS drive cloning for Blades with Compact Flash
If you've ever been tasked with setting up a server room full of machines, you can sympathize with the challenge of doing this with 90 boxes that use slow Compact Flash storage. Hackszine reader Left-O-Matic sent in the following story in which he describes a pretty efficient way to clone a ton of Windows-based Blade servers using Linux, a ton of USB CF adapters and GParted, the swiss army knife of filesystem tools that lets you grow, shrink, and duplicate most unix and windows partition formats.
Working in a high stressed R&D; environment, I find myself crunched for time, fighting new requests that almost always end with "...we needed this yesterday".In the past 2 years I've worked out a system using Dell 1950s and SunFire X4600s onboard RAID controllers to effectively clone entire Node setups for deployment around the world, using RAID 10 for the Dells and Raid 1 for the Suns (Server 2003 with appropiate licences for all machines).
That all came to a crashing halt when someone higher up the food chain decided to consolidate this mess using Sun 6000 Series blades. The X6250 Blade did not pose a problem as we ordered them with Raid Expansion Modules (REM) to control the Hard drives.
The $h!t hit the fan when the X6450 Blades rolled through the door loaded with 4 Quad-Core Xeons no room for 2.5" SAS Drives... This leads to the problem. Loading Server 2003 on to 90+ Blades equipped with 48GB CF cards using a USB CD-ROM.
Now let me remind you that this needed to be done "yesterday".
With the time required to load Server 2003, set the system paramters, and harden the security for each blade, I'm looking at around 5 hours a piece. Lets do the math: 5x90=450 hours...YIKES! and lets just say that I can 2 or 3 at a time...that's still forever.
On top of that, almost any windows based program won't work correctly.
Solution:
Enter the greatest FREE Linux based solution.GParted!
I found a crappy PC lying around the office that has a bunch of USB Ports on it. I then downloaded the LiveCD version, booted up the PC from CD, plugged in 10 CF card readers, and loaded them all with brand new CF cards fresh out of the blades along with one master CF card.
GParted allowed me to first create a NTFS partition on each card (leaving a 8MB slack space) and attach a boot flag to it.
Next I select the data from the Master CF card, clicked copy, then selected the destination partition and clicked paste.
The beauty of this program is that instead of do each step one at a time and waitng the 2 hours for each copy, it enabled me to line up 10 jobs that set it to copy the data from the master CF card to the destination's card, basically cloning 10 machines in 16 hours (Compact Flash transfer speeds are really, really, really slow). So after I had transfered the master copy to a internal HDD, it cut the time by..well a lot, eliminating the delay of reading a CF.
So, in conclusion before I leave for the evening, I setup a batch to clone and it's ready for me in the morning.
Quick and easy, nice and cheezy.
Thanks for the tip Left-O-Matic! I'm sure there are more than a few of you IT folks out there who could save some time this way. I have to admit I haven't done anything like this in many years. What's your favorite way to clone a room full of boxes? Ghost? GParted? Send us your cloning tips in the comments.
Posted by Jason Striegel |
Oct 18, 2008 08:24 PM
Data, Linux Server, Windows Server |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
August 26, 2008
Dealing with large numbers of files in Unix
Most of the time, you can move a bunch of files from one folder to another by running a simple mv command like "mv sourcedir/* destdir/
". The problem is that when that asterisk gets expanded, each file in the directory is added as a command line parameter to the mv command. If sourcedir contains a lot of files, this can overflow the command line buffer, resulting in a mysterious "Too many arguments" error.
I ran into this problem recently while trying to manage a directory that had over a million files in it. It's not every day you run across a directory that contains a metric crap-ton of files, but when the problem arises, there's an easy way to deal with it. The trick is to use the handy xargs program, which is designed to take a big list as stdin and separate it as arguments to another command:
find sourcedir -type f -print | xargs -l1 -i mv {} destdir/
The -l1 tells xargs to only use one argument at a time to pass to mv. The -i parameter tells xargs to replace the {} with the argument. This command will execute mv for each file in the directory. Ideally, you would optimize this and specify something like -l50, sending mv 50 files at a time to move. This is how I remember xargs working on other Unix systems, but the GNU xargs that I have on my Linux box forces the number of arguments to 1 any time the -i is invoked. Either way, it gets the job done.
Without the -i, the -l parameter will work in Linux, but you can no longer use the {} substitution and all parameters are placed as the final arguments in the command. This is useless for when you want to add a final parameter such as the destination directory for the mv command. On the other hand, it's helpful for commands that will end with your file parameters, such as when you are batch removing files with rm.
Oddly enough, in OS X the parameters for xargs are a bit wonky and capitalized. The good news is that you can invoke the parameter substitution with multiple arguments at a time. To move a bunch of files in OS X, 50 files at a time, try the following:
find sourcedir -type f -print | xargs -L50 -I{} mv {} destdir/
That's about all there is to it. This is just a basic example, but once you get used to using xargs and find together, it's pretty easy to tweak the find parameters and move files around based on their date, permissions or file extension.
Posted by Jason Striegel |
Aug 26, 2008 07:22 PM
Linux, Linux Server, Mac |
Permalink
| Comments (6)
| TrackBack
| Digg It
| Tag w/del.icio.us
August 6, 2008
Memcached and high performance MySQL
Memcached is a distributed object caching system that was originally developed to improve the performance of LiveJournal and has subsequently been used as a scaling strategy for a number of high-load sites. It serves as a large, extremely fast hash table that can be spread across many servers and accessed simultaneously from multiple processes. It's designed to be used for almost any back-end caching need, and for high performance web applications, it's a great complement to a database like MySQL.
In a typical environment, a web developer might employ a combination of process level caching and the built-in MySQL query caching to eke out that extra bit of performance from an application. The problem is that in-process caching is limited to the web process running on a single server. In a load-balanced configuration, each server is maintaining its own cache, limiting the efficiency and available size of the cache. Similarly, MySQL's query cache is limited to the server that the MySQL process is running on. The query cache is also limited in that it can only cache row results. With memcached you can set up a number cache servers which can store any type of serialized object and this data can be shared by all of the loadbalanced web servers. Cool, no?
To set up a memcached server, you simple download the daemon and run it with a few parameters. From the memcached web site:
First, you start up the memcached daemon on as many spare machines as you have. The daemon has no configuration file, just a few command line options, only 3 or 4 of which you'll likely use:
# ./memcached -d -m 2048 -l 10.0.0.40 -p 11211
This starts memcached up as a daemon, using 2GB of memory, and listening on IP 10.0.0.40, port 11211. Because a 32-bit process can only address 4GB of virtual memory (usually significantly less, depending on your operating system), if you have a 32-bit server with 4-64GB of memory using PAE you can just run multiple processes on the machine, each using 2 or 3GB of memory.
It's about as simple as it gets. There's no real configuration. No authentication. It's just a gigantor hash table. Obviously, you'd set this up on a private, non-addressable network. From there, the work of querying and updating the cache is completely up to the application designer. You are afforded the basic functions of set, get, and delete. Here's a simple example in PHP:
$memcache = new Memcache; $memcache->addServer('10.0.0.40', 11211); $memcache->addServer('10.0.0.41', 11211);$value= "Data to cache";
$memcache->set('thekey', $value, 60);
echo "Caching for 60 seconds: $value <br>\n";$retrieved = $memcache->get('thekey');
echo "Retrieved: $retrieved <br>\n";
The PHP library takes care of the dirty work of serializing any value you pass to the cache, so you can send and retrieve arrays or even complete data objects.
In your application's data layer, instead of immediately hitting the database, you can now query memcached first. If the item is found, there's no need to hit the database and assemble the data object. If the key is not found, you select the relevant data from the database and store the derived object in the cache. Similarly, you update the cache whenever your data object is altered and updated in the database. Assuming your API is structured well, only a few edits need to be made to dramatically alter the scalability and performance of your application.
I've linked to a few resources below where you can find more information on using memcached in your application. In addition to the documentation on the memcached web site, Todd Hoff has compiled a list of articles on memcached and summarized several memcached performance techniques. It's a pretty versatile tool. For those of you who've used memcached, give us a holler in the comments and share your tips and tricks.
Memcached
Strategies for Using Memcached and MySQL Better Together
Memcached and MySQL tutorial (PDF)
Posted by Jason Striegel |
Aug 6, 2008 10:37 PM
Data, Linux, Linux Server, MySQL, Software Engineering |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 13, 2008
Find and Grep 101
Find and Grep are perhaps the most used command line tools for the Linux user or administrator. Terse but powerful, these two commands will allow you to search through files and their contents by almost any imaginable attribute or filter criteria: file name, date modified, occurrence of the some specific word in a file, etc. Combined with a couple of other standard unix utilities, you can automate and process modifications over a number of files that match your search.
Here are two blog posts by Eric Wendelin which nicely illustrate the basics of these two commands:
Find is a Beautiful Tool
Grep is a Beautiful Tool
There are a number of other great unix utilities for file search, but knowing how to use find and grep is fundamental, as these two utilities can be found on the most basic build of every unix-like machine you come across.
Got a favorite command line hack that uses find or grep? Drop it on us in the comments.
Posted by Jason Striegel |
Jul 13, 2008 09:19 PM
Linux, Linux Server, Ubuntu |
Permalink
| Comments (3)
| TrackBack
| Digg It
| Tag w/del.icio.us
June 4, 2008
Use video RAM as swap in Linux
If you are into the headless or console experience, there are a couple of ways to put your machine's graphics card to good use. Most new boxes come with a GPU that has a substantial amount of RAM that is normally used for direct rendering. Using the Memory Technology Device (MTD) support in the Linux kernel, you can actually map the video card RAM to a block device and format it for swap use or as a ramdisk.
The Gentoo wiki has detailed instructions for doing this. The only tricky part is determining the video memory address, but after that it's a simple modprobe to load the MTD driver and you can run mkswap/swapon on the device just as if you were creating a normal swap disk. Considering many machines have 512MB of video RAM and it's waaaaay faster than disk, this could give you a pretty huge performance boost.
You can still use your graphics card in X, but you'll need to reserve a small chunk of that RAM for normal graphics use, use the VESA driver, and add inform the driver that it should only use that teensy portion of memory. "VideoRam 4096" in the XF86Config, for instance, will let you use your card in X and only eat the first 4MB of RAM. Everything after that 4MB is fair game for swap. Michal Schulz wrote a bit about calculating the memory address offsets to make this all work. It's the second link below, for those of you who aren't hardcore enough to deal with only the command line.
Use Memory On Video Card As Swap
Configuring X11 With A VRAM Storage Device
Posted by Jason Striegel |
Jun 4, 2008 09:10 PM
Linux, Linux Server |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
May 23, 2008
Helmer render cluster: 186 Gflops in an IKEA cabinet
I usually get all excited about tiny, noiseless, low-power PC hardware, but I have to admit that this 24 core, 186 Gflop render cluster built into an IKEA Helmer cabinet is pretty inspiring. Most cool is that when it's not overburdened and jumping to swap, it's still a reasonably efficient setup for its performance specs:
The most amazing is that this machine just cost as a better standard PC, but has 24 cores that run each at 2.4 Ghz, a total of 48GB ram, and just need 400W of power!! This means that it hardly gets warm, and make less noise then my desktop pc.Render jobs that took all night, now gets done in 10-12 min.
Janne opted for modifying the Helmer cabinet instead of using standard PC cases because the 6 cases would have cost about as much ass the motherboards and CPUs. Most of the modification involved cutting holes for airflow, power supplies, and cabling, but it looks like the Helmer's drawer dimensions accommodate the ATX motherboards almost perfectly.
I'm not all that familiar with the software behind 3D rendering (anyone care to point us to some howtos?), but Janne is using a batch management system called DrQueue that looks quite useful for a lot of distributed applications. It takes care of distributing jobs between the clsuter's nodes, allowing you to manage and monitor each of the nodes remotely from a central interface. Pretty cool stuff.
Posted by Jason Striegel |
May 23, 2008 07:48 PM
Hardware, Linux Multimedia, Linux Server |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
May 14, 2008
Debian/Ubuntu users: update your SSL keys and certs
It was announced yesterday that sometime back in September 2006 a line of code was removed from the Debian distributed OpenSSL package. That one line of code was responsible for causing an uninitialized data warning in Valgrind. It also seeded the random number generator used by OpenSSL. Without it, the error went away, but the keyspace used by affected systems went from 2^1024 to about 2^15. Oh noes!
A large majority of Debian and Ubuntu systems are affected. To correct the problem, you'll need to not only update OpenSSL, but also revoke and replace any cryptographic keys and certificates that were generated on the affected systems. From the Debian security advisory:
Affected keys include SSH keys, OpenVPN keys, DNSSEC keys, and key material for use in X.509 certificates and session keys used in SSL/TLS connections. Keys generated with GnuPG or GNUTLS are not affected, though.
For most people, this boils down to your ssh server's host key and any public key pairs used for remote ssh authentication. Any keys or certificates generated on the affected machines for SSL/https use also need to be revoked and regenerated. It's pretty ugly, really.
As far as teachable moments go, there's probably a lot to think about here. Software developers have this weird natural tendency to want to fix and reengineer things that aren't even broken. I'd go so far as to say that the desire to reengineer is inversely proportional to a programmer's familiarity and understanding of the code. I think it comes from our intense desire to make sense of things. It's the guru who's able to channel that hacker urge into solving new problems instead of creating new bugs out of old solutions.
DSA-1571-1 openssl -- predictable random number generator
OpenSSL PRNG Debian Toys (more discussion of the problem here)
Posted by Jason Striegel |
May 14, 2008 07:57 PM
Cryptography, Linux, Linux Desktop, Linux Server, Ubuntu |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
October 18, 2007
Remote snapshot backups with rsync and Samba
Thanassis Tsiodras writes:
What would you do if you had to automatically backup a remote Linux box (e.g. your web server), and all you had locally was Windows machines? How about this:
- automatically expanding local storage space
- transmissions of differences only
- automatic scheduling
- local storage of differences only
- secure and compressed transfer of remote data and
- instant filesystem navigation inside daily snapshot images
I covered all these requirements using open source tools, and I now locally backup our 3GB remote server in less than 2min!
We've all used Samba and rsync before, but Thanassis has really put all the pieces together into a complete backup system that's superior to a lot of commercial products I've seen.
The really impressive bit is how he's easily doing snapshot images using filesystem hardlinks. You can save several days worth of snapshots at very little cost because additional space is only taken up by files that have changed. Using hardlinks, identical files from different snapshots all point to the same inode.
root# mount /dev/loop0 /mnt/backup root# cd /mnt/backup root# rm -rf OneBeforeLast root# cp -al LastBackup OneBeforeLast root# cd LastBackup root# rsync -avz --delete root@hosting.machine.in.US:/ ./The "cp -al" creates a zero-cost copy of the data (using hardlinks, the only price paid is the one of the directory entries, and ReiserFS is well known for its ability to store these extremely efficiently). Then, rsync is executed with the --delete option: meaning that it must remove from our local mirror all the files that were removed on the server - and thus creating an accurate image of the current state.
And here's the icing on the cake: The data inside these files are not lost! They are still accessible from the OneBeforeLast/ directory, since hard links (the old directory entries) are pointing to them!
In plain terms, simple navigation inside OneBeforeLast can be used to examine the exact contents of the server as they were BEFORE the last mirroring.
Just imagine the data recovery headaches you could solve by adapting that to a cron job that shuffles a months worth of nightly backups.
Optimal remote Linux backups with rsync over Samba - Link
Posted by Jason Striegel |
Oct 18, 2007 10:17 PM
Linux, Linux Server, Windows, Windows Server |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
Bloggers
Welcome to the Hacks Blog!
Categories
- Ajax
- Amazon
- Android
- AppleTV
- Astronomy
- Baseball
- BlackBerry
- Blogging
- Body
- Cars
- Cryptography
- Data
- Design
- Education
- Electronics
- Energy
- Events
- Excel
- Excerpts
- Firefox
- Flash
- Flickr
- Flying Things
- Food
- Gaming
- Gmail
- Google Earth
- Google Maps
- Government
- Greasemonkey
- Hacks Series
- Hackszine Podcast
- Halo
- Hardware
- Home
- Home Theater
- iPhone
- iPod
- IRC
- iTunes
- Java
- Kindle
- Knoppix
- Language
- LEGO
- Life
- Lifehacker
- Linux
- Linux Desktop
- Linux Multimedia
- Linux Server
- Mac
- Mapping
- Math
- Microsoft Office
- Mind
- Mind Performance
- Mobile Phones
- Music
- MySpace
- MySQL
- NetFlix
- Network Security
- olpc
- Online Investing
- OpenOffice
- Outdoor
- Parenting
- PCs
- PDAs
- Perl
- Philosophy
- Photography
- PHP
- Pleo
- Podcast
- Podcasting
- Productivity
- PSP
- Retro Computing
- Retro Gaming
- Science
- Screencasts
- Security
- Shopping
- Skype
- Smart Home
- Software Engineering
- Sports
- SQL
- Statistics
- Survival
- TiVo
- Transportation
- Travel
- Ubuntu
- User Interface
- Video
- Virtualization
- Visual Studio
- VoIP
- Web
- Web Site Measurement
- Windows
- Windows Server
- Wireless
- Word
- World
- Xbox
- Yahoo!
- YouTube
Archives
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
Recent Posts
- Myvu Crystal as a wearable head mounted display
- Linux Tip: super-fast network file copy
- Embed high-res Youtube videos
- Typeface.js - embedded HTML fonts sans Flash
- Make cake in a mug
- Installing Debian alongside Android on the G1
- SlugPower - Linux controlled power switch
- Play backed-up Wii games
- Quick workaround for the T-Mobile G1 root shell bug
- Hand gesture multitouch using only a webcam
www.flickr.com
|
Most read entries (last 30 days)
- HOWTO - track stocks in Google Spreadsheets
- Linux Tip: super-fast network file copy
- Typeface.js - embedded HTML fonts sans Flash
- Make a cheap Xbox 360 Wireless Adapter with DD-WRT
- HOWTO: Reset a lost OS X password
- HOWTO - Read/Write to NTFS drives in OS X
- HOWTO - HDR photography in Gimp or Photoshop
- Easiest cross-browser CSS min-height
- HOWTO - Fix a "Red Ring of Death" Xbox 360
- View YouTube in high-res
- Play MS-DOS Games on Vista
- HOWTO - Install Ubuntu on the Asus Eee PC
- Embed high-res Youtube videos
- Using Google as a Proxy (or HOW TO: View MySpace at School)
- T-Zones and iPhone: the $5.99 data plan
© 2008 O'Reilly Media, Inc.
All trademarks and registered trademarks appearing on makezine.com are the property of their respective owners.
Recent comments