Tuesday, March 30, 2010

Using egrep and sed to catch lines on configuration files

Unix/Linux systems has a wonderful way of extracting files from configs that you normally access and modify. One such example would be to use the egrep tool to perform immediate look-up of configs that you check. The goal here is simple to present to you lines which can be viewed readily.

Older unix guys will use sed to perform the job like this:

$sed -e '/^[[:space:]]*#/d;/^[[:space:]]*$/d' file.conf

For egrep the command would be:

$egrep -v '^#' file.conf

Sunday, March 28, 2010

Horizontal Scaling -- To scale or Not?

NoSQL is a fast, portable, relational database management system without arbitrary limits, (other than memory and processor speed) that runs under, and interacts with, the UNIX¹ Operating System. It uses the "Operator-Stream Paradigm" described in "Unix Review", March, 1991, page 24, entitled "A 4GL Language". There are a number of "operators" that each perform a unique function on the data. The "stream" is supplied by the UNIX Input/Output redirection mechanism. Therefore each operator processes some data and then passes it along to the next operator via the UNIX pipe function. This is very efficient as UNIX pipes are implemented in memory. NoSQL is compliant with the "Relational Model". -- http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/nosql/Home%20Page

Most web architects and developers who manage high volume website experience deadlock when the site can no longer scale up the demand of connections it can accommodate. For systems administrators the task are even more daunting. There are a few items, I will list here that can help you better understand horizontal scaling strategies at least on the systems administrators point of view.

I. Web Servers

Understanding the core components that run your site is crucial in understanding how you can scale-up. Apache web servers run most of the most busy website you will find around, but the increasing number of apache run web-site are slowly but surely being challenged by an even more light-weight and faster web server! Nginx -- to the rescue. Nginx was taunted as the next generation of web servers that will run the most busiest sites on the web, right now nginx powers a handful of sites found on the net. With a bit of tinkering, you can marry nginx technology with apache a sort of a high-breed architecture that marries the best of both world. A good example of this implementation can be seen with most e-commerce websites where it has successfully married both technology to scale-up connections to the web-server wisely.

II. Databases

Regardless of what database system you use it is important that the technology can keep up with your sites requirements. Most websites found on the internet today uses MySQL to power there sites, but in certain instances mysql faces a greater challenge when demands grow to levels that mysql was originally intended to. The option was to move to real enterprise class databases systems like Oracle. Planning is crucial in understanding how you can architect your system to perform beyond what is expected in cases where the database can no longer accommodate such hight demands the problem may lie on the a key design constraint that wasn't there when you designed your database. Moving the read and write function to separate containers (dbs) will enhance speed. I know of some implementations where database administrators implemented sphinx technology to optimize indexing of data entries there making key searches available immediately when needed. This is transparent to the systems since the application will be able to talk this out with the sphinx engine which greatly increases the seek time by several hundredths of second. To learn of these technology marriages you will need to look for items that preclude more that just your technical expertise a good read would be to start on the sphinx website itself and see if the technology can help you out with your requirement ..... to be continued...

Thursday, March 18, 2010

XYMON Custom Graphs for your Hobbit Server

If you happen to wonder how you can make xymon graphs more customized and you have used hobbit in the past. Then this short write-up is for you. The material here again was taken from part of the gazillions of documents that has been posted on the net and I would like to emphasize on that availability since such wonderful how-tos needs a new home and reference. Read on.

How to setup custom graphs

This document walks you through the setup of custom graphs in your Xymon installation. Although Xymon comes with pre-defined setups for a lot of common types of graphs, it is also extensible allowing you to add your own tests. For many kinds of tests, it is nice to view them over a period of time in a graph - this document tells you how to do that.

Make a script to collect the data

First create your test data. Typically, this is an extension script that sends in some data to Xymon, using a status or data command. If you use status, it will show up as a separate column on the display, with a green/yellow/red color that can trigger alerts. If you use data, Xymon just collects the data into a graph - you must go to the trends column to see the graph. For this example, we'll use status.
So we create an extension script. Here is an example script; it picks two numbers out of the Linux kernel's memory statistics, and reports these to hobbit.


 #!/bin/sh

 cat /proc/slabinfo | \
    egrep "^dentry_cache|^inode_cache" | \
       awk '{print $1 " : " $3*$4}' >/tmp/slab.txt

 $BB $BBDISP "status $MACHINE.slab green `date`

 `cat /tmp/slab.txt`
 "

 exit 0

Get hobbitlaunch to run the script

Save this script in ~hobbit/client/ext/slab, and add a section to the ~hobbit/client/etc/clientlaunch.cfg to run it every 5 minutes:


 [slabinfo]
         ENVFILE /usr/lib/hobbit/client/etc/hobbitclient.cfg
         CMD /usr/lib/hobbit/client/ext/slab
  INTERVAL 5m

(On the Xymon server itself, you must add this to the file ~hobbit/server/etc/hobbitlaunch.cfg)

Check that the script data arrives in Xymon

After a few minutes, a slab column should appear on your Xymon view of this host, with the data it reports. The output looks like this:


 Sun Nov 20 09:03:44 CET 2005

 inode_cache : 330624
 dentry_cache : 40891068

Arrange for the data to be collected into an RRD file

This is obviously a name-colon-value formatted report, so we'll use the NCV module in Xymon to handle it. Xymon will find two datasets here: The first will be called inodecache, and the second dentrycache (note that Xymon strips off any part of the name that is not a letter or a number; Xymon also limits the length of the dataset name to 19 letters max. since RRD will not handle longer names). To enable this, on the Xymon server edit the ~hobbit/server/etc/hobbitserver.cfg file. The TEST2RRD setting defines how Xymon tests (status columns) map to RRD datafiles. So you add the new test to this setting, by adding slab=ncv at the end:


TEST2RRD="cpu=la,disk,<...lots more stuff...>,hobbitd,mysql=ncv,slab=ncv"

slab is the status column name, and =ncv is a token that tells Xymon to send these data through the built-in NCV module.
By default, the Xymon NCV module expects data to be some sort of counter, e.g. number of bytes sent over a network - it uses the RRD DERIVE datatype by default, which is for data that is continuously increasing in value. Some data are not like that - the data in our test script is not - and for those data you'll have to make an extra setting to tell Xymon what RRD data type to use. The RRDtool rrdcreate(1) man-page has a detailed description of the various RRD datatypes. It is available online at http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/doc/rrdcreate.en.html
Our test script provides data that goes up and down in value (it is the number of bytes of memory used for a Linux kernel bufffer), and for that kind of data we'll use the RRD GAUGE datatype. So we add an extra setting to hobbitserver.cfg:


 NCV_slab="inodecache:GAUGE,dentrycache:GAUGE"

This tells the hobbitd_rrd module that it should create an RRD file with two datasets of type GAUGE instead of the default (DERIVE). The setting must be named NCV_.
The hobbitserver.cfg file is not reloaded automatically, so you must restart Xymon after making these changes. Or at least, kill the hobbitd_rrd processes (there are usually two) - hobbitlaunch will automatically restart them, and they will then pick up the new settings.

Check that the RRD collects data

The next time the slab status is updated, Xymon will begin to collect the data. You can check this by looking for the slab.rrd file in the ~hobbit/data/rrd/HOSTNAME/ directory. If you want to check the data it collects, the rrdtool dump ~hobbit/data/rrd/HOSTNAME/slab.rrd will tell you what it got:


 
 
   0001 
   300  
   1132474725  

  
    inodecache 
RRD datatype------>  GAUGE 
    600 
    0.0000000000e+00 
    NaN 

   
current value----->  330624 
    0.0000000000e+00 
    0

If you go and look at the status page for the slab column, you should not see any graph yet, but a link to hobbit graph ncv:slab. One final step is missing.

Setup a graph definition

The final step is to tell Xymon how to create a graph from the data in the RRD file. This is done in the ~hobbit/server/etc/hobbitgraph.cfg file.


 [slab]
  TITLE Slab info
  YAXIS Bytes
  DEF:inode=slab.rrd:inodecache:AVERAGE
  DEF:dentry=slab.rrd:dentrycache:AVERAGE
  LINE2:inode#00CCCC:Inode cache
  LINE2:dentry#FF0000:Dentry cache
  COMMENT:\n
  GPRINT:inode:LAST:Inode cache \: %5.1lf%s (cur)
  GPRINT:inode:MAX: \: %5.1lf%s (max)
  GPRINT:inode:MIN: \: %5.1lf%s (min)
  GPRINT:inode:AVERAGE: \: %5.1lf%s (avg)\n
  GPRINT:dentry:LAST:Dentry cache\: %5.1lf%s (cur)
  GPRINT:dentry:MAX: \: %5.1lf%s (max)
  GPRINT:dentry:MIN: \: %5.1lf%s (min)
  GPRINT:dentry:AVERAGE: \: %5.1lf%s (avg)\n

[slab] is the name of this graph, and it must match the name of your status column if you want the graph to appear together with the status. The TITLE and YAXIS settings define the graph title and the legend on the Y-axis. The rest are definitions for the rrdgraph(1) tool - you should read the RRDtool docs if you want to know in detail how it works. For now, all you need to know is that you must pick out the data you want from the RRD file with a DEF line, like


  DEF:inode=slab.rrd:inodecache:AVERAGE

which gives you an "inode" definition that has the value from the inodecache dataset in the slab.rrd file. This is then used to draw a line on the graph:


  LINE2:inode#00CCCC:Inode cache

The line gets the color #00CCCC (red-green-blue), which is a light greenish-blue color. Note that you can have several lines in one graph, if it makes sense to compare them. You can also use other types of visual effects, e.g. stack values on top of each other (like the vmstat graphs do) - this is described in the rrdgraph man-page. An online version is at http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/doc/rrdgraph.en.html. The GPRINT lines at the end of the graph definition also uses the inode value to print a summary line showing the current, maximum, minimum and average values from the data that has been collected.
Once you have added this section to hobbitgraph.cfg, refresh the status page in your browser, and the graph should show up.

Add the graph to the collection of graphs on the trends column

If you want the graph included with the other graphs on the trends column, you must add it to the GRAPHS setting in the ~hobbit/server/etc/hobbitserver.cfg file.


 GRAPHS="la,disk,<... lots more ...>,bbproxy,hobbitd,slab"

Save the file, and when you click on the trends column you should see the slab graph at the bottom of the page.

Common problems and pitfalls

If your graph nearly always shows 0

You probably used the wrong RRD datatype for your data - see step 4. By default, the RRD file expects data that is increasing constantly; if you are tracking some data that just varies up and down, you must use the RRD GAUGE datatype. Note that when you change the RRD datatype, you must delete any existing RRD files - the RRD datatype is defined when the RRD file is created, and cannot be changed on the fly.

No graph on the status page, but OK on the trends page

Make sure you have ncv listed in the GRAPHS setting in hobbitserver.cfg. (Don't ask why - just take my word that it must be there).

reference: http://www.hswn.dk/hobbit/help/howtograph.html

Tuesday, March 16, 2010

SVN Migration -- V2P

What if... you have a virtual server that is no longer scaling to your current demands? Meaning the host server has reach its maximum and you need more power to the server that you are running on that machine. The next logical thing to do will be to move out from a virtual environment to a physical one.

In this section I will give you insights on how you will be able to move your current server configs without the to much hasstle in searching the net. The technology involved here are three:

1. Linux/Unix -- system to migrate
2. VMware Server centric guest -- the server to migrate
3. SVN -- service that needs to be migrated

Preparing the new box

Regardless of which distribution you use the steps listed here will definitely work well with you current need. Assuming that you don't have any other bizzare configuration in place aside from those I mentioned earlier.

Step 1 Install the OS in my case I used CentOS 5.4 release

Install necessary packages if you used the base installation

$ sudo yum install mod_dav_svn subversion subversion-tools

This will create the necessary files needed to configure subversion on this new server.

Now you need to create the new svn repository

$ sudo mkdir //svn
$ sudo svnadmin create //svn/repos

Create a dump of your current svn repo from your existing virtual server (old one)

$ sudo svnadmin dump   //repos > dumpfile
This will effectively dump your current svn repos to a single file, depending on the size it will take awhile before you can move this dump to the new server.

Once you are done with the dump, scp this file to the new server

$ scp   dumpfile somename@someip:/somehome/

Step 2 Migrate the configs

$ mkdir svn_configs
$ cp /etc/svn-acl-conf svn_configs/
$ cp /etc/svn-auth-conf   svn_configs/
$ cp /etc/httpd/conf.d/subversion.conf svn_configs/
$ tar cvjf svn_configs.tar.gz ///
$ scp svn_configs.tar.gz someuser@some_ip:/somehome/

From your new box untar the scp'd file and copy them to the desired locations

$ tar xvf svn.tar.gz
$ cd svn_configs
$ sudo cp svn-acl-conf /etc
$ sudo cp svn-auth-conf /etc
$ sudo cp subversion.conf /etc/httpd/conf.d/ (replace the one created by svn installation)

Now you need to load your old repos to the new one.

$sudo svnadmin load   //repos < dumpfile
$sudo svnadmin verify /somelocation/repos
$sudo svnadmin recover /somelocation/repos

Step 3 Test your configuration

The easiest way test if you have successfully dumped your repo will be to check this with your browser.

https://some_ip_address/repos

You should see a prompt asking for your username and password. If you see the repos directory listing your done!

Next would be to test if you could export and commit to this repository, something that is out of scope for this write-up.

Cheers!!!

Saturday, March 13, 2010

FIREWALL: The APF Guide

As much as I would like to say that NIX systems are way too far secured than most systems. But apparently they're not! Crackers love the challenge of breaking into a system and they favor NIXES because of its well known notoriety for security. In this article, again I will be sharing with you a snippet of one excellent post from the net which gives credible insight on how to set-up an intelligent firewall for you box.

What is APF (Advanced Policy Firewall)? APF FirewallAPF is a policy based iptables firewall system designed for ease of use and configuration. It employs a subset of features to satisfy the veteran Linux user and the novice alike. Packaged in tar.gz format and RPM formats, make APF ideal for deployment in many server environments based on Linux. APF is developed and maintained by R-fx Networks: http://www.rfxnetworks.com/apf.php

This guide will show you how to install and configure APF firewall, one of the better known Linux firewalls available.10

Limit SSH connections to one IP with APF in this advanced tutorial
Requirements:
- Root SSH access to your server

Lets begin!Login to your server through SSH and su to the root user.

1. cd /root/downloads or another temporary folder where you store your files.

2. wget http://www.rfxnetworks.com/downloads/apf-current.tar.gz

3. tar -xvzf apf-current.tar.gz

4. cd apf-0.9.5-1/ or whatever the latest version is.

5. Run the install file: ./install.shYou will receive a message saying it has been installed

Installing APF 0.9.5-1: Completed.
Installation Details:
Install path:         /etc/apf/
Config path:          /etc/apf/conf.apf
Executable path:      /usr/local/sbin/apf
AntiDos install path: /etc/apf/ad/
AntiDos config path: /etc/apf/ad/conf.antidos
DShield Client Parser: /etc/apf/extras/dshield/
Other Details:
Listening TCP ports: 1,21,22,25,53,80,110,111,143,443,465,993,995,2082,2083,2086,2087,2095,2096,3306
Listening UDP ports: 53,55880
Note: These ports are not auto-configured; they are simply presented for information purposes. You must manually configure all port options.

6. Lets configure the firewall: pico /etc/apf/conf.apfWe will go over the general configuration to get your firewall running. This isn't a complete detailed guide of every feature the firewall has. Look through the README and the configuration for an explanation of each feature.
We like to use DShield.org's "block" list of top networks that have exhibited
suspicious activity.
FIND: USE_DS="0"CHANGE TO: USE_DS="1"

7. Configuring Firewall Ports:

Cpanel ServersWe like to use the following on our Cpanel Servers

Common ingress (inbound) ports# Common ingress (inbound) TCP ports -3000_3500 = passive port range for Pure FTPD
IG_TCP_CPORTS="21,22,25,53,80,110,143,443,2082,2083, 2086,2087, 2095, 2096,3000_3500"
#
# Common ingress (inbound) UDP ports
IG_UDP_CPORTS="53"

Common egress (outbound) ports
# Egress filtering [0 = Disabled / 1 = Enabled]
EGF="1"

# Common egress (outbound) TCP ports
EG_TCP_CPORTS="21,25,80,443,43,2089"
#
# Common egress (outbound) UDP ports
EG_UDP_CPORTS="20,21,53"

Ensim Servers
We have found the following can be used on Ensim Servers - although we have not tried these ourselves as I don't run Ensim boxes.

Common ingress (inbound) ports# Common ingress (inbound) TCP ports
IG_TCP_CPORTS="21,22,25,53,80,110,143,443,19638"
#
# Common ingress (inbound) UDP ports
IG_UDP_CPORTS="53"

Common egress (outbound) ports
# Egress filtering [0 = Disabled / 1 = Enabled]
EGF="1"

# Common egress (outbound) TCP ports
EG_TCP_CPORTS="21,25,80,443,43"
#
# Common egress (outbound) UDP ports
EG_UDP_CPORTS="20,21,53"

Save the changes: Ctrl+X then Y

8. Starting the firewall
/usr/local/sbin/apf -s

Other commands:
usage ./apf [OPTION]
-s|--start ......................... load firewall policies
-r|--restart ....................... flush & load firewall
-f|--flush|--stop .................. flush firewall
-l|--list .......................... list chain rules
-st|--status ....................... firewall status
-a HOST CMT|--allow HOST COMMENT ... add host (IP/FQDN) to allow_hosts.rules and
                                     immediately load new rule into firewall
-d HOST CMT|--deny HOST COMMENT .... add host (IP/FQDN) to deny_hosts.rules and
                                     immediately load new rule into firewall

9. After everything is fine, change the DEV option
Stop the firewall from automatically clearing itself every 5 minutes from cron.
We recommend changing this back to "0" after you've had a chance to ensure everything is working well and tested the server out.

pico /etc/apf/conf.apf
FIND: DEVM="1"
CHANGE TO: DEVM="0"
10. Configure AntiDOS for APFRelatively new to APF is the new AntiDOS feature which can be found in: /etc/apf/ad
The log file will be located at /var/log/apfados_log so you might want to make note of it and watch it!

pico /etc/apf/ad/conf.antidos

There are various things you might want to fiddle with but I'll get the ones that will alert you by email.
# [E-Mail Alerts]
Under this heading we have the following:

# Organization name to display on outgoing alert emails
CONAME="Your Company"
Enter your company information name or server name..

# Send out user defined attack alerts [0=off,1=on]
USR_ALERT="0"
Change this to 1 to get email alerts

# User for alerts to be mailed to
USR="your@email.com"
Enter your email address to receive the alerts

Save your changes! Ctrl+X then press Y
Restart the firewall: /usr/local/sbin/apf -r

11. Checking the APF Log

Will show any changes to allow and deny hosts among other things.
tail -f /var/log/apf_log

Example output:
Aug 23 01:25:55 ocean apf(31448): (insert) deny all to/from 185.14.157.123
Aug 23 01:39:43 ocean apf(32172): (insert) allow all to/from 185.14.157.123

12. New - Make APF Start automatically at boot timeTo autostart apf on reboot, run this:
chkconfig --level 2345 apf on
To remove it from autostart, run this:
chkconfig --del apf

13. Denying IPs with APF Firewall (Blocking)
Now that you have your shiny new firewall you probably want to block a host right, of course you do! With this new version APF now supports comments as well. There are a few ways you can block an IP, I'll show you 2 of the easier methods.
A) /etc/apf/apf -d IPHERE COMMENTHERENOSPACES
> The -d flag means DENY the IP address
> IPHERE is the IP address you wish to block
> COMMENTSHERENOSPACES is obvious, add comments to why the IP is being blocked
These rules are loaded right away into the firewall, so they're instantly active.
Example:

./apf -d 185.14.157.123 TESTING

pico /etc/apf/deny_hosts.rules

Shows the following:

# added 185.14.157.123 on 08/23/05 01:25:55
# TESTING
185.14.157.123
B) pico /etc/apf/deny_hosts.rules
You can then just add a new line and enter the IP you wish to block. Before this becomes active though you'll need to reload the APF ruleset.

/etc/apf/apf -r

14. Allowing IPs with APF Firewall (Unblocking)

I know I know, you added an IP now you need it removed right away! You need to manually remove IPs that are blocked from deny_hosts.rules.
A)
pico /etc/apf/deny_hosts.rules

Find where the IP is listed and remove the line that has the IP.
After this is done save the file and reload apf to make the new changes active.

/etc/apf/apf -r

B) If the IP isn't already listed in deny_hosts.rules and you wish to allow it, this method adds the entry to allow_hosts.rules

/etc/apf/apf -a IPHERE COMMENTHERENOSPACES
> The -a flag means ALLOW the IP address
> IPHERE is the IP address you wish to allow
> COMMENTSHERENOSPACES is obvious, add comments to why the IP is being removed These rules are loaded right away into the firewall, so they're instantly active.
Example:

./apf -a 185.14.157.123 UNBLOCKING

pico /etc/apf/allow_hosts.rules

# added 185.14.157.123 on 08/23/05 01:39:43
# UNBLOCKING
185.14.157.123

reference: http://www.webhostgear.com/61_print.html

SVN Hoopla: UNIX / LINUX

There are alot of excellent materials out there that does an wonderful way of explaining things about SVN. However, I came across one site that explains in granularity the core administration hooplas in SVN. This post hopes to re-enforce that write-up. Enough of the subtle introduction lets get on with it.

Repository Maintenance

Maintaining a Subversion repository can be a daunting task, mostly due to the complexities inherent in systems which have a database backend. Doing the task well is all about knowing the tools—what they are, when to use them, and how to use them. This section will introduce you to the repository administration tools provided by Subversion, and how to wield them to accomplish tasks such as repository migrations, upgrades, backups and cleanups.

An Administrator's Toolkit

Subversion provides a handful of utilities useful for creating, inspecting, modifying and repairing your repository. Let's look more closely at each of those tools. Afterward, we'll briefly examine some of the utilities included in the Berkeley DB distribution that provide functionality specific to your repository's database backend not otherwise provided by Subversion's own tools.

svnlook

svnlook is a tool provided by Subversion for examining the various revisions and transactions in a repository. No part of this program attempts to change the repository—it's a “read-only” tool. svnlook is typically used by the repository hooks for reporting the changes that are about to be committed (in the case of the pre-commit hook) or that were just committed (in the case of the post-commit hook) to the repository. A repository administrator may use this tool for diagnostic purposes.
svnlook has a straightforward syntax:

$ svnlook help
general usage: svnlook SUBCOMMAND REPOS_PATH [ARGS & OPTIONS ...]
Note: any subcommand which takes the '--revision' and '--transaction'
      options will, if invoked without one of those options, act on
      the repository's youngest revision.
Type "svnlook help " for help on a specific subcommand.
…

Nearly every one of svnlook's subcommands can operate on either a revision or a transaction tree, printing information about the tree itself, or how it differs from the previous revision of the repository. You use the --revision and --transaction options to specify which revision or transaction, respectively, to examine. Note that while revision numbers appear as natural numbers, transaction names are alphanumeric strings. Keep in mind that the filesystem only allows browsing of uncommitted transactions (transactions that have not resulted in a new revision). Most repositories will have no such transactions, because transactions are usually either committed (which disqualifies them from viewing) or aborted and removed.
In the absence of both the --revision and --transaction options, svnlook will examine the youngest (or “HEAD”) revision in the repository. So the following two commands do exactly the same thing when 19 is the youngest revision in the repository located at /path/to/repos:

$ svnlook info /path/to/repos
$ svnlook info /path/to/repos --revision 19

The only exception to these rules about subcommands is the svnlook youngest subcommand, which takes no options, and simply prints out the HEAD revision number.

$ svnlook youngest /path/to/repos
19

Output from svnlook is designed to be both human- and machine-parsable. Take as an example the output of the info subcommand:

$ svnlook info /path/to/repos
sally
2002-11-04 09:29:13 -0600 (Mon, 04 Nov 2002)
27
Added the usual
Greek tree.

The output of the info subcommand is defined as:

The author, followed by a newline.
The date, followed by a newline.
The number of characters in the log message, followed by a newline.
The log message itself, followed by a newline.

This output is human-readable, meaning items like the datestamp are displayed using a textual representation instead of something more obscure (such as the number of nanoseconds since the Tasty Freeze guy drove by). But this output is also machine-parsable—because the log message can contain multiple lines and be unbounded in length, svnlook provides the length of that message before the message itself. This allows scripts and other wrappers around this command to make intelligent decisions about the log message, such as how much memory to allocate for the message, or at least how many bytes to skip in the event that this output is not the last bit of data in the stream.
Another common use of svnlook is to actually view the contents of a revision or transaction tree. The svnlook tree command displays the directories and files in the requested tree. If you supply the --show-ids option, it will also show the filesystem node revision IDs for each of those paths (which is generally of more use to developers than to users).

$ svnlook tree /path/to/repos --show-ids
/ <0.0.1>
 A/ <2.0.1>
  B/ <4.0.1>
   lambda <5.0.1>
   E/ <6.0.1>
    alpha <7.0.1>
    beta <8.0.1>
   F/ <9.0.1>
  mu <3.0.1>
  C/ 
  D/ 
   gamma 
   G/ 
    pi 
    rho 
    tau 
   H/ 
    chi 
    omega 
    psi 
 iota <1.0.1>

Once you've seen the layout of directories and files in your tree, you can use commands like svnlook cat, svnlook propget, and svnlook proplist to dig into the details of those files and directories.
svnlook can perform a variety of other queries, displaying subsets of bits of information we've mentioned previously, reporting which paths were modified in a given revision or transaction, showing textual and property differences made to files and directories, and so on. The following is a brief description of the current list of subcommands accepted by svnlook, and the output of those subcommands:

author: Print the tree's author.
cat: Print the contents of a file in the tree.
changed: List all files and directories that changed in the tree.
date: Print the tree's datestamp.
diff: Print unified diffs of changed files.
dirs-changed: List the directories in the tree that were themselves changed, or whose file children were changed.
history: Display interesting points in the history of a versioned path (places where modifications or copies occurred).
info: Print the tree's author, datestamp, log message character count, and log message.
log: Print the tree's log message.
propget: Print the value of a property on a path in the tree.
proplist: Print the names and values of properties set on paths in the tree.
tree: Print the tree listing, optionally revealing the filesystem node revision IDs associated with each path.
uuid: Print the tree's unique user ID (UUID).
youngest: Print the youngest revision number.

svnadmin

The svnadmin program is the repository administrator's best friend. Besides providing the ability to create Subversion repositories, this program allows you to perform several maintenance operations on those repositories. The syntax of svnadmin is similar to that of svnlook:

$ svnadmin help
general usage: svnadmin SUBCOMMAND REPOS_PATH  [ARGS & OPTIONS ...]
Type "svnadmin help " for help on a specific subcommand.

Available subcommands:
   create
   deltify
   dump
   help (?, h)
…

We've already mentioned svnadmin's create subcommand (see the section called “Repository Creation and Configuration”). Most of the others we will cover in more detail later in this chapter. For now, let's just take a quick glance at what each of the available subcommands offers.

create: Create a new Subversion repository.
deltify: Run over a specified revision range, performing predecessor deltification on the paths changed in those revisions. If no revisions are specified, this command will simply deltify the HEAD revision.
dump: Dump the contents of the repository, bounded by a given set of revisions, using a portable dump format.
hotcopy: Make a hot copy of a repository. You can run this command at any time and make a safe copy of the repository, regardless if other processes are using the repository.
list-dblogs: List the paths of Berkeley DB log files associated with the repository. This list includes all log files—those still in use by Subversion, as well as those no longer in use.
list-unused-dblogs: List the paths of Berkeley DB log files associated with, but no longer used by, the repository. You may safely remove these log files from the repository layout, possibly archiving them for use in the event that you ever need to perform a catastrophic recovery of the repository.
load: Load a set of revisions into a repository from a stream of data that uses the same portable dump format generated by the dump subcommand.
lstxns: List the names of uncommitted Subversion transactions that currently exist in the repository.
recover: Perform recovery steps on a repository that is in need of such, generally after a fatal error has occurred that prevented a process from cleanly shutting down its communication with the repository.
rmtxns: Cleanly remove Subversion transactions from the repository (conveniently fed by output from the lstxns subcommand).
setlog: Replace the current value of the svn:log (commit log message) property on a given revision in the repository with a new value.
verify: Verify the contents of the repository. This includes, among other things, checksum comparisons of the versioned data stored in the repository.

svndumpfilter

Since Subversion stores everything in an opaque database system, attempting manual tweaks is unwise, if not quite difficult. And once data has been stored in your repository, Subversion generally doesn't provide an easy way to remove that data. ^[13] But inevitably, there will be times when you would like to manipulate the history of your repository. You might need to strip out all instances of a file that was accidentally added to the repository (and shouldn't be there for whatever reason). Or, perhaps you have multiple projects sharing a single repository, and you decide to split them up into their own repositories. To accomplish tasks like this, administrators need a more manageable and malleable representation of the data in their repositories—the Subversion repository dump format.
The Subversion repository dump format is a human-readable representation of the changes that you've made to your versioned data over time. You use the svnadmin dump command to generate the dump data, and svnadmin load to populate a new repository with it (see the section called “Migrating a Repository”). The great thing about the human-readability aspect of the dump format is that, if you aren't careless about it, you can manually inspect and modify it. Of course, the downside is that if you have two years' worth of repository activity encapsulated in what is likely to be a very large dumpfile, it could take you a long, long time to manually inspect and modify it.
While it won't be the most commonly used tool at the administrator's disposal, svndumpfilter provides a very particular brand of useful functionality—the ability to quickly and easily modify that dumpfile data by acting as a path-based filter. Simply give it either a list of paths you wish to keep, or a list of paths you wish to not keep, then pipe your repository dump data through this filter. The result will be a modified stream of dump data that contains only the versioned paths you (explicitly or implicitly) requested.
The syntax of svndumpfilter is as follows:

$ svndumpfilter help
general usage: svndumpfilter SUBCOMMAND [ARGS & OPTIONS ...]
Type "svndumpfilter help " for help on a specific subcommand.

Available subcommands:
   exclude
   include
   help (?, h)

There are only two interesting subcommands. They allow you to make the choice between explicit or implicit inclusion of paths in the stream:

exclude: Filter out a set of paths from the dump data stream.
include: Allow only the requested set of paths to pass through the dump data stream.

Let's look a realistic example of how you might use this program. We discuss elsewhere (see the section called “Choosing a Repository Layout”) the process of deciding how to choose a layout for the data in your repositories—using one repository per project or combining them, arranging stuff within your repository, and so on. But sometimes after new revisions start flying in, you rethink your layout and would like to make some changes. A common change is the decision to move multiple projects which are sharing a single repository into separate repositories for each project.
Our imaginary repository contains three projects: calc, calendar, and spreadsheet. They have been living side-by-side in a layout like this:

/
   calc/
      trunk/
      branches/
      tags/
   calendar/
      trunk/
      branches/
      tags/
   spreadsheet/
      trunk/
      branches/
      tags/

To get these three projects into their own repositories, we first make a dumpfile of the whole repository:

$ svnadmin dump /path/to/repos > repos-dumpfile
* Dumped revision 0.
* Dumped revision 1.
* Dumped revision 2.
* Dumped revision 3.
…
$

Next, run that dumpfile through the filter, each time including only one of our top-level directories, and resulting in three new dumpfiles:

$ cat repos-dumpfile | svndumpfilter include calc > calc-dumpfile
…
$ cat repos-dumpfile | svndumpfilter include calendar > cal-dumpfile
…
$ cat repos-dumpfile | svndumpfilter include spreadsheet > ss-dumpfile
…
$

At this point, you have to make a decision. Each of your dumpfiles will create a valid repository, but will preserve the paths exactly as they were in the original repository. This means that even though you would have a repository solely for your calc project, that repository would still have a top-level directory named calc. If you want your trunk, tags, and branches directories to live in the root of your repository, you might wish to edit your dumpfiles, tweaking the Node-path and Copyfrom-path headers to no longer have that first calc/ path component. Also, you'll want to remove the section of dump data that creates the calc directory. It will look something like:

Node-path: calc
Node-action: add
Node-kind: dir
Content-length: 0

All that remains now is to create your three new repositories, and load each dumpfile into the right repository:

$ svnadmin create calc; svnadmin load calc < calc-dumpfile
<<< Started new transaction, based on original revision 1
     * adding path : Makefile ... done.
     * adding path : button.c ... done.
…
$ svnadmin create calendar; svnadmin load calendar < cal-dumpfile
<<< Started new transaction, based on original revision 1
     * adding path : Makefile ... done.
     * adding path : cal.c ... done.
…
$ svnadmin create spreadsheet; svnadmin load spreadsheet < ss-dumpfile
<<< Started new transaction, based on original revision 1
     * adding path : Makefile ... done.
     * adding path : ss.c ... done.
…
$

Both of svndumpfilter's subcommands accept options for deciding how to deal with “empty” revisions. If a given revision contained only changes to paths that were filtered out, that now-empty revision could be considered uninteresting or even unwanted. So to give the user control over what to do with those revisions, svndumpfilter provides the following command-line options:

--drop-empty-revs: Do not generate empty revisions at all—just omit them.
--renumber-revs: If empty revisions are dropped (using the --drop-empty-revs option), change the revision numbers of the remaining revisions so that there are no gaps in the numeric sequence.
--preserve-revprops: If empty revisions are not dropped, preserve the revision properties (log message, author, date, custom properties, etc.) for those empty revisions. Otherwise, empty revisions will only contain the original datestamp, and a generated log message that indicates that this revision was emptied by svndumpfilter.

While svndumpfilter can be very useful, and a huge timesaver, there are unfortunately a couple of gotchas. First, this utility is overly sensitive to path semantics. Pay attention to whether paths in your dumpfile are specified with or without leading slashes. You'll want to look at the Node-path and Copyfrom-path headers.

…
Node-path: spreadsheet/Makefile
…

If the paths have leading slashes, you should include leading slashes in the paths you pass to svndumpfilter include and svndumpfilter exclude (and if they don't, you shouldn't). Further, if your dumpfile has an inconsistent usage of leading slashes for some reason, ^[14] you should probably normalize those paths so they all have, or lack, leading slashes.
Also, copied paths can give you some trouble. Subversion supports copy operations in the repository, where a new path is created by copying some already existing path. It is possible that at some point in the lifetime of your repository, you might have copied a file or directory from some location that svndumpfilter is excluding, to a location that it is including. In order to make the dump data self-sufficient, svndumpfilter needs to still show the addition of the new path—including the contents of any files created by the copy—and not represent that addition as a copy from a source that won't exist in your filtered dump data stream. But because the Subversion repository dump format only shows what was changed in each revision, the contents of the copy source might not be readily available. If you suspect that you have any copies of this sort in your repository, you might want to rethink your set of included/excluded paths.

svnshell.py

The Subversion source tree also comes with a shell-like interface to the repository. The svnshell.py Python script (located in tools/examples/ in the source tree) uses Subversion's language bindings (so you must have those properly compiled and installed in order for this script to work) to connect to the repository and filesystem libraries.
Once started, the program behaves similarly to a shell program, allowing you to browse the various directories in your repository. Initially, you are “positioned” in the root directory of the HEAD revision of the repository, and presented with a command prompt. You can use the help command at any time to display a list of available commands and what they do.

$ svnshell.py /path/to/repos
$  help
Available commands:
  cat FILE     : dump the contents of FILE
  cd DIR       : change the current working directory to DIR
  exit         : exit the shell
  ls [PATH]    : list the contents of the current directory
  lstxns       : list the transactions available for browsing
  setrev REV   : set the current revision to browse
  settxn TXN   : set the current transaction to browse
  youngest     : list the youngest browsable revision number
$

Navigating the directory structure of your repository is done in the same way you would navigate a regular Unix or Windows shell—using the cd command. At all times, the command prompt will show you what revision (prefixed by rev:) or transaction (prefixed by txn:) you are currently examining, and at what path location in that revision or transaction. You can change your current revision or transaction with the setrev and settxn commands, respectively. As in a Unix shell, you can use the ls command to display the contents of the current directory, and you can use the cat command to display the contents of a file.

Example 5.1. Using svnshell to Navigate the Repository

$ ls
   REV   AUTHOR  NODE-REV-ID     SIZE         DATE NAME
----------------------------------------------------------------------------
     1    sally <     2.0.1>          Nov 15 11:50 A/
     2    harry <     1.0.2>       56 Nov 19 08:19 iota
$ cd A
$ ls
   REV   AUTHOR  NODE-REV-ID     SIZE         DATE NAME
----------------------------------------------------------------------------
     1    sally <     4.0.1>          Nov 15 11:50 B/
     1    sally <     a.0.1>          Nov 15 11:50 C/
     1    sally <     b.0.1>          Nov 15 11:50 D/
     1    sally <     3.0.1>       23 Nov 15 11:50 mu
$ cd D/G 
$ ls
   REV   AUTHOR  NODE-REV-ID     SIZE         DATE NAME
----------------------------------------------------------------------------
     1    sally <     e.0.1>       23 Nov 15 11:50 pi
     1    sally <     f.0.1>       24 Nov 15 11:50 rho
     1    sally <     g.0.1>       24 Nov 15 11:50 tau
$ cd ../..
$ cat iota
This is the file 'iota'.
Added this text in revision 2.

$ setrev 1; cat iota
This is the file 'iota'.

$ exit
$

As you can see in the previous example, multiple commands may be specified at a single command prompt, separated by a semicolon. Also, the shell understands the notions of relative and absolute paths, and will properly handle the . and .. special path components.
The youngest command displays the youngest revision. This is useful for determining the range of valid revisions you can use as arguments to the setrev command—you are allowed to browse all the revisions (recalling that they are named with integers) between 0 and the youngest, inclusively. Determining the valid browsable transactions isn't quite as pretty. Use the lstxns command to list the transactions that you are able to browse. The list of browsable transactions is the same list that svnadmin lstxns returns, and the same list that is valid for use with svnlook's --transaction option.
Once you've finished using the shell, you can exit cleanly by using the exit command. Alternatively, you can supply an end-of-file character—Control-D (though some Win32 Python distributions use the Windows Control-Z convention instead).

Berkeley DB Utilities

All of your versioned filesystem's structure and data live in a set of Berkeley DB database tables within the db subdirectory of your repository. This subdirectory is a regular Berkeley DB environment directory, and can therefore be used in conjunction with any of the Berkeley database tools (you can see the documentation for these tools at SleepyCat's website, http://www.sleepycat.com/).
For day-to-day Subversion use, these tools are unnecessary. Most of the functionality typically needed for Subversion repositories has been duplicated in the svnadmin tool. For example, svnadmin list-unused-dblogs and svnadmin list-dblogs perform a subset of what is provided by the Berkeley db_archive command, and svnadmin recover reflects the common use-cases of the db_recover utility.
There are still a few Berkeley DB utilities that you might find useful. The db_dump and db_load programs write and read, respectively, a custom file format which describes the keys and values in a Berkeley DB database. Since Berkeley databases are not portable across machine architectures, this format is a useful way to transfer those databases from machine to machine, irrespective of architecture or operating system. Also, the db_stat utility can provide useful information about the status of your Berkeley DB environment, including detailed statistics about the locking and storage subsystems.

Repository Cleanup

Your Subversion repository will generally require very little attention once it is configured to your liking. However, there are times when some manual assistance from an administrator might be in order. The svnadmin utility provides some helpful functionality to assist you in performing such tasks as

modifying commit log messages,
removing dead transactions,
recovering “wedged” repositories, and
migrating repository contents to a different repository.

Perhaps the most commonly used of svnadmin's subcommands is setlog. When a transaction is committed to the repository and promoted to a revision, the descriptive log message associated with that new revision (and provided by the user) is stored as an unversioned property attached to the revision itself. In other words, the repository remembers only the latest value of the property, and discards previous ones.
Sometimes a user will have an error in her log message (a misspelling or some misinformation, perhaps). If the repository is configured (using the pre-revprop-change and post-revprop-change hooks; see the section called “Hook Scripts”) to accept changes to this log message after the commit is finished, then the user can “fix” her log message remotely using the svn program's propset command (see Chapter 9, Subversion Complete Reference). However, because of the potential to lose information forever, Subversion repositories are not, by default, configured to allow changes to unversioned properties—except by an administrator.
If a log message needs to be changed by an administrator, this can be done using svnadmin setlog. This command changes the log message (the svn:log property) on a given revision of a repository, reading the new value from a provided file.

$ echo "Here is the new, correct log message" > newlog.txt
$ svnadmin setlog myrepos newlog.txt -r 388

The svnadmin setlog command alone is still bound by the same protections against modifying unversioned properties as a remote client is—the pre- and post-revprop-change hooks are still triggered, and therefore must be setup to accept changes of this nature. But an administrator can get around these protections by passing the --bypass-hooks option to the svnadmin setlog command.

Warning

Remember, though, that by bypassing the hooks, you are likely avoiding such things as email notifications of property changes, backup systems which track unversioned property changes, and so on. In other words, be very careful about what you are changing, and how you change it.

Another common use of svnadmin is to query the repository for outstanding—possibly dead—Subversion transactions. In the event that a commit should fail, the transaction is usually cleaned up. That is, the transaction itself is removed from the repository, and any data associated with (and only with) that transaction is removed as well. Occasionally, though, a failure occurs in such a way that the cleanup of the transaction never happens. This could happen for several reasons: perhaps the client operation was inelegantly terminated by the user, or a network failure might have occurred in the middle of an operation, etc. Regardless of the reason, these dead transactions serve only to clutter the repository and consume resources.
You can use svnadmin's lstxns command to list the names of the currently outstanding transactions.

$ svnadmin lstxns myrepos
19
3a1
a45
$

Each item in the resultant output can then be used with svnlook (and its --transaction option) to determine who created the transaction, when it was created, what types of changes were made in the transaction—in other words, whether or not the transaction is a safe candidate for removal! If so, the transaction's name can be passed to svnadmin rmtxns, which will perform the cleanup of the transaction. In fact, the rmtxns subcommand can take its input directly from the output of lstxns!

$ svnadmin rmtxns myrepos `svnadmin lstxns myrepos`
$

If you use these two subcommands like this, you should consider making your repository temporarily inaccessible to clients. That way, no one can begin a legitimate transaction before you start your cleanup. The following is a little bit of shell-scripting that can quickly generate information about each outstanding transaction in your repository:

Example 5.2. txn-info.sh (Reporting Outstanding Transactions)

#!/bin/sh

### Generate informational output for all outstanding transactions in
### a Subversion repository.

SVNADMIN=/usr/local/bin/svnadmin
SVNLOOK=/usr/local/bin/svnlook

REPOS="${1}"
if [ "x$REPOS" = x ] ; then
  echo "usage: $0 REPOS_PATH"
  exit
fi

for TXN in `${SVNADMIN} lstxns ${REPOS}`; do 
  echo "---[ Transaction ${TXN} ]-------------------------------------------"
  ${SVNLOOK} info "${REPOS}" --transaction "${TXN}"
done

You can run the previous script using /path/to/txn-info.sh /path/to/repos. The output is basically a concatenation of several chunks of svnlook info output (see the section called “svnlook”), and will look something like:

$ txn-info.sh myrepos
---[ Transaction 19 ]-------------------------------------------
sally
2001-09-04 11:57:19 -0500 (Tue, 04 Sep 2001)
0
---[ Transaction 3a1 ]-------------------------------------------
harry
2001-09-10 16:50:30 -0500 (Mon, 10 Sep 2001)
39
Trying to commit over a faulty network.
---[ Transaction a45 ]-------------------------------------------
sally
2001-09-12 11:09:28 -0500 (Wed, 12 Sep 2001)
0
$

Usually, if you see a dead transaction that has no log message attached to it, this is the result of a failed update (or update-like) operation. These operations use Subversion transactions under the hood to mimic working copy state. Since they are never intended to be committed, Subversion doesn't require a log message for those transactions. Transactions that do have log messages attached are almost certainly failed commits of some sort. Also, a transaction's datestamp can provide interesting information—for example, how likely is it that an operation begun nine months ago is still active?
In short, transaction cleanup decisions need not be made unwisely. Various sources of information—including Apache's error and access logs, the logs of successful Subversion commits, and so on—can be employed in the decision-making process. Finally, an administrator can often simply communicate with a seemingly dead transaction's owner (via email, for example) to verify that the transaction is, in fact, in a zombie state.

Managing Disk Space

While the cost of storage has dropped incredibly in the past few years, disk usage is still a valid concern for administrators seeking to version large amounts of data. Every additional byte consumed by the live repository is a byte that needs to be backed up offsite, perhaps multiple times as part of rotating backup schedules. Since the primary storage mechanism of a Subversion repository is a complex database system, it is useful to know what pieces of data need to remain on the live site, which need to be backed up, and which can be safely removed.
Until recently, the largest offender of disk space usage with respect to Subversion repositories was the logfiles to which Berkeley DB performs its pre-writes before modifying the actual database files. These files capture all the actions taken along the route of changing the database from one state to another—while the database files reflect at any given time some state, the logfiles contain all the many changes along the way between states. As such, they can start to accumulate quite rapidly.
Fortunately, beginning with the 4.2 release of Berkeley DB, the database environment has the ability to remove its own unused logfiles without any external procedures. Any repositories created using an svnadmin which is compiled against Berkeley DB version 4.2 or greater will be configured for this automatic log file removal. If you don't want this feature enabled, simply pass the --bdb-log-keep option to the svnadmin create command. If you forget to do this, or change your mind at a later time, simple edit the DB_CONFIG file found in your repository's db directory, comment out the line which contains the set_flags DB_LOG_AUTOREMOVE directive, and then run svnadmin recover on your repository to force the configuration changes to take effect. See the section called “Berkeley DB Configuration” for more information about database configuration.
Without some sort of automatic log file removal in place, log files will accumulate as you use your repository. This is actually somewhat of a feature of the database system—you should be able to recreate your entire database using nothing but the log files, so these files can be useful for catastrophic database recovery. But typically, you'll want to archive the log files that are no longer in use by Berkeley DB, and then remove them from disk to conserve space. Use the svnadmin list-unused-dblogs command to list the unused logfiles:

$ svnadmin list-unused-dblogs /path/to/repos
/path/to/repos/log.0000000031
/path/to/repos/log.0000000032
/path/to/repos/log.0000000033

$ svnadmin list-unused-dblogs /path/to/repos | xargs rm
## disk space reclaimed!

To keep the size of the repository as small as possible, Subversion uses deltification (or, “deltified storage”) within the repository itself. Deltification involves encoding the representation of a chunk of data as a collection of differences against some other chunk of data. If the two pieces of data are very similar, this deltification results in storage savings for the deltified chunk—rather than taking up space equal to the size of the original data, it only takes up enough space to say, “I look just like this other piece of data over here, except for the following couple of changes.” Specifically, each time a new version of a file is committed to the repository, Subversion encodes the previous version (actually, several previous versions) as a delta against the new version. The result is that most of the repository data that tends to be sizable—namely, the contents of versioned files—is stored at a much smaller size than the original “fulltext” representation of that data.

Note

Because all of the Subversion repository data that is subject to deltification is stored in a single Berkeley DB database file, reducing the size of the stored values will not necessarily reduce the size of the database file itself. Berkeley DB will, however, keep internal records of unused areas of the database file, and use those areas first before growing the size of the database file. So while deltification doesn't produce immediate space savings, it can drastically slow future growth of the database.

Repository Recovery

In order to protect the data in your repository, the database back-end uses a locking mechanism. This mechanism ensures that portions of the database are not simultaneously modified by multiple database accessors, and that each process sees the data in the correct state when that data is being read from the database. When a process needs to change something in the database, it first checks for the existence of a lock on the target data. If the data is not locked, the process locks the data, makes the change it wants to make, and then unlocks the data. Other processes are forced to wait until that lock is removed before they are permitted to continue accessing that section of the database.
In the course of using your Subversion repository, fatal errors (such as running out of disk space or available memory) or interruptions can prevent a process from having the chance to remove the locks it has placed in the database. The result is that the back-end database system gets “wedged”. When this happens, any attempts to access the repository hang indefinitely (since each new accessor is waiting for a lock to go away—which isn't going to happen).
First, if this happens to your repository, don't panic. Subversion's filesystem takes advantage of database transactions and checkpoints and pre-write journaling to ensure that only the most catastrophic of events ^[15] can permanently destroy a database environment. A sufficiently paranoid repository administrator will be making off-site backups of the repository data in some fashion, but don't call your system administrator to restore a backup tape just yet.
Secondly, use the following recipe to attempt to “unwedge” your repository:

Make sure that there are no processes accessing (or attempting to access) the repository. For networked repositories, this means shutting down the Apache HTTP Server, too.
Become the user who owns and manages the repository. This is important, as recovering a repository while running as the wrong user can tweak the permissions of the repository's files in such a way that your repository will still be inaccessible even after it is “unwedged”.
Run the command svnadmin recover /path/to/repos. You should see output like this:
```
Please wait; recovering the repository may take some time...

Recovery completed.
The latest repos revision is 19.
```
This command may take many minutes to complete.
Restart the Subversion server.

This procedure fixes almost every case of repository lock-up. Make sure that you run this command as the user that owns and manages the database, not just as root. Part of the recovery process might involve recreating from scratch various database files (shared memory regions, for example). Recovering as root will create those files such that they are owned by root, which means that even after you restore connectivity to your repository, regular users will be unable to access it.
If the previous procedure, for some reason, does not successfully unwedge your repository, you should do two things. First, move your broken repository out of the way and restore your latest backup of it. Then, send an email to the Subversion user list (at <users@subversion.tigris.org>) describing your problem in detail. Data integrity is an extremely high priority to the Subversion developers.

Migrating a Repository

A Subversion filesystem has its data spread throughout various database tables in a fashion generally understood by (and of interest to) only the Subversion developers themselves. However, circumstances may arise that call for all, or some subset, of that data to be collected into a single, portable, flat file format. Subversion provides such a mechanism, implemented in a pair of svnadmin subcommands: dump and load.
The most common reason to dump and load a Subversion repository is due to changes in Subversion itself. As Subversion matures, there are times when certain changes made to the back-end database schema cause Subversion to be incompatible with previous versions of the repository. The recommended course of action when you are upgrading across one of those compatibility boundaries is a relatively simple process:

Using your current version of svnadmin, dump your repositories to dump files.
Upgrade to the new version of Subversion.
Move your old repositories out of the way, and create new empty ones in their place using your new svnadmin.
Again using your new svnadmin, load your dump files into their respective, just-created repositories.
Be sure to copy any customizations from your old repositories to the new ones, including DB_CONFIG files and hook scripts. You'll want to pay attention to the release notes for the new release of Subversion to see if any changes since your last upgrade affect those hooks or configuration options.
If the migration process made your repository accessible at a different URL (e.g. moved to a different computer, or is being accessed via a different schema), then you'll probably want to tell your users to run svn switch --relocate on their existing working copies. See svn switch.

svnadmin dump will output a range of repository revisions that are formatted using Subversion's custom filesystem dump format. The dump format is printed to the standard output stream, while informative messages are printed to the standard error stream. This allows you to redirect the output stream to a file while watching the status output in your terminal window. For example:

$ svnlook youngest myrepos
26
$ svnadmin dump myrepos > dumpfile
* Dumped revision 0.
* Dumped revision 1.
* Dumped revision 2.
…
* Dumped revision 25.
* Dumped revision 26.

At the end of the process, you will have a single file (dumpfile in the previous example) that contains all the data stored in your repository in the requested range of revisions. Note that svnadmin dump is reading revision trees from the repository just like any other “reader” process would (svn checkout, for example.) So it's safe to run this command at any time.
The other subcommand in the pair, svnadmin load, parses the standard input stream as a Subversion repository dump file, and effectively replays those dumped revisions into the target repository for that operation. It also gives informative feedback, this time using the standard output stream:

$ svnadmin load newrepos < dumpfile
<<< Started new txn, based on original revision 1
     * adding path : A ... done.
     * adding path : A/B ... done.
     …
------- Committed new rev 1 (loaded from original rev 1) >>>

<<< Started new txn, based on original revision 2
     * editing path : A/mu ... done.
     * editing path : A/D/G/rho ... done.

------- Committed new rev 2 (loaded from original rev 2) >>>

…

<<< Started new txn, based on original revision 25
     * editing path : A/D/gamma ... done.

------- Committed new rev 25 (loaded from original rev 25) >>>

<<< Started new txn, based on original revision 26
     * adding path : A/Z/zeta ... done.
     * editing path : A/mu ... done.

------- Committed new rev 26 (loaded from original rev 26) >>>

Note that because svnadmin uses standard input and output streams for the repository dump and load process, people who are feeling especially saucy can try things like this (perhaps even using different versions of svnadmin on each side of the pipe):

$ svnadmin create newrepos
$ svnadmin dump myrepos | svnadmin load newrepos

We mentioned previously that svnadmin dump outputs a range of revisions. Use the --revision option to specify a single revision to dump, or a range of revisions. If you omit this option, all the existing repository revisions will be dumped.

$ svnadmin dump myrepos --revision 23 > rev-23.dumpfile
$ svnadmin dump myrepos --revision 100:200 > revs-100-200.dumpfile

As Subversion dumps each new revision, it outputs only enough information to allow a future loader to re-create that revision based on the previous one. In other words, for any given revision in the dump file, only the items that were changed in that revision will appear in the dump. The only exception to this rule is the first revision that is dumped with the current svnadmin dump command.
By default, Subversion will not express the first dumped revision as merely differences to be applied to the previous revision. For one thing, there is no previous revision in the dump file! And secondly, Subversion cannot know the state of the repository into which the dump data will be loaded (if it ever, in fact, occurs). To ensure that the output of each execution of svnadmin dump is self-sufficient, the first dumped revision is by default a full representation of every directory, file, and property in that revision of the repository.
However, you can change this default behavior. If you add the --incremental option when you dump your repository, svnadmin will compare the first dumped revision against the previous revision in the repository, the same way it treats every other revision that gets dumped. It will then output the first revision exactly as it does the rest of the revisions in the dump range—mentioning only the changes that occurred in that revision. The benefit of this is that you can create several small dump files that can be loaded in succession, instead of one large one, like so:

$ svnadmin dump myrepos --revision 0:1000 > dumpfile1
$ svnadmin dump myrepos --revision 1001:2000 --incremental > dumpfile2
$ svnadmin dump myrepos --revision 2001:3000 --incremental > dumpfile3

These dump files could be loaded into a new repository with the following command sequence:

$ svnadmin load newrepos < dumpfile1
$ svnadmin load newrepos < dumpfile2
$ svnadmin load newrepos < dumpfile3

Another neat trick you can perform with this --incremental option involves appending to an existing dump file a new range of dumped revisions. For example, you might have a post-commit hook that simply appends the repository dump of the single revision that triggered the hook. Or you might have a script that runs nightly to append dump file data for all the revisions that were added to the repository since the last time the script ran. Used like this, svnadmin's dump and load commands can be a valuable means by which to backup changes to your repository over time in case of a system crash or some other catastrophic event.
The dump format can also be used to merge the contents of several different repositories into a single repository. By using the --parent-dir option of svnadmin load, you can specify a new virtual root directory for the load process. That means if you have dumpfiles for three repositories, say calc-dumpfile, cal-dumpfile, and ss-dumpfile, you can first create a new repository to hold them all:

$ svnadmin create /path/to/projects
$

Then, make new directories in the repository which will encapsulate the contents of each of the three previous repositories:

$ svn mkdir -m "Initial project roots" \
      file:///path/to/projects/calc \
      file:///path/to/projects/calendar \
      file:///path/to/projects/spreadsheet
Committed revision 1.
$

Lastly, load the individual dumpfiles into their respective locations in the new repository:

$ svnadmin load /path/to/projects --parent-dir calc < calc-dumpfile
…
$ svnadmin load /path/to/projects --parent-dir calendar < cal-dumpfile
…
$ svnadmin load /path/to/projects --parent-dir spreadsheet < ss-dumpfile
…
$

We'll mention one final way to use the Subversion repository dump format—conversion from a different storage mechanism or version control system altogether. Because the dump file format is, for the most part, human-readable, ^[16] it should be relatively easy to describe generic sets of changes—each of which should be treated as a new revision—using this file format. In fact, the cvs2svn.py utility (see the section called “Converting a Repository from CVS to Subversion”) uses the dump format to represent the contents of a CVS repository so that those contents can be moved in a Subversion repository.

Repository Backup

Despite numerous advances in technology since the birth of the modern computer, one thing unfortunately rings true with crystalline clarity—sometimes, things go very, very awry. Power outages, network connectivity dropouts, corrupt RAM and crashed hard drives are but a taste of the evil that Fate is poised to unleash on even the most conscientious administrator. And so we arrive at a very important topic—how to make backup copies of your repository data.
There are generally two types of backup methods available for Subversion repository administrators—incremental and full. We discussed in an earlier section of this chapter how to use svnadmin dump --incremental to perform an incremental backup (see the section called “Migrating a Repository”). Essentially, the idea is to only backup at a given time the changes to the repository since the last time you made a backup.
A full backup of the repository is quite literally a duplication of the entire repository directory (which includes the Berkeley database environment). Now, unless you temporarily disable all other access to your repository, simply doing a recursive directory copy runs the risk of generating a faulty backup, since someone might be currently writing to the database.
Fortunately, Sleepycat's Berkeley DB documents describe a certain order in which database files can be copied that will guarantee a valid backup copy. And better still, you don't have to implement that algorithm yourself, because the Subversion development team has already done so. The hot-backup.py script is found in the tools/backup/ directory of the Subversion source distribution. Given a repository path and a backup location, hot-backup.py—which is really just a more intelligent wrapper around the svnadmin hotcopy command—will perform the necessary steps for backing up your live repository—without requiring that you bar public repository access at all—and then will clean out the dead Berkeley log files from your live repository.
Even if you also have an incremental backup, you might want to run this program on a regular basis. For example, you might consider adding hot-backup.py to a program scheduler (such as cron on Unix systems). Or, if you prefer fine-grained backup solutions, you could have your post-commit hook script call hot-backup.py (see the section called “Hook Scripts”), which will then cause a new backup of your repository to occur with every new revision created. Simply add the following to the hooks/post-commit script in your live repository directory:

(cd /path/to/hook/scripts; ./hot-backup.py ${REPOS} /path/to/backups &)

The resulting backup is a fully functional Subversion repository, able to be dropped in as a replacement for your live repository should something go horribly wrong.
There are benefits to both types of backup methods. The easiest is by far the full backup, which will always result in a perfect working replica of your repository. This again means that should something bad happen to your live repository, you can restore from the backup with a simple recursive directory copy. Unfortunately, if you are maintaining multiple backups of your repository, these full copies will each eat up just as much disk space as your live repository.
Incremental backups using the repository dump format are excellent to have on hand if the database schema changes between successive versions of Subversion itself. Since a complete repository dump and load are generally required to upgrade your repository to the new schema, it's very convenient to already have half of that process (the dump part) finished. Unfortunately, the creation of—and restoration from—incremental backups takes longer, as each commit is effectively replayed into either the dumpfile or the repository.
In either backup scenario, repository administrators need to be aware of how modifications to unversioned revision properties affect their backups. Since these changes do not themselves generate new revisions, they will not trigger post-commit hooks, and may not even trigger the pre-revprop-change and post-revprop-change hooks. ^[17] And since you can change revision properties without respect to chronological order—you can change any revision's properties at any time—an incremental backup of the latest few revisions might not catch a property modification to a revision that was included as part of a previous backup.
Generally speaking, only the truly paranoid would need to backup their entire repository, say, every time a commit occurred. However, assuming that a given repository has some other redundancy mechanism in place with relatively fine granularity (like per-commit emails), a hot backup of the database might be something that a repository administrator would want to include as part of a system-wide nightly backup. For most repositories, archived commit emails alone provide sufficient redundancy as restoration sources, at least for the most recent few commits. But it's your data—protect it as much as you'd like.
Often, the best approach to repository backups is a diversified one. You can leverage combinations of full and incremental backups, plus archives of commit emails. The Subversion developers, for example, back up the Subversion source code repository after every new revision is created, and keep an archive of all the commit and property change notification emails. Your solution might be similar, but should be catered to your needs and that delicate balance of convenience with paranoia. And while all of this might not save your hardware from the iron fist of Fate, ^[18] it should certainly help you recover from those trying times.

reference: http://svnbook.red-bean.com/en/1.0/ch05s03.html

Known Issues with SELINUX

SELINUX can be a real pain when implementing SVN working with apache. For example on CentOS and Redhat Installations SELINUX is enabled by default, which of-course can really complicate things. Balancing between security focus in mind and free form functionality can be comparing apples with oranges. If you encounter problems relating to these types of problems which are too many to mention here, I suggest that you disable SELINUX this should fix problems you are encountering. Once you are able to work undauntedly with your SVN installation, find a suitable replacement for firewalling your system.

references: http://ubuntuforums.org/showthread.php?t=350358
http://stackoverflow.com/questions/912277/svn-propfind-request-failed-on

Unix - Linux - Open Source - SysOps - Culture