Your browser does not have JavaScript enabled. JavaScript is used to enhance both BitLeap’s website and our Customer Control Panel. You may not be able to use all the functions of our website properly with JavaScript disabled. Sorry for any inconvenience this may have caused.

Loading Data...


PHP Mail and CRLF Terminated Header Lines

When using PHP’s mail function there is a common question, “Do I need to use CRLF (\r\n) or just a LF (\n) at the end of each line of the additional_headers parameter?” It turns out that if PHP is using postfix’s sendmail program to send email then the answer is you only need LF. This is because postfix’s sendmail program will add the missing CR.

Thus, if you do try to use CRLF while PHP is setup the way described above it will produce an email with headers ending in CRCRLF (\r\r\n) which can cause all sorts of issues. One of these issues that we encountered is that some mail content filtering software will quarantine the message as a potential threat. This appears to be due to the software protecting against the Outlook CR Vulnerability.

Exchange WebDAV Returning 401

During development we found that when we were accessing Exchange via WebDAV we kept getting 401 HTTP status codes for a few mailboxes. This was preventing us from doing any operation against the mailbox. So we started to troubleshoot the issue and we found that this problem was only showing up when the mailbox was created but was never logged into from OWA previously. After we logged in as the user from a web browser the problem went away and we were able to access the mailbox from WebDAV normally.

So we brought up our network sniffing tools to see just what was different between the PHP generated WebDAV request and the HTTP request. After comparing the request headers, we narrowed the difference and found that we were missing the Accept-Language and Accept-Charset headers that apparently Exchange requires.

Below is an example of the headers that we needed to add to the WebDAV request:

Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Backup Data Transfer Management Part 2: Our Solution

Where we left off last time, we were accepting offsite backup traffic from the customer at night and transferring it to a second location during the day. This solution was not viable however because it could not efficiently maximize its bandwidth utilization at each location causing the queues grow faster than they could be drained at times.

Ideally we wanted to be able to transfer backup data between multiple BitLeap locations with the bandwidth that is left over from receiving the backup data from the customer. This means that our customers can transfer offsite backup data to us as fast as they can send it, but we can still control our overall bandwidth usage by rate limiting how fast the data is transferred between multiple locations.

The challenge stems from the fact that rate limiting needs to be applied in an outbound direction, but we really wanted to control the rate of inbound bandwidth for each location. In our case, this was possible because we control the data that is both sent and received on both sides. As a result, to limit to inbound bandwidth in location A, you apply an outbound rate limit to each location sending traffic to location A while taking into account location A’s current inbound bandwidth utilization.

We chose to apply this rate limiting methodology in three distinct steps. First, each location reports its inbound and outbound traffic utilization to a centralized database at a regular interval. Then based on these numbers and how each location connections to one other, another process calculates the outbound rate limits which are used to achieve the overall inbound rate that is desired for each location. Lastly, another process runs at a regular interval at each location and applies the calculated outbound rate limits to individual interfaces.

Like any work in progress, this methodology for bandwidth control is by no means perfect, but it does achieve the goal of keeping our customers’ data at two locations and keeping our bandwidth usage in check at the same time. It will no doubt have to be extended further as our network topology continues to grow in complexity.

Like they say, why build one when you can build two for twice the price!

Goodnight NFS

At BitLeap, we have a big need for network-based file systems. Although many businesses have a need for a basic network file system, only a few push them to a limit. It just so happens that both at BitLeap and at the last job I held, we pushed network based file systems to the edge.

Now normally, I would not think a network file system would be something that you would expect to break down with increased load. And they have been around so long that by this time, most projects would have been very stable. I don’t intend to start a flame war. I’m sure these projects are very stable, but in the last 6 years, my mileage has varied.

At both companies we flipped back and forth between NFS and CIFS/SMBFS continuing to look for greener grass on the other side. “It’s been awhile since we looked at the other file system, maybe it got better, lets try it again.” Around and around we would go. Here is what I found over the years:

CIFS/SMBFS:

Ups: It’s very easy to setup and the config files are clean. The ports are easy to control and the daemons are limited. It’s easy to fix stuck mounts and something breaking usually does not require a reboot of the OS (linux 2.6.x).

Downs: Regardless of the optimizations, we could not get it to match the performance of NFS. We also had problems with mounts dying from time to time. At the worst of it at my last company, at least one mount on every server would die each day. At BitLeap, we got to the point where a mount wouldn’t last more then 10 minutes under load.

NFS:

Ups: The performance was great and even under heavy load (10-20Mbps and 5-20 files/sec per file server) the stability was overall good.

Downs: On occasion something would go awry with NFS or hardware on a server. When a mount died, even if it wasn’t NFS’s fault, it was NFS that ended up giving the headache. We would end up with stuck mounts, un-killable processes, and basic gummed up servers which required reboots. If a file server would go down or have trouble, it could gum up the processes on the client who in turn connected to many NFS servers bringing the whole system to a stand still.

Both of these projects are great, I think it’s just possible to outgrow them. Even if stability was perfect, we are getting to the point where managing the mounts is becoming unwieldy.

Introducing ‘FSD’ (File Server Daemon)

The functionality we need out of a network file system is rather simple. We put files, get files, check files, stat files, and maybe list them. The file servers are many and the communications to them is sometimes infrequent. Other than that, we do not have a need for hard or soft links or any other operation a standard file system might provide.

As we grow, the number of file servers we need to communicate with from our processing servers grows. Managing the many server mounts on all of the processing servers became a burden. Even if we automated writing down /etc/fstab and automated the mounting of file servers, things felt unclean. Thus, we wrote FSD. From code, dynamic connections are made to file servers based on entries in the databases tracking them. If a server goes down, another server can be selected out of the list. We no longer need to manage mounts and adding or removing a file server to the system is merely a database entry. When a server does go down, the system does not hang, data continues to move through the system freely.

We also added transaction support. Thus you can do the following operation:

  1. start transaction
  2. put file1
  3. put file2
  4. commit — or — rollback

This fits nicely with the transactions we do on the database. It prevents the situation where we write data down to disk and then near the end of a work unit, we decide we need to roll back. The data we wrote down to disk is no longer tracked and hence needs cleaned up.

In summary, FSD provides the following:

  1. Local data integrity checks on disk rather than pulling data back over a mount, saving bandwidth
  2. Removes the need for /etc/fstab entries and all management time involved in keeping servers mounted
  3. Basic file system transaction support
  4. Low overhead resulting in high throughput
  5. Can easily include encryption/compression libraries as needed

Although the code for FSD is tightly integrated into our current code base, I would like to make it more generic and post it on the open source page.

Credit where credit is due

  • I’ve heard NFS is rock solid on BSD. I would also imagine the same is true for Solaris. My experience has been using NFSv3 and NFSv4 on late linux 2.4.x kernels and recently NFSv4 on linux 2.6.x.
  • We have at one time or another submitted a bug or two with the samba team. They were excellent in responding and working to resolve any issue we might have found. It was really our fault for dropping the ball and not setting up labs to help them look into the issues we found.
  • I could never complain about open source software, the open source community is great. Rather then writing this blog, I should be out there writing patches. I really wrote this blog to talk about FSD and why we decided it was time to create such a project.

Backup Data Transfer Management Part 1: The Challenge

Backing up all of our data to two different locations was a requirement from the beginning at BitLeap, even before there was much of a product to speak of. In an effort to save bandwidth at the customer’s location, the data is transferred from the LeapServ to a single BitLeap location which is then responsible for transferring it to a second location.

Our first approach to handling the two locations problem was to receive the data from the customer and immediately transfer it to the second location before letting the customer continue. This method worked well for a while but eventually started to show its shortfalls. So many systems were involved in the acknowledgment of a single piece of data that one small isolated problem was able to effect the entire system. This also lead to performance issues as the overhead of receiving a single piece of data was so high.

The first logical step to solving this problem that we devised was to break apart the processes that receive the data from the customer and send the data to the second location. As we suspected, breaking apart these systems solved the performance issues receiving the data from the customer, but almost a little too well as we quickly realized. Data was now able to flow into our servers from customers at a rate much quicker than before. This combined with the newly separated process of transferring all of that data to the second location caused the processor and bandwidth usage on our servers to jump significantly.

One thing to realize about offsite backup traffic is that most of it comes in during the night when our customers are not at work. This means that during the day the processor and bandwidth utilization of our servers is relatively low. Because of this, we decided to accept all of the backup data from customers during the night and only transfer it to the second location during the day to even out the load some what. After making this change it seemed as if we had hit a sweet spot between quickly accepting backup data from customers and sending it to the second offsite location.

After running with the new approach for a while, we found that the queues used to transfer the data between different offsite locations were prone to getting backed up if there was any interruption in the transfer process during the day. While it seemed like the solution was to add more hours to the day, we couldn’t help but notice that there was still quite a bit of unused inbound bandwidth scattered throughout the night at each location that could be utilized for transfer. This is were the fun of traffic shaping comes in.

Stay tuned next time for the shocking conclusion!



 

BitLeap Devblog

Welcome to the BitLeap developer blog! Some posts are longer than others, but they all seem to make use of links and code and stuff. Feel free to read, or not to read, as you so desire and prefer.