org.archive.io
Class MappedByteBufferInputStream

java.lang.Object
  extended by java.io.InputStream
      extended by org.archive.io.MappedByteBufferInputStream
All Implemented Interfaces:
it.unimi.dsi.mg4j.io.RepositionableStream, java.io.Closeable

public class MappedByteBufferInputStream
extends java.io.InputStream
implements it.unimi.dsi.mg4j.io.RepositionableStream

An inputstream perspective on a MappedByteBuffer. This class is effectively a random access input stream. Use position() to get current location and then mark and reset to move about in the stream.

This class is no longer used but its kept around because it documents experience using nio for ARCReader. In summary, minor performance improvementwhen iterating over ARC Records. Was replaced by RandomAccess io implementation because each instance, if the size of an ARC file, took up too much system memory preventing our being able to open tens of instances concurrently. Maybe JVM 1.5 makes big improvements in nio and we'll then use this class again.

This class was made because I wanted to use java.nio memory-mapped files rather than old-school java.io reading arcs because: "Accessing a file through the memory-mapping mechanism can be far more efficient than reading or writing data by conventional means, even when using channels. No explicit system calls need to be made, which can be time-consuming. More importantly, the virtual memory system of the operating system automatically caches memory pages. These pages will be cached using system memory and will not consume space from the JVM's memory heap. Once a memory page has been made valid (brought in from disk), it can be accessed again at full hardware speed without the need to make another system call to get the data. Large, structured files that contain indexes or other sections that are referenced or updated frequently can benefit tremendously from memory mapping....", from the OReilly Java NIO By Ron Hitchens.

Using a ByteBuffer that holds the whole ARC file for sure makes the code simpler and the nice thing about using memory-mapped buffers for reading is that the memory used is allocated in the OS, not in the JVM. I played around w/ this on a machine w/ 512M of physical memory and a swap of 1G (/sbin/swapon -s). I made a dumb program to use file channel memory-mapped buffers to read a file. I was able to read a file of 1.5G using default JVM heap (64M on linux IIRC): i.e. I was able to allocate a buffer of 1.5G inside inside in my small-heap program. Anything bigger and I got complaints back about unable to allocate the memory. So, a channel based reader would be limited only by memory characteristics of the machine its running on (swap and physical memory -- not JVM heap size) ONLY, I discovered the following. Note, a spin on the 'unable to allocate the memory' was that I was unable to keep open tens of ARC instances concurrently because each was using 100meg plus of RAM.

Really big files generated complaint out of FileChannel.map saying the size parameter was > Integer.MAX_VALUE which is also odd considering the type is long. This must be an nio bug. Means there is an upperbound of Integer.MAX_VALUE (about 2.1G or so). This is unfortunate -- particularly as the c-code tools for ARC manipulations, see alexa/common/a_arcio.c, support > 2.1G -- but its good enough for now (ARC files are usually 100M).

The committee seems to still be out regards general nio performance. See NIO ByteBuffer slower than BufferedInputStream. It can be 4 times slower than java.io or 40% faster. For sure its 3x to 4x slower than reading from a buffer: http://jroller.com/page/cpurdy/20040405#raw_nio_performance. Tests done reading arcs show the difference to be little in the scheme of things.

Author:
stack

Constructor Summary
MappedByteBufferInputStream(java.nio.MappedByteBuffer mbb)
          Constructor.
 
Method Summary
 int available()
           
protected  void checkClosed()
           
 void close()
           
 void mark(int markAmount)
           
 boolean markSupported()
           
 long position()
           
 void position(long position)
           
 int read()
           
 int read(byte[] b, int off, int len)
           
 void reset()
           
 
Methods inherited from class java.io.InputStream
read, skip
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MappedByteBufferInputStream

public MappedByteBufferInputStream(java.nio.MappedByteBuffer mbb)
Constructor.

Parameters:
mbb - MappedByteBuffer to use.
Method Detail

read

public int read()
         throws java.io.IOException
Specified by:
read in class java.io.InputStream
Throws:
java.io.IOException

read

public int read(byte[] b,
                int off,
                int len)
         throws java.io.IOException
Overrides:
read in class java.io.InputStream
Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Specified by:
close in interface java.io.Closeable
Overrides:
close in class java.io.InputStream
Throws:
java.io.IOException

checkClosed

protected void checkClosed()
                    throws java.io.IOException
Throws:
java.io.IOException

markSupported

public boolean markSupported()
Overrides:
markSupported in class java.io.InputStream

mark

public void mark(int markAmount)
Overrides:
mark in class java.io.InputStream

reset

public void reset()
           throws java.io.IOException
Overrides:
reset in class java.io.InputStream
Throws:
java.io.IOException

available

public int available()
              throws java.io.IOException
Overrides:
available in class java.io.InputStream
Throws:
java.io.IOException

position

public long position()
Specified by:
position in interface it.unimi.dsi.mg4j.io.RepositionableStream

position

public void position(long position)
Specified by:
position in interface it.unimi.dsi.mg4j.io.RepositionableStream


Copyright © 2003-2005 Internet Archive. All Rights Reserved.