As role of my work hither at Admios, I have large files that need to pass through an API. To exercise this, I divide the files into smaller sections. In this post, I'thousand going to share my process with you.
The first thing I do is calculate the amount of parts or pieces that I need to create. I base this on the maximum allowed file size.
So, I have two parameters, the original file and the maximum size of each piece. It's quite simple, but divide the original file size past the maximum allowed size. Pretty straightforward, right?
Simply it's actually a petty more difficult. I accept to load the file content in a byte array and then track an alphabetize pointing to the array beingness processed so that in the first run I re-create the first bytes of information to a new file, and then I movement that index to the adjacent byte after the last I ane copied, copy more chunks, and repeat until I take read the unabridged file and divided it into smaller sections.
So, after checking some examples, I end upward with something like this:
// Version one.0 private static final String dir = "/tmp/"; private static last String suffix = ".splitPart"; private static byte[] convertFileToBytes(String location) throws IOException { RandomAccessFile f = new RandomAccessFile(location, "r"); byte[] b = new byte[(int) f.length()]; f.readFully(b); f.close(); render b; } private static void writeBufferToFiles(byte[] buffer, String fileName) throws IOException { BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream(fileName)); bw.write(buffer); bw.close(); } individual static void copyBytesToPartFile(byte[] originalBytes, Listing partFiles, int partNum, int bytesPerSplit, int bufferSize) throws IOException { String partFileName = dir + "part" + partNum + suffix; byte[] b = new byte[bufferSize]; System.arraycopy(originalBytes, (partNum * bytesPerSplit), b, 0, bufferSize); writeBufferToFiles(b, partFileName); partFiles.add together(partFileName); } /** * * @param fileName proper name of file to be split up. * @param mBperSplit number of MB per file. * @return Return a listing of files. * @throws IOException */ public static Listing splitFile(String fileName, int mBperSplit) throws IOException { if (mBperSplit <= 0) { throw new IllegalArgumentException("mBperSplit must be more than than naught"); } List partFiles = new ArrayList(); final long sourceSize = new File(fileName).length(); int bytesPerSplit = 1024 * 1024 * mBperSplit; long numSplits = sourceSize / bytesPerSplit; int remainingBytes = (int) sourceSize % bytesPerSplit; /// Copy arrays byte[] originalBytes = convertFileToBytes(fileName); int partNum = 0; while (partNum < numSplits) { //write bytes to a part file. copyBytesToPartFile(originalBytes, partFiles, partNum, bytesPerSplit, bytesPerSplit); ++partNum; } if (remainingBytes > 0) { copyBytesToPartFile(originalBytes, partFiles, partNum, bytesPerSplit, remainingBytes); } return partFiles; }
The problem with this arroyo is that we end upward loading the entire file into the JVM. If I attempt to split a 2GB file, information technology volition load in the JVM completely. This is ok for pocket-size files, in fact, this volition work in all Coffee Versions JDK1.0 and above. This is my beginning attempt and it is compiled with Java ane.0. This should non use whatsoever dependencies or 3rd political party libraries.
Just, memory is a big business organisation since the but reason to split a file is its size. Looking for another arroyo, I found many examples, some using BufferedOutputStream and FileOutputStream, and so some versions that but work on Java 5 others Java 7.
To show how Java has evolved from Java i.2 to 1.7, I will make a second version that is uniform with Java 1.four (note that Version i.0 of the splitter above is uniform with Java ane.2).
For these examples (Version i and 2) were made using Java 1.4.
For this second version, I didn't load the entire file - a minor part is merely saved in a buffer, writes it in a new file, and then moves to the side by side file. The key is that the 'maxReadBufferSize' should be small. Information technology will accept more time, merely the application volition employ less retentiveness.
int maxReadBufferSize = 8 * 1024; //8KB This is my 2nd version. Information technology's more than memory efficient and runs on Java 1.4: //version 2.0 individual static terminal Cord dir = "/tmp/"; private static final String suffix = ".splitPart"; /** * * @param fileName proper name of file to be splited. * @param mBperSplit number of MB per file. * @return Render a list of files. * @throws IOException */ public static List splitFile(Cord fileName, int mBperSplit) throws IOException { if (mBperSplit <= 0) { throw new IllegalArgumentException("mBperSplit must be more than zero"); } Listing partFiles = new ArrayList(); final long sourceSize = new File(fileName).length(); final long bytesPerSplit = 1024L * 1024L * mBperSplit; final long numSplits = sourceSize / bytesPerSplit; long remainingBytes = sourceSize % bytesPerSplit; RandomAccessFile raf = new RandomAccessFile(fileName, "r"); int maxReadBufferSize = viii * 1024; //8KB int partNum = 0; for (; partNum < numSplits; partNum++) { BufferedOutputStream bw = newWriteBuffer(partNum, partFiles); if (bytesPerSplit > maxReadBufferSize) { long numReads = bytesPerSplit / maxReadBufferSize; long numRemainingRead = bytesPerSplit % maxReadBufferSize; for (int i = 0; i < numReads; i++) { readWrite(raf, bw, maxReadBufferSize); } if (numRemainingRead > 0) { readWrite(raf, bw, numRemainingRead); } } else { readWrite(raf, bw, bytesPerSplit); } bw.close(); } if (remainingBytes > 0) { BufferedOutputStream bw = newWriteBuffer(partNum, partFiles); readWrite(raf, bw, remainingBytes); bw.close(); } raf.close(); return partFiles; } individual static BufferedOutputStream newWriteBuffer(int partNum, List partFiles) throws IOException{ String partFileName = dir + "role" + partNum + suffix; partFiles.add(partFileName); render new BufferedOutputStream(new FileOutputStream(partFileName)); } private static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException { byte[] buf = new byte[(int) numBytes]; int val = raf.read(buf); if (val != -i) { bw.write(buf); } }
Java vii comes with a cool package coffee.nio.file.* that has a amend way to read and write files in a simpler way. Also, I used channels, to replace the employ of BufferedOutputStream. All of this also comes with a more efficient utilize of memory.
Channels
Instead of using InputStreams and OutputStream, we will use channels. Imagine a channel is like a file pointer. We ready the pointer to the beginning of where we desire to read or write, in this instance, we start to read from the original file. Channels are not new in the Java SDK. It'due south been there since Java 1.four, but it gets used more than since Java seven.
Cheque out the official Java documentation referring to Channels.
Using the same RandomAccessFile we go the file channel and set the starting position to aught in the first run.
I am using the same logic to calculate the size of the parts that is shown in Version 1. The difference is that now we don't load any of the file to the JVM, we but utilise the underline host organization.
This but says copy from Channel, put it on 'position' with a size of 'count'.
And that's it. Files are created.
// version 3.0 private static final String dir = "/tmp/"; private static final Cord suffix = ".splitPart"; /** * Dissever a file into multiples files. * * @param fileName Name of file to be split. * @param mBperSplit maximum number of MB per file. * @throws IOException */ public static List splitFile(concluding String fileName, final int mBperSplit) throws IOException { if (mBperSplit <= 0) { throw new IllegalArgumentException("mBperSplit must exist more than zero"); } List partFiles = new ArrayList<>(); final long sourceSize = Files.size(Paths.get(fileName)); final long bytesPerSplit = 1024L * 1024L * mBperSplit; final long numSplits = sourceSize / bytesPerSplit; final long remainingBytes = sourceSize % bytesPerSplit; int position = 0; effort (RandomAccessFile sourceFile = new RandomAccessFile(fileName, "r"); FileChannel sourceChannel = sourceFile.getChannel()) { for (; position < numSplits; position++) { //write multipart files. writePartToFile(bytesPerSplit, position * bytesPerSplit, sourceChannel, partFiles); } if (remainingBytes > 0) { writePartToFile(remainingBytes, position * bytesPerSplit, sourceChannel, partFiles); } } return partFiles; } private static void writePartToFile(long byteSize, long position, FileChannel sourceChannel, List partFiles) throws IOException { Path fileName = Paths.get(dir + UUID.randomUUID() + suffix); effort (RandomAccessFile toFile = new RandomAccessFile(fileName.toFile(), "rw"); FileChannel toChannel = toFile.getChannel()) { sourceChannel.position(position); toChannel.transferFrom(sourceChannel, 0, byteSize); } partFiles.add(fileName); }
Java 7 gives some useful methods in java.nio.file.Files and java.nio.file.Paths similar Files.size() and Paths.get(fileName) making the entire code more readable.
Benchmarks
For the test, we used two files, a pocket-size five,376,008 bytes (5.iv MB) file, and a 1,077,622,793 bytes (1.09 GB) file. We ran each version twice with the files. The post-obit charts are presented in logarithmic scale (fourth dimension is displayed in seconds and memory is displayed in megabytes).
Conclusions
Every bit we can see, the retention consumption is profoundly reduced. As for fourth dimension, it is almost the same for every version. For small files, the time increases, but this should not concern u.s. considering it is very unlikely that we would need to split up a small ten or 20 MB file.
In case nosotros need to practice maintenance or implement this into an former system, Java 1.4 is a proficient option since channels have been bachelor since then (only java.nio.file.Files and java.nio.file.Paths are from Java 7 and you can probably implement a like solution for an old legacy system).
I hope these examples assist you split your ain files or to just copy and use.
0 Response to "How to Read a Text File Divided in Sections Java"
Post a Comment