Tuesday 20 November 2012

Base64 encoding of UTF8 String in Java

The process of Base64 encoding of any UTF8 String in Java is simple. Just follow these steps:

1. Take 8 bit binary value of each character
2. Join all those 8bit-binary-numbers in one single binary number.
3. Starting from left, cut that single binary number into parts of 6 bits each. (If last part of those cut-pieces is not of 6 bits, then append zeroes on right of that last part.)
4. Convert each of those 6bit-parts to it's decimal equivalent.
5. Now if the Nth 6bit-decimal's value is X, then the Nth character of our encoded string is Xth character of this string: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".
6. Recall what you did in step 3. Did you append zeroes to the last binary number? If you appended 4 zeroes then append 2 equals('==') to the encoded string. If you appended 2 zeroes then append 1 equal('=') to the encoded string. If you added no zeroes then append nothing to the encoded string.

Here are the above steps explained with an example string "Baker":

Each character of the string:

    UTF-8 encoding of B is: 66, i.e. 01000010 in 8bit-binary
    UTF-8 encoding of a is: 97, i.e. 01100001 in 8bit-binary
    UTF-8 encoding of k is: 107, i.e. 01101011 in 8bit-binary
    UTF-8 encoding of e is: 101, i.e. 01100101 in 8bit-binary
    UTF-8 encoding of r is: 114, i.e. 01110010 in 8bit-binary


All 8bit-binaries joined as one:
    0100001001100001011010110110010101110010

Dividing the above binary-string into parts of 6bits each:
    010000 _ 100110 _ 000101 _ 101011 _ 011001 _ 010111 _ 0010

Appending zeroes to last binary part:
    010000 _ 100110 _ 000101 _ 101011 _ 011001 _ 010111 _ 001000

Decimal values of each 6bit-binary number.

    Decimal value of binary number 010000 is 16
    Decimal value of binary number 100110 is 38
    Decimal value of binary number 000101 is 5
    Decimal value of binary number 101011 is 43
    Decimal value of binary number 011001 is 25
    Decimal value of binary number 010111 is 23
    Decimal value of binary number 001000 is 8

In the String "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    Character number 0 is A
Hence
    Character number 16 is Q
    Character number 38 is m
    Character number 5 is F
    Character number 43 is r
    Character number 25 is Z
    Character number 23 is X
    Character number 8 is I

So far, our Base64 String is: QmFrZXI
Look at the above steps, and recall that we appended 2 zeroes to 0010 and made it 001000.
    Since we appended 2 zeroes, hence we will append = 1 equal('=') character to our Base64 string QmFrZXI and make it QmFrZXI=


Decoding will just be the reverse of these steps. That is it !

A very simple code in Java for above mentioned steps will be as follows:


public class SampleBase64 {

    public static void main(String[] args) {
        String sample = "Baker";
        System.out.println(encode(sample));
    }
    
    private static String base64 =
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                      + "abcdefghijklmnopqrstuvwxyz"
                      + "0123456789+/";

    public static String encode(String sample){
        String bin="";
        for(char c:sample.toCharArray())
            bin+=leftPadding(Integer.toBinaryString(c),"0",8);
        String bin_padded=rightPadding(bin,"0",bin.length()+((6-bin.length()%6)%6));
        sample="";
        for(int i=0;i<bin_padded.length();i+=6)
            sample+=base64.charAt(Integer.parseInt(bin_padded.substring(i,i+6), 2));
        return rightPadding(sample, "=", sample.length()+((bin_padded.length()-bin.length())/2));
    }

    private static String leftPadding(String string,String pad,int to_length){
        to_length-=string.length();
        while(--to_length>-1)
            string=pad+string;
        return string;
    }

    private static String rightPadding(String string,String pad,int to_length) {
        to_length-=string.length();
        while(--to_length>-1)
            string+=pad;
        return string;
    }

}



Here is the not-so-optimized code in Java for the Base64 encoding and decoding:

    /**
     * This class has been programmed to educate other programmers
     * about steps involved in the Base64 algorithm.
     * For industrial purpose, use a standard fast implementation,
     * from a trusted vendor.
     * @author Abhishek Oza
     */
    public class SlowBase64 {

        /**
         * The first 64 digits in a number system with base=64digits,
         * for Base64 encoding
         */
        private static String radixBase64=
                  "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                + "abcdefghijklmnopqrstuvwxyz"
                + "0123456789"
                + "+/";

        /**
         * Encodes a UTF-8/ASCII string to it's Base64 value.
         * Gives anomalous result for string with any other encoding.
         * Output is similar to output of this command on Linux:
         * printf "<string>" | base64
         * @param string
         * @return Base64 Encoding of parameter string
         */
        public static String encode(String string){
            String whole_binary = "";
            for(char c:string.toCharArray()){
                String char_to_binary = Integer.toBinaryString(c);
                while(char_to_binary.length()<8)
                    char_to_binary="0"+char_to_binary;
                whole_binary+=char_to_binary;
            }
            string="";
            String suffix="";
            for(int i=0;i<whole_binary.length();i+=6){
                String six_binary_digits = null;
                try{
                    six_binary_digits = whole_binary.substring(i, i+6);
                }
                catch(StringIndexOutOfBoundsException sioobe){
                    six_binary_digits = whole_binary.substring(i);
                    while(six_binary_digits.length()<6){
                        six_binary_digits+="00";
                        suffix+="=";
                    }
                }
                string+=radixBase64.charAt(Integer.parseInt(six_binary_digits,2));
            }
            return string+suffix;
        }

        /**
         * Decodes an base64-encoded UTF-8/ASCII string to it's Base64 value.
         * Gives anomalous result for string with any other encoding.
         * Output is similar to output of this command on Linux:
         * printf "<string>" | base64 -d
         * @param string
         * @return Base64 Decoding of parameter string
         */
        public static String decode(String string){
            String binary_string="";
            for(char c:string.toCharArray()){
                if(c=='=')
                    break;
                String char_to_binary = Integer.toBinaryString(radixBase64.indexOf(c));
                while(char_to_binary.length()<6)
                    char_to_binary="0"+char_to_binary;
                binary_string+=char_to_binary;
            }
            if(string.endsWith("=="))
                binary_string=binary_string.substring(0, binary_string.length()-4);
            else if(string.endsWith("="))
                binary_string=binary_string.substring(0, binary_string.length()-2);
            string="";
            for(int i=0;i<binary_string.length();i+=8){
                String eight_binary_digits = binary_string.substring(i, i+8);
                string+=(char)Integer.parseInt(eight_binary_digits,2);
            }
            return string;
        }

        /**
         * Java-Main function for testing the functions encode(string), and decode(string).
         * Sould be used like this on command line:
         * java SlowBase64 [<argument>]...
         * where each argument is a string to be tested.
         * @param args 
         */
        public static void main(String[] args) {
            for(String arg:args){
                System.out.println("Input   String: "+arg);
                System.out.println("Encoded String: "+(arg=encode(arg)));
                System.out.println("Decoded String: "+(arg=decode(arg)));
                System.out.println("----------------------------------");
            }
        }

    }


This is how the code was executed on command line:
 javac SlowBase64.java
 java SlowBase64 Baker : "Hi Joe!" Luke is right.
Output:
Input   String: Baker
Encoded String: QmFrZXI=
Decoded String: Baker
----------------------------------
Input   String: :
Encoded String: Og==
Decoded String: :
----------------------------------
Input   String: Hi Joe!
Encoded String: SGkgSm9lIQ==
Decoded String: Hi Joe!
----------------------------------
Input   String: Luke
Encoded String: THVrZQ==
Decoded String: Luke
----------------------------------
Input   String: is
Encoded String: aXM=
Decoded String: is
----------------------------------
Input   String: right.
Encoded String: cmlnaHQu
Decoded String: right.
----------------------------------
As you can see in the name of the class, it is a slower version of the Base64 algorithm. A faster version would be a better option for your commercial use. I will try to write that too, and put up on this weblog. Keep watching.