How To Write Efficient Programs in Java
Introduction
This article is devoted to the performance of Java applications. It
refers to various types of problems; first of all to identify the constraints to
be taken into account beyond the performance, then the principles to guide the
efficient allocation of treatment, and to analyze the optimization of these
treatments in the case of the database and its interaction with the rest of the
application.
From the outset, we must recognize that the performance of a program depends
on several factors listed here in random order: 1) quality of the hardware and
computer network, 2) operations and requests from users, 3) quality of the
design of the database and tables, 4) DBMS performance 5) performance of the
application server 6) performance of drivers access to databases and mapping
tools OR used, 7) performance libraries used for the development of client
programs, 8) software architecture and distribution of treatment, 9) algorithms
used in treatment, 10) choices made during programming.
It is important to note that in the programs, the performance is not an end
in itself. It is only a means. Today, work stations are very powerful.
Therefore, most often a program running with one user logged will offer
sufficient speed even with bad algorithms (in terms of performance).
Unfortunately, there are cases where we need efficient algorithms. This may
be due to the fact that we write a server program (servlet, ejb, etc..) that
should carry a load of several clients. This may also be due to the need to
handle huge databases, or do complex calculations requiring the consideration of
a large number of cases. Such cases are very common in optimization problems.
We leave it to the reader to discover in the algorithms courses how to write
good algorithms and how to calculate their complexity. We will certainly review
a few principles of algorithms, but we put ourselves especially in the context
of "how to make efficient programs in Java."
As I said above, the performance is not an end in itself. Also, it is
generally advisable to programmers not to try the first time to write efficient
programs, but programs that work. I actually think that there are two extremes
to avoid. The first is that you will want from the outset to make high
performance programs. The second is to want simply to make a program that works.
The Main Errors
Want from the outset to make high performance programs
This is a mistake that raises at least three problems. The first problem is
that you lose a lot of time trying to find the best algorithms, however, time is
really what we need most. Furthermore, the best algorithms are often very
complex and subtle, and they become even more difficult to debug. For example,
any young programmer can easily understand the selection sort and debug it when
it does not work. But for the quick sort, it is a different story. The gain
provided by a better algorithm does not always justify the effort to get there.
The second problem is that the fastest algorithms for a given problem are
those that maximize all the specifics of this problem. Therefore, they have a
kind enough frozen and can not be used as such for other problems. It is
therefore difficult to reuse the code of the best algorithms. However, in many
cases, it will be important to reuse the code than to achieve certain levels of
performance.
The third problem is that in order to truly optimize, you should know where
the time is lost. This is information that you can have by analyzing the
algorithm, but this is information that we mostly have by running the algorithm.
For example, by analyzing an algorithm, you can have an idea that can allow you
to run 100 times faster than a portion of this algorithm. You then make big
changes to get there. If this portion is executed in 1ms before, going 100 times
faster, you have earned less than 1ms. If it is called only once in your
program, we can consider that in fact you have not won.
Does this mean that we should not worry about the performance at the
beginning? No it does not.
Want simply to make a program that works
Personally, I have several times had to rewrite programs because from the
beginning I just wanted to do what works. I actually got programs that walked
but were so slow that they were ultimately unusable. By rewriting, I put up
simple optimizations which I thought up from the beginning, and did not really
complicate the final program. You can often have to rewrite, if at the outset
you do not try to do it well. Therein lies the distinction. It is not to make
the best from the start, but to do it well. Writing efficient programs without
being necessarily the best programs. We are not looking for perfection, but
efficiency.
Write the Successful Programs: The Algorithmic Techniques
Search in the literature
The first technique is to look naturally if there is not in the literature or
in existing programs a simple and efficient solution for your problem. You can
then find the algorithms that you will program or that will even already
programmed.
Select data structures appropriate
If you must write an algorithm, select the appropriate data structures for
manipulating collections. API java (collections) provide a lot of collections
that you can use in your programs. So you should learn to use these collections.
We give just a few general considerations for their choice:
Your Problem |
Suitable Structure |
Research based on a key (name, address, number, etc.). No need to manage the
order of items in your structure |
Hashmap, Hashtable |
Research based on a key. Need to know the order of input elements, but no
reorganization of the structure. |
Hashmap + ArrayList (or Vector) |
Research based on a key. Need for access to the least recently element
obtained by the key (or inserted) |
DoubleLinkedHashMap or (hashtable DoubleLinkedlist +) or cachelru |
Research and multiple insertions with a need to maintain order |
Treeset |
Search by position, insertion at the end of the structure |
Arraylist ;Vector |
Insertions anywhere, no research |
lists |
Search by position, no insertions necessary ; size of the structure
defined in its creation |
tables |
Table to represent choice options boolean (true or false) |
BitSet |
Use BitSet to reduce memory usage when it is possible. As an example, we will
quickly see how to use the BitSet in the simulation of a problem called "family
problem." The statement of this problem is as follows. A family consists of n
people who sit once a day around a table of n chairs, each time selecting the
chairs perfectly random. On average, after how many days each member shall sit
on all the chairs? In an attempt to solve the problem by simulation, we choose
to associate to each member a vector of n integers. Position i of the vector
will have the value 1 if the member is already sitting on the chair I and 0
otherwise. The simulation ends when the vector of each member is set to 1 in all
positions.
It is clear that using a BitSet instead of a vector would save a lot of
space, since in the final space used will correspond to n2(square) bits rather
than n2(square) integers.
A naive solution to this problem by simulation leads to a solution that runs
at least in O (n3), however, it is possible to do much better. Note here that it
is possible to combine several data structures as needed. This implies that each
manipulated object is inserted into each of the data structures. One might think
of duplication of data. There is nothing like that. Java handles object
references. So when you place an object in a vector and a hash table, only
references to that object are placed. The object is not duplicated. Therefore,
any manipulation done on the object from a structure appears immediately to the
one who has accesses to the object from another structure, because all
structures refer to the same object.
Select the Appropriate Sorting Algorithms
In many problems, it is necessary to sort functions. The standard Java API
provides a stable and efficient sort by merger, running in O (n log (n)). This
sorting can be used for most cases. However, in some problems where sorts are
recurring, it is sometimes possible to use more efficient sorting. We can
therefore think about sorting by counting, sorting packet or radix sort which
are linear sorts.
For example, suppose that we have to sort the individuals in a population of
students according to their age. It is reasonable to assume that the age of a
student is between 15 and 100 years. In this case, sorting by counting will be
particularly effective.
We may also have to do a program on a graph with a large number of nodes,
each with a number of neighbors not exceeding a relatively small integer (eg
40), and we look for some treatments to the nodes having the smallest number of
neighbors. In some cases, it may be useful to sort the nodes according to the
number of their neighbors. The use of a linear sorting here will be very
beneficial if the sorting is done often.
Reduce the Creation of Objects
In java, the creation of an object is a time-consuming operation. Also, to
the extent possible, it is recommended to create the minimum possible objects.
Every effort will be made to reuse objects. Note that it is not to create the
minimum possible classes.
An object is created by calling a constructor or a
method that calls one. Declare an object or write a new class is not the
creation of objects. It is essentially to minimize the number of calls to the
new operator in the program.
Some cases exceptions
Rather than propagating Exceptions in your code by making every time throw
new Exception (msg), you can simply write similar code to this one:
/ * declaration of a variable somewhere in the class type of the exception to
propagate if error * / Exception badValue = new BadValueExeption ();
/ ** * This method is intended to propagate an exception type
badvalueException * / public void TestValue throws BadValueExeption {
/ / here there was an error and we will propagate the exception with the
message msg badValue.setMessage (msg) ;/ * If this method does not exist, write
in BadValueException class * / throw badValue; / * magic! we do not create a new
object, even if this error occurs 1000 times. Of course if you must handle
concurrency, it may be necessary to use caches to avoid mixed messages for
different user (see covers below) * / }
Events
The creation of events in the Java model can be very time consuming. Also,
when you want to do mass treatment on objects that are affected listeners of
events, if you do not need for such treatments to make listeners work, it is
better first of all to disable them in order to avoid creating a large number of
event objects, which will slow down the program and saturate the memory.
Deactivation can be done by a method for this purpose or by removing the
listener before treatment to recover it after.
For example, if you use jbuilder and you want to browse a dataset that
contains multiple records ( by contenting to read these records or perform
treatments that should not involve work by auditors related to datasets), it
should proceed as follows:
/ / our dataset called ds ds.enableDataSetEvents (false);/ / it and cancels
the creation and propagation of events try { makeAllProcesses () ;/ / on all
treatments writing to do (while loop, etc.) } Catch (Exception exp) { / / handle
the exception. If necessary, spread by the throw exp; Finally {}
enableDataSetEvents (true);/ / return the dataset in its normal state }
Connections
See cache usage later in this section
The StringBuffer and StringBuilder
Imagine an object with a toString function.
public String toString () { StringBuffer bf = new StringBuffer (); / / fill
the buffer treatment / /?. / / Return the result return bf.toString (); }
If this function will be called multiple times on the same object, it can be
improved as follows:
StringBuffer bf = new StringBuffer (); public String toString () {
bf.setLength (0); / / fill the buffer treatment / /? / / Return the result
return bf.toString (); }
The difference here is that the StringBuffer object is created once, and then
we simply use it.
Allow Setting Attributes Outside Constructors
Ideally, each of your classes should have a constructor with no arguments
(may be others that accept arguments of course). Users of these classes should
be able to set the properties by setXXX or getXXX methods. With this, an object
can be created and reused by changing its properties. In some cases, there may
be the init method to reset the object. Do not allow this setting requires users
of your classes to make several creations of objects.
The list is not exhaustive and these examples should just inspire you.
Use Object Pooling
This is a collection in which you store the objects with which you work. When
you need an object, you ask to the pool. If there is an unused object, it passes
it to you. Otherwise, it creates a new object and passes it to you. When you
have finished, you turn the object to the pool so that it can have it for others
or simply destroy it. In fact, you should limit the number of objects that the
pool may contain, for otherwise it would saturate the memory with stored and
unnecessary items. It would take a strong bias for a while, which would lead him
to create a large number of objects. If it is no longer applied at the same
level, several objects will remain in the pool and unnecessarily occupy memory.
You have in MDAL library a Pool class that allows you to manage object
pooling.
Collections
Several collections have methods allowing to clear it (clear, empty, etc.).
When it is necessary to work on several collections objects, it is better when
it is possible to use a single object and empty it in order to reuse it again
(as was done above with the StringBuffer), rather than creating more.
The StringTokenizer
Use indexOf and substring is faster than using a StringTokenizer where
applicable.
Use Caches
We could define a cache as an object in which we store the result of a
time-consuming operation, so that you do not have to repeat this process again
if we had need of said result. In general, an object in a pool is supposed to be
used exclusively by the person who takes it from the pool, while the objects
placed in a cache must be able to be accessed simultaneously by many clients as
needed.
Since in many problems, there are always transactions that return, caches are
one of the most effective ways to improve the performance of computer systems.
Some examples of using caches Cache of access to Internet
Used by a browser, it is an object (or space) in which files recently
requested Web are stored (web pages, images, etc.). When one of these files is
pressed again, it is directly recover from the cache, eliminating the time
needed to download it from the Internet. Such a cache often has two levels: a
space in memory and disk space. The memory access is faster, so you can save
time.
Memory Cache of the Computer
It allows a faster access than main memory to recent data and therefore allow
us to save time
Connection Pools
The creation of objects is a time-consuming operation, it is more when it
comes to connections (connection to a database for example). Also, to save time,
one strategy is to maintain connections that have already been created, so you
can use it for other applications.
Cache Strategy Research
Imagine a program that makes combinatorial searches for solutions to a
problem. One could cite games, but for a more serious problem we mention the
problem of cutting glass. We're looking for how to cut glass plates in trays in
stock to make an item. In such a problem, we can imagine that we cut three
plates in the first set, then calculate all possible cuts of the second plate.
If you do not find a satisfactory solution to the problem, we can consider
instead the possibility of cutting four plates in the first set. We do not wish
to re-calculate all possible cuts of the first plate. Also, it would be nice to
store the result as soon as we got it for the first time.
Generally, as soon as it may be required to reuse the result of a long
process, one must seriously consider the possibility of storing the results in a
cache.
Caches Problems Problem of cache efficiency
The cache uses memory to save time. However, memory is not infinite. It
becomes necessary to decide what can be put in the cache or what must be out
when it is full and there is a new object to insert.
Several strategies exist for managing the cache: LRU: least recently used
(least recently used) UFA: least frequently used (less frequently used). FIFO:
first in fist out (first in, first out) LIFO: last in, first out (last in, first
out) RANDOM: Random Management of Custom Cache: Particular strategy obeying
other criteria than those above.
The problem of cache management is to find the cache strategy for improving
the effectiveness of a better program. The problem is to avoid putting out of
cache costly results in time that will be sought immediately after (we must
resume operation which helped get them) or put in the cache an object that will
not often be asked. Naturally, we can mathematically represent the problem. I
leave that to the mathematicians.
Just note that if you have to manage caches, it has been observed that the
strategies mentioned above, LRU and LFU generally give better results.
LRU is very easy to implement in an effective manner (yes, cache management
can itself become a source of reduced performance), you can use it. From jdk1.4,
the DoubleLinkedHashMap object allows us to do it.
Problem of Relevance of Cache
For some problems, it may happen that the data cache does not reflect
reality. For example, a web page caching may differ from the current version of
the page. The results of a query, cached may not be accurate if the database has
changed. If it can happen, it will be necessary to develop a strategy for
automatic detection of changes in the original to update the cache (for example,
ask the last modified date of a file and compare it to what is in the cache), or
any other strategy in order to put in the cache recent data.
Proper management of caches is certainly one of the best sources of program
performance.
Write the Successful Programs: The Techniques for Programming
Use the latest version of the jre Use profilers
The profiler is a tool which allows us to monitor the execution of a program
and therefore determine what time the program spent in each method. With the
profiler, it quickly determines 20% of the code in which the program spend 80%
of their time, and you can put in place an effective optimization strategy based
solely on this part of the code.
Java provides an integrated profiler. To learn how to use it, use the
command:
java-Xrunhprof: help or the command java-prof: help
An example of using this command is
java-Xrunhprof: cpu = samples, depth = 5 com.developp.demo1.TestHprof
Where depth = 5 shows a stack of depth 5. By default, the output of the
profiler will go into java.hprof.txt file. You will find a table showing the
percentage of time spent by the program in each part. You can more easily use
the file by importing it under Excel. There are also many tools on the market
specializing in profiling (profilers). They will help you to both identify the
parts that take the most time, but also those that occupy the most memory.
Failing to use a real profile, the System.currentTimeMillis () method is useful
to know what is the present time. The difference in value between two calls of
this method sets the time between the two calls.
Initialize Collections
Dynamic collections of java (Vector, ArrayList, StringBuffer, etc.) use very
often in internal the tables, which are resized whenever the collection is full.
When you use an empty constructor to create these collections, the original
table is created with a size depending on the collection (see the Java API to
find out). Whenever the collection is full, a new table is created with a size
twice that of the former, then the data is copied from the old to the new. This
process can be time consuming, and if you have an idea of the maximum size
that your collection will have, it should be avoided by directly creating a
collection of this size (especially if you are sure that in almost all the cases
your collection eventually reach this size). Otherwise, use the average maximum
size that your collection reached.
/ / So, if for example you want to create a vector that contains 10,000
items, do not create it with Vector v = new Vector () / * this may cause you
tens of recopies and slow all your program * / / / Create the rather with Vector
v = new Vector (10000) ;/ / no copies will be made.
/ * If in fact the vector is only very rarely 10,000 items but the average
size is around 4000 elements, use * /
Vector v = new Vector (4000); / * recopies, but avoid too often needlessly
waste space. If the most critical factor is the time it is not sure that Vector
(10000) is more faster because it must take into account the time to create a
vector. *
Properly Manage Synchronization
The synchronized keyword can slow threads forcing some to wait while others
are not doing treatments that interfere.
Using Threads
Even on a single processor, the use of threads can improve the overall
performance of a program if it exist a large number of accesses to the hard disk
, on the network or on the screen. when a thread is doing one of these
operations, it frees up the CPU which can then be used by another thread.
Copy Effectively Arrays
Suppose we have index and temp arrays, and we want to copy nb elements of the
index array in the temp array, taking them from the position a. This could be
done by the following loop:
for (int j = a j <a + nb j + +) index [j] = Temp [j];
It is faster to write: System.arraycopy (temp, a, index, 0, nb);
Generally, to make a copy of an array in another, use System.arraycopy allows
you to go faster. Also you can turn your collections in array of type they
contain by the method toArray (collection) (From JDK 1.5)
Use Buffers
To concatenate strings, use the StringBuffer class.
/ / Rather than eg String s = "a"; int n = 100; for (int i = 0; i <n; i + +)
s + = "a";
/ / Do:
StringBuffer bf = new StringBuffer (101); / * Initialize good collections at
the proper size to avoid recopies * / bf.append ("a"); for (int i = 0; i <n; i +
+) bf.append ("a")
The use of StringBuffer is recommended when the object can be manipulated by
several threads. When the object is handled by a single thread (which is often
the case), it is more convenient to use the StringBuilder class (since JDK 1.5).
This class, unlike StringBuffer is not synchronized and is therefore faster. But
if the StringBuilder object must be accessed by multiple threads, you should use
StringBuffer.
For the stream input / output, use the BufferedReader, BufferedOutputStream
and all other classes using buffers . Remember that the initial size of your
buffer is essential (preferably a power of 2 for reading streams of input /
output).
For file handling, API java.nio achieves higher than java.io performance in
quite a few cases.
Make Statements Out of Loops
for (int i = 0; i <n; i + +) { Double y = 2 * i+4 / /? } / / Make Double y;
for (int i = 0; i <n; i + +) { y = 2 * i+ 4 / /? }
Accelerate the Process of Freeing Memory
When you no longer need to use an object, immediately turn the value of this
object to null. The garbage collector will quickly release the memory occupied
by the object.
You can also trigger the action of the garbage collector to free
up unused memory objects with the following code:
r = Runtime.getRuntime (); r.gc ();
Written in a special thread that you have created for the occasion (and not
in your main program). This prevents the memory from being saturated with
unnecessary objects. But used it sparingly, because the garbage collector (GC)
is a process and its execution also requires the resources of the machine.
Normally you do not need to explicitly call the GC in your program, because it
knows how to do his job. However, in some cases very fast object creation, we
observed that the GC does not always eliminated in time objects past to null,
which led to the saturation of the memory. Presumably this is due to the very
low priority of GC compared to other processes. In the presence of such cases,
it may be necessary not only to force the call to GC, but also to mark a short
break in the process of creation of objects, the time for the GC to remove
enough in order that the memory is not saturated by a fast creation of objects.
For such programs, it may be necessary to control the amount of available memory
by calling the freeMemory () method of the class Runtime (Runtime.getRuntime ().
FreeMemory ();) that returns the amount of memory available for the JVM before
calling the GC if it is below a critical threshold.....
To increase the amount of memory available to your java program, use for the
launch, the command
java-Xms1024m
For example, for 1GB of memory (1024Mb), in the case of a launch by jlnp, use
the initial-heap-size variables and max-heap-size to define respectively the
initial and maximum size of the memory used by the program. We should have
something like that:
<j2se version = "1.5+" href = "//java.sun.com/products/autodl/j2se"
initial-heap-size = "256m" max-heap-size = "512m"/>
Programming
Introduction to Java EE (Part 1)
How the Google App Engine Works
WSDL - Web Service Description Language
SOAP: Simple Object Access Protocol
Initiation to Cryptography and Encryption to Secure Data
Introduction to Design Patterns in Java
How To Write Efficient Programs in Java
Proper Management of Logs in .Net
How To Remove FaceBook Action Ids From URL |