This MedLibrary.org supplementary page on Algorithmic efficiency is provided directly from the open source Wikipedia as a service to our readers. Please see the note below on authorship of this content, as well as the Wikipedia usage guidelines. To search for other content from our encyclopedia supplement, please use the form below:
Related Sponsors
| This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (October 2008) |
In computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. The two most frequently encountered are
- speed or running time - the time it takes for an algorithm to complete, and
- space - the memory or 'non-volatile storage' used by the algorithm during its operation.
but also might apply to
- transmission size or external memory such as required bandwidth or disk space
The process of making code as efficient as possible is known as Optimization and in the case of automatic optimization (i.e. compiler optimization) - performed by compilers (on request or by default) - usually focus on space at the cost of speed, or vice versa. There are also quite simple programming techniques and 'avoidance strategies' that can actually improve both at the same time, usually irrespective of hardware, software or language. Even the re-ordering of nested conditional statements to put the least frequently occurring condition first (example: test patients for blood type ='AB-', before testing age > 18, since this type of blood occurs in only about 1 in 100 of the population - thereby eliminating the second test at runtime in 99% of instances), can reduce actual instruction path length, something an optimizing compiler would almost certainly not be aware of but which a programmer can research relatively easily even without specialist medical knowledge.
Speed
The absolute speed of an algorithm for a given input can simply be measured as the duration of execution (or clock time) and the results can be averaged over several executions to eliminate possible random effects. Most modern processors operate in a multi-processing & multi-programming environment so consideration must be made for parallel processes occurring on the same physical machine, eliminating these as far as possible. A relative measure of an algorithms performance can sometimes be gained from the total instruction path length which can be determined by a run time Instruction Set Simulator (where available).
An estimate of the speed of an algorithm can be determined in various ways. The most common method uses time complexity to determine the Big-O of an algorithm. See Run-time analysis for estimating how fast a particular algorithm may be according to its type (example: lookup unsorted list, lookup sorted list etc) and in terms of scalability - its dependence on 'size of input', processor power and other factors.
Memory
Often, it's possible to make an algorithm faster at the expense of memory. This might be the case whenever the result of an 'expensive' calculation is cached rather than recalculating it afresh each time. The additional memory requirement would, in this case, be considered additional overhead although, in many situations, the stored result occupies very little extra space and can often be held in pre-compiled static storage, reducing not just processing time but also allocation & deallocation of working memory. This is a very common method of improving speed, so much so that some programming languages often add special features to support it, such as C++'s 'mutable' keyword.
The memory requirement of an algorithm is actually two separate but related things:-
- The memory taken up by the compiled executable code (the object code or binary file) itself (on disk or equivalent, depending on the hardware and language). This can often be reduced by preferring run-time decision making mechanisms (such as virtual functions and run-time type information) over certain compile-time decision making mechanisms (such as macro substitution and templates). This, however, comes at the cost of speed.
- Amount of temporary "dynamic memory" allocated during processing. For example, dynamically pre-caching results, as mentioned earlier, improves speed at the cost of this attribute. Even the depth of sub-routine calls can impact heavily on this cost and increase path length too, especially if there are 'heavy' dynamic memory requirements for the particular functions invoked. The use of copied function parameters (rather than simply using pointers to earlier, already defined, and sometimes static values) actually doubles the memory requirement for this particular memory metric (as well as carrying its own processing overhead for the copying itself. This can be particularly relevant for quite 'lengthy' parameters such as html script, javascript source programs or extensive freeform text such as letters or emails.
(See also sections Choice of instruction or data type and Avoiding costs concerning deliberate 'stretching' of data structures to force them to have a fixed length which then becomes a multiple of 2,4, 8 etc.)
Transmission size
Data compression algorithms can be useful because they help reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. This however also comes at a cost - which is additional processing time to compress and subsequently decompress. Depending upon the speed of the data transfer, compression may reduce overall response times which, ultimately, equates to speed - even though processing within the computer itself takes longer. Transmission sizes of high resolution images for web pages can sometimes be acceptable at a much lower resolutions and this technique is one of several methods used by such commercial products as ONSPEED. For audio , MP3 is a compression method used extensively in portable sound systems. The efficiency of a data compression algorithm relates to the compression factor and speed of achieving both compression and decompression. For the purpose of archiving an extensive database, it might be considered worthwhile to achieve a very high compression ratio, since decompression is less likely to occur on the majority of the data.
Rematerialization
It has been argued that Rematerialization (re-calculating) may occasionally be more efficient than holding results in cache. This is the somewhat non-intuitive belief that it can be faster to re-calculate from the input - even if the answer is already known - when it can be shown, in some special cases, to decrease "register pressure". Some optimizing compilers have the ability to decide when this is considered worthwhile based on a number of criteria such as complexity and no side effects, and works by keeping track of the expression used to compute each variable, using the concept of available expressions.
Optimization techniques
Environment specific
Optimization of algorithms frequently depends on the properties of the machine the algorithm will be executed on as well as the language the algorithm is written in and chosen data types. For example, a programmer might optimize code for time efficiency in an application for home computers (with sizable amounts of memory), but for code destined to be embedded in small, "memory-tight" devices, the programmer may have to accept that it will run more slowly, simply because of the restricted memory available for any potential software optimization.
For a discussion of hardware performance, see article on Computer performance which covers such things as CPU clock speed, cycles per instruction and other relevant metrics. For a discussion on how the choice of particular instructions available on a specific machine effect efficiency, see later section 'Choice of instruction and data type'.
General techniques
- Table look-up's in particular can be very expensive in terms of execution time but can be reduced significantly through use of efficient techniques such as indexed arrays and binary searches. Using a look-up on 1st occurrence and indexed thereafter is an obvious compromise.
- Use of indexed program branching, utilizing branch tables to control program flow, (rather than using multiple conditional IF statements or unoptimized CASE/SWITCH) can drastically reduce instruction path length, simultaneously reduce program size and even also make a program easier to read and more easily maintainable (in effect it becomes a 'decision table' rather than repetitive spaghetti code).
Dependency trees and Spreadsheets
Spreadsheets are a 'special case' of algorithm that self optimize by virtue of their dependency trees that are inherent in the design of spreadsheets in order to reduce re-calculations when a cell changes. The results of earlier calculations are effectively cached within the workbook and only updated if a another cells changed value effects it directly.
Searching strings
Searching for particular text strings in long sequences of characters potentially generates lengthy instruction paths. This includes searching for delimiters in comma separated files or similar processing which can be very simply and effectively eliminated (using declarative notation for instance).
Several methods of reducing the cost for general searching have been examined and the "Boyer–Moore string search algorithm" (or Boyer–Moore–Horspool algorithm, a similar but modified version) is one solution that has been proven to give superior results to repetitive comparisons of the entire search string along the sequence [1].
Hot spot analyzers
Special system software products known as "performance analyzers" are often available from suppliers to help diagnose "hot spots" - during actual execution of computer programs - using real or test data - they perform a Performance analysis under generally repeatable conditions [2]. They can pinpoint sections of the program that might benefit from specifically targeted programmer optimization without necessarily spending time optimizing the rest of the code. Using program re-runs, a measure of relative improvement can then be determined to decide if the optimization was successful and by what amount. Instruction Set Simulators can be used as an alternative to measure the instruction path length at the machine code level between selected execution paths or on the entire execution.
Benchmarking & competitive algorithms
For new versions of software or to provide comparisons with competitive systems, benchmarks are sometimes used which assist with gauging an algorithms relative performance. If a new sort algorithm is produced for example it can be compared with its predecessors to ensure that at least it is efficient as before with known data - taking into consideration any functional improvements. Benchmarks can be used by customers when comparing various products from alternative suppliers to estimate which product will best suit their specific requirements in terms of functionality and performance. For example in the mainframe world certain proprietary sort products from independent software companies such as Syncsort compete with products from the major suppliers such as IBM for speed. Some benchmarks provide opportunities for producing an analysis comparing the relative speed of various compiled and interpreted languages for example [3] [4] and The Computer Language Benchmarks Game1 compares the performance of implementations of typical programming problems in several programming languages.
Compiled versus Interpreted languages
A compiled algorithm will, in general, execute faster than the equivalent interpreted algorithm simply because some processing is required even at run time to 'understand' (i.e. interpret) the instructions to effect an execution. A compiled program will normally output an object or machine code equivalent of the algorithm that has already been processed by the compiler into a form more readily executed by microcode or the hardware directly. The popular Perl language is an example of an interpreted language and benchmarks indicate that it executes approximately 24 times more slowly than compiled C [5].
Just-in-time compilers
'On-the-fly' processors known today as just-in-time or 'JIT' compilers combine features of interpreted languages with compiled languages and may also incorporate elements of optimization to a greater or lesser extent. Essentially the JIT compiler can compile small sections of source code statements (or bytecode) as they are newly encountered and (usually) retain the result for the next time the same source is processed. In addition, pre-compiled segments of code can be in-lined or called as dynamic functions that themselves perform equally fast as the equivalent 'custom' compiled function. Because the JIT processor also has access to run-time information (that a normal compiler can't have) it is also possible for it to optimize further executions depending upon the input and also perform other run-time introspective optimization as execution proceeds. A JIT processor may, or may not, incorporate self modifying code or its equivalent by creating 'fast path' routes through an algorithm. It may also use such techniques as dynamic Fractional cascading or any other similar runtime device based on collected actual runtime metrics. It is therefore entirely possible that a JIT compiler might (counter intuitively) execute even faster than an optimally 'optimized' compiled program.
Genetic algorithm
In the world of performance related algorithms it is worth mentioning the role of Genetic algorithms which compete using similar methods to the natural world in eliminating inferior algorithms in favour of more efficient versions.
Object code optimizers
Some proprietary program optimizers such as the "COBOL Optimizer" developed by Capex Corporation in the mid 1970's for COBOL, actually took the unusual step of optimizing the Object code (or binary file) after normal compilation. This method depended upon knowledge of 'weaknesses' in the standard IBM COBOL compiler and actually replaced (or patched) sections of the object code with more efficient code. The replacement code might replace a linear table lookup with a binary search for example or sometimes simply replace a relatively 'slow' instruction with a known faster one that was otherwise functionally equivalent within its context. For example on the IBM/360 hardware the CLI instruction was, depending on the particular model, between twice and 5 times as fast as a CLC instruction for single byte comparisons [6] [7].
Choice of instruction or data type
Particularly in an Assembler language (although also applicable to HLL statements), the choice of a particular 'instruction' or data type, can have a large impact on execution efficiency. In general, instructions that process variables such as signed or unsigned 16 bit or 32 bit integers are faster than those that process floating point or packed decimal. Modern processors are even capable of executing multiple 'fixed point' instructions in parallel with the simultaneous execution of a floating point instruction. If the largest integer to be encountered can be accommodated by the 'faster' data type, defining the variables as that type will result in faster execution - since even a non-optimizing compiler will, in-effect, be 'forced' to choose appropriate instructions that will execute faster than would have been the case with data types associated with 'slower' instructions. Assembler programmers (and optimizing compiler writers) can then also benefit from the ability to perform certain common types of arithmatic for instance - division by 2, 4, 8 etc by performing the very much faster binary shift right operations (in this case by 1,2 or 3 bits).
If the choice of input data type is not under the control of the programmer, although prior conversion (outside of a loop for instance) to a faster data type carries some overhead, it can often be worthwhile if the variable is then to be used as a loop counter, especially if the count could be quite a high value or there are many input values to process. As mentioned above, choice of individual assembler instructions (or even sometimes just their order of execution) on particular machines can effect the efficiency of an algorithm. See Assembly Optimization Tips for one quite numerous arcane list of various technical (and sometimes non-intuitive) considerations for choice of assembly instructions on different processors that also discusses the merits of each case. Sometimes microcode or hardware quirks can result in unexpected performance differences between processors that assembler programmers can actively code for - or else specifically avoid if penalties result - something even the best optimizing compiler may not be designed to handle.
Software validation versus hardware validation
An optimization technique that was frequently taken advantage of on 'legacy' platforms was that of allowing the hardware (or microcode) to perform validation on numeric data fields such as those coded in (or converted to) packed decimal (or packed BCD). The choice was to either spend processing time checking each field for a valid numeric content in the particular internal representation chosen or simply assume the data was correct and let the hardware detect the error upon execution. The choice was highly significant because in order to to check for validity on multiple fields (for sometimes many millions of input records), it could occupy valuable computer resources. Since input data fields were in any case frequently built from the output of earlier computer processing, the actual probability of a field containing invalid data was exceedingly low and usually the result of some 'corruption'. The solution was to incorporate an 'event handler' for the hardware detected condition ('data exception)' that would intercept the occasional errant data field and either 'report, correct and continue' or, more usually, abort the run with a core dump to try to determine the reason for the bad data. Similar event handlers are frequently utilized in today's web based applications to handle other exceptional conditions but repeatedly parsing data input,in order to ensure its validity before execution, has nevertheless become much more commonplace - partly because processors have become faster (and the perceived need for efficiency in this area less significant) but, predominantly - because data structures have become less 'formalized' (eg. .csv and .tsv files) or uniquely identifiable (eg. packed decimal). The potential savings using this type of technique may have therefore fallen into general dis-use as a consequence and therefore repeated data validations and repeated data conversions have become an accepted overhead. Ironically, one consequence of this move to less formalized data structures is that a corruption of say, a numeric binary integer value, will not be detected at all by the hardware upon execution (for instance: is an ASCII hexadecimal value '20202020' a valid signed or unsigned binary value - or simply a string of blanks that has corrupted it?)
Avoiding costs
- Defining variables as integers for indexed arrays instead of floating point will result in faster execution (see above).
- Defining structures whose structure length is a multiple of a power of 2 (2,4,8,16 etc), will allow the compiler to calculate array indexes by shifting a binary index by 1, 2 or more bits to the left, instead of using a multiply instruction will result in faster execution. Adding an otherwise redundant short filler variable to 'pad out' the length of a structure element to say 8 bytes when otherwise it would have been 6 or 7 bytes may reduce overall processing time by a worthwhile amount for very large arrays. See [8] for generated code differences for C as for example.
- Storage defined in terms of bits, when bytes would suffice, may inadvertently involve extremely long path lengths involving bitwise operations instead of more efficient single instruction 'multiple byte' copy instructions. (This does not apply to 'genuine' intentional bitwise operations - used for example instead of multiplication or division by powers of 2 or for TRUE/FALSE flags).
- Unnecessary use of allocated dynamic storage when static storage would suffice, can increase the processing overhead substantially - both increasing memory requirements and the associated allocation/deallocation path length overheads for each function call.
- Excessive use of function calls for very simple functions, rather than in-line statements, can also add substantially to instruction path lengths and stack/unstack overheads. For particularly time critical systems that are not also code size sensitive, automatic or manual inline expansion can reduce path length by eliminating all the instructions that call the function and return from it or .A conceptually similar method, loop unwinding, eliminates the instructions required to set up and terminate a loop by, instead; repeating the instructions inside the loop multiple times. This of course eliminates the branch back instruction but also increases the size of the binary file or, in the case of JIT built code, dynamic memory. Care must be taken with this method however that re-calculating addresses for each statement for an unwound indexed loop is not more expensive than incrementing pointers within the former loop.
Readability, trade offs and trends
One must be careful, in the pursuit of good coding style, not to over-emphasize efficiency. Frequently, a clean, readable and 'usable' design is much more important than a fast, efficient design that is hard to understand. There are exceptions to this 'rule' (such as embedded systems, where space is tight, and processing power minimal) but these are rarer than one might expect.
However, increasingly, for many 'time critical' applications such as air line reservation systems, point-of-sale applications, ATM's (cash-point machines), Airline Guidance systems, Collision avoidance systems and numerous modern web based applications - operating in a real-time environment where speed of response is fundamental - there is little alternative.
Determining if optimization is worthwhile
The essential criteria for using optimized code is of course dependant upon the expected use of the algorithm. If it is a new algorithm and is going to be in use for many years and speed is relevant, it is worth spending some time designing the code to be as efficient as possible from the outset. If an existing algorithm is proving to be too slow or memory becoming an issue, clearly something must be done to improve it.
For the average application, or for one-off applications, avoiding inefficient coding techniques and encouraging the compiler to optimize where possible may be sufficient.
One simple way (at least for mathematicians!) to determine whether an optimization is worthwhile is as follows: Let the original time and space requirements (generally in Big-O notation) of the algorithm be O1 and O2. Let the new code require N1 and N2 time and space respectively. If N1N2 < O1O2, the optimization should be carried out. However, as mentioned above, this may not always be true.
Implications for algorithmic efficiency
A recent report, published in December 2007, from Global Action Plan [9], a UK-based environmental organisation found that computer servers are "at least as great a threat to the climate as SUVs or the global aviation industry" drawing attention to the carbon footprint of the IT industry in the UK. [10] [11]. According to an Environmental Research Letters report published in September 2008, "Total power used by information technology equipment in data centers represented about 0.5% of world electricity consumption in 2005.When cooling and auxiliary infrastructure are included, that figure is about 1%. The total data center power demand in 2005 is equivalent (in capacity terms) to about seventeen 1000 MW power plants for the world."[12]
Computers having become increasingly more powerful over the past few decades, emphasis was on a 'brute force' mentality. This may have to be reconsidered in the light of these reports and more effort placed in future on reducing carbon footprints through optimization. It is a timely reminder that algorithmic efficiency is just another aspect of the more general thermodynamic efficiency. The genuine economic benefits of an optimized algorithm are, in any case, that more processing can be done for the same cost or that useful results can be shown in a more timely manner and ultimately, acted upon sooner.
See also
- Animation of the Boyer-Moore algorithm
- Binary search algorithm - a simple and efficient technique for searching sorted arrays
- Benchmark - a method for measuring comparative execution times in defined cases
- Best, worst and average case - considerations for estimating execution times in three scenarios
- Branch table - a technique for reducing instruction path-length, size of machine code, (and often also memory}
- Compiler optimization - compiler derived optimization
- also Optimizing compiler - a specifically designed compiler for optimization at compile time
- Computational complexity theory
- Computer performance, - computer hardware metrics
- Data compression, - reducing transmission bandwidth and disk storage
- Index (information technology) - a technique for fast lookup using indexes
- Locality of reference - for avoidance of caching delays caused by non-local memory access
- Loop optimization
- Optimization (computer science)
- Performance analysis, methods of measuring actual performance of an algorithm at run-time
- Real-time computing, for further examples of time critical applications
- Run-time analysis, estimation of expected run-times and an algorithms scalability
- Virtual method table branch table with dynamically assigned pointers for dispatching
References
Wikipedia content modification information:
- This page was last modified on 14 November 2008, at 07:01.
Wikipedia Authorship and Review
Wikipedia content provided here is not reviewed directly by MedLibrary.org. Wikipedia content is authored by an open community of volunteers and is not produced by or in any way affiliated with MedLibrary.org.
Wikipedia Usage Guidelines
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article on "Algorithmic efficiency".
The URL for this specific entry is:
All Wikipedia text is available under the terms of the GNU Free Documentation License. (See Copyrights for details). Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.
