Papers by hariprakash govindarajalu
High performance on Java applications running on server and desktop machines requires fast execut... more High performance on Java applications running on server and desktop machines requires fast execution of Java bytecodes. Such performance can be achieved by Just-In-Time (JIT) compilers, which translate the stack-based bytecodes into register-based machine code on demand. But one crucial problem in Java JIT compilation is the compilation time, which increases the total execution time of an application. So it is necessary to reduce the JIT compilation time as much as possible. In this paper we propose a front-end hardware compilation pipeline that performs the compilation of bytecodes into native machine code on-the-fly in hardware and pass the compiled code to a backend native processor for execution. The bytecodes are translated into three-address intermediate representation form, by mimicking the stack operations, before performing a series of optimizations in hardware. The optimized three-address codes are used for code generation and architectural register allocation and then pla...
Number of processors have come up in line since the days of the first simple processors, with eac... more Number of processors have come up in line since the days of the first simple processors, with each newer one being faster and better than its predecessor, chip manufacturers scrutinize the fidelity of processors by running a multitude of tests at the foundry and tested during each boot-up of the system in the form of Power-On-Self-Test (POST). This paper focuses on diagnosing Floating-Point Unit (FPU) faults by calculating value of π to a large number of digits using the Borwein-Borwein iterative quadratic convergent algorithm (BB algorithm). The calculation has known to be very computational intensive. By analyzing the CPU's behavior closely during this calculation, we wish to bring out the various Floating-point arithmetic circuits that are stressed and the degree to which they are stressed by examining the effective time between two successive computations in a unit. Hardware failures cannot be attributed to defects in its manufacture but rather to its wear after prolonged us...
Medical records of Mental Health patients are highly unstructured and hence mining the informatio... more Medical records of Mental Health patients are highly unstructured and hence mining the information for discovery of patterns for prevention, diagnosis and treatment requires new data mining techniques. In this paper we present a novel Iterative Selection Algorithm (IDSA) for data selection, cleaning and formatting the Psychiatry Electronic Medical Records. A patient's record consists of demographic information, illness history, personal history, family history, mental state examination (MSE), Physical state examination (PSE), medical history, psycho-social stressor information, scales, prescriptions, treatment and diagnosis information. The IDSA is repeatedly applied to retrieve EMR that matches the selection criteria to narrow down the selection. The selected records may be further analyze using automated mining tools or manually analyze to study the behavior of the mental illness.
Calculating arbitrary precision of π stresses the fixed-point, floating-point, logic, shift, bran... more Calculating arbitrary precision of π stresses the fixed-point, floating-point, logic, shift, branch prediction and pipelining circuits of a CPU. Testing of a CPU using two versions of algorithmically different programs to generate π and verifying the result by comparison is proposed to ensure the integrity of the CPU. The rapidly convergent compute intensive Borwein-Borwein (BB) algorithm is used as the CPU stressor and Bailey-Borwein-Plouffe (BBP) digit-extraction spigot algorithm as a result-checker is studied in this paper. The faulty CPU is identified, if there is a difference in the result produced by both the algorithms. This paper describes a technique to verify the integrity of the CPU by selecting the first algorithm from the class of iterative convergence based algorithms to generate the first n digits and the second algorithm to extract the n th digit to verify the result of the first algorithm. This technique is superior to the existing method of verifying the result wit...
Proceedings International Parallel and Distributed Processing Symposium
The available Instruction Level Parallelism in Java bytecode (Java-ILP) is not readily exploitabl... more The available Instruction Level Parallelism in Java bytecode (Java-ILP) is not readily exploitable using traditional in-order or out-of-order issue mechanisms due to dependencies involving stack operands. The sequentialization due to stack dependency can be overcome by identifying bytecode traces, which are sequences of bytecode instructions that when executed leave the operand stack in the same state as it was at the beginning of the sequence. Instructions from different bytecode traces have no stack-operand dependency and hence can be executed in parallel on multiple operand stacks. In this paper, we propose a simultaneous multitrace instruction issue (SMTI) architecture for a processor that can issue instructions from multiple bytecode traces to exploit Java-ILP. The proposed architecture can easily take advantage of nested folding to further increase the ILP. Extraction of bytecode traces and nested bytecode folding are done in software, during the method verification stage, and add little run-time overhead. We carried out our experiments in our SMTI simulation environment with SPECjvm98, Scimark benchmarks and the Linpack workload. Simultaneous multi-trace issue combined with nested folding resulted in an average ILP speedup gain of 54% over the base inorder single-issue Java processor.
High Performance Computing, 1996
A homogeneous system of PCs, workstations, minicomputers etc., connected together via a local are... more A homogeneous system of PCs, workstations, minicomputers etc., connected together via a local area network or wide area network represents a large pool of computational power. However, in a network of PCs and workstations, transparency is not provided and hence, users are aware of other machines. PARDISC is a parallel programming environment, which provides the needed transparency as a scalable
Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001, 2001
Uploads
Papers by hariprakash govindarajalu