Source Code

Henry's github page

@ The UTEP Department of Computer Science

Home

Research Experience

Source Code

Publications

Bio

Name

Source Code

Desccription

LAPACK and BLAS in C

Lapack and Blas Exponential matrix C code

LAPACK Presentation

The programs compute the eigenvalues and left and right eigenvectors of a symmetric and general rectangular matrix A, by solving a linear system of the form.

Ax=b⇒x=A^{-1}Ax

\begin{equation} A\;\x=b \end{equation}

The programs use LAPACK (Linear Algebra PACKage) ans BLAS (Basic Linear Algebra Subprograms) subroutines.

CSPARSE (C library for SPARSE linear systems)

SVL_FFTW_CSPARSE

FFTW-CSPARSE compute the SVL of a 2D unit cell device. The SVL code can be executed on a personal computer if CSPARSE and FFTW are properly installed.

The Fastest Fourier Transform in the west (FFTW)

Desktop version:

2D FFTW-PETSc SVL Code

3D FFTW-PETSC SVL Code

FFTW-PETSc compute the Fourier transform of a 2D and 3D unit cell device. The SVL code can be executed on a personal computer if PETSC and FFTW are properly installed.

For more detail on the code implementation, please review the chapter my MS and Ph.D. thesis at the following links:
MS-CPS Thesis: See chapter 3 for more details on the code
PhD-CPS Thesis: See chapter 2 for more details on the code

Spatially Variant Lattice Algorithm (SVL) with FFTW and PETSC

Desktop version:

DESKTOP_SVL_3D_PETSC

We wrote a portable computer program for parallel architectures with general purpose programming language that supports structured programming. For the parallel code, we use the FFTW (Fastest Fourier Transform in the West) for handling the Fourier transform of the unit cell device and PETSc (Portable, Extensible Toolkit for Scientific Computation) for handling the numerical linear algebra operations. Using Message Passing Interface (MPI) for distributed memory helps us to improve the performance of the code that generates 2D and 3D SVL when it is executed on a parallel system. The SVL code was executed the on Stampede 2 supercomputers at the Texas Advanced Computing Center (TACC), the University of Texas at Austin.

The SVL code can be executed on a personal computer if PETSC and FFTW are properly installed. Also, we are showing the code we execute on Stampede 2 on KNL and SKX architectures

For more detail on the code implementation, please review the chapter my MS and Ph.D. thesis at the following links:
MS-CPS Thesis: See chapter 3 for more details on the code
PhD-CPS Thesis: See chapter 2 for more details on the cod

Bioinformatics using Python

Makefile and python code : Trimommatic_code

Trimmomatic is a lightweight and multithreaded command line java tool application that can be used to trim and crop Illumina FASTQ data as well as to remove Illumina adapter sequences and low quality reads. It can be used to performs a variety of useful trimming tasks for illumina paired-end and single end mode.

It works with FASTQ files (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used). Files compressed such as gzip or bzip2 are supported, and are identified by use of .gz or .bz2 file extensions.

Bioinformatics using HTCondor

python code : HTCondor_code

HTCondor is an open source job scheduler system develop by the University of Wisconsin that specialized in workload management for compute-intensive that supports High Throughput Computing (HTC). HTCondor provides a set of submission file commands that allow jobs to be submitted through various operating systems such as Linux, Unix, Mac OS, and Windows. It can be download. The following is a Bioinformatic project to speed up BLAST performance using HTcondor scheduler system. The project uses the following server computers to set up the pool computer available.

Introduction to Parallel Programing with Message Passing Interface (MPI)

C Code:
MPI Basic code in C

MPI establish a portable, efficient, and flexible standard for message passing that will be used for writing message-passing programs and communication among processes, which have separate address spaces. A process is a program counter and address space. Each MPI process has its own global variables, environment, and does not need to be thread-safe. A process is a copy of the program executed on every computer (node) in the network (cluster) where it is launched.

Introduction to Parallel Programing with Share Memory Open Multi-Processing (OpenMP)

C Code:
OPENMP_Basic in C

OpenMP is a parallel programming model for shared memory and distributed shared-memory multiprocessors. OpenMP uses the fork-join model of parallel execution and provides three kinds of directives: parallelism/work sharing, data environment, and synchronization.

Introduction to Parallel Programing with Message Passing Interface (MPI) with Share Memory Open Multi-Processing (OpenMP)

C code: Hybrid MPI+OpenMP

C++ and Fortran are not include yet.

MPI is a standardized library (not a language) for the collection of processes communicating via messages passing.

OpenMP is an API for shared-memory programming that includes compiler directives, library routines, and environment variables.

A hybrid MPI + OpenMP implementation of an application might be beneficial because it reduces the memory requirements of an application, and improves its performance.

Introducing MPI into OpenMP applications can help scale across multiple Symmetric multiprocessing (SMP) nodes.
Introducing OpenMP into MPI applications can help make more efficient use of the shared memory on Symmetric multiprocessing (SMP) nodes, thus mitigating the need for explicit intra-node communication.
Introducing MPI and OpenMP during the design/coding of a new application can help maximize efficiency, performance, and scaling.

SQL with C/C++ and Python

C/ C++, Python Code

SQL is a standard language for communicating with data in databases. SQL statements are used to perform tasks such as storing, accessing manipulating and updating data, on a database. Here you will find the code I use to learn SQLITE3.

MPAS Performance

Python MPAS_performance_code

This is a control version repository for the development and sharing (public, no available). The python code search, read and extract the information from the MPAS test case outputs. MPAS was executed on different HPC clusters to collect data. The outputs files have information about the layer subroutine performance.

Roofline model

Python & C++ code

The Roofline Model is a visual performance model used to provide the peak performance that we might expect of a given compute kernel or application running on CPU multi-core, or GPU accelerator processor architectures, by quickly identifying if the computation is computed or memory-bound showing inherent hardware limitations and potential benefits and priority of optimizations.

The Roofline Model is a two-dimensional plot with attainable performance (FLOPs/Cycle or FLOps/sec) in the Y-axis and arithmetic or numeric intensity (Flops/Byte) in the X-axis. The works W are expressed as FLOPs, and the memory traffic Q denotes the number of bytes of memory transfers incurred during the execution of the kernel or application. The arithmetic intensity I is the ratio (FLOPs/byte) of the work W of FLOPs operations to the memory traffic Q total data movement in bytes.