|
|||||||||||
|
Source
Code
|
|
|
|
Henry's github page | |
|||||
@ The UTEP Department of Computer Science |
|
|
|||||||||
Name |
Source Code |
Desccription |
|||||||||
LAPACK and BLAS in C | Lapack and Blas Exponential
matrix C code LAPACK Presentation |
The programs compute the
eigenvalues and left and right eigenvectors of a
symmetric and general rectangular matrix A, by solving a
linear system of the form. |
|||||||||
CSPARSE (C library for SPARSE linear systems) | SVL_FFTW_CSPARSE |
FFTW-CSPARSE compute the SVL of a 2D unit cell device. The SVL code can be executed on a personal computer if CSPARSE and FFTW are properly installed. | |||||||||
The Fastest Fourier Transform in the west (FFTW) | Desktop version: 2D FFTW-PETSc SVL Code 3D FFTW-PETSC SVL Code |
FFTW-PETSc compute the
Fourier transform of a 2D and 3D unit cell device. The SVL
code can be executed on a personal computer if PETSC and FFTW are properly
installed. For more detail on the code implementation, please review the chapter my MS and Ph.D. thesis at the following links: MS-CPS Thesis: See chapter 3 for more details on the code PhD-CPS Thesis: See chapter 2 for more details on the code |
|||||||||
|
Spatially Variant Lattice Algorithm (SVL) with FFTW and PETSC | Desktop version: DESKTOP_SVL_3D_PETSC |
We wrote a portable computer program
for parallel architectures with general purpose
programming language that supports structured
programming. For the parallel code, we use the FFTW (Fastest Fourier
Transform in the West) for handling the Fourier
transform of the unit cell device and PETSc
(Portable, Extensible Toolkit for Scientific
Computation) for handling the numerical linear algebra
operations. Using Message Passing Interface (MPI) for
distributed memory helps us to improve the performance
of the code that generates 2D and 3D SVL when it is
executed on a parallel system. The SVL code was executed
the on Stampede 2 supercomputers at the Texas Advanced
Computing Center (TACC), the
University of Texas at Austin.
The SVL code can be executed on a personal computer if PETSC and FFTW are properly installed. Also, we are showing the code we execute on Stampede 2 on KNL and SKX architectures For more detail on the code implementation, please review the chapter my MS and Ph.D. thesis at the following links: MS-CPS Thesis: See chapter 3 for more details on the code PhD-CPS Thesis: See chapter 2 for more details on the cod |
||||||||
Bioinformatics using Python | Makefile and python code : Trimommatic_code |
Trimmomatic
is a lightweight and multithreaded command line java tool
application that can be used to trim and crop Illumina
FASTQ data as well as to remove Illumina adapter sequences
and low quality reads. It can be used to performs a
variety of useful trimming tasks for illumina paired-end
and single end mode. It works with FASTQ files (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used). Files compressed such as gzip or bzip2 are supported, and are identified by use of .gz or .bz2 file extensions. |
|||||||||
Bioinformatics using HTCondor | python code : HTCondor_code |
HTCondor is an open source job scheduler system develop by the University of Wisconsin that specialized in workload management for compute-intensive that supports High Throughput Computing (HTC). HTCondor provides a set of submission file commands that allow jobs to be submitted through various operating systems such as Linux, Unix, Mac OS, and Windows. It can be download. The following is a Bioinformatic project to speed up BLAST performance using HTcondor scheduler system. The project uses the following server computers to set up the pool computer available. | |||||||||
Introduction to Parallel Programing with Message Passing Interface (MPI) | C Code: MPI Basic code in C |
MPI establish a portable, efficient, and flexible standard for message passing that will be used for writing message-passing programs and communication among processes, which have separate address spaces. A process is a program counter and address space. Each MPI process has its own global variables, environment, and does not need to be thread-safe. A process is a copy of the program executed on every computer (node) in the network (cluster) where it is launched. | |||||||||
Introduction to Parallel Programing with Share Memory Open Multi-Processing (OpenMP) | C Code: OPENMP_Basic in C |
OpenMP is a parallel programming model for shared memory and distributed shared-memory multiprocessors. OpenMP uses the fork-join model of parallel execution and provides three kinds of directives: parallelism/work sharing, data environment, and synchronization. | |||||||||
Introduction to Parallel Programing with Message Passing Interface (MPI) with Share Memory Open Multi-Processing (OpenMP) | C code: Hybrid MPI+OpenMP C++ and Fortran are not include yet. |
MPI is a
standardized library (not a language) for the
collection of processes communicating via messages
passing. OpenMP is an API
for shared-memory programming that includes compiler
directives, library routines, and environment
variables.
|
|||||||||
SQL with C/C++ and Python | C/ C++, Python Code | SQL is a standard language
for communicating with data in databases. SQL statements
are used to perform tasks such as storing, accessing
manipulating and updating data, on a database. Here you
will find the code I use to learn SQLITE3. |
|||||||||
MPAS Performance |
Python
MPAS_performance_code |
This is a control version
repository for the development and sharing (public, no
available). The python code search, read and extract the
information from the MPAS test case outputs. MPAS was
executed on different HPC clusters to collect data. The
outputs files have information about the layer subroutine
performance. |
|||||||||
Roofline model |
Python & C++ code |
The Roofline Model is a
visual performance model used to provide the peak
performance that we might expect of a given compute kernel
or application running on CPU multi-core, or GPU
accelerator processor architectures, by quickly
identifying if the computation is computed or memory-bound
showing inherent hardware limitations and potential
benefits and priority of optimizations. The Roofline Model is a two-dimensional plot with attainable performance (FLOPs/Cycle or FLOps/sec) in the Y-axis and arithmetic or numeric intensity (Flops/Byte) in the X-axis. The works W are expressed as FLOPs, and the memory traffic Q denotes the number of bytes of memory transfers incurred during the execution of the kernel or application. The arithmetic intensity I is the ratio (FLOPs/byte) of the work W of FLOPs operations to the memory traffic Q total data movement in bytes. |
|||||||||
|
|||||||||||