<link>https://alejandroarmas.github.io/</link><description>Recent content on</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Mon, 08 Jan 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://alejandroarmas.github.io/index.xml" rel="self" type="application/rss+xml"/><item><title>What are Transformers - Understanding the Architecture End-to-Endhttps://alejandroarmas.github.io/post/transformer/Mon, 08 Jan 2024 00:00:00 +0000https://alejandroarmas.github.io/post/transformer/Alejandro Armas and Sachin Loechler have been hard at work on a project that involves developing streaming workloads.
The project has the goal to process realtime data and support real-time traffic prediction! In order to make sense of the enormous quanity of unstructured video data, we employed Foundational models that perform video tracking, bounding box, depth estimation and segmentation to extract information from video data. Many of these foundational models relied on an artificial Neural Network Architecture called a Transformer.Authenticating Data for Experimentation Environmenthttps://alejandroarmas.github.io/post/iam_management/Mon, 30 Oct 2023 00:00:00 +0000https://alejandroarmas.github.io/post/iam_management/Identity Access Management (IAM) In this post, we will explore how to leverage Terraform, a popular Infrastructure as Code (IaC) tool, to automate the setup and management of AWS IAM. We will walk through creating IAM roles, policies, and users, and demonstrating how to attach policies to these entities.
Teammate Usage Figure 2: IAM User Interaction.
There are two AWS accounts. One is for an admin.
You have been assigned an IAM user account.Driving a Data Product - Uncovering Insights and Laying out Assumptions with Exploratory Data Analysishttps://alejandroarmas.github.io/post/driving_data_product/Mon, 30 Oct 2023 00:00:00 +0000https://alejandroarmas.github.io/post/driving_data_product/Alejandro Armas and Sachin Loechler have been hard at work on a project that involves developing streaming workloads.
The project has the goal to process realtime data and support real-time traffic prediction! However, before I was able to begin with that. I had to demonstrate the viability of this initative. It was critical to communicate and achieve consensus on my understanding of the data with the team. In addition to learning about the data, I testing hypothesis I had and laying out the assumptions I had.Enabling a Reproducible Data Experimentation Environmenthttps://alejandroarmas.github.io/post/reproducible_experiment_environments/Mon, 30 Oct 2023 00:00:00 +0000https://alejandroarmas.github.io/post/reproducible_experiment_environments/This post is going to detail how I enabled reproducible environments
Reproducible Docker Builds Optimizing Data Transfers Enabled reproducible data analysis by leveraging DVC to capture data lineage, provisioning both object storage and IAM policies using Terraform for secure access, optimizing network transfers by 35x and packaging notebooks via Docker
In this figure, the developer utilizes four main command line tools: DVC, Git, Poetry and Docker. DVC is configured to pull and push dataset artifacts onto the DVC repository.Getting Started with PyFlink: My Local Development Experiencehttps://alejandroarmas.github.io/post/pyflink_debugging/Mon, 30 Oct 2023 00:00:00 +0000https://alejandroarmas.github.io/post/pyflink_debugging/Background A hobby project I am working on, involves developing streaming workloads. We want to process realtime data and support traffic prediction! Often at the start of tool adoption and especially when working in a multi-tool ecosystem, I was finding myself at a familiar roadblock:
As the engineer responsible for creating the streaming workloads, I was having a hard time weighing the tradeoffs in what language to use for our data pipeline’s tooling.Winning 3rd place at MLOPS LLM Hackathon: Question & Answer for MLOps Systemhttps://alejandroarmas.github.io/post/sf-llm-stack-hackathon/Wed, 07 Jun 2023 00:00:00 +0000https://alejandroarmas.github.io/post/sf-llm-stack-hackathon/This post describes the experience of team RedisCovering LLMs, as we developed a Question & Answer system specialized on MLOps community slack discussions, armed with GPT-3.5 for precise answers and verifiable references to slack threads, guarding against misinformation.
1. Introduction Last weekend, I had the opportunity to participate in a 12-hour hackathon organized by the San Francisco Bay Area MLOps Community. It was my third hackathon experience, and the first one I attended through the MLOps Community.Unveiling Dimensionality Reduction - A Beginner's Guide to Principal Component Analysishttps://alejandroarmas.github.io/post/2023-05-18-pca/Thu, 18 May 2023 00:00:00 +0000https://alejandroarmas.github.io/post/2023-05-18-pca/Introduction Imagine for a second you were transplanted into Olvera Street in LA. It’s a Tuesday, but today is a little different. Theres a spark in the air. You’re not quite sure what to make of it, but you know that today, something great is going to happen. You walk around aimlessly for awhile, until your mind begins to get distracted by this huge sense of hunger. “Dang – if only I could have some tacos”, you think to yourself.What is the Difference Between Covariance and Correlation?https://alejandroarmas.github.io/post/correlation/Fri, 05 May 2023 00:00:00 +0000https://alejandroarmas.github.io/post/correlation/Working with data will almost always begin with a data exploration phase. We listen to its heartbeat and ask lots of questions. As we begin this phase, one might ask themselves ‘what are the tools we can leverage?’. What do we do to define a linear measure of a relationship between two random variables? In other words, how do we measure the amount of ‘increasing X increases Y’-ness, or ‘- decreasing Y’-ness and vice versa in a joint probability distribution?Unlocking the Power of Joint Distributions - How to Analyze Multiple Random Variableshttps://alejandroarmas.github.io/post/joint_distributions/Wed, 26 Apr 2023 00:00:00 +0000https://alejandroarmas.github.io/post/joint_distributions/The concept of joint distribution is useful when studying the outcomes and effects of multiple random variables in statistics. Joint distribution allows generalizing probability theory to the multivariate case. Let me paint a story for you.
Joint Distributions Today, the weather is nice. Its a fresh summer morning. You’re out at a restaurant having breakfast with your in-laws and you want to impress. You’re such a nice person, you think to yourself.Breaking Down Virtual Memory: The Role of Paging in Modern Operating Systemshttps://alejandroarmas.github.io/post/virtual_memory/Fri, 21 May 2021 00:00:00 +0000https://alejandroarmas.github.io/post/virtual_memory/Introduction Have you ever wondered why 32-bit and 64-bit get thrown around and not know what it meant? So too did I. Well the simple answer is that these refer to the amount of memory addressable to a program or more accurately, the computer architectures bit width i.r.t registers and address busses.
Now let’s see how much this amounts to: \(2^{32} = 4,294,967,296\) Bytes or more succincly 4GiB. In modern days we are able to address \(2^{64} = 18,446,744,073,709,551,616\) Bytes or 16.Mastering Concurrency: A Comprehensive C++ Guide to Processes and Threadshttps://alejandroarmas.github.io/post/concurrency/Thu, 29 Apr 2021 00:00:00 +0000https://alejandroarmas.github.io/post/concurrency/1. Introduction First lets begin by defining a piece of system software called the Operating System (OS), which is responsible for orchastrating the sophisticated resource management of a given machine’s hardware as well as providing an abstracted interface for software to be built above.
At the time of me writing this article, I have a web browser open, my spotify playlist on, as well as my VS code editor and a terminal open.Quick Primer on Metric Spaceshttps://alejandroarmas.github.io/post/metric_space/Thu, 18 Jun 2020 00:00:00 +0000https://alejandroarmas.github.io/post/metric_space/Vector Space A vector space is set of mathemetical objects that can be multiplied and added together to produce objects of the same kind. This notion of vector spaces proves to be a very useful framework for extending methods and structures to very different types of problems. A few special types of vector spaces you may be already be familiar with:
Function Spaces We can add functions together and scale them as well.Primer on Matrix Multiplicationhttps://alejandroarmas.github.io/post/matrix_multiplication/Wed, 27 May 2020 00:00:00 +0000https://alejandroarmas.github.io/post/matrix_multiplication/Introduction Remember the good ol days when 6 x 5 easily made sense as adding 6 together with itself 5 times and whala you ended up with 30. Now you’re in college and things are hard ðŸ˜ Hopefully running through an example can give you a bit of a glimpse as to how and why we do matrix multiplication.
How to Compute Matrix Multiplication Consider two Matricies A and B. We denote the dimensions by row and columns, in that order.Combinationshttps://alejandroarmas.github.io/post/combinations/Sun, 03 May 2020 00:00:00 +0000https://alejandroarmas.github.io/post/combinations/Since the concept of “n choose k” seems to appear a lot in my life I decided I would make a quick post explaining the intuition behind it. Let’s start with a simple example.
Say we had a set of three greek characters representing the names of three friends, \( F = \{ \alpha, \beta, \gamma \}\) and we are interested in knowing how many uniquely paired matches could be played between two competitors of the friends in table tennis.Mathematics Meets Signal Processing: Exploring the Convolution Integralhttps://alejandroarmas.github.io/post/convolution/Wed, 22 Apr 2020 00:00:00 +0000https://alejandroarmas.github.io/post/convolution/Introduction Since signals are sets of data or information and systems process said data, we are interested in the analysis of systems. When we deal with a special type of system that contains the properties of linearity and time-invariance, we are able to construct methods of analysis that are extremely useful for Linear Time-invariant (LTI) systems. Fourier analysis, which will be a seperate blog post, and the convolution integral are examples of exploiting system properties to decompose inputs into basic signals which are easy to work with analytically.Random Variables and Distributionshttps://alejandroarmas.github.io/post/distributions/Tue, 21 Apr 2020 00:00:00 +0000https://alejandroarmas.github.io/post/distributions/I hope this article serves as a basic introduction to the terminology of probability theory!
Random Variables Considering that an experiment is a procedure that produces well defined outcomes, like taking a course and finishing with a certain grade letter, we see that a random variable is a function which maps random outcomes from experiments to numerical values \(X : \Omega \to R \) . The set of all possible numerical values attainable is called the support of the random variable.