Monday, November 10, 2025

Bootstrap Module 9

  Module 9 : Rows and Columns Explained. 

1. what are rows and columns?

Row: a horizontal sequence of items. In a table, one row usually represents one record, observation, or vector.

Column: a vertical sequence of items. A column usually represents one feature, variable, or attribute of each record.

Key conceptual differences:



Rows ≈ instances/records (e.g., one user, one sample).

Columns ≈ features/attributes (e.g., age, name, temperature).

Contexts where rows/columns appear:

Matrix / Linear algebra — rows and columns are numeric arrays; linear operations often treat rows as vectors or columns as vectors.

Spreadsheet (Excel/Google Sheets) — grid of cells; rows often indexed by numbers, columns by letters.

Relational databases (SQL) — tables; rows are tuples, columns are attributes.

DataFrames (Pandas, R) — hybrid: labelled rows and columns with built-in operations.

HTML — <tr> is a table row, <td>/<th> are cells (columns).


2. Fundamental operations & methods (concepts)

2.1 In matrices (math)

Row vector: 1 × n matrix.

Column vector: n × 1 matrix.

Transpose Aᵀ: flips rows and columns.

Row operations: swap rows, multiply row by scalar, add scalar multiple of one row to another (used in Gaussian elimination).

Column operations: analogous to row operations but on columns.

Row space / Column space: linear span of row vectors / column vectors.

Rank: dimension of row space = dimension of column space.

Why it matters: linear independence, solving systems Ax = b (use rows/columns view to understand elimination vs decomposition).

2.2 In dataframes / spreadsheets

Select a column — get a vector of values (feature extraction).

Select a row — get a record (instance view).

Filtering (row-wise) — select rows matching a condition.

Aggregation (column-wise) — compute mean, sum, count over a column.

Add/remove columns — create derived features (e.g., bmi = weight / height^2).

Pivot / melt (reshape) — convert rows ↔ columns (long vs wide formats).

Group-by (rows grouped by column values) — compute aggregated statistics per group.

Why it matters: data cleaning, feature engineering, analysis.

2.3 In SQL

SELECT columns FROM table — column projection.

WHERE (row filter) — choose rows satisfying predicates.

GROUP BY column(s) — treat rows grouped by values in specified columns for aggregation.

JOINs — combine rows from multiple tables based on column equality (foreign keys).


Indexes — speed row lookup by values of a column.

Why it matters: efficient querying; data integrity and normalization.

2.4 In HTML/CSS

<table>, <tr>, <td>, <th> define rows and cells (columns visually).

Use CSS to style columns (e.g., nth-child selectors) or use <colgroup> for column-level attributes.


3. Concrete code examples

All examples use small, runnable snippets. Read each snippet top-to-bottom.

3.1 Python — numeric matrices with numpy

import numpy as np # Create a 3x4 matrix A = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9,10,11,12]]) # A.shape -> (3,4) # Select second row (index 1) row2 = A[1, :] # -> array([5,6,7,8]) # Select third column (index 2) col3 = A[:, 2] # -> array([3,7,11]) # Transpose AT = A.T # shape (4,3) # Row operations - swap row 0 and row 2 (in-place) A[[0,2], :] = A[[2,0], :] # Row reduction: get row echelon via numpy.linalg (useful: but we show a simple elimination step) # Multiply row 0 by 0.5 (scalar multiplication) A[0, :] = 0.5 * A[0, :]

Explanation: A[i, :] picks row i; A[:, j] picks column j. Transpose swaps rows and columns; many linear algebra algorithms pivot on row operations.

3.2 Python — tabular with pandas

import pandas as pd df = pd.DataFrame({ 'id': [101, 102, 103, 104], 'age': [34, 23, 45, 31], 'score': [88, 92, 75, 85] }) # Select column(s) ages = df['age'] # Series (column) sub = df[['id', 'score']] # DataFrame with 2 columns # Select row by index row0 = df.iloc[0] # first row # Filter rows high = df[df['score'] >= 85] # rows with score >= 85 # Add a column (derived) df['age_group'] = pd.cut(df['age'], bins=[0,30,50], labels=['young','mid']) # Group by a column (rows grouped by age_group) and aggregate over a column agg = df.groupby('age_group')['score'].mean().reset_index()

DataFrames treat columns as named Series (vectors). Row selection can be by position (iloc) or label (loc) if you set an index.

3.3 SQL — table operations (pseudo-SQL)

-- Table: students(id INTEGER PRIMARY KEY, name TEXT, age INTEGER, score INTEGER) -- Project columns SELECT name, score FROM students; -- Filter rows SELECT * FROM students WHERE age BETWEEN 18 AND 25; -- Group (rows grouped by age bucket) + aggregate (column-wise) SELECT age, AVG(score) AS avg_score FROM students GROUP BY age ORDER BY age; -- Join example (rows from students and classes matched by student_id) SELECT s.name, c.class_name FROM students s JOIN enrollments e ON s.id = e.student_id JOIN classes c ON e.class_id = c.id;

 SELECT chooses columns; WHERE filters rows; GROUP BY compresses many rows into aggregated rows using column values.

3.4 HTML simple table

<table> <thead> <tr><th>ID</th><th>Name</th><th>Score</th></tr> </thead> <tbody> <tr><td>101</td><td>Alice</td><td>88</td></tr> <tr><td>102</td><td>Bob</td><td>92</td></tr> </tbody> </table>

 Each <tr> is a row; table columns implied by <td> order.


4.  Exercises 


Exercise 1 — Matrix row/column basics

Given matrix

M = [[2, 4, 6],

     [1, 3, 5],

     [0, 7, 8]]



a) Extract the second row and third column.

b) Compute Mᵀ and show the element at position (1,2) in the transpose.

Answer

a) second row = [1,3,5], third column = [6,5,8].

b) transpose:

Mᵀ = [[2,1,0],

      [4,3,7],

      [6,5,8]]



Element at (1,2) (0-based indexing) = Mᵀ[1,2] = 7.

Why: Transpose swaps indices: Mᵀ[i,j] = M[j,i].


Exercise 2 — DataFrame filtering & aggregation




Using the df from 3.2, compute:

average score for each age_group

list of ids for students with score > 80

Solution (pandas)

df.groupby('age_group')['score'].mean() df.loc[df['score']>80, 'id'].tolist()

groupby collapses row sets into aggregated rows keyed by the column; loc filters rows then projects id column.


Exercise 3 — SQL row vs column (conceptual)

Which is typically faster: selecting a single column (projection) or full row scan when using a columnar-store database? Why?

In a columnar store, selecting a single column is faster because the engine reads only the column’s storage. In a row-store, reading many columns from many rows is efficient when you need full rows; reading a single column still requires scanning row structures unless there's a projection/columnar layout or index.


5. Row operations and solving linear systems (Gaussian elimination)

Solve Ax = b by row-reduction and verify with numpy.linalg.solve.

Materials: Python, numpy.

Procedure

Create a random invertible 3×3 matrix A and vector b.

Use Gaussian elimination (implement partial pivoting) to convert A|b to row-echelon form and back-substitute to get x.

Compare with numpy.linalg.solve(A,b).

Expected result: The two x vectors should be numerically close (within floating-point tolerance).

 Gaussian elimination relies on row operations (swap, scale, add) to eliminate variables. Partial pivoting reduces numerical error.


2 Column operations and PCA (feature axes)

Use columns as features to compute PCA (principal components) and show how columns are rotated (linear combination of original columns).

Materials: Python, numpy / scikit-learn (or implement SVD).

Procedure



Load or simulate a dataset with 4 columns (features).

Standardize each column (column-wise mean subtraction, divide by std).

Compute covariance matrix (columns determine covariance matrix).

Compute eigenvectors (principal axes) — these are linear combos of original columns.

Project rows (observations) onto the first two principal components and plot.

Expected result: The PCA axes align with directions of greatest column variance; projecting rows shows clusters if present.

PCA treats columns as coordinate axes; operations are performed on column covariance.


3 Data query performance (SQL index)

See how adding an index on a column speeds up row lookups.

Materials: SQLite or PostgreSQL, table with ~100k rows.

Procedure

Create table users(id serial primary key, email text, age integer).

Insert 100k synthetic rows.

Run timed query: SELECT * FROM users WHERE email = 'some@example.com' — record time.

Create index on email: CREATE INDEX idx_email ON users(email);

Repeat query and compare times.

Expected result: Query time after index will be much smaller for selective lookups.

 Indexes allow locating rows by column value without scanning the entire table.


6. principles and pitfalls

6.1 Row-major vs column-major memory layout

Row-major (C, numpy default): consecutive elements of a row are contiguous in memory.

Column-major (Fortran, R by default): consecutive elements of a column are contiguous.

Implication: Access patterns matching memory layout are faster due to cache locality. For example, iterating across columns inside a loop for a row-major array is slower.

6.2 Mutability & views vs copies

In libraries like NumPy, selecting a column may return a view (shared memory) or a copy depending on operation; be aware when mutating.

Pandas Series are views sometimes but often copies — use .loc carefully for assignment to avoid SettingWithCopyWarning.

6.3 Wide vs long (rows vs columns) data shapes

Wide: many columns (features). Pros: fast to index by row, easier to compute per-row operations. Cons: can be sparse, memory heavy.

Long: fewer columns, more rows (observations). Pros: easier to group, plot, and perform statistical operations. Many libraries prefer a tidy (long) format.

6.4 Database normalization (columns design)

Columns should be atomic and represent single attributes (1NF). Avoid repeating groups of columns (e.g., phone1, phone2) — use child tables (rows in another table).

Normalization reduces redundancy but can increase joint complexity.


7. Exercises 

Matrix rank problem: 3×3 matrix, determine if rows are linearly independent.using row-reduction.

Pandas reshape: Convert a wide table of quarterly sales columns Q1, Q2, Q3, Q4 into long format with columns quarter and sales. (Implement with melt.)



SQL aggregation:  query to return top 3 categories by average rating from a products table.

Performance profiling: Measure time to compute column means of a 2D NumPy array using loops vs vectorized operations. Report results and differences.

should include code and a short explanation of why results make sense.


8. Research

Relational model — foundational concept of rows as tuples and columns as attributes (see original works on relational algebra / Codd for deep theoretical background).

Linear Algebra — for row-space/column-space/rank, use a standard linear algebra textbook (e.g., Gilbert Strang’s introductions are approachable).

Database internals — chapters on indexing, row/column stores, and storage engines give insight into performance tradeoffs.

DataFrame internals — read pandas documentation on indexing, view vs copy semantics, and chunked memory layout.

Numerical computing — papers/articles on cache locality, row-major vs column-major, and vectorization for performance.


9. checklist

Prefer vectorized operations (NumPy/Pandas) over explicit Python loops — they operate column-wise much faster.

Mind memory layout: when doing heavy numeric work, choose array layout to match your access pattern (row-major vs column-major).

Indexing for DBs: create indexes on columns used in WHERE/JOIN/GROUP BY frequently; avoid excessive indexing.

When to use row vs column stores:

OLTP (transactional): row-store is typical (read/write whole records).

OLAP (analytical): column-store excels (aggregations over columns).

Be explicit about dtypes in pandas and SQL to save memory (e.g., use category dtype for low-cardinality columns).

Avoid nullable or mixed-type columns when possible — they reduce performance and complicate downstream operations.


10. (integrative)


 Build a small ETL pipeline that loads CSV sensor data, cleans it, reshapes wide→long, computes per-sensor statistics (column-wise), and stores normalized results in a SQL table.

 steps:

Read CSV and DataFrame.

Inspect columns and rows (df.shape, df.head()).

Clean: drop rows with too many NaNs (row-level filtering); fill missing values per-column with column-wise strategy.

Reshape melt if sensors are wide.

Compute per-sensor aggregates (mean, std, min, max) — column-wise.

Create SQL tables and insert aggregated rows (each row = sensor stats).

Create an index on sensor_id.

This project row filtering, column aggregation, reshape (rows↔columns), and storing aggregated rows into a database.


11. Quick reference 




Access row i in NumPy: A[i, :]. Column j: A[:, j].

Access row/col in pandas: df.iloc[i] (row), df['col'] (column).

SQL: SELECT col FROM table WHERE condition (columns = projection, WHERE filters rows).

Transpose swaps rows & columns: A.T.


No comments:

Post a Comment

How Developers Create Responsive Bootstrap Popup Modals in Modern Websites

  Bootstrap Modal Tutorial with Responsive Design and JavaScript Validation Create Responsive Popup Modal in Bootstrap with Step-by-Step Exp...