| CARVIEW |
Select Language
HTTP/2 200
cache-control: max-age=600
content-type: application/xml
etag: "63b770634f5220c2c3f7a017dca551215c8eac745d09dc3e8223dd97ce7dcced"
expires: Sat, 27 Dec 2025 08:35:36 UTC
last-modified: Mon, 22 Dec 2025 02:05:03 GMT
permissions-policy: interest-cohort=()
vary: Origin
x-request-id: 01KDFEKGC4K1X54J87C3GZ8V3Z
content-length: 828406
date: Sat, 27 Dec 2025 08:25:36 GMT
Optimized by Otto https://optimizedbyotto.com/Recent content on Optimized by Otto Hugo -- gohugo.io en-us Otto Kekäläinen Fri, 19 Dec 2025 00:00:00 +0000 Backtesting trailing stop-loss strategies with Python and market data https://optimizedbyotto.com/post/backtest-stop-loss-strategy-python/Fri, 19 Dec 2025 00:00:00 +0000 https://optimizedbyotto.com/post/backtest-stop-loss-strategy-python/ <img src="https://optimizedbyotto.com/post/backtest-stop-loss-strategy-python/featured-image.png" alt="Featured image of post Backtesting trailing stop-loss strategies with Python and market data" /><p>In <a class="link" href="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/" >January 2024 I wrote</a> about the insanity of the <em>Magnificent Seven</em> dominating the MSCI World Index, and I wondered how long the number can continue to go up? It has continued to surge upward at an accelerating pace, which makes me worry that a crash is likely closer. As a software professional, I decided to analyze <strong>whether using stop-loss orders could reliably automate avoiding deep drawdowns</strong>.</p>
<p>As everyone with some savings in the stock market (hopefully) knows, the stock market eventually experiences crashes. It is just a matter of <em>when</em> and <em>how deep</em> the crash will be. Staying on the sidelines for years is not a good investment strategy, as inflation will erode the value of your savings. Assuming the current true inflation rate is around 7%, a restaurant dinner that costs 20 euros today will cost 24.50 euros in three years. Savings of 1000 euros today would drop in purchasing power from 50 dinners to only 40 dinners in three years.</p>
<p>Hence, if you intend to retain the value of your hard-earned savings, they need to be invested in something that grows in value. Most people try to beat inflation by buying shares in stable companies, directly or via broad market ETFs. These historically <strong>grow faster than inflation</strong> during normal years, <strong>but likely drop in value during recessions</strong>.</p>
<h2 id="what-is-a-trailing-stop-loss-order"><a href="#what-is-a-trailing-stop-loss-order" class="header-anchor"></a>What is a trailing stop-loss order?
</h2><p>What if you could buy stocks to benefit from their value increasing without having to worry about a potential crash? All modern online stock brokers have a feature called stop-loss, where you can enter a price at which your stocks automatically get sold if they drop down to that price. A trailing stop-loss order is similar, but instead of a fixed price, you enter a margin (e.g. 10%). If the stock price rises, the stop-loss price will trail upwards by that margin.</p>
<p>For example, if you buy a share at 100 euros and it has risen to 110 euros, you can set a 10% trailing stop-loss order which automatically sells it if the price drops 10% from the peak of 110 euros, at 99 euros. Thus, no matter what happens, you only lost 1 euro. And if the stock price continues to rise to 150 euros, the trailing stop-loss would automatically readjust to 150 euros minus 10%, which is 135 euros (150-15=135). If the price dropped to 135 euros, you would lock in a gain of 35 euros, which is not the peak price of 150 euros, but still better than whatever the price fell down to as a result of a large crash.</p>
<p>In the simple case above, it obviously makes sense in <em>theory</em>, but it might not make sense in <em>practice</em>. Prices constantly oscillate, so you don’t want a margin that is too small, otherwise you exit too early. Conversely, having a large margin may result in too large a drawdown before exiting. If markets crash rapidly, it might be that nobody buys your stocks at the stop-loss price, and shares have to be sold at an even lower price. Also, what will you do once the position is sold? The reason you invested in the stock market was to avoid holding cash, so would you buy the same stock back when the crash bottoms? But how will you know when the bottom has been reached?</p>
<h2 id="backtesting-stock-market-strategies-with-python-yfinance-pandas-and-lightweight-charts"><a href="#backtesting-stock-market-strategies-with-python-yfinance-pandas-and-lightweight-charts" class="header-anchor"></a>Backtesting stock market strategies with Python, YFinance, Pandas and Lightweight Charts
</h2><p>I am not a professional investor, and nobody should take investment advice from me. However, I know what <a class="link" href="https://en.wikipedia.org/wiki/Backtesting" target="_blank" rel="noopener"
>backtesting</a> is and how to leverage open source software. So, I wrote a Python script to test if the <a class="link" href="https://en.wikipedia.org/wiki/Trading_strategy" target="_blank" rel="noopener"
>trading strategy</a> of using trailing stop-loss orders with specific margin values would have worked for a particular stock.</p>
<p><strong>First you need to have data.</strong> <a class="link" href="https://github.com/ranaroussi/yfinance" target="_blank" rel="noopener"
>YFinance</a> is a handy Python library that can be used to download the historic price data for any stock ticker on <a class="link" href="https://Yahoo.com" target="_blank" rel="noopener"
>Yahoo.com</a>. <strong>Then you need to manipulate the data.</strong> <a class="link" href="https://github.com/pandas-dev/pandas" target="_blank" rel="noopener"
>Pandas</a> is <em>the</em> Python data analysis library with advanced data structures for working with relational or labeled data. <strong>Finally, to visualize the results</strong>, I used <a class="link" href="https://github.com/tradingview/lightweight-charts" target="_blank" rel="noopener"
>Lightweight Charts</a>, which is a fast, interactive library for rendering financial charts, allowing you to plot the stock price, the trailing stop-loss line, and the points where trades would have occurred. I really like how the zoom is implemented in Lightweight Charts, which makes drilling into the data points feel effortless.</p>
<p>The full solution is not polished enough to be published for others to use, but you can piece together your own by reusing some of the key snippets. To avoid re-downloading the same data repeatedly, I implemented a small caching wrapper that saves the data locally (as <a class="link" href="https://en.wikipedia.org/wiki/Apache_Parquet" target="_blank" rel="noopener"
>Parquet</a> files):</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">python</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">CACHE_DIR.mkdir(parents=True, exist_ok=True)
end_date = datetime.today().strftime("%Y-%m-%d")
cache_file = CACHE_DIR / f"{TICKER}-{START_DATE}--{end_date}.parquet"
if cache_file.is_file():
dataframe = pandas.read_parquet(cache_file)
print(f"Loaded price data from cache: {cache_file}")
else:
dataframe = yfinance.download(
TICKER,
start=START_DATE,
end=end_date,
progress=False,
auto_adjust=False
)
dataframe.to_parquet(cache_file)
print(f"Fetched new price data from Yahoo Finance and cached to: {cache_file}")</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>CACHE_DIR<span style="color:#f92672">.</span>mkdir(parents<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>, exist_ok<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>)
</span></span><span style="display:flex;"><span>end_date <span style="color:#f92672">=</span> datetime<span style="color:#f92672">.</span>today()<span style="color:#f92672">.</span>strftime(<span style="color:#e6db74">"%Y-%m-</span><span style="color:#e6db74">%d</span><span style="color:#e6db74">"</span>)
</span></span><span style="display:flex;"><span>cache_file <span style="color:#f92672">=</span> CACHE_DIR <span style="color:#f92672">/</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">"</span><span style="color:#e6db74">{</span>TICKER<span style="color:#e6db74">}</span><span style="color:#e6db74">-</span><span style="color:#e6db74">{</span>START_DATE<span style="color:#e6db74">}</span><span style="color:#e6db74">--</span><span style="color:#e6db74">{</span>end_date<span style="color:#e6db74">}</span><span style="color:#e6db74">.parquet"</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> cache_file<span style="color:#f92672">.</span>is_file():
</span></span><span style="display:flex;"><span> dataframe <span style="color:#f92672">=</span> pandas<span style="color:#f92672">.</span>read_parquet(cache_file)
</span></span><span style="display:flex;"><span> print(<span style="color:#e6db74">f</span><span style="color:#e6db74">"Loaded price data from cache: </span><span style="color:#e6db74">{</span>cache_file<span style="color:#e6db74">}</span><span style="color:#e6db74">"</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span> dataframe <span style="color:#f92672">=</span> yfinance<span style="color:#f92672">.</span>download(
</span></span><span style="display:flex;"><span> TICKER,
</span></span><span style="display:flex;"><span> start<span style="color:#f92672">=</span>START_DATE,
</span></span><span style="display:flex;"><span> end<span style="color:#f92672">=</span>end_date,
</span></span><span style="display:flex;"><span> progress<span style="color:#f92672">=</span><span style="color:#66d9ef">False</span>,
</span></span><span style="display:flex;"><span> auto_adjust<span style="color:#f92672">=</span><span style="color:#66d9ef">False</span>
</span></span><span style="display:flex;"><span> )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> dataframe<span style="color:#f92672">.</span>to_parquet(cache_file)
</span></span><span style="display:flex;"><span> print(<span style="color:#e6db74">f</span><span style="color:#e6db74">"Fetched new price data from Yahoo Finance and cached to: </span><span style="color:#e6db74">{</span>cache_file<span style="color:#e6db74">}</span><span style="color:#e6db74">"</span>)</span></span></code></pre></div></div></div>
<p>The <strong>dataframe</strong> is a Pandas object with a <a class="link" href="https://pandas.pydata.org/docs/reference" target="_blank" rel="noopener"
>powerful API</a>. For example, to print a snippet from the beginning and the end of the dataframe to see what the data looks like, you can use:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">python</span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">print("First 5 rows of the raw data:")
print(df.head())
print("Last 5 rows of the raw data:")
print(df.tail())</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>print(<span style="color:#e6db74">"First 5 rows of the raw data:"</span>)
</span></span><span style="display:flex;"><span>print(df<span style="color:#f92672">.</span>head())
</span></span><span style="display:flex;"><span>print(<span style="color:#e6db74">"Last 5 rows of the raw data:"</span>)
</span></span><span style="display:flex;"><span>print(df<span style="color:#f92672">.</span>tail())</span></span></code></pre></div></div></div>
<p>Example output:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">First 5 rows of the raw data
Price Adj Close Close High Low Open Volume
Ticker BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA
Date
2014-01-02 29.956285 55.540001 56.910000 55.349998 56.700001 316552
2014-01-03 30.031801 55.680000 55.990002 55.290001 55.580002 210044
2014-01-06 30.080338 55.770000 56.230000 55.529999 55.560001 185142
2014-01-07 30.943321 57.369999 57.619999 55.790001 55.880001 370397
2014-01-08 31.385597 58.189999 59.209999 57.750000 57.790001 489940
Last 5 rows of the raw data
Price Adj Close Close High Low Open Volume
Ticker BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA
Date
2025-12-11 78.669998 78.669998 78.919998 76.900002 76.919998 357918
2025-12-12 78.089996 78.089996 80.269997 78.089996 79.470001 280477
2025-12-15 79.080002 79.080002 79.449997 78.559998 78.559998 233852
2025-12-16 78.860001 78.860001 79.980003 78.809998 79.430000 283057
2025-12-17 80.080002 80.080002 80.150002 79.080002 79.199997 262818</code><pre><code>First 5 rows of the raw data
Price Adj Close Close High Low Open Volume
Ticker BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA
Date
2014-01-02 29.956285 55.540001 56.910000 55.349998 56.700001 316552
2014-01-03 30.031801 55.680000 55.990002 55.290001 55.580002 210044
2014-01-06 30.080338 55.770000 56.230000 55.529999 55.560001 185142
2014-01-07 30.943321 57.369999 57.619999 55.790001 55.880001 370397
2014-01-08 31.385597 58.189999 59.209999 57.750000 57.790001 489940
Last 5 rows of the raw data
Price Adj Close Close High Low Open Volume
Ticker BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA
Date
2025-12-11 78.669998 78.669998 78.919998 76.900002 76.919998 357918
2025-12-12 78.089996 78.089996 80.269997 78.089996 79.470001 280477
2025-12-15 79.080002 79.080002 79.449997 78.559998 78.559998 233852
2025-12-16 78.860001 78.860001 79.980003 78.809998 79.430000 283057
2025-12-17 80.080002 80.080002 80.150002 79.080002 79.199997 262818</code></pre></div>
<p>Adding new columns to the dataframe is easy. For example, I used a custom function to calculate the Relative Strength Index (RSI). To add a new column “RSI” with a value for every row based on the price from that row, only one line of code is needed, without custom loops:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">python</span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">df["RSI"] = compute_rsi(df["price"], period=14)</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>df[<span style="color:#e6db74">"RSI"</span>] <span style="color:#f92672">=</span> compute_rsi(df[<span style="color:#e6db74">"price"</span>], period<span style="color:#f92672">=</span><span style="color:#ae81ff">14</span>)</span></span></code></pre></div></div></div>
<p>After manipulating the data, the series can be converted into an array structure and printed as JSON into a placeholder in an HTML template:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">python</span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;"> baseline_series = [
{"time": ts, "value": val}
for ts, val in df_plot[["timestamp", BASELINE_LABEL]].itertuples(index=False)
]
baseline_json = json.dumps(baseline_series)
template = jinja2.Template("template.html")
rendered_html = template.render(
title=title,
heading=heading,
description=description_html,
...
baseline_json=baseline_json,
...
)
with open("report.html", "w", encoding="utf-8") as f:
f.write(rendered_html)
print("Report generated!")</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span> baseline_series <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span> {<span style="color:#e6db74">"time"</span>: ts, <span style="color:#e6db74">"value"</span>: val}
</span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> ts, val <span style="color:#f92672">in</span> df_plot[[<span style="color:#e6db74">"timestamp"</span>, BASELINE_LABEL]]<span style="color:#f92672">.</span>itertuples(index<span style="color:#f92672">=</span><span style="color:#66d9ef">False</span>)
</span></span><span style="display:flex;"><span> ]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> baseline_json <span style="color:#f92672">=</span> json<span style="color:#f92672">.</span>dumps(baseline_series)
</span></span><span style="display:flex;"><span> template <span style="color:#f92672">=</span> jinja2<span style="color:#f92672">.</span>Template(<span style="color:#e6db74">"template.html"</span>)
</span></span><span style="display:flex;"><span> rendered_html <span style="color:#f92672">=</span> template<span style="color:#f92672">.</span>render(
</span></span><span style="display:flex;"><span> title<span style="color:#f92672">=</span>title,
</span></span><span style="display:flex;"><span> heading<span style="color:#f92672">=</span>heading,
</span></span><span style="display:flex;"><span> description<span style="color:#f92672">=</span>description_html,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">...</span>
</span></span><span style="display:flex;"><span> baseline_json<span style="color:#f92672">=</span>baseline_json,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">...</span>
</span></span><span style="display:flex;"><span> )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> <span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">"report.html"</span>, <span style="color:#e6db74">"w"</span>, encoding<span style="color:#f92672">=</span><span style="color:#e6db74">"utf-8"</span>) <span style="color:#66d9ef">as</span> f:
</span></span><span style="display:flex;"><span> f<span style="color:#f92672">.</span>write(rendered_html)
</span></span><span style="display:flex;"><span> print(<span style="color:#e6db74">"Report generated!"</span>)</span></span></code></pre></div></div></div>
<p>In the HTML template, the marker <code>{{ variable }}</code> in Jinja syntax gets replaced with the actual JSON:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">html</span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;"><!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>{{ title }}</title>
...
</head>
<body>
<h1>{{ heading }}</h1>
<div id="chart"></div>
<script>
// Ensure the DOM is ready before we initialise the chart
document.addEventListener('DOMContentLoaded', () => {
// Parse the JSON data passed from Python
const baselineData = {{ baseline_json | safe }};
const strategyData = {{ strategy_json | safe }};
const markersData = {{ markers_json | safe }};
// Create the chart
const chart = LightweightCharts.createChart(document.getElementById('chart'), {
width: document.getElementById('chart').clientWidth,
height: 500,
layout: {
background: { color: "#222" },
textColor: "#ccc"
},
grid: {
vertLines: { color: "#555" },
horzLines: { color: "#555" }
}
});
// Add baseline series
const baselineSeries = chart.addLineSeries({
title: '{{ baseline_label }}',
lastValueVisible: false,
priceLineVisible: false,
priceLineWidth: 1
});
baselineSeries.setData(baselineData);
baselineSeries.priceScale().applyOptions({
entireTextOnly: true
});
// Add strategy series
const strategySeries = chart.addLineSeries({
title: '{{ strategy_label }}',
lastValueVisible: false,
priceLineVisible: false,
color: '#FF6D00'
});
strategySeries.setData(strategyData);
// Add buy/sell markers to the strategy series
strategySeries.setMarkers(markersData);
// Fit the chart to show the full data range (full zoom)
chart.timeScale().fitContent();
})
</script>
</body>
</html></code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-html" data-lang="html"><span style="display:flex;"><span><span style="color:#75715e"><!DOCTYPE html></span>
</span></span><span style="display:flex;"><span><<span style="color:#f92672">html</span> <span style="color:#a6e22e">lang</span><span style="color:#f92672">=</span><span style="color:#e6db74">"en"</span>>
</span></span><span style="display:flex;"><span><<span style="color:#f92672">head</span>>
</span></span><span style="display:flex;"><span> <<span style="color:#f92672">meta</span> <span style="color:#a6e22e">charset</span><span style="color:#f92672">=</span><span style="color:#e6db74">"UTF-8"</span>>
</span></span><span style="display:flex;"><span> <<span style="color:#f92672">title</span>>{{ title }}</<span style="color:#f92672">title</span>>
</span></span><span style="display:flex;"><span> ...
</span></span><span style="display:flex;"><span></<span style="color:#f92672">head</span>>
</span></span><span style="display:flex;"><span><<span style="color:#f92672">body</span>>
</span></span><span style="display:flex;"><span> <<span style="color:#f92672">h1</span>>{{ heading }}</<span style="color:#f92672">h1</span>>
</span></span><span style="display:flex;"><span> <<span style="color:#f92672">div</span> <span style="color:#a6e22e">id</span><span style="color:#f92672">=</span><span style="color:#e6db74">"chart"</span>></<span style="color:#f92672">div</span>>
</span></span><span style="display:flex;"><span> <<span style="color:#f92672">script</span>>
</span></span><span style="display:flex;"><span> <span style="color:#75715e">// Ensure the DOM is ready before we initialise the chart
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> document.<span style="color:#a6e22e">addEventListener</span>(<span style="color:#e6db74">'DOMContentLoaded'</span>, () => {
</span></span><span style="display:flex;"><span> <span style="color:#75715e">// Parse the JSON data passed from Python
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#66d9ef">const</span> <span style="color:#a6e22e">baselineData</span> <span style="color:#f92672">=</span> {{ <span style="color:#a6e22e">baseline_json</span> <span style="color:#f92672">|</span> <span style="color:#a6e22e">safe</span> }};
</span></span><span style="display:flex;"><span> <span style="color:#66d9ef">const</span> <span style="color:#a6e22e">strategyData</span> <span style="color:#f92672">=</span> {{ <span style="color:#a6e22e">strategy_json</span> <span style="color:#f92672">|</span> <span style="color:#a6e22e">safe</span> }};
</span></span><span style="display:flex;"><span> <span style="color:#66d9ef">const</span> <span style="color:#a6e22e">markersData</span> <span style="color:#f92672">=</span> {{ <span style="color:#a6e22e">markers_json</span> <span style="color:#f92672">|</span> <span style="color:#a6e22e">safe</span> }};
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> <span style="color:#75715e">// Create the chart
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#66d9ef">const</span> <span style="color:#a6e22e">chart</span> <span style="color:#f92672">=</span> <span style="color:#a6e22e">LightweightCharts</span>.<span style="color:#a6e22e">createChart</span>(document.<span style="color:#a6e22e">getElementById</span>(<span style="color:#e6db74">'chart'</span>), {
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">width</span><span style="color:#f92672">:</span> document.<span style="color:#a6e22e">getElementById</span>(<span style="color:#e6db74">'chart'</span>).<span style="color:#a6e22e">clientWidth</span>,
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">height</span><span style="color:#f92672">:</span> <span style="color:#ae81ff">500</span>,
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">layout</span><span style="color:#f92672">:</span> {
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">background</span><span style="color:#f92672">:</span> { <span style="color:#a6e22e">color</span><span style="color:#f92672">:</span> <span style="color:#e6db74">"#222"</span> },
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">textColor</span><span style="color:#f92672">:</span> <span style="color:#e6db74">"#ccc"</span>
</span></span><span style="display:flex;"><span> },
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">grid</span><span style="color:#f92672">:</span> {
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">vertLines</span><span style="color:#f92672">:</span> { <span style="color:#a6e22e">color</span><span style="color:#f92672">:</span> <span style="color:#e6db74">"#555"</span> },
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">horzLines</span><span style="color:#f92672">:</span> { <span style="color:#a6e22e">color</span><span style="color:#f92672">:</span> <span style="color:#e6db74">"#555"</span> }
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span> });
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> <span style="color:#75715e">// Add baseline series
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#66d9ef">const</span> <span style="color:#a6e22e">baselineSeries</span> <span style="color:#f92672">=</span> <span style="color:#a6e22e">chart</span>.<span style="color:#a6e22e">addLineSeries</span>({
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">title</span><span style="color:#f92672">:</span> <span style="color:#e6db74">'{{ baseline_label }}'</span>,
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">lastValueVisible</span><span style="color:#f92672">:</span> <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">priceLineVisible</span><span style="color:#f92672">:</span> <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">priceLineWidth</span><span style="color:#f92672">:</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span> });
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">baselineSeries</span>.<span style="color:#a6e22e">setData</span>(<span style="color:#a6e22e">baselineData</span>);
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">baselineSeries</span>.<span style="color:#a6e22e">priceScale</span>().<span style="color:#a6e22e">applyOptions</span>({
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">entireTextOnly</span><span style="color:#f92672">:</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span> });
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> <span style="color:#75715e">// Add strategy series
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#66d9ef">const</span> <span style="color:#a6e22e">strategySeries</span> <span style="color:#f92672">=</span> <span style="color:#a6e22e">chart</span>.<span style="color:#a6e22e">addLineSeries</span>({
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">title</span><span style="color:#f92672">:</span> <span style="color:#e6db74">'{{ strategy_label }}'</span>,
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">lastValueVisible</span><span style="color:#f92672">:</span> <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">priceLineVisible</span><span style="color:#f92672">:</span> <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">color</span><span style="color:#f92672">:</span> <span style="color:#e6db74">'#FF6D00'</span>
</span></span><span style="display:flex;"><span> });
</span></span><span style="display:flex;"><span> <span style="color:#a6e22e">strategySeries</span>.<span style="color:#a6e22e">setData</span>(<span style="color:#a6e22e">strategyData</span>);
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> <span style="color:#75715e">// Add buy/sell markers to the strategy series
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#a6e22e">strategySeries</span>.<span style="color:#a6e22e">setMarkers</span>(<span style="color:#a6e22e">markersData</span>);
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> <span style="color:#75715e">// Fit the chart to show the full data range (full zoom)
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#a6e22e">chart</span>.<span style="color:#a6e22e">timeScale</span>().<span style="color:#a6e22e">fitContent</span>();
</span></span><span style="display:flex;"><span> })
</span></span><span style="display:flex;"><span> </<span style="color:#f92672">script</span>>
</span></span><span style="display:flex;"><span></<span style="color:#f92672">body</span>>
</span></span><span style="display:flex;"><span></<span style="color:#f92672">html</span>></span></span></code></pre></div></div></div>
<p>There are also Python libraries built specifically for backtesting investment strategies, such as <a class="link" href="https://github.com/mementum/backtrader" target="_blank" rel="noopener"
>Backtrader</a> and <a class="link" href="https://github.com/quantopian/zipline" target="_blank" rel="noopener"
>Zipline</a>, but they do not seem to be actively maintained, and probably have too many features and complexity compared to what I needed for doing this simple test.</p>
<p>The screenshot below shows an example of backtesting a strategy on the Waste Management Inc stock from January 2015 to December 2025. The baseline “Buy and hold” scenario is shown as the blue line and it fully tracks the stock price, while the orange line shows how the strategy would have performed, with markers for the sells and buys along the way.</p>
<p><img src="https://optimizedbyotto.com/post/backtest-stop-loss-strategy-python/backtest-waste-management.png"
width="1657"
height="1243"
srcset="https://optimizedbyotto.com/post/backtest-stop-loss-strategy-python/backtest-waste-management_hu3564648353518254418.png 480w, https://optimizedbyotto.com/post/backtest-stop-loss-strategy-python/backtest-waste-management_hu14136795148462251827.png 1024w, https://optimizedbyotto.com/post/backtest-stop-loss-strategy-python/backtest-waste-management.png 1657w"
loading="lazy"
alt="Backtest run example"
class="gallery-image"
data-flex-grow="133"
data-flex-basis="319px"
>
</p>
<h2 id="results"><a href="#results" class="header-anchor"></a>Results
</h2><p>I experimented with multiple strategies and tested them with various parameters, but I don’t think I found a strategy that was consistently and clearly better than just buy-and-hold.</p>
<p>It basically boils down to the fact that I was <strong>not able to find any way to calculate when the crash has bottomed</strong> based on historical data. You can only know in hindsight that the price has stopped dropping and is on a steady path to recovery, but at that point it is already too late to buy in. In my testing, <strong>most strategies underperformed buy-and-hold</strong> because they sold when the crash started, but bought back after it recovered at a slightly higher price.</p>
<p>In particular when using narrow margins and selling on a 3-6% drawdown the strategy performed very badly, as those small dips tend to recover in a few days. Essentially, the strategy was repeating the pattern of selling 100 stocks at a 6% discount, then being able to buy back only 94 shares the next day, then again selling 94 shares at a 6% discount, and only being able to buy back maybe 90 shares after recovery, and so forth, never catching up to the buy-and-hold.</p>
<p>The <strong>strategy worked better in large market crashes</strong> as they tended to last longer, and there were higher chances of buying back the shares while the price was still low. For example, in the 2020 crash selling at a 20% drawdown was a good strategy, as the stock I tested dropped nearly 50% and remained low for several weeks; thus, the strategy bought back the stocks while the price was still low and had not yet started to climb significantly. But that was just a lucky incident, as the delta between the trailing stop-loss margin of 20% and total crash of 50% was large enough. If the crash had been only 25%, the strategy would have missed the rebound and ended up buying back the stocks at a slightly higher price.</p>
<p>Also, note that the simulation assumes that the trade itself is too small to affect the price formation. We should keep in mind that in reality, if many people have stop-loss orders in place, a large price drop would trigger all of them, creating a flood of sell orders, which in turn would affect the price and drive it lower even faster and deeper. Luckily, it seems that stop-loss orders are generally not a good strategy, and we don’t need to fear that too many people will be using them.</p>
<h2 id="conclusion"><a href="#conclusion" class="header-anchor"></a>Conclusion
</h2><p>Even though using a trailing stop-loss strategy does not seem to help in getting consistently higher returns based on my backtesting, I would still say it is <strong>useful in protecting from the downside</strong> of stock investing. It can act as a kind of <em>“insurance policy”</em> to considerably decrease the chances of losing <em>big</em> while increasing the chances of losing <em>a little bit</em>. If you are risk-averse, which I think I probably am, this tradeoff can make sense. I’d rather miss out on an initial 50% loss <em>and</em> an overall 3% gain on recovery than have to sit through weeks or months with a 50% loss before the price recovers to prior levels.</p>
<p>Most notably, the <strong>trailing stop-loss strategy works best if used only once</strong>. If it is repeated multiple times, the small losses in gains will compound into big losses overall.</p>
<p>Thus, I think I might actually put this automation in place at least on the stocks in my portfolio that have had the highest gains. If they keep going up, I will ride along, but once the crash happens, I will be out of those particular stocks permanently.</p>
<p>Do you have a favorite open source investment tool or are you aware of any strategy that actually works? Comment below!</p> DEP-18: A proposal for Git-based collaboration in Debian https://optimizedbyotto.com/post/debian-collaboration-on-git/Sun, 30 Nov 2025 00:00:00 +0000 https://optimizedbyotto.com/post/debian-collaboration-on-git/ <img src="https://optimizedbyotto.com/post/debian-collaboration-on-git/debian-git-collaboration.jpg" alt="Featured image of post DEP-18: A proposal for Git-based collaboration in Debian" /><p>I am a huge fan of Git, as I have witnessed how it has made software development so much more productive compared to the pre-2010s era. I wish all Debian source code were in Git to reap the full benefits.</p>
<p>Git is not perfect, as it requires significant effort to learn properly, and the ecosystem is complex with even more things to learn ranging from cryptographic signatures and commit hooks to Git-assisted code review best practices, ‘forge’ websites, and CI systems.</p>
<p>Sure, there is still room to optimize its use, but Git certainly has proven itself and is now the industry standard. <strong>Thus, some readers might be surprised to learn that Debian development in 2025 is not actually based on Git.</strong> In Debian, the version control is done by the Debian archive itself. Each ‘commit’ is a new upload to the archive, and the ‘commit message’ is the <code>debian/changelog</code> entry. The ‘commit log’ is available at <a class="link" href="https://snapshot.debian.org/" target="_blank" rel="noopener"
>snapshots.debian.org</a>.</p>
<p>In practice, most Debian Developers (people who have the credentials to upload to the Debian archive) do use Git and host their packaging source code on <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a> – the GitLab instance of Debian. This is, however, based on each DD’s personal preferences. <strong>The Debian project does not have any policy requiring that packages be hosted on salsa.debian.org or be in version control at all.</strong></p>
<h2 id="is-collaborative-software-development-possible-without-git-and-version-control-software"><a href="#is-collaborative-software-development-possible-without-git-and-version-control-software" class="header-anchor"></a>Is collaborative software development possible without git and version control software?
</h2><p>Debian, however, has some peculiarities that may be surprising to people who have grown accustomed to GitHub, GitLab or various company-internal code review systems.</p>
<p>In Debian:</p>
<ul>
<li>The source code of the next upload is not public but resides only on the developer’s laptop.</li>
<li>Code contributions are plain patch files, based on the latest revision released in the Debian archive (where the <code>unstable</code> area is equivalent to the main development branch).</li>
<li>These patches are submitted by email to a bug tracker that does no validation or testing whatsoever.</li>
<li>Developers applying these patches typically have elaborate Mutt or Emacs setups to facilitate fetching patches from email.</li>
<li>There is no public staging area, no concept of rebasing patches or withdrawing a patch and replacing it with a better version.</li>
<li>The submitter won’t see any progress information until a notification email arrives after a new version has been uploaded to the Debian archive.</li>
</ul>
<p>This system has served Debian for three decades. It is not broken, but using the package archive just feels… well, <em>archaic</em>.</p>
<p>There is a more efficient way, and indeed the majority of Debian packages have a metadata field <code>Vcs-Git</code> that advertises which version control repository the maintainer uses. However, newcomers to Debian are surprised to notice that not all packages are hosted on <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a> but at various random places with their own account and code submission systems, and there is nothing enforcing or even warning if the code there is <strong>out of sync with what was uploaded to Debian</strong>. Any Debian Developer can at any time upload a new package with whatever changes, bypassing the Git repository, even when the package advertised a Git repository. All PGP signed commits, Git tags and other information in the Git repository are <em>just extras</em> currently, as the Debian archive does not enforce or validate anything about them.</p>
<p>This also makes contributing to multiple packages in parallel hard. One can’t just go on <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a> and fork a bunch of repositories and submit Merge Requests. Currently, the <strong>only reliable way is to download source packages from Debian unstable</strong>, develop patches on top of them, and send the final version as a plain <strong>patch file by email to the Debian bug tracker</strong>. To my knowledge, no system exists to facilitate working with the patches in the bug tracker, such as rebasing patches 6 months later to detect if they or equivalent changes were applied or if sending refreshed versions is needed.</p>
<p>To newcomers in Debian, it is even more surprising that there are packages that <em>are</em> on <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a> but have the Merge Requests feature disabled. This is often because the maintainer does not want to receive notification emails about new Merge Requests, but rather just emails from <a class="link" href="https://bugs.debian.org/" target="_blank" rel="noopener"
>bugs.debian.org</a>. This may sound arrogant, but keep in mind that these developers put in the effort to set up their Mutt/Emacs workflow for the existing Debian process, and extending it to work with GitLab notifications is not trivial. There are also purists who want to do everything via the command-line (without having to open a browser, run JavaScript and maintain a live Internet connection), and tools like <a class="link" href="https://manpages.debian.org/unstable/glab/glab.1.en.html" target="_blank" rel="noopener"
>glab</a> are not convenient enough for the full workflow.</p>
<h2 id="inefficient-ways-of-working-prevent-debian-from-flourishing"><a href="#inefficient-ways-of-working-prevent-debian-from-flourishing" class="header-anchor"></a>Inefficient ways of working prevent Debian from flourishing
</h2><p>I would claim, based on my personal experiences from the past 10+ years as a Debian Developer, that <strong>the lack of high-quality and productive tooling is seriously harming Debian</strong>. The current methods of collaboration are cumbersome for aspiring contributors to learn and suboptimal to use for both new and seasoned contributors.</p>
<p>There are no exit interviews for contributors who left Debian, no comprehensive data on reasons to contribute or stop contributing, nor are there any metrics tracking how many people tried but failed to contribute to Debian. Some data points to support my concerns do exist:</p>
<ul>
<li>The contributor database shows that the <a class="link" href="https://salsa.debian.org/rafael/debian-contrib-years" target="_blank" rel="noopener"
>number of contributors is growing slower</a> than Debian’s popularity.</li>
<li>Most packages are maintained by one person working alone (just pick any package at random and look at the upload history).</li>
</ul>
<h2 id="debian-should-embrace-git-but-decision-making-is-slow"><a href="#debian-should-embrace-git-but-decision-making-is-slow" class="header-anchor"></a>Debian should embrace git, but decision-making is slow
</h2><p>Debian is all about community and collaboration. One would assume that Debian prioritized above all making collaboration tools and processes simpler, faster and less error-prone, as it would help both current and future package maintainers. Yet, it isn’t so, due to some reasons unique to Debian.</p>
<p>There is no single company or entity running Debian, and it has managed to operate as a pure <strong>meritocracy and do-cracy for over 30 years</strong>. This is impressive and admirable. Unfortunately, some of the infrastructure and technical processes are also nearly 30 years old and very difficult to change for the same reason: the nature of Debian’s distributed decision-making process.</p>
<p>As a software developer and manager with 25+ years of experience, I strongly feel that developing software collaboratively using Git is a major step forward that Debian needs to take, in one form or another, and I <strong>hope to see other DDs voice their support</strong> if they agree.</p>
<h2 id="debian-enhancement-proposal-18"><a href="#debian-enhancement-proposal-18" class="header-anchor"></a>Debian Enhancement Proposal 18
</h2><p>Following how consensus is achieved in Debian, I started drafting <a class="link" href="https://dep-team.pages.debian.net/deps/dep18/" target="_blank" rel="noopener"
>DEP-18</a> in 2024, and it is currently awaiting enough <em>thumbs up</em> at <a class="link" href="https://salsa.debian.org/dep-team/deps/-/merge_requests/21" target="_blank" rel="noopener"
>https://salsa.debian.org/dep-team/deps/-/merge_requests/21</a> to get into <em>CANDIDATE</em> status next.</p>
<p>In summary, the DEP-18 proposes that everyone keen on collaborating should:</p>
<ol>
<li>Maintain Debian packaging sources in Git on Salsa.</li>
<li>Use Merge Requests to show your work and to get reviews.</li>
<li>Run Salsa CI before upload.</li>
</ol>
<p>The principles above are not novel. According to stats at e.g. <a class="link" href="https://trends.debian.net/#vcs-hosting" target="_blank" rel="noopener"
>trends.debian.net</a>, and <a class="link" href="https://udd.debian.org/cgi-bin/dep14stats.cgi" target="_blank" rel="noopener"
>UDD</a>, ~93% of all Debian source packages are already hosted on <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a>. As of June 1st, 2025, only 1640 source packages remain that are not hosted on Salsa. The purpose of DEP-18 is to state in writing what Debian is currently doing for most packages, and thus express what among others new contributors should be learning and doing, so basic collaboration is smooth and free from structural obstacles.</p>
<p>Most packages are also already allowing Merge Requests and using Salsa CI, but there hasn’t been any written recommendation anywhere in Debian to do so. The <a class="link" href="https://www.debian.org/doc/debian-policy/" target="_blank" rel="noopener"
>Debian Policy (v.4.7.2)</a> does not even mention the word “Salsa” a single time. The current <a class="link" href="https://www.debian.org/doc/manuals/developers-reference/" target="_blank" rel="noopener"
>process documentation</a> on how to do non-maintainer uploads or salvaging packages are all based on uploading packages to the archive, without any consideration of using git-based collaboration such as posting a Merge Request first. Personally I feel <strong>posting a Merge Request would be a better approach</strong>, as it would invite collaborators to discuss and provide code reviews. If there are no responses, the submitter can proceed to merge, but compared to direct uploads to the Debian archive, the Merge Request practice at least tries to offer a time and place for discussions and reviews to happen.</p>
<p>It could very well be that in the future somebody comes up with a new packaging format that makes upstream source package management easier, or a monorepo with all packages, or some other future structures or processes. Having a DEP to state how to do things <em>now</em> does not prevent people from experimenting and innovating if they intentionally want to do that. The DEP is merely an expression of the minimal common denominators in the packaging workflow that maintainers and contributors should follow, <em>unless they know better</em>.</p>
<h2 id="transparency-and-collaboration"><a href="#transparency-and-collaboration" class="header-anchor"></a>Transparency and collaboration
</h2><p>Among the <a class="link" href="https://dep-team.pages.debian.net/deps/dep18/" target="_blank" rel="noopener"
>DEP-18</a> recommendations is:</p>
<blockquote>
<p>The recommended first step in contributing to a package is to use the built-in “Fork” feature on Salsa. This serves two purposes. Primarily, it allows any contributor to publish their Git branches and submit them as Merge Requests. Additionally, the mere existence of a list of “Forks” enables contributors to discover each other, and in rare cases when the original package is not accepting improvements, collaboration could arise among the contributors and potentially lead to permanent forks in the general meaning. Forking is a fundamental part of the dynamics in open source that helps drive quality and agreement. The ability to fork ultimately serves as the last line of defense of users’ rights. Git supports this by making both temporary and permanent forks easy to create and maintain.</p>
</blockquote>
<p>Further, it states:</p>
<blockquote>
<p>Debian packaging work should be reasonably transparent and public to allow contributors to participate. A maintainer should push their pending changes to Salsa at regular intervals, so that a potential contributor can discover if a particular change has already been made or a bug has been fixed in version control, and thus avoid duplicate work.</p>
<p>Debian maintainers should make reasonable efforts to publish planned changes as Merge Requests on Salsa and solicit feedback and reviews. While pushing changes directly on the main Git branch is the fastest workflow, second only to uploading all changes directly to Debian repositories, it is not an inclusive way to develop software. Even packages that are maintained by a single maintainer should at least occasionally publish Merge Requests to allow new contributors to step up and participate.</p>
</blockquote>
<p><strong>I think these are key aspects leading to transparency and true open source collaboration.</strong> Even though this talks about <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>Salsa</a> — which is based on <a class="link" href="https://gitlab.com/" target="_blank" rel="noopener"
>GitLab</a> — the concepts are universal and will work also on other forges, like <a class="link" href="https://forgejo.org/" target="_blank" rel="noopener"
>Forgejo</a> or <a class="link" href="https://github.com/" target="_blank" rel="noopener"
>GitHub</a>. <strong>The point is that sharing work-in-progress on a real-time platform</strong>, with CI and other supporting features, <strong>empowers and motivates people</strong> to iterate on code collaboratively. As an example of an anti-pattern, Oracle MySQL publishes the source code for all their releases and is license-compliant, but as they don’t publish their Git commits in real-time, it does not feel like a real open source project. Non-Oracle employees are not motivated to participate as second-class developers who are kept in the dark. Debian should embrace git and sharing work in real-time, embodying a true open source spirit.</p>
<h2 id="recommend-not-force"><a href="#recommend-not-force" class="header-anchor"></a>Recommend, not force
</h2><p>Note that the Debian Enhancement Proposals are not binding. Only the Debian Policy and Technical Committee decisions carry that weight. The nature of collaboration is voluntary anyway, so the DEP does not need to force anything on people who don’t want to use <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a>.</p>
<p>The DEP-18 is also not a guide for package maintainers. I have my own views and have written detailed guides in blog articles if you want to read more on, for example, how to do <a class="link" href="https://optimizedbyotto.com/post/how-to-code-review/" >code reviews</a> efficiently.</p>
<p>Within DEP-18, there is plenty of room to work in many different ways, and it does not try to force one single workflow. <strong>The goal here is to simply have agreed-upon minimal common denominators among those who are keen to collaborate using salsa.debian.org,</strong> not to dictate a complete code submission workflow.</p>
<p>Once we reach this, there will hopefully be less friction in the most basic and recurring collaboration tasks, giving DDs more energy to improve other processes or just invest in having more and newer packages for Debian users to enjoy.</p>
<h2 id="next-steps"><a href="#next-steps" class="header-anchor"></a>Next steps
</h2><p>In addition to lengthy online discussions on mailing lists and DEP reviews, I also <a class="link" href="https://debconf25.debconf.org/talks/135-merge-request-based-collaboration-for-debian-packages/" target="_blank" rel="noopener"
>presented on this topic at DebConf 2025</a> in Brest, France. Unfortunately the recording is not yet up on <a class="link" href="https://peertube.debian.social/" target="_blank" rel="noopener"
>Peertube</a>.</p>
<p>The feedback has been overwhelmingly positive. However, there are a few loud and very negative voices that cannot be ignored. Maintaining a Linux distribution at the scale and complexity of Debian requires extraordinary talent and dedication, and people doing this kind of work often have strong views, most of the time for good reasons. We do not want to alienate existing key contributors with new processes, so maximum consensus is desirable.</p>
<p>We also need more data on what the 1000+ current Debian Developers view as a good process to avoid being skewed by a loud minority. <strong>If you are a current or aspiring Debian Developer, <a class="link" href="https://salsa.debian.org/dep-team/deps/-/merge_requests/21" target="_blank" rel="noopener"
>please add a thumbs up</a> if you think I should continue with this effort (or a thumbs down if not) on the Merge Request that would make DEP-18 have <em>candidate</em> status.</strong></p>
<p>There is also technical work to do. Increased Git use will obviously lead to growing adoption of the new <a class="link" href="https://manpages.debian.org/unstable/git-debpush/tag2upload.5.en.html" target="_blank" rel="noopener"
>tag2upload</a> feature, which will need to get full <code>git-buildpackage</code> support so it can integrate into <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a> without <a class="link" href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1106071" target="_blank" rel="noopener"
>turning off</a> Debian packaging security features. The <code>git-buildpackage</code> tool itself also needs various improvements, such as making contributing to multiple different packages with various levels of diligence in <code>debian/gbp.conf</code> maintenance less error-prone.</p>
<p>Eventually, if it starts looking like all Debian packages might get hosted on <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a>, I would also start building a <em>review.debian.org</em> website to facilitate code review aspects that are unique to Debian, such as tracking Merge Requests across GitLab projects in ways GitLab can’t do, highlighting which submissions need review most urgently, feeding code reviews and approvals into the <a class="link" href="https://contributors.debian.org/" target="_blank" rel="noopener"
>contributors.debian.org</a> database for better attribution, and so forth.</p>
<p>Details on this vision will be in a later blog post, so subscribe to updates!</p> Could the XZ backdoor have been detected with better Git and Debian packaging practices? https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/Sun, 19 Oct 2025 00:00:00 +0000 https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/ <img src="https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/debian-git-magnifier.jpg" alt="Featured image of post Could the XZ backdoor have been detected with better Git and Debian packaging practices?" /><p>The discovery of a backdoor in XZ Utils in the spring of 2024 shocked the open source community, raising critical questions about software supply chain security. This post explores whether better Debian packaging practices could have detected this threat, offering a guide to auditing packages and suggesting future improvements.</p>
<p>The XZ backdoor in versions 5.6.0/5.6.1 made its way briefly into many major Linux distributions such as Debian and Fedora, but luckily didn’t reach that many actual users, as the backdoored releases were quickly removed thanks to the heroic diligence of <a class="link" href="https://mastodon.social/@AndresFreundTec" target="_blank" rel="noopener"
>Andres Freund</a>. We are all extremely lucky that he detected a half a second performance regression in SSH, cared enough to trace it down, discovered malicious code in the XZ library loaded by SSH, and reported promtly to various security teams for quick coordinated actions.</p>
<p>This episode makes software engineers ponder the following questions:</p>
<ul>
<li>Why didn’t any Linux distro packagers notice anything odd when importing the new XZ version 5.6.0/5.6.1 from upstream?</li>
<li>Is the current software supply-chain in the most popular Linux distros easy to audit?</li>
<li>Could we have similar backdoors lurking that haven’t been detected yet?</li>
</ul>
<p>As a Debian Developer, I decided to audit the xz package in Debian, share my methodology and findings in this post, and also suggest some improvements on how the software supply-chain security could be tightened in Debian specifically.</p>
<p><strong>Note that the scope here is only to inspect how Debian imports software from its upstreams, and how they are distributed to Debian’s users.</strong> This excludes the whole story of how to assess if an upstream project is following software development security best practices. This post doesn’t discuss how to operate an individual computer running Debian to ensure it remains untampered as there are plenty of guides on that already.</p>
<h2 id="downloading-debian-and-upstream-source-packages"><a href="#downloading-debian-and-upstream-source-packages" class="header-anchor"></a>Downloading Debian and upstream source packages
</h2><p>Let’s start by working backwards from what the Debian package repositories offer for download. As auditing binaries is extremely complicated, we skip that, and assume the Debian build hosts are trustworthy and reliably building binaries from the source packages, and the <strong>focus should be on auditing the source code packages</strong>.</p>
<p>As with everything in Debian, there are multiple tools and ways to do the same thing, but in this post only one (and hopefully the best) way to do something is presented for brevity.</p>
<p>The first step is to download the latest version and some past versions of the package from the Debian archive, which is easiest done with <a class="link" href="https://manpages.debian.org/unstable/devscripts/debsnap.1.en.html" target="_blank" rel="noopener"
>debsnap</a>. The following command will download all Debian source packages of <a class="link" href="https://tracker.debian.org/pkg/xz-utils" target="_blank" rel="noopener"
>xz-utils</a> from Debian release 5.2.4-1 onwards:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">$ debsnap --verbose --first 5.2.4-1 xz-utils
Getting json https://snapshot.debian.org/mr/package/xz-utils/
...
Getting dsc file xz-utils_5.2.4-1.dsc: https://snapshot.debian.org/file/a98271e4291bed8df795ce04d9dc8e4ce959462d
Getting file xz-utils_5.2.4.orig.tar.xz.asc: https://snapshot.debian.org/file/59ccbfb2405abe510999afef4b374cad30c09275
Getting file xz-utils_5.2.4-1.debian.tar.xz: https://snapshot.debian.org/file/667c14fd9409ca54c397b07d2d70140d6297393f
source-xz-utils/xz-utils_5.2.4-1.dsc:
Good signature found
validating xz-utils_5.2.4.orig.tar.xz
validating xz-utils_5.2.4.orig.tar.xz.asc
validating xz-utils_5.2.4-1.debian.tar.xz
All files validated successfully.</code><pre><code>$ debsnap --verbose --first 5.2.4-1 xz-utils
Getting json https://snapshot.debian.org/mr/package/xz-utils/
...
Getting dsc file xz-utils_5.2.4-1.dsc: https://snapshot.debian.org/file/a98271e4291bed8df795ce04d9dc8e4ce959462d
Getting file xz-utils_5.2.4.orig.tar.xz.asc: https://snapshot.debian.org/file/59ccbfb2405abe510999afef4b374cad30c09275
Getting file xz-utils_5.2.4-1.debian.tar.xz: https://snapshot.debian.org/file/667c14fd9409ca54c397b07d2d70140d6297393f
source-xz-utils/xz-utils_5.2.4-1.dsc:
Good signature found
validating xz-utils_5.2.4.orig.tar.xz
validating xz-utils_5.2.4.orig.tar.xz.asc
validating xz-utils_5.2.4-1.debian.tar.xz
All files validated successfully.</code></pre></div>
<p>Once debsnap completes there will be a subfolder <code>source-<package name></code> with the following types of files:</p>
<ul>
<li><code>*.orig.tar.xz</code>: source code from upstream</li>
<li><code>*.orig.tar.xz.asc</code>: detached signature (if upstream signs their releases)</li>
<li><code>*.debian.tar.xz</code>: Debian packaging source, i.e. the <code>debian/</code> subdirectory contents</li>
<li><code>*.dsc</code>: Debian source control file, including signature by Debian Developer/Maintainer</li>
</ul>
<p>Example:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">$ ls -1 source-xz-utils/
...
xz-utils_5.6.4.orig.tar.xz
xz-utils_5.6.4.orig.tar.xz.asc
xz-utils_5.6.4-1.debian.tar.xz
xz-utils_5.6.4-1.dsc
xz-utils_5.8.0.orig.tar.xz
xz-utils_5.8.0.orig.tar.xz.asc
xz-utils_5.8.0-1.debian.tar.xz
xz-utils_5.8.0-1.dsc
xz-utils_5.8.1.orig.tar.xz
xz-utils_5.8.1.orig.tar.xz.asc
xz-utils_5.8.1-1.1.debian.tar.xz
xz-utils_5.8.1-1.1.dsc
xz-utils_5.8.1-1.debian.tar.xz
xz-utils_5.8.1-1.dsc
xz-utils_5.8.1-2.debian.tar.xz
xz-utils_5.8.1-2.dsc</code><pre><code>$ ls -1 source-xz-utils/
...
xz-utils_5.6.4.orig.tar.xz
xz-utils_5.6.4.orig.tar.xz.asc
xz-utils_5.6.4-1.debian.tar.xz
xz-utils_5.6.4-1.dsc
xz-utils_5.8.0.orig.tar.xz
xz-utils_5.8.0.orig.tar.xz.asc
xz-utils_5.8.0-1.debian.tar.xz
xz-utils_5.8.0-1.dsc
xz-utils_5.8.1.orig.tar.xz
xz-utils_5.8.1.orig.tar.xz.asc
xz-utils_5.8.1-1.1.debian.tar.xz
xz-utils_5.8.1-1.1.dsc
xz-utils_5.8.1-1.debian.tar.xz
xz-utils_5.8.1-1.dsc
xz-utils_5.8.1-2.debian.tar.xz
xz-utils_5.8.1-2.dsc</code></pre></div>
<h2 id="verifying-authenticity-of-upstream-and-debian-sources-using-openpgp-signatures"><a href="#verifying-authenticity-of-upstream-and-debian-sources-using-openpgp-signatures" class="header-anchor"></a>Verifying authenticity of upstream and Debian sources using OpenPGP signatures
</h2><p>As seen in the output of <code>debsnap</code>, it already automatically verifies that the downloaded files match the <a class="link" href="https://www.openpgp.org/" target="_blank" rel="noopener"
>OpenPGP</a> signatures. To have full clarity on what files were authenticated with what keys, we should verify the Debian packagers signature with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">$ gpg --verify --auto-key-retrieve --keyserver hkps://keyring.debian.org xz-utils_5.8.1-2.dsc
gpg: Signature made Fri Oct 3 22:04:44 2025 UTC
gpg: using RSA key 57892E705233051337F6FDD105641F175712FA5B
gpg: requesting key 05641F175712FA5B from hkps://keyring.debian.org
gpg: key 7B96E8162A8CF5D1: public key "Sebastian Andrzej Siewior" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: Good signature from "Sebastian Andrzej Siewior" [unknown]
gpg: aka "Sebastian Andrzej Siewior <bigeasy@linutronix.de>" [unknown]
gpg: aka "Sebastian Andrzej Siewior <sebastian@breakpoint.cc>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 6425 4695 FFF0 AA44 66CC 19E6 7B96 E816 2A8C F5D1
Subkey fingerprint: 5789 2E70 5233 0513 37F6 FDD1 0564 1F17 5712 FA5B</code><pre><code>$ gpg --verify --auto-key-retrieve --keyserver hkps://keyring.debian.org xz-utils_5.8.1-2.dsc
gpg: Signature made Fri Oct 3 22:04:44 2025 UTC
gpg: using RSA key 57892E705233051337F6FDD105641F175712FA5B
gpg: requesting key 05641F175712FA5B from hkps://keyring.debian.org
gpg: key 7B96E8162A8CF5D1: public key "Sebastian Andrzej Siewior" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: Good signature from "Sebastian Andrzej Siewior" [unknown]
gpg: aka "Sebastian Andrzej Siewior <bigeasy@linutronix.de>" [unknown]
gpg: aka "Sebastian Andrzej Siewior <sebastian@breakpoint.cc>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 6425 4695 FFF0 AA44 66CC 19E6 7B96 E816 2A8C F5D1
Subkey fingerprint: 5789 2E70 5233 0513 37F6 FDD1 0564 1F17 5712 FA5B</code></pre></div>
<p>The upstream tarball signature (if available) can be verified with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">$ gpg --verify --auto-key-retrieve xz-utils_5.8.1.orig.tar.xz.asc
gpg: assuming signed data in 'xz-utils_5.8.1.orig.tar.xz'
gpg: Signature made Thu Apr 3 11:38:23 2025 UTC
gpg: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpg: key 38EE757D69184620: public key "Lasse Collin <lasse.collin@tukaani.org>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: Good signature from "Lasse Collin <lasse.collin@tukaani.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 3690 C240 CE51 B467 0D30 AD1C 38EE 757D 6918 4620</code><pre><code>$ gpg --verify --auto-key-retrieve xz-utils_5.8.1.orig.tar.xz.asc
gpg: assuming signed data in 'xz-utils_5.8.1.orig.tar.xz'
gpg: Signature made Thu Apr 3 11:38:23 2025 UTC
gpg: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpg: key 38EE757D69184620: public key "Lasse Collin <lasse.collin@tukaani.org>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: Good signature from "Lasse Collin <lasse.collin@tukaani.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 3690 C240 CE51 B467 0D30 AD1C 38EE 757D 6918 4620</code></pre></div>
<p>Note that this only proves that there is <em>a key</em> that created a valid signature for this content. <strong>The authenticity of the keys themselves need to be validated separately</strong> before trusting they in fact are the keys of these people. That can be done by checking e.g. the upstream website for what key fingerprints they published, or the <a class="link" href="https://keyring.debian.org/" target="_blank" rel="noopener"
>Debian keyring</a> for Debian Developers and Maintainers, or by relying on the OpenPGP “web-of-trust”.</p>
<h2 id="verifying-authenticity-of-upstream-sources-by-comparing-checksums"><a href="#verifying-authenticity-of-upstream-sources-by-comparing-checksums" class="header-anchor"></a>Verifying authenticity of upstream sources by comparing checksums
</h2><p>In case the upstream in question does not publish release signatures, the second best way to verify the authenticity of the sources used in Debian is to download the sources directly from upstream and compare that the <a class="link" href="https://en.wikipedia.org/wiki/SHA-2" target="_blank" rel="noopener"
>sha256 checksums</a> match.</p>
<p>This should be done using the <code>debian/watch</code> file inside the Debian packaging, which defines where the upstream source is downloaded from. Continuing on the example situation above, we can unpack the latest Debian sources, enter and then run <a class="link" href="https://manpages.debian.org/unstable/devscripts/uscan.1.en.html" target="_blank" rel="noopener"
>uscan</a> to download:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">$ tar xvf xz-utils_5.8.1-2.debian.tar.xz
...
debian/rules
debian/source/format
debian/source.lintian-overrides
debian/symbols
debian/tests/control
debian/tests/testsuite
debian/upstream/signing-key.asc
debian/watch
...
$ uscan --download-current-version --destdir /tmp
Newest version of xz-utils on remote site is 5.8.1, specified download version is 5.8.1
gpgv: Signature made Thu Apr 3 11:38:23 2025 UTC
gpgv: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpgv: Good signature from "Lasse Collin <lasse.collin@tukaani.org>"
Successfully symlinked /tmp/xz-5.8.1.tar.xz to /tmp/xz-utils_5.8.1.orig.tar.xz.</code><pre><code>$ tar xvf xz-utils_5.8.1-2.debian.tar.xz
...
debian/rules
debian/source/format
debian/source.lintian-overrides
debian/symbols
debian/tests/control
debian/tests/testsuite
debian/upstream/signing-key.asc
debian/watch
...
$ uscan --download-current-version --destdir /tmp
Newest version of xz-utils on remote site is 5.8.1, specified download version is 5.8.1
gpgv: Signature made Thu Apr 3 11:38:23 2025 UTC
gpgv: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpgv: Good signature from "Lasse Collin <lasse.collin@tukaani.org>"
Successfully symlinked /tmp/xz-5.8.1.tar.xz to /tmp/xz-utils_5.8.1.orig.tar.xz.</code></pre></div>
<p>The original files downloaded from upstream are now in <code>/tmp</code> along with the files renamed to follow Debian conventions. Using everything downloaded so far the sha256 checksums can be compared across the files and also to what the <code>.dsc</code> file advertised:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">$ ls -1 /tmp/
xz-5.8.1.tar.xz
xz-5.8.1.tar.xz.sig
xz-utils_5.8.1.orig.tar.xz
xz-utils_5.8.1.orig.tar.xz.asc
$ sha256sum xz-utils_5.8.1.orig.tar.xz /tmp/xz-5.8.1.tar.xz
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e xz-utils_5.8.1.orig.tar.xz
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e /tmp/xz-5.8.1.tar.xz
$ grep -A 3 Sha256 xz-utils_5.8.1-2.dsc
Checksums-Sha256:
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e 1461872 xz-utils_5.8.1.orig.tar.xz
4138f4ceca1aa7fd2085fb15a23f6d495d27bca6d3c49c429a8520ea622c27ae 833 xz-utils_5.8.1.orig.tar.xz.asc
3ed458da17e4023ec45b2c398480ed4fe6a7bfc1d108675ec837b5ca9a4b5ccb 24648 xz-utils_5.8.1-2.debian.tar.xz</code><pre><code>$ ls -1 /tmp/
xz-5.8.1.tar.xz
xz-5.8.1.tar.xz.sig
xz-utils_5.8.1.orig.tar.xz
xz-utils_5.8.1.orig.tar.xz.asc
$ sha256sum xz-utils_5.8.1.orig.tar.xz /tmp/xz-5.8.1.tar.xz
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e xz-utils_5.8.1.orig.tar.xz
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e /tmp/xz-5.8.1.tar.xz
$ grep -A 3 Sha256 xz-utils_5.8.1-2.dsc
Checksums-Sha256:
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e 1461872 xz-utils_5.8.1.orig.tar.xz
4138f4ceca1aa7fd2085fb15a23f6d495d27bca6d3c49c429a8520ea622c27ae 833 xz-utils_5.8.1.orig.tar.xz.asc
3ed458da17e4023ec45b2c398480ed4fe6a7bfc1d108675ec837b5ca9a4b5ccb 24648 xz-utils_5.8.1-2.debian.tar.xz</code></pre></div>
<p>In the example above the checksum <code>0b54f79df85...</code> is the same across the files, so it is a match.</p>
<h3 id="repackaged-upstream-sources-cant-be-verified-as-easily"><a href="#repackaged-upstream-sources-cant-be-verified-as-easily" class="header-anchor"></a>Repackaged upstream sources can’t be verified as easily
</h3><p>Note that <code>uscan</code> may in rare cases repackage some upstream sources, for example to exclude files that don’t adhere to Debian’s copyright and licensing requirements. Those files and paths would be listed under the <code>Files-Excluded</code> section in the <code>debian/copyright</code> file. There are also other situations where the file that represents the upstream sources in Debian isn’t bit-by-bit the same as what upstream published. If checksums don’t match, an experienced Debian Developer should review all package settings (e.g. <code>debian/source/options</code>) to see if there was a valid and intentional reason for divergence.</p>
<h2 id="reviewing-changes-between-two-source-packages-using-diffoscope"><a href="#reviewing-changes-between-two-source-packages-using-diffoscope" class="header-anchor"></a>Reviewing changes between two source packages using diffoscope
</h2><p><a class="link" href="https://manpages.debian.org/unstable/diffoscope-minimal/diffoscope.1.en.html" target="_blank" rel="noopener"
>Diffoscope</a> is an incredibly capable and handy tool to compare arbitrary files. For example, to view a report in HTML format of the differences between two XZ releases, run:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-6"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-6" style="display:none;">diffoscope --html-dir xz-utils-5.6.4_vs_5.8.0 xz-utils_5.6.4.orig.tar.xz xz-utils_5.8.0.orig.tar.xz
browse xz-utils-5.6.4_vs_5.8.0/index.html</code><pre><code>diffoscope --html-dir xz-utils-5.6.4_vs_5.8.0 xz-utils_5.6.4.orig.tar.xz xz-utils_5.8.0.orig.tar.xz
browse xz-utils-5.6.4_vs_5.8.0/index.html</code></pre></div>
<p><img src="https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-diffoscope.png"
width="1251"
height="645"
srcset="https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-diffoscope_hu11879943104882691299.png 480w, https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-diffoscope_hu12009539730501090515.png 1024w, https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-diffoscope.png 1251w"
loading="lazy"
alt="Inspecting diffoscope output of differences between two XZ Utils releases"
class="gallery-image"
data-flex-grow="193"
data-flex-basis="465px"
>
</p>
<p>If the changes are extensive, and you want to use a LLM to help spot potential security issues, generate the report of both the upstream and Debian packaging differences in Markdown with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-7"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-7" style="display:none;">diffoscope --markdown diffoscope-debian.md xz-utils_5.6.4-1.debian.tar.xz xz-utils_5.8.1-2.debian.tar.xz
diffoscope --markdown diffoscope.md xz-utils_5.6.4.orig.tar.xz xz-utils_5.8.0.orig.tar.xz</code><pre><code>diffoscope --markdown diffoscope-debian.md xz-utils_5.6.4-1.debian.tar.xz xz-utils_5.8.1-2.debian.tar.xz
diffoscope --markdown diffoscope.md xz-utils_5.6.4.orig.tar.xz xz-utils_5.8.0.orig.tar.xz</code></pre></div>
<p>The Markdown files created above can then be passed to your favorite LLM, along with a prompt such as:</p>
<blockquote>
<p>Based on the attached diffoscope output for a new Debian package version compared with the previous one, list all suspicious changes that might have introduced a backdoor, followed by other potential security issues. If there are none, list a short summary of changes as the conclusion.</p>
</blockquote>
<h2 id="reviewing-debian-source-packages-in-version-control"><a href="#reviewing-debian-source-packages-in-version-control" class="header-anchor"></a>Reviewing Debian source packages in version control
</h2><p>As of today <a class="link" href="https://udd.debian.org/cgi-bin/dep14stats.cgi" target="_blank" rel="noopener"
>only 93%</a> of all Debian source packages are tracked in git on Debian’s GitLab instance at <a class="link" href="https://salsa.debian.org" target="_blank" rel="noopener"
>salsa.debian.org</a>. Some key packages such as <a class="link" href="https://tracker.debian.org/pkg/coreutils" target="_blank" rel="noopener"
>Coreutils</a> and <a class="link" href="https://tracker.debian.org/pkg/bash" target="_blank" rel="noopener"
>Bash</a> are not using version control at all, as their maintainers apparently don’t see value in using git for Debian packaging, and the <a class="link" href="https://www.debian.org/doc/debian-policy/" target="_blank" rel="noopener"
>Debian Policy</a> does not require it. <strong>Thus, the only reliable and consistent way to audit changes in Debian packages is to compare the full versions from the archive as shown above.</strong></p>
<p>However, for packages that <em>are hosted on Salsa</em>, one can view the <strong>git history to gain additional insight</strong> into what exactly changed, when and why. For packages that are using version control, their location can be found in the <code>Git-Vcs</code> header in the <code>debian/control</code> file. For <a class="link" href="https://tracker.debian.org/pkg/xz-utils" target="_blank" rel="noopener"
>xz-utils</a> the location is <a class="link" href="https://salsa.debian.org/debian/xz-utils" target="_blank" rel="noopener"
>salsa.debian.org/debian/xz-utils</a>.</p>
<p>Note that the Debian policy does not state anything about <em>how</em> Salsa should be used, or what git repository layout or development practices to follow. In practice most packages follow the <a class="link" href="https://dep-team.pages.debian.net/deps/dep14/" target="_blank" rel="noopener"
>DEP-14 proposal</a>, and use <a class="link" href="https://gbp.sigxcpu.org/manual/" target="_blank" rel="noopener"
>git-buildpackage</a> as the tool for managing changes and pushing and pulling them between upstream and <a class="link" href="https://salsa.debian.org" target="_blank" rel="noopener"
>salsa.debian.org</a>.</p>
<p>To get the XZ Utils source, run:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-8"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-8" style="display:none;">$ gbp clone https://salsa.debian.org/debian/xz-utils.git
gbp:info: Cloning from 'https://salsa.debian.org/debian/xz-utils.git'</code><pre><code>$ gbp clone https://salsa.debian.org/debian/xz-utils.git
gbp:info: Cloning from 'https://salsa.debian.org/debian/xz-utils.git'</code></pre></div>
<p>At the time of writing this post the git history shows:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-9"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-9" style="display:none;">$ git log --graph --oneline
* bb787585 (HEAD -> debian/unstable, origin/debian/unstable, origin/HEAD) Prepare 5.8.1-2
* 4b769547 d: Remove the symlinks from -dev package.
* a39f3428 Correct the nocheck build profile
* 1b806b8d Import Debian changes 5.8.1-1.1
* b1cad34b Prepare 5.8.1-1
* a8646015 Import 5.8.1
* 2808ec2d Update upstream source from tag 'upstream/5.8.1'
|\
| * fa1e8796 (origin/upstream/v5.8, upstream/v5.8) New upstream version 5.8.1
| * a522a226 Bump version and soname for 5.8.1
| * 1c462c2a Add NEWS for 5.8.1
| * 513cabcf Tests: Call lzma_code() in smaller chunks in fuzz_common.h
| * 48440e24 Tests: Add a fuzzing target for the multithreaded .xz decoder
| * 0c80045a liblzma: mt dec: Fix lack of parallelization in single-shot decoding
| * 81880488 liblzma: mt dec: Don't modify thr->in_size in the worker thread
| * d5a2ffe4 liblzma: mt dec: Don't free the input buffer too early (CVE-2025-31115)
| * c0c83596 liblzma: mt dec: Simplify by removing the THR_STOP state
| * 831b55b9 liblzma: mt dec: Fix a comment
| * b9d168ee liblzma: Add assertions to lzma_bufcpy()
| * c8e0a489 DOS: Update Makefile to fix the build
| * 307c02ed sysdefs.h: Avoid <stdalign.h> even with C11 compilers
| * 7ce38b31 Update THANKS
| * 688e51bd Translations: Update the Croatian translation
* | a6b54dde Prepare 5.8.0-1.
* | 77d9470f Add 5.8 symbols.
* | 9268eb66 Import 5.8.0
* | 6f85ef4f Update upstream source from tag 'upstream/5.8.0'
|\ \
| * | afba662b New upstream version 5.8.0
| |/
| * 173fb5c6 doc/SHA256SUMS: Add 5.8.0
| * db9258e8 Bump version and soname for 5.8.0
| * bfb752a3 Add NEWS for 5.8.0
| * 6ccbb904 Translations: Run "make -C po update-po"
| * 891a5f05 Translations: Run po4a/update-po
| * 4f52e738 Translations: Partially fix overtranslation in Serbian man pages
| * ff5d9447 liblzma: Count the extra bytes in LZMA/LZMA2 decoder memory usage
| * 943b012d liblzma: Use SSE2 intrinsics instead of memcpy() in dict_repeat()</code><pre><code>$ git log --graph --oneline
* bb787585 (HEAD -> debian/unstable, origin/debian/unstable, origin/HEAD) Prepare 5.8.1-2
* 4b769547 d: Remove the symlinks from -dev package.
* a39f3428 Correct the nocheck build profile
* 1b806b8d Import Debian changes 5.8.1-1.1
* b1cad34b Prepare 5.8.1-1
* a8646015 Import 5.8.1
* 2808ec2d Update upstream source from tag 'upstream/5.8.1'
|\
| * fa1e8796 (origin/upstream/v5.8, upstream/v5.8) New upstream version 5.8.1
| * a522a226 Bump version and soname for 5.8.1
| * 1c462c2a Add NEWS for 5.8.1
| * 513cabcf Tests: Call lzma_code() in smaller chunks in fuzz_common.h
| * 48440e24 Tests: Add a fuzzing target for the multithreaded .xz decoder
| * 0c80045a liblzma: mt dec: Fix lack of parallelization in single-shot decoding
| * 81880488 liblzma: mt dec: Don't modify thr->in_size in the worker thread
| * d5a2ffe4 liblzma: mt dec: Don't free the input buffer too early (CVE-2025-31115)
| * c0c83596 liblzma: mt dec: Simplify by removing the THR_STOP state
| * 831b55b9 liblzma: mt dec: Fix a comment
| * b9d168ee liblzma: Add assertions to lzma_bufcpy()
| * c8e0a489 DOS: Update Makefile to fix the build
| * 307c02ed sysdefs.h: Avoid <stdalign.h> even with C11 compilers
| * 7ce38b31 Update THANKS
| * 688e51bd Translations: Update the Croatian translation
* | a6b54dde Prepare 5.8.0-1.
* | 77d9470f Add 5.8 symbols.
* | 9268eb66 Import 5.8.0
* | 6f85ef4f Update upstream source from tag 'upstream/5.8.0'
|\ \
| * | afba662b New upstream version 5.8.0
| |/
| * 173fb5c6 doc/SHA256SUMS: Add 5.8.0
| * db9258e8 Bump version and soname for 5.8.0
| * bfb752a3 Add NEWS for 5.8.0
| * 6ccbb904 Translations: Run "make -C po update-po"
| * 891a5f05 Translations: Run po4a/update-po
| * 4f52e738 Translations: Partially fix overtranslation in Serbian man pages
| * ff5d9447 liblzma: Count the extra bytes in LZMA/LZMA2 decoder memory usage
| * 943b012d liblzma: Use SSE2 intrinsics instead of memcpy() in dict_repeat()</code></pre></div>
<p>This shows both the changes on the <code>debian/unstable</code> branch as well as the intermediate upstream import branch, and the actual real upstream development branch. See my <a class="link" href="https://optimizedbyotto.com/post/debian-source-package-git/" >Debian source packages in git explainer</a> for details of what these branches are used for.</p>
<p>To only view changes on the Debian branch, run <code>git log --graph --oneline --first-parent</code> or <code>git log --graph --oneline -- debian</code>.</p>
<p>The Debian branch should only have changes inside the <code>debian/</code> subdirectory, which is easy to check with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-10"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-10" style="display:none;">$ git diff --stat upstream/v5.8
debian/README.source | 16 +++
debian/autogen.sh | 32 +++++
debian/changelog | 949 ++++++++++++++++++++++++++
...
debian/upstream/signing-key.asc | 52 +++++++++
debian/watch | 4 +
debian/xz-utils.README.Debian | 47 ++++++++
debian/xz-utils.docs | 6 +
debian/xz-utils.install | 28 +++++
debian/xz-utils.postinst | 19 +++
debian/xz-utils.prerm | 10 ++
debian/xzdec.docs | 6 +
debian/xzdec.install | 4 +
33 files changed, 2014 insertions(+)</code><pre><code>$ git diff --stat upstream/v5.8
debian/README.source | 16 +++
debian/autogen.sh | 32 +++++
debian/changelog | 949 ++++++++++++++++++++++++++
...
debian/upstream/signing-key.asc | 52 +++++++++
debian/watch | 4 +
debian/xz-utils.README.Debian | 47 ++++++++
debian/xz-utils.docs | 6 +
debian/xz-utils.install | 28 +++++
debian/xz-utils.postinst | 19 +++
debian/xz-utils.prerm | 10 ++
debian/xzdec.docs | 6 +
debian/xzdec.install | 4 +
33 files changed, 2014 insertions(+)</code></pre></div>
<p>All the files outside the <code>debian/</code> directory originate from upstream, and for example running <code>git blame</code> on them should show only upstream commits:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-11"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-11" style="display:none;">$ git blame CMakeLists.txt
22af94128 (Lasse Collin 2024-02-12 17:09:10 +0200 1) # SPDX-License-Identifier: 0BSD
22af94128 (Lasse Collin 2024-02-12 17:09:10 +0200 2)
7e3493d40 (Lasse Collin 2020-02-24 23:38:16 +0200 3) ###############
7e3493d40 (Lasse Collin 2020-02-24 23:38:16 +0200 4) #
426bdc709 (Lasse Collin 2024-02-17 21:45:07 +0200 5) # CMake support for building XZ Utils</code><pre><code>$ git blame CMakeLists.txt
22af94128 (Lasse Collin 2024-02-12 17:09:10 +0200 1) # SPDX-License-Identifier: 0BSD
22af94128 (Lasse Collin 2024-02-12 17:09:10 +0200 2)
7e3493d40 (Lasse Collin 2020-02-24 23:38:16 +0200 3) ###############
7e3493d40 (Lasse Collin 2020-02-24 23:38:16 +0200 4) #
426bdc709 (Lasse Collin 2024-02-17 21:45:07 +0200 5) # CMake support for building XZ Utils</code></pre></div>
<p>If the upstream in question signs commits or tags, they can be verified with e.g.:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-12"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-12" style="display:none;">$ git verify-tag v5.6.2
gpg: Signature made Wed 29 May 2024 09:39:42 AM PDT
gpg: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpg: issuer "lasse.collin@tukaani.org"
gpg: Good signature from "Lasse Collin <lasse.collin@tukaani.org>" [expired]
gpg: Note: This key has expired!</code><pre><code>$ git verify-tag v5.6.2
gpg: Signature made Wed 29 May 2024 09:39:42 AM PDT
gpg: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpg: issuer "lasse.collin@tukaani.org"
gpg: Good signature from "Lasse Collin <lasse.collin@tukaani.org>" [expired]
gpg: Note: This key has expired!</code></pre></div>
<p>The main benefit of reviewing changes in git is the ability to see detailed information about each individual change, instead of just staring at a massive list of changes without any explanations. In this example, to view all the upstream commits since the previous import to Debian, one would view the commit range from <em>afba662b New upstream version 5.8.0</em> to <em>fa1e8796 New upstream version 5.8.1</em> with <code>git log --reverse -p afba662b...fa1e8796</code>. However, a far superior way to review changes would be to browse this range using a visual git history viewer, such as <a class="link" href="https://git-scm.com/book/en/v2/Appendix-A:-Git-in-Other-Environments-Graphical-Interfaces" target="_blank" rel="noopener"
>gitk</a>. Either way, looking at one code change at a time and reading the git commit message makes the review much easier.</p>
<p><img src="https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-gitk.png"
width="1272"
height="796"
srcset="https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-gitk_hu5329716550067711993.png 480w, https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-gitk_hu14496436468498130410.png 1024w, https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-gitk.png 1272w"
loading="lazy"
alt="Browsing git history in gitk --all"
class="gallery-image"
data-flex-grow="159"
data-flex-basis="383px"
>
</p>
<h2 id="comparing-debian-source-packages-to-git-contents"><a href="#comparing-debian-source-packages-to-git-contents" class="header-anchor"></a>Comparing Debian source packages to git contents
</h2><p>As stated in the beginning of the previous section, and worth repeating, <strong>there is no guarantee that the contents in the Debian packaging git repository matches what was actually uploaded to Debian</strong>. While the <a class="link" href="https://manpages.debian.org/unstable/git-debpush/tag2upload.5.en.html" target="_blank" rel="noopener"
>tag2upload</a> project in Debian is getting more and more popular, Debian is still far from having any system to enforce that the git repository would be in sync with the Debian archive contents.</p>
<p>To detect such differences we can run <a class="link" href="https://manpages.debian.org/unstable/diffutils/diff.1.en.html" target="_blank" rel="noopener"
>diff</a> across the Debian source packages downloaded with debsnap earlier (path <code>source-xz-utils/xz-utils_5.8.1-2.debian</code>) and the git repository cloned in the previous section (path <code>xz-utils</code>):</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">diff</span>
<button
class="codeblock-copy"
data-id="codeblock-id-13"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-13" style="display:none;">$ diff -u source-xz-utils/xz-utils_5.8.1-2.debian/ xz-utils/debian/
diff -u source-xz-utils/xz-utils_5.8.1-2.debian/changelog xz-utils/debian/changelog
--- debsnap/source-xz-utils/xz-utils_5.8.1-2.debian/changelog 2025-10-03 09:32:16.000000000 -0700
+++ xz-utils/debian/changelog 2025-10-12 12:18:04.623054758 -0700
@@ -5,7 +5,7 @@
* Remove the symlinks from -dev, pointing to the lib package.
(Closes: #1109354)
- -- Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Fri, 03 Oct 2025 18:32:16 +0200
+ -- Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Fri, 03 Oct 2025 18:36:59 +0200</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-diff" data-lang="diff"><span style="display:flex;"><span>$ diff -u source-xz-utils/xz-utils_5.8.1-2.debian/ xz-utils/debian/
</span></span><span style="display:flex;"><span>diff -u source-xz-utils/xz-utils_5.8.1-2.debian/changelog xz-utils/debian/changelog
</span></span><span style="display:flex;"><span><span style="color:#f92672">--- debsnap/source-xz-utils/xz-utils_5.8.1-2.debian/changelog 2025-10-03 09:32:16.000000000 -0700
</span></span></span><span style="display:flex;"><span><span style="color:#f92672"></span><span style="color:#a6e22e">+++ xz-utils/debian/changelog 2025-10-12 12:18:04.623054758 -0700
</span></span></span><span style="display:flex;"><span><span style="color:#a6e22e"></span><span style="color:#75715e">@@ -5,7 +5,7 @@
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> * Remove the symlinks from -dev, pointing to the lib package.
</span></span><span style="display:flex;"><span> (Closes: #1109354)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">- -- Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Fri, 03 Oct 2025 18:32:16 +0200
</span></span></span><span style="display:flex;"><span><span style="color:#f92672"></span><span style="color:#a6e22e">+ -- Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Fri, 03 Oct 2025 18:36:59 +0200
</span></span></span></code></pre></div></div></div>
<p>In the case above <code>diff</code> revealed that the timestamp in the changelog in the version uploaded to Debian is different from what was committed to git. This is not malicious, just a mistake by the maintainer who probably didn’t run <code>gbp tag</code> immediately after upload, but instead some <code>dch</code> command and ended up with having a different timestamps in the git compared to what was actually uploaded to Debian.</p>
<h2 id="creating-synthetic-debian-packaging-git-repositories"><a href="#creating-synthetic-debian-packaging-git-repositories" class="header-anchor"></a>Creating synthetic Debian packaging git repositories
</h2><p>If no Debian packaging git repository exists, or if it is lagging behind what was uploaded to Debian’s archive, one can use <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp-import-dscs.1.en.html" target="_blank" rel="noopener"
>git-buildpackage’s import-dscs feature</a> to create synthetic git commits based on the files downloaded by debsnap, ensuring the git contents fully matches what was uploaded to the archive. To import a single version there is <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp-import-dsc.1.en.html" target="_blank" rel="noopener"
>gbp import-dsc</a> (no ’s’ at the end), of which an example invocation would be:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-14"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-14" style="display:none;">$ gbp import-dsc --verbose ../source-xz-utils/xz-utils_5.8.1-2.dsc
Version '5.8.1-2' imported under '/home/otto/debian/xz-utils-2025-09-29'</code><pre><code>$ gbp import-dsc --verbose ../source-xz-utils/xz-utils_5.8.1-2.dsc
Version '5.8.1-2' imported under '/home/otto/debian/xz-utils-2025-09-29'</code></pre></div>
<p>Example commit history from a repository with commits added with <code>gbp import-dsc</code>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-15"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-15" style="display:none;">$ git log --graph --oneline
* 86aed07b (HEAD -> debian/unstable, tag: debian/5.8.1-2, origin/debian/unstable) Import Debian changes 5.8.1-2
* f111d93b (tag: debian/5.8.1-1.1) Import Debian changes 5.8.1-1.1
* 1106e19b (tag: debian/5.8.1-1) Import Debian changes 5.8.1-1
|\
| * 08edbe38 (tag: upstream/5.8.1, origin/upstream/v5.8, upstream/v5.8) Import Upstream version 5.8.1
| |\
| | * a522a226 (tag: v5.8.1) Bump version and soname for 5.8.1
| | * 1c462c2a Add NEWS for 5.8.1
| | * 513cabcf Tests: Call lzma_code() in smaller chunks in fuzz_common.h</code><pre><code>$ git log --graph --oneline
* 86aed07b (HEAD -> debian/unstable, tag: debian/5.8.1-2, origin/debian/unstable) Import Debian changes 5.8.1-2
* f111d93b (tag: debian/5.8.1-1.1) Import Debian changes 5.8.1-1.1
* 1106e19b (tag: debian/5.8.1-1) Import Debian changes 5.8.1-1
|\
| * 08edbe38 (tag: upstream/5.8.1, origin/upstream/v5.8, upstream/v5.8) Import Upstream version 5.8.1
| |\
| | * a522a226 (tag: v5.8.1) Bump version and soname for 5.8.1
| | * 1c462c2a Add NEWS for 5.8.1
| | * 513cabcf Tests: Call lzma_code() in smaller chunks in fuzz_common.h</code></pre></div>
<p>An online example repository with only a few missing uploads added using <code>gbp import-dsc</code> can be viewed at <a class="link" href="https://salsa.debian.org/otto/xz-utils-2025-09-29/-/network/debian%2Funstable" target="_blank" rel="noopener"
>salsa.debian.org/otto/xz-utils-2025-09-29/-/network/debian%2Funstable</a></p>
<p>An example repository that was <strong>fully crafted</strong> using <code>gbp import-dscs</code> can be viewed at <a class="link" href="https://salsa.debian.org/otto/xz-utils-gbp-import-dscs-debsnap-generated/-/network/debian%2Flatest" target="_blank" rel="noopener"
>salsa.debian.org/otto/xz-utils-gbp-import-dscs-debsnap-generated/-/network/debian%2Flatest</a>.</p>
<p>There exists also <a class="link" href="https://manpages.debian.org/unstable/dgit/dgit.1.en.html" target="_blank" rel="noopener"
>dgit</a>, which in a similar way creates a synthetic git history to allow viewing the Debian archive contents via git tools. However, its focus is on producing new package versions, so fetching a package with dgit that has not had the history recorded in dgit earlier will only show the latest versions:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-16"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-16" style="display:none;">$ dgit clone xz-utils
canonical suite name for unstable is sid
starting new git history
last upload to archive: NO git hash
downloading https://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1.orig.tar.xz...
downloading https://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1.orig.tar.xz.asc...
downloading https://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1-2.debian.tar.xz...
dpkg-source: info: extracting xz-utils in unpacked
dpkg-source: info: unpacking xz-utils_5.8.1.orig.tar.xz
dpkg-source: info: unpacking xz-utils_5.8.1-2.debian.tar.xz
synthesised git commit from .dsc 5.8.1-2
HEAD is now at f9bcaf7 xz-utils (5.8.1-2) unstable; urgency=medium
dgit ok: ready for work in xz-utils
$ dgit/sid ± git log --graph --oneline
* f9bcaf7 xz-utils (5.8.1-2) unstable; urgency=medium 9 days ago (HEAD -> dgit/sid, dgit/dgit/sid)
|\
| * 11d3a62 Import xz-utils_5.8.1-2.debian.tar.xz 9 days ago
* 15dcd95 Import xz-utils_5.8.1.orig.tar.xz 6 months ago</code><pre><code>$ dgit clone xz-utils
canonical suite name for unstable is sid
starting new git history
last upload to archive: NO git hash
downloading https://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1.orig.tar.xz...
downloading https://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1.orig.tar.xz.asc...
downloading https://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1-2.debian.tar.xz...
dpkg-source: info: extracting xz-utils in unpacked
dpkg-source: info: unpacking xz-utils_5.8.1.orig.tar.xz
dpkg-source: info: unpacking xz-utils_5.8.1-2.debian.tar.xz
synthesised git commit from .dsc 5.8.1-2
HEAD is now at f9bcaf7 xz-utils (5.8.1-2) unstable; urgency=medium
dgit ok: ready for work in xz-utils
$ dgit/sid ± git log --graph --oneline
* f9bcaf7 xz-utils (5.8.1-2) unstable; urgency=medium 9 days ago (HEAD -> dgit/sid, dgit/dgit/sid)
|\
| * 11d3a62 Import xz-utils_5.8.1-2.debian.tar.xz 9 days ago
* 15dcd95 Import xz-utils_5.8.1.orig.tar.xz 6 months ago</code></pre></div>
<p>Unlike git-buildpackage managed git repositories, the dgit managed repositories cannot incorporate the upstream git history and are thus less useful for auditing the full software supply-chain in git.</p>
<h2 id="comparing-upstream-source-packages-to-git-contents"><a href="#comparing-upstream-source-packages-to-git-contents" class="header-anchor"></a>Comparing upstream source packages to git contents
</h2><p>Equally important to the note in the beginning of the previous section, one must also keep in mind that the <strong>upstream</strong> release source packages, often called <strong>release tarballs, are not guaranteed to have the exact same contents as the upstream git repository</strong>. Projects might strip out test data or extra development files from their release tarballs to avoid shipping unnecessary files to users, or projects might add documentation files or versioning information into the tarball that isn’t stored in git. While a small minority, there are also upstreams that don’t use git at all, so the plain files in a <strong>release tarball is still the lowest common denominator</strong> for all open source software projects, and exporting and importing source code needs to interface with it.</p>
<p>In the case of XZ, the release tarball has additional version info and also a sizeable amount of pregenerated compiler configuration files. Detecting and comparing differences between git contents and tarballs can of course be done manually by running diff across an unpacked tarball and a checked out git repository. If using git-buildpackage, the difference between the git contents and tarball contents can be made visible directly in the import commit.</p>
<p>In this XZ example, consider this git history:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-17"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-17" style="display:none;">* b1cad34b Prepare 5.8.1-1
* a8646015 Import 5.8.1
* 2808ec2d Update upstream source from tag 'upstream/5.8.1'
|\
| * fa1e8796 (debian/upstream/v5.8, upstream/v5.8) New upstream version 5.8.1
| * a522a226 (tag: v5.8.1) Bump version and soname for 5.8.1
| * 1c462c2a Add NEWS for 5.8.1</code><pre><code>* b1cad34b Prepare 5.8.1-1
* a8646015 Import 5.8.1
* 2808ec2d Update upstream source from tag 'upstream/5.8.1'
|\
| * fa1e8796 (debian/upstream/v5.8, upstream/v5.8) New upstream version 5.8.1
| * a522a226 (tag: v5.8.1) Bump version and soname for 5.8.1
| * 1c462c2a Add NEWS for 5.8.1</code></pre></div>
<p>The commit <em>a522a226</em> was the upstream release commit, which upstream also tagged <em>v5.8.1</em>. The merge commit <em>2808ec2d</em> applied the new upstream import branch contents on the Debian branch. Between these is the special commit <em>fa1e8796 New upstream version 5.8.1</em> tagged <em>upstream/v5.8</em>. <strong>This commit and tag exists only in the Debian packaging repository</strong>, and they show what is the contents imported into Debian. This is <strong>generated automatically by git-buildpackage</strong> when running <code>git import-orig --uscan</code> for Debian packages with the <a class="link" href="https://salsa.debian.org/debian/dh-make/-/blob/master/lib/debian/gbp.conf.ex" target="_blank" rel="noopener"
>correct settings</a> in <code>debian/gbp.conf</code>. By viewing this commit one can see exactly how the upstream release tarball differs from the upstream git contents (if at all).</p>
<p>In the case of XZ, the difference is substantial, and shown below in full as it is very interesting:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-18"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-18" style="display:none;">$ git show --stat fa1e8796
commit fa1e8796dabd91a0f667b9e90f9841825225413a
(debian/upstream/v5.8, upstream/v5.8)
Author: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Date: Thu Apr 3 22:58:39 2025 +0200
New upstream version 5.8.1
.codespellrc | 30 -
.gitattributes | 8 -
.github/workflows/ci.yml | 163 -
.github/workflows/freebsd.yml | 32 -
.github/workflows/netbsd.yml | 32 -
.github/workflows/openbsd.yml | 35 -
.github/workflows/solaris.yml | 32 -
.github/workflows/windows-ci.yml | 124 -
.gitignore | 113 -
ABOUT-NLS | 1 +
ChangeLog | 17392 +++++++++++++++++++++
Makefile.in | 1097 +++++++
aclocal.m4 | 1353 ++++++++
build-aux/ci_build.bash | 286 --
build-aux/compile | 351 ++
build-aux/config.guess | 1815 ++++++++++
build-aux/config.rpath | 751 +++++
build-aux/config.sub | 2354 +++++++++++++
build-aux/depcomp | 792 +++++
build-aux/install-sh | 541 +++
build-aux/ltmain.sh | 11524 ++++++++++++++++++++++
build-aux/missing | 236 ++
build-aux/test-driver | 160 +
config.h.in | 634 ++++
configure | 26434 ++++++++++++++++++++++
debug/Makefile.in | 756 +++++
doc/SHA256SUMS | 236 --
doc/man/txt/lzmainfo.txt | 36 +
doc/man/txt/xz.txt | 1708 ++++++++++
doc/man/txt/xzdec.txt | 76 +
doc/man/txt/xzdiff.txt | 39 +
doc/man/txt/xzgrep.txt | 70 +
doc/man/txt/xzless.txt | 36 +
doc/man/txt/xzmore.txt | 31 +
lib/Makefile.in | 623 ++++
m4/.gitignore | 40 -
m4/build-to-host.m4 | 274 ++
m4/gettext.m4 | 392 +++
m4/host-cpu-c-abi.m4 | 529 +++
m4/iconv.m4 | 324 ++
m4/intlmacosx.m4 | 71 +
m4/lib-ld.m4 | 170 +
m4/lib-link.m4 | 815 +++++
m4/lib-prefix.m4 | 334 ++
m4/libtool.m4 | 8488 +++++++++++++++++++++
m4/ltoptions.m4 | 467 +++
m4/ltsugar.m4 | 124 +
m4/ltversion.m4 | 24 +
m4/lt~obsolete.m4 | 99 +
m4/nls.m4 | 33 +
m4/po.m4 | 456 +++
m4/progtest.m4 | 92 +
po/.gitignore | 31 -
po/Makefile.in.in | 517 +++
po/Rules-quot | 66 +
po/boldquot.sed | 21 +
po/ca.gmo | Bin 0 -> 15587 bytes
po/cs.gmo | Bin 0 -> 7983 bytes
po/da.gmo | Bin 0 -> 9040 bytes
po/de.gmo | Bin 0 -> 29882 bytes
po/en@boldquot.header | 35 +
po/en@quot.header | 32 +
po/eo.gmo | Bin 0 -> 15060 bytes
po/es.gmo | Bin 0 -> 29228 bytes
po/fi.gmo | Bin 0 -> 28225 bytes
po/fr.gmo | Bin 0 -> 10232 bytes</code><pre><code>$ git show --stat fa1e8796
commit fa1e8796dabd91a0f667b9e90f9841825225413a
(debian/upstream/v5.8, upstream/v5.8)
Author: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Date: Thu Apr 3 22:58:39 2025 +0200
New upstream version 5.8.1
.codespellrc | 30 -
.gitattributes | 8 -
.github/workflows/ci.yml | 163 -
.github/workflows/freebsd.yml | 32 -
.github/workflows/netbsd.yml | 32 -
.github/workflows/openbsd.yml | 35 -
.github/workflows/solaris.yml | 32 -
.github/workflows/windows-ci.yml | 124 -
.gitignore | 113 -
ABOUT-NLS | 1 +
ChangeLog | 17392 +++++++++++++++++++++
Makefile.in | 1097 +++++++
aclocal.m4 | 1353 ++++++++
build-aux/ci_build.bash | 286 --
build-aux/compile | 351 ++
build-aux/config.guess | 1815 ++++++++++
build-aux/config.rpath | 751 +++++
build-aux/config.sub | 2354 +++++++++++++
build-aux/depcomp | 792 +++++
build-aux/install-sh | 541 +++
build-aux/ltmain.sh | 11524 ++++++++++++++++++++++
build-aux/missing | 236 ++
build-aux/test-driver | 160 +
config.h.in | 634 ++++
configure | 26434 ++++++++++++++++++++++
debug/Makefile.in | 756 +++++
doc/SHA256SUMS | 236 --
doc/man/txt/lzmainfo.txt | 36 +
doc/man/txt/xz.txt | 1708 ++++++++++
doc/man/txt/xzdec.txt | 76 +
doc/man/txt/xzdiff.txt | 39 +
doc/man/txt/xzgrep.txt | 70 +
doc/man/txt/xzless.txt | 36 +
doc/man/txt/xzmore.txt | 31 +
lib/Makefile.in | 623 ++++
m4/.gitignore | 40 -
m4/build-to-host.m4 | 274 ++
m4/gettext.m4 | 392 +++
m4/host-cpu-c-abi.m4 | 529 +++
m4/iconv.m4 | 324 ++
m4/intlmacosx.m4 | 71 +
m4/lib-ld.m4 | 170 +
m4/lib-link.m4 | 815 +++++
m4/lib-prefix.m4 | 334 ++
m4/libtool.m4 | 8488 +++++++++++++++++++++
m4/ltoptions.m4 | 467 +++
m4/ltsugar.m4 | 124 +
m4/ltversion.m4 | 24 +
m4/lt~obsolete.m4 | 99 +
m4/nls.m4 | 33 +
m4/po.m4 | 456 +++
m4/progtest.m4 | 92 +
po/.gitignore | 31 -
po/Makefile.in.in | 517 +++
po/Rules-quot | 66 +
po/boldquot.sed | 21 +
po/ca.gmo | Bin 0 -> 15587 bytes
po/cs.gmo | Bin 0 -> 7983 bytes
po/da.gmo | Bin 0 -> 9040 bytes
po/de.gmo | Bin 0 -> 29882 bytes
po/en@boldquot.header | 35 +
po/en@quot.header | 32 +
po/eo.gmo | Bin 0 -> 15060 bytes
po/es.gmo | Bin 0 -> 29228 bytes
po/fi.gmo | Bin 0 -> 28225 bytes
po/fr.gmo | Bin 0 -> 10232 bytes</code></pre></div>
<p>To be able to easily inspect exactly what changed in the release tarball compared to git release tag contents, the best tool for the job is <a class="link" href="https://meldmerge.org/" target="_blank" rel="noopener"
>Meld</a>, invoked via <code>git difftool --dir-diff fa1e8796^..fa1e8796</code>.</p>
<p><img src="https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-5.8.0-git-vs-tarball.gif"
width="1250"
height="776"
loading="lazy"
alt="Meld invoked by git difftool --dir-diff afba662b..fa1e8796 to show differences between git release tag and release tarball contents"
class="gallery-image"
data-flex-grow="161"
data-flex-basis="386px"
>
</p>
<p>To compare changes across the <strong>new and old upstream tarball</strong>, one would need to compare commits <em>afba662b New upstream version 5.8.0</em> and <em>fa1e8796 New upstream version 5.8.1</em> by running <code>git difftool --dir-diff afba662b..fa1e8796</code>.</p>
<p><img src="https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-5.6.4-tar-vs-5.8.0-tar.gif"
width="1250"
height="776"
loading="lazy"
alt="Meld invoked by git difftool --dir-diff afba662b..fa1e8796 to show differences between to upstream release tarball contents"
class="gallery-image"
data-flex-grow="161"
data-flex-basis="386px"
>
</p>
<p>With all the above tips you can now go and try to audit your own favorite package in Debian and see if it is identical with upstream, and if not, how it differs.</p>
<h2 id="should-the-xz-backdoor-have-been-detected-using-these-tools"><a href="#should-the-xz-backdoor-have-been-detected-using-these-tools" class="header-anchor"></a>Should the XZ backdoor have been detected using these tools?
</h2><p>The famous XZ Utils backdoor (<a class="link" href="https://tukaani.org/xz-backdoor/" target="_blank" rel="noopener"
>CVE-2024-3094</a>) consisted of two parts: the actual backdoor inside two binary blobs masqueraded as a test files (<code>tests/files/bad-3-corrupt_lzma2.xz</code>, <code>tests/files/good-large_compressed.lzma</code>), and a small modification in the build scripts (<code>m4/build-to-host.m4</code>) to extract the backdoor and plant it into the built binary. The build script was not tracked in version control, but generated with <a class="link" href="https://en.wikipedia.org/wiki/GNU_Autotools" target="_blank" rel="noopener"
>GNU Autotools</a> at release time and only shipped as additional files in the release tarball.</p>
<p>The entire reason for me to write this post was to ponder if a diligent engineer using git-buildpackage best practices could have reasonably spotted this while importing the new upstream release into Debian. <strong>The short answer is “no”.</strong> The malicious actor here clearly anticipated all the typical ways anyone might inspect both git commits, and release tarball contents, and masqueraded the changes very well and over a long timespan.</p>
<p><strong>First of all, XZ has for legitimate reasons for</strong> several carefully crafted <code>.xz</code> files as <strong>test data</strong> to help catch regressions in the decompression code path. The test files are shipped in the release so users can run the test suite and validate that the binary is built correctly and <a class="link" href="https://manpages.debian.org/unstable/xz-utils/xz.1.en.html" target="_blank" rel="noopener"
>xz</a> works properly. Debian famously runs massive amounts of testing in its <a class="link" href="https://ci.debian.net/" target="_blank" rel="noopener"
>CI and autopkgtest system</a> across tens of thousands of packages to uphold high quality despite frequent upgrades of the build toolchain and while supporting more CPU architectures than any other distro. Test data is useful and should stay.</p>
<p>When git-buildpackage is used correctly, the upstream commits are visible in the Debian packaging for easy review, but the <a class="link" href="https://salsa.debian.org/debian/xz-utils/-/commit/cf44e4b" target="_blank" rel="noopener"
>commit cf44e4b</a> that introduced the test files does not deviate enough from regular sloppy coding practices to really stand out. It is <a class="link" href="https://optimizedbyotto.com/post/git-commit-message-examples/" >unfortunately very common for git commit to lack a message body</a> explaining why the change was done, and to not be properly atomic with test code and test data together in the same commit, and for commits to be pushed directly to mainline without using code reviews (the commit was not part of any PR in this case). <em>Only another <strong>upstream</strong> developer</em> could have spotted that this change is not on par to what the project expects, and that the test code was never added, only test data, and thus that this commit was not just a sloppy one but potentially malicious.</p>
<p><strong>Secondly, the fact that a new Autotools file appeared</strong> (<code>m4/build-to-host.m4</code>) in the XZ Utils 5.6.0 is not suspicious. <strong>This is perfectly normal for Autotools.</strong> In fact, starting from XZ Utils version 5.8.1 it is now shipping a <code>m4/build-to-host.m4</code> file that it actually uses now.</p>
<p>Spotting that there is anything fishy is practically impossible by simply reading the code, as Autotools files are full custom <a class="link" href="https://en.wikipedia.org/wiki/M4_%28computer_language%29" target="_blank" rel="noopener"
>m4 syntax</a> interwoven with shell script, and there are plenty of backticks (<code>`</code>) that spawn subshells and <code>evals</code> that execute variable contents further, which is <em>just normal for Autotools</em>. <a class="link" href="https://research.swtch.com/xz-script" target="_blank" rel="noopener"
>Russ Cox’s XZ post explains</a> how exactly the Autotools code fetched the actual backdoor from the test files and injected it into the build.</p>
<p><img src="https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-meld.png"
width="1646"
height="881"
srcset="https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-meld_hu7357668744803736045.png 480w, https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-meld_hu14124088801267651905.png 1024w, https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/xz-utils-meld.png 1646w"
loading="lazy"
alt="Inspecting the m4/build-to-host.m4 changes in Meld launched via git difftool"
class="gallery-image"
data-flex-grow="186"
data-flex-basis="448px"
>
</p>
<p><strong>There is only one tiny thing that maybe a very experienced Autotools user could potentially have noticed:</strong> the <code>serial 30</code> in the version header is way too high. In theory one could also have noticed this Autotools file deviates from what other packages in Debian ship with the same filename, such as e.g. the serial <a class="link" href="https://sources.debian.org/sha256/?checksum=331a4432631bec49fb8a1a65b08d5ec469573fe6bf8dc0d6dfed57f1e374f085&page=1" target="_blank" rel="noopener"
>3</a>, serial <a class="link" href="https://sources.debian.org/sha256/?checksum=8307c1f05ec9d5b8da3e3e5af369ed91b9209e517c632d8a95c7c2aa32650ec5&page=1" target="_blank" rel="noopener"
>5a</a> or <a class="link" href="https://sources.debian.org/sha256/?checksum=b2261ee50f116d42c796b79593480a4368327d912bc63ef6cba145229358abff&page=1" target="_blank" rel="noopener"
>5b</a> versions. That would however require and an insane amount extra checking work, and is not something we should plan to start doing. A much simpler solution would be to simply strongly recommend all open source projects to stop using Autotools to eventually get rid of it entirely.</p>
<h3 id="not-detectable-with-reasonable-effort"><a href="#not-detectable-with-reasonable-effort" class="header-anchor"></a>Not detectable with reasonable effort
</h3><p>While planting backdoors is evil, it is hard not to feel some <em>respect for the level of skill and dedication of the people behind this</em>. I’ve been involved in a bunch of security breach investigations during my IT career, and never have I seen anything this well executed.</p>
<p>If it hadn’t slowed down SSH by ~500 milliseconds and been discovered due to that, it would most likely have stayed undetected for months or years. Hiding backdoors in closed source software is relatively trivial, but hiding backdoors in plain sight in a popular open source project requires some unusual amount of expertise and creativity as shown above.</p>
<h2 id="is-the-software-supply-chain-in-debian-easy-to-audit"><a href="#is-the-software-supply-chain-in-debian-easy-to-audit" class="header-anchor"></a>Is the software supply-chain in Debian easy to audit?
</h2><p>While maintaining a Debian package source using git-buildpackage can make the package history a lot easier to inspect, most packages have incomplete configurations in their <code>debian/gbp.conf</code>, and thus their package development histories are not always correctly constructed or uniform and easy to compare. The Debian Policy does not mandate git usage at all, and there are many important packages that are not using git at all. Additionally the Debian Policy also allows for non-maintainers to upload new versions to Debian without committing anything in git even for packages where the original maintainer wanted to use git. Uploads that “bypass git” unfortunately happen surprisingly often.</p>
<p>Because of the situation, I am afraid that we could have multiple similar backdoors lurking that simply haven’t been detected yet. More audits, that hopefully also get published openly, would be welcome! More people auditing the contents of the Debian archives would probably also help surface what tools and policies Debian might be missing to make the work easier, and thus help improve the security of Debian’s users, and improve trust in Debian.</p>
<h2 id="is-debian-currently-missing-some-software-that-could-help-detect-similar-things"><a href="#is-debian-currently-missing-some-software-that-could-help-detect-similar-things" class="header-anchor"></a>Is Debian currently missing some software that could help detect similar things?
</h2><p>To my knowledge there is currently no system in place as part of Debian’s QA or security infrastructure to verify that the upstream source packages in Debian are actually from upstream. I’ve come across a lot of packages where the <code>debian/watch</code> or other configs are incorrect and even cases where maintainers have manually created upstream tarballs as it was easier than configuring automation to work. It is obvious that for those packages the source tarball now in Debian is not at all the same as upstream. I am not aware of any malicious cases though (if I was, I would report them of course).</p>
<p>I am also aware of packages in the Debian repository that are misconfigured to be of type <code>1.0 (native)</code> packages, mixing the upstream files and debian/ contents and having patches applied, while they actually should be configured as <code>3.0 (quilt)</code>, and not hide what is the true upstream sources. Debian should extend the QA tools to scan for such things. If I find a sponsor, I might build it myself as my next major contribution to Debian.</p>
<p>In addition to better tooling for finding mismatches in the source code, Debian could also have better tooling for tracking in built binaries what their source files were, but solutions like <a class="link" href="https://github.com/Fraunhofer-AISEC/supply-graph" target="_blank" rel="noopener"
>Fraunhofer-AISEC’s supply-graph</a> or <a class="link" href="https://github.com/sony/esstra" target="_blank" rel="noopener"
>Sony’s ESSTRA</a> are not practical yet. <a class="link" href="https://luj.fr/blog/how-nixos-could-have-detected-xz.html" target="_blank" rel="noopener"
>Julien Malka’s post</a> about NixOS discusses the role of reproducible builds, which may help in some cases across all distros.</p>
<h2 id="or-is-debian-missing-some-policies-or-practices-to-mitigate-this"><a href="#or-is-debian-missing-some-policies-or-practices-to-mitigate-this" class="header-anchor"></a>Or, is Debian missing some policies or practices to mitigate this?
</h2><p>Perhaps more importantly than more security scanning, the Debian Developer community should switch the general mindset from <em>“anyone is free to do anything”</em> to valuing having <strong>more shared workflows</strong>. The ability to audit anything is severely hampered by the fact that there are so many ways to do the same thing, and distinguishing what is a “normal” deviation from a malicious deviation is too hard, as the “normal” can basically be almost anything.</p>
<p>Also, as there is no documented and recommended “default” workflow, both those who are old and new to Debian packaging might never learn any one optimal workflow, and end up doing many steps in the packaging process in a way that kind of works, but is actually wrong or unnecessary, causing process deviations that look malicious, but turn out to just be a result of not fully understanding what would have been the right way to do something.</p>
<p>In the long run, once individual developers’ workflows are more aligned, doing code reviews will become a lot easier and smoother as the excess noise of workflow differences diminishes and reviews will feel much more productive to all participants. Debian fostering a culture of code reviews would allow us to slowly move from the current practice of mainly solo packaging work towards true collaboration forming around those code reviews.</p>
<p>I have been promoting increased use of Merge Requests in Debian already for some time, for example by proposing <a class="link" href="https://dep-team.pages.debian.net/deps/dep18/" target="_blank" rel="noopener"
>DEP-18: Encourage Continuous Integration and Merge Request based Collaboration for Debian packages</a>. If you are involved in Debian development, please give a thumbs up in <a class="link" href="https://salsa.debian.org/dep-team/deps/-/merge_requests/21" target="_blank" rel="noopener"
>dep-team/deps!21</a> if you want me to continue promoting it.</p>
<h2 id="can-we-trust-open-source-software"><a href="#can-we-trust-open-source-software" class="header-anchor"></a>Can we trust open source software?
</h2><p><strong>Yes — and I would argue that we can <em>only</em> trust open source software.</strong> There is no way to audit closed source software, and anyone using e.g. Windows or MacOS just have to trust the vendor’s word when they say they have no intentional or accidental backdoors in their software. Or, when the news gets out that the systems of a closed source vendor was compromised, <a class="link" href="https://cyberpress.org/crowdstrike-npm-packages-compromised/" target="_blank" rel="noopener"
>like Crowdstrike some weeks ago</a>, we can’t audit anything, and time after time we simply need to take their word when they say they have properly cleaned up their code base.</p>
<p>In theory, a vendor could give some kind of contractual or financial guarantee to its customer that there are no preventable security issues, but in practice that never happens. I am not aware of a single case of e.g. Microsoft or Oracle would have paid damages to their customers after a security flaw was found in their software. In theory you could also pay a vendor more to have them focus more effort in security, but since there is no way to verify what they did, or to get compensation when they didn’t, any increased fees are likely just pocketed as increased profit.</p>
<p>Open source is clearly better overall. You can, if you are an individual with the time and skills, audit every step in the supply-chain, or you could as an organization make investments in open source security improvements and actually verify what changes were made and how security improved.</p>
<p>If your organisation is using Debian (or derivatives, such as Ubuntu) and you are interested in sponsoring my work to improve Debian, please reach out.</p> Zero-configuration TLS and password management best practices in MariaDB 11.8 https://optimizedbyotto.com/post/zero-configuration-tls-mariadb-11.8/Sun, 14 Sep 2025 00:00:00 +0000 https://optimizedbyotto.com/post/zero-configuration-tls-mariadb-11.8/ <img src="https://optimizedbyotto.com/post/zero-configuration-tls-mariadb-11.8/featured-image.jpg" alt="Featured image of post Zero-configuration TLS and password management best practices in MariaDB 11.8" /><p>Locking down database access is probably the single most important thing for a system administrator or software developer to prevent their application from leaking its data. As MariaDB 11.8 is the first long-term supported version with a few new key security features, let’s recap what the most important things are every DBA should know about MariaDB in 2025.</p>
<p>Back in the old days, MySQL administrators had a habit of running the clumsy <code>mysql-secure-installation</code> script, but it has long been obsolete. A modern MariaDB database server is already secure by default and locked down out of the box, and no such extra scripts are needed. On the contrary, the database administrator is expected to open up access to MariaDB according to the specific needs of each server. Therefore, it is important that the DBA can <em>understand</em> and <em>correctly configure</em> three things:</p>
<ol>
<li>Separate application-specific users with granular permissions allowing only necessary access and no more.</li>
<li>Distributing and storing passwords and credentials securely</li>
<li>Ensuring all remote connections are properly encrypted</li>
</ol>
<p>For holistic security, one should also consider proper auditing, logging, backups, regular security updates and more, but in this post we will focus only on the above aspects related to securing database access.</p>
<h2 id="how-encrypting-database-connections-with-tls-differs-from-web-server-https"><a href="#how-encrypting-database-connections-with-tls-differs-from-web-server-https" class="header-anchor"></a>How encrypting database connections with TLS differs from web server HTTP(S)
</h2><p>Even though MariaDB (and other databases) use the same SSL/TLS protocol for encrypting remote connections as web servers and HTTPS, the way it is implemented is <strong>significantly different</strong>, and the different security assumptions are important for a database administrator to grasp.</p>
<p>Firstly, most HTTP requests to a web server are unauthenticated, meaning the web server serves public web pages and does not require users to log in. Traditionally, when a user logs in over a HTTP connection, the username and password were transmitted in plaintext as a HTTP POST request. Modern TLS, which was previously called SSL, does not change how HTTP works but simply encapsulates it. When using HTTPS, a web browser and a web server will start an encrypted TLS connection as the very first thing, and only once established, do they send HTTP requests and responses inside it. There are no passwords or other shared secrets needed to form the TLS connection. Instead, the web server relies on a trusted third party, a Certificate Authority (CA), to vet that the TLS certificate offered by the web server can be trusted by the web browser.</p>
<p>For a database server like MariaDB, the situation is quite different. All users need to <strong>first authenticate</strong> and log in to the server before getting being allowed to run any SQL and getting any data out of the server. The database server and client programs have built-in authentication methods, and passwords are not, and have never been, sent in plaintext. Over the years, MySQL and its successor, MariaDB, have had multiple password authentication methods: the original SHA-1-based hashing, later double SHA-1-based <em>mysql_native_password</em>, followed by <em>sha256_password</em> and <em>caching_sha256_password</em> in MySQL and <em>ed25519</em> in MariaDB. The <a class="link" href="https://mariadb.org/history-of-mysql-mariadb-authentication-protocols/" target="_blank" rel="noopener"
>MariaDB.org blog post by Sergei Golubchik</a> recaps the history of these well.</p>
<p>Even though most modern MariaDB installations should be using TLS to encrypt all remote connections in 2025, having the authentication method be as secure as possible still matters, because <strong>authentication is done before the TLS connection is fully established</strong>.</p>
<p>To further harden the authentication against man-in-the-middle attacks, a new password authentication method <a class="link" href="https://mariadb.com/docs/server/reference/plugins/authentication-plugins/authentication-plugin-parsec" target="_blank" rel="noopener"
>PARSEC was introduced in MariaDB 11.8</a>, which builds upon the previous ed25519 public-key-based verification (similar to how modern SSH does), and also combines key derivation with PBKDF2 with hash functions (SHA512,SHA256) and a high iteration count.</p>
<p>At first it may seem like a disadvantage to not wrap all connections in a TLS tunnel like HTTPS does, but actually not having the authentication done in a MitM resistant way regardless of the connection encryption status allows a clever extra capability that is now available in MariaDB: as the <strong>database server and client already have a shared secret</strong> that is being used by the server to authenticate the user, it can also be <strong>used by the client to validate the server’s TLS certificate</strong> and no third parties like CAs or root certificates are needed. MariaDB 11.8 was the first LTS version to ship with this capability for <a class="link" href="https://mariadb.org/mission-impossible-zero-configuration-ssl/" target="_blank" rel="noopener"
>zero-configuration TLS</a>.</p>
<p>Note that the zero-configuration TLS also works on older password authentication methods and does not require users to have PARSEC enabled. As PARSEC is not yet the default authentication method in MariaDB, it is recommended to enable it in installations that use zero-configuration TLS encryption to maximize the security of the TLS certificate validation.</p>
<h2 id="why-the-root-user-in-mariadb-has-no-password-and-how-it-makes-the-database-more-secure"><a href="#why-the-root-user-in-mariadb-has-no-password-and-how-it-makes-the-database-more-secure" class="header-anchor"></a>Why the ‘root’ user in MariaDB has no password and how it makes the database more secure
</h2><p>Relying on passwords for security is problematic, as there is always a risk that they could leak, and <strong>a malicious user could access the system using the leaked password</strong>. It is unfortunately far too common for database passwords to be stored in plaintext in configuration files that are accidentally committed into version control and published on GitHub and similar platforms. Every application or administrative password that exists should be tracked to ensure only people who need it know it, and rotated at regular intervals to ensure old employees etc won’t be able to use old passwords. This password management is complex and error-prone.</p>
<p>Replacing passwords with other authentication methods is always advisable when possible. On a database server, whoever installed the database by running e.g. <code>apt install mariadb-server</code>, and configured it with e.g. <code>nano /etc/mysql/mariadb.cnf</code>, already has full root access to the operating system, and asking them for a password to access the MariaDB database shell is moot, since they could circumvent any checks by directly accessing the files on the system anyway. Therefore, MariaDB, <a class="link" href="https://mariadb.org/authentication-in-mariadb-10-4/" target="_blank" rel="noopener"
>since version 10.4</a> stopped requiring the root user to enter a password when connecting locally, and instead checks using socket authentication whether the user is the operating-system root user or equivalent (e.g. running <code>sudo</code>). This is an elegant way to get rid of a password that was actually unnecessary to begin with. As there is no root password anymore, the <strong>risk of an external user accessing the database as root with a leaked password is fully eliminated</strong>.</p>
<p>Note that socket authentication only works for local connections on the same server. If you want to access a MariaDB server remotely as the <code>root</code> user, you would need to configure a password for it first. This is not generally recommended, as explained in the next section.</p>
<h2 id="create-separate-database-users-for-normal-use-and-keep-root-for-administrative-use-only"><a href="#create-separate-database-users-for-normal-use-and-keep-root-for-administrative-use-only" class="header-anchor"></a>Create separate database users for normal use and keep ‘root’ for administrative use only
</h2><p>Out of the box a MariaDB installation is already secure by default, and only the local <code>root</code> user can connect to it. This account is intended for administrative use only, and for regular daily use you should create separate database users with access limited to the databases they need and the permissions required.</p>
<p>The most typical commands needed to create a new database for an app and a user the app can use to connect to the database would be the following:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">sql</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">CREATE DATABASE app_db;
CREATE USER 'app_user'@'%' IDENTIFIED BY 'your_secure_password';
GRANT ALL PRIVILEGES ON app_db.* TO 'app_user'@'%';
FLUSH PRIVILEGES;</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#66d9ef">CREATE</span> <span style="color:#66d9ef">DATABASE</span> app_db;
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">CREATE</span> <span style="color:#66d9ef">USER</span> <span style="color:#e6db74">'app_user'</span><span style="color:#f92672">@</span><span style="color:#e6db74">'%'</span> IDENTIFIED <span style="color:#66d9ef">BY</span> <span style="color:#e6db74">'your_secure_password'</span>;
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">GRANT</span> <span style="color:#66d9ef">ALL</span> <span style="color:#66d9ef">PRIVILEGES</span> <span style="color:#66d9ef">ON</span> app_db.<span style="color:#f92672">*</span> <span style="color:#66d9ef">TO</span> <span style="color:#e6db74">'app_user'</span><span style="color:#f92672">@</span><span style="color:#e6db74">'%'</span>;
</span></span><span style="display:flex;"><span>FLUSH <span style="color:#66d9ef">PRIVILEGES</span>;</span></span></code></pre></div></div></div>
<p>Alternatively, if you want to use the parsec authentication method, run this to create the user:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">sql</span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">CREATE OR REPLACE USER 'app_user'@'%'
IDENTIFIED VIA parsec
USING PASSWORD('your_secure_password');</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#66d9ef">CREATE</span> <span style="color:#66d9ef">OR</span> <span style="color:#66d9ef">REPLACE</span> <span style="color:#66d9ef">USER</span> <span style="color:#e6db74">'app_user'</span><span style="color:#f92672">@</span><span style="color:#e6db74">'%'</span>
</span></span><span style="display:flex;"><span> IDENTIFIED VIA parsec
</span></span><span style="display:flex;"><span> <span style="color:#66d9ef">USING</span> PASSWORD(<span style="color:#e6db74">'your_secure_password'</span>);</span></span></code></pre></div></div></div>
<p>Note that the plugin auth_parsec is not enabled by default. If you see the error message <code>ERROR 1524 (HY000): Plugin 'parsec' is not loaded</code> fix this by running <code>INSTALL SONAME 'auth_parsec';</code>.</p>
<p>In the <code>CREATE USER</code> statements, the <code>@'%'</code> means that the user is allowed to connect from any host. This needs to be defined, as MariaDB always checks permissions based on both the username and the remote IP address or hostname of the user, combined with the authentication method. Note that it is possible to have multiple <code>user@remote</code> combinations, and they can have different authentication methods. A user could, for example, be allowed to log in locally using the socket authentication and over the network using a password.</p>
<p>If you are running a custom application and you know exactly what permissions are sufficient for the database users, replace the <code>ALL PRIVILEGES</code> with a subset of <a class="link" href="https://mariadb.com/docs/server/reference/sql-statements/account-management-sql-statements/grant#privilege-levels" target="_blank" rel="noopener"
>privileges listed in the MariaDB documentation</a>.</p>
<p>For new permissions to take effect, restart the database or run <code>FLUSH PRIVILEGES</code>.</p>
<h2 id="allow-mariadb-to-accept-remote-connections-and-enforce-tls"><a href="#allow-mariadb-to-accept-remote-connections-and-enforce-tls" class="header-anchor"></a>Allow MariaDB to accept remote connections and enforce TLS
</h2><p>Using the above <code>'app_user'@'%'</code> is not enough on its own to allow remote connections to MariaDB. The MariaDB server also <strong>needs to be configured to listen on a network interface</strong> to accept remote connections. As MariaDB is secure by default, it only accepts connections from <code>localhost</code> until the administrator updates its configuration. On a typical Debian/Ubuntu system, the recommended way is to drop a new custom config in e.g. <code>/etc/mysql/mariadb.conf.d/99-server-customizations.cnf</code>, with the contents:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">[mariadbd]
# Listen for connections from anywhere
bind-address = 0.0.0.0
# Only allow TLS encrypted connections
require-secure-transport = on</code><pre><code>[mariadbd]
# Listen for connections from anywhere
bind-address = 0.0.0.0
# Only allow TLS encrypted connections
require-secure-transport = on</code></pre></div>
<p>For settings to take effect, restart the server with <code>systemctl restart mariadb</code>. After this, the server will accept connections on any network interface. If the system is using a firewall, the port 3306 would additionally need to be allow-listed.</p>
<p>To confirm that the settings took effect, run e.g. <code>mariadb -e "SHOW VARIABLES LIKE 'bind_address';"</code> , which should now show <code>0.0.0.0</code>.</p>
<p>When allowing remote connections, it is important to also always define <code>require-secure-transport = on</code> to enforce that only TLS-encrypted connections are allowed. If the server is running MariaDB 11.8 and the clients are also MariaDB 11.8 or newer, <strong>no additional configuration is needed</strong> thanks to MariaDB automatically providing TLS certificates and appropriate certificate validation in recent versions.</p>
<p>On older long-term-supported versions of the MariaDB server one would have had to manually create the certificates and configure the <code>ssl_key</code>, <code>ssl_cert</code> and <code>ssl_ca</code> values on the server, and distribute the certificate to the clients as well, which was cumbersome, so good it is not required anymore. In MariaDB 11.8 the only additional related config that might still be worth setting is <code>tls_version = TLSv1.3</code> to ensure only the latest TLS protocol version is used.</p>
<p>Finally, test connections to ensure they work and to confirm that TLS is used by running e.g.:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">mariadb --user=app_user --password=your_secure_password \
--host=192.168.1.66 -e '\s'</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mariadb --user<span style="color:#f92672">=</span>app_user --password<span style="color:#f92672">=</span>your_secure_password <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --host<span style="color:#f92672">=</span>192.168.1.66 -e <span style="color:#e6db74">'\s'</span></span></span></code></pre></div></div></div>
<p>The result should show something along:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">--------------
mariadb from 11.8.3-MariaDB, client 15.2 for debian-linux-gnu (x86_64)
...
Current user: app_user@192.168.1.66
SSL: Cipher in use is TLS_AES_256_GCM_SHA384, cert is OK
...</code><pre><code>--------------
mariadb from 11.8.3-MariaDB, client 15.2 for debian-linux-gnu (x86_64)
...
Current user: app_user@192.168.1.66
SSL: Cipher in use is TLS_AES_256_GCM_SHA384, cert is OK
...</code></pre></div>
<p>If running a Debian/Ubuntu system, see the bundled README with <code>zcat /usr/share/doc/mariadb-server/README.Debian.gz</code> to read more configuration tips.</p>
<h2 id="should-tls-encryption-be-used-also-on-internal-networks"><a href="#should-tls-encryption-be-used-also-on-internal-networks" class="header-anchor"></a>Should TLS encryption be used also on internal networks?
</h2><p>If a database server and app are running on the same private network, the chances that the connection gets eavesdropped on or man-in-the-middle attacked by a malicious user are low. However, it is not zero, and if it happens, it can be difficult to detect or prove that it didn’t happen. The benefit of using end-to-end encryption is that both the database server and the client can validate the certificates and keys used, log it, and later have logs audited to prove that connections were indeed encrypted and show how they were encrypted.</p>
<p>If all the computers on an internal network already have centralized user account management and centralized log collection that includes all database sessions, reusing existing SSH connections, SOCKS proxies, dedicated HTTPS tunnels, point-to-point VPNs, or similar solutions might also be a practical option. Note that the zero-configuration TLS only works with password validation methods. This means that systems configured to use PAM or Kerberos/GSSAPI can’t use it, but again those systems are typically part of a centrally configured network anyway and are likely to have certificate authorities and key distribution or network encryption facilities already set up.</p>
<p>In a typical software app stack however, the simplest solution is often the best and I recommend DBAs use the end-to-end TLS encryption in MariaDB 11.8 in most cases.</p>
<p>Hopefully with these tips you can enjoy having your MariaDB deployments both simpler and more secure than before!</p> Managing procrastination and distractions https://optimizedbyotto.com/post/procrastination-and-distractions/Sun, 31 Aug 2025 00:00:00 +0000 https://optimizedbyotto.com/post/procrastination-and-distractions/ <img src="https://optimizedbyotto.com/post/procrastination-and-distractions/featured-image.jpg" alt="Featured image of post Managing procrastination and distractions" /><p>I’ve noticed that procrastination and inability to be consistently productive at work have become quite common in recent years. This is clearly visible in younger people who have grown up with an endless stream of entertainment literally at their fingertips, on their mobile phone. It is, however, a trap one can escape from with a little bit of help.</p>
<p><a class="link" href="https://en.wikipedia.org/wiki/Procrastination" target="_blank" rel="noopener"
>Procrastination</a> is natural — they say humans are lazy by nature after all. Probably all of us have had moments when we choose to postpone a task we know we should be working on, and instead spend our time doing secondary tasks (valorisation). A classic example is cleaning your apartment when you should be preparing for an exam. Some may procrastinate by not doing any work at all, and just watching YouTube videos or the like. To some people, typically those who are in their 20s and early in their career, procrastination can be a big challenge and finding the discipline to stick to planned work may need intentional extra effort, and perhaps even external help.</p>
<p>During my 20+ year career in software development I’ve been blessed to work with engineers of various backgrounds and each with their unique set of strengths. I have also helped many grow in various areas and overcome challenges, such as lack of intrinsic motivation and managing procrastination, and some might be able to get it in check with some simple advice.</p>
<h2 id="distance-yourself-from-the-digital-distractions"><a href="#distance-yourself-from-the-digital-distractions" class="header-anchor"></a>Distance yourself from the digital distractions
</h2><p>The key to avoiding distractions and procrastination is to <strong>make it inconvenient enough that you rarely do it</strong>. If continuing to do work is easier than switching to procrastination, work is more likely to continue.</p>
<p>Tips to minimize digital distractions, listed in order of importance:</p>
<ol>
<li><strong>Put your phone away.</strong> Just like when you go to a movie and turn off your phone for two hours, you can put the phone away completely when starting to work. Put the phone in a different room to ensure there is enough physical distance between you and the distraction, so it is impossible for you to just take a “quick peek”.</li>
<li><strong>Turn off notifications from apps.</strong> Don’t let the apps call you like sirens luring <a class="link" href="https://en.wikipedia.org/wiki/Odysseus" target="_blank" rel="noopener"
>Odysseus</a>. You don’t need to have all the notifications. You will see what the apps have when you eventually open them at a time you choose to use them.</li>
<li><strong>Remove or disable social media apps,</strong> games and the like from your phone and your computer. You can install them back when you have vacation. You can probably live without them for some time. If you can’t remove them, explore your phone’s screen time restriction features to limit your own access to apps that most often waste your time. These features are sometimes listed in the phone settings under “digital health”.</li>
<li><strong>Have a separate work computer and work phone.</strong> Having dedicated ones just for work that are void of all unnecessary temptations helps keep distance from the devices that could derail your focus.</li>
<li><strong>Listen to music.</strong> If you feel your brain needs a dose of dopamine to get you going, listening to music helps satisfy your brain’s cravings while still being able to simultaneously keep working.</li>
</ol>
<p>Doing a full digital detox is probably not practical, or not sustainable for an extended time. One needs apps to stay in touch with friends and family, and staying current in software development probably requires spending some time reading news online and such. However the tips above can help contain the distractions and minimize the spontaneous attention the distractions get.</p>
<p>Some of the distractions may ironically be from the work itself, for example Slack notifications or new email notifications. <strong>I recommend turning them off for a couple of hours every day to have some distraction free time.</strong> It should be enough to check work mail a couple times a day. Checking them every hour probably does not add much overall value for the company unless your work is in sales or support where the main task itself is responding to emails.</p>
<h2 id="distraction-free-work-environment"><a href="#distraction-free-work-environment" class="header-anchor"></a>Distraction free work environment
</h2><p>Following the same principle of distancing yourself from distractions, try to use a <strong>dedicated physical space for working</strong>. If you don’t have a spare room to dedicate to work, use a neighborhood café or sign up for a local co-working space or start commuting to the company office to find a space to be focused on work in.</p>
<h2 id="break-down-tasks-into-smaller-steps"><a href="#break-down-tasks-into-smaller-steps" class="header-anchor"></a>Break down tasks into smaller steps
</h2><p>Sometimes people postpone tasks because they feel intimidated by the size or complexity of a task. In particular, in software engineering, problems may be vague and appear large until one reaches the breakthrough that brings the vision of how to tackle it. Breaking down problems into smaller more manageable pieces has many advantages in software engineering. Not only can it help with task-avoidance, but it can also make the problem easier to analyze, suggest solutions and test them, and build a solid foundation to expand upon to ultimately later reach a full solution on the entire larger problem.</p>
<p>Working on big problems as a chain of smaller tasks may also offer more opportunities to celebrate success on completing each subtask and help getting in a suitable cadence of solving a single thing, taking a break and then tackling the next issue.</p>
<p>Breaking down a task into concrete steps may also help with getting more realistic time estimations. <strong>Sometimes procrastination isn’t real — someone could just be overly ambitious and feel bad about themselves for not doing an unrealistic amount of work.</strong></p>
<h2 id="intrinsic-motivation"><a href="#intrinsic-motivation" class="header-anchor"></a>Intrinsic motivation
</h2><p>Of course, you should follow your passion when possible. <strong>Strive to pick a career that you enjoy</strong>, and thus maximize the intrinsic motivation you experience. However, even a dream job is still a job. Nobody is ever paid to do whatever they want. Any work will include at least some tasks that feel like a chore or otherwise like something you would not do unless paid to.</p>
<p>Some would say that the definition of work itself is having to do things one would otherwise not do. You can only fully do whatever you want while on vacation or when you choose to not have a job at all. But if you have a job, you simply need to find the intrinsic motivation to do it.</p>
<p>Simply put, some tasks are just unpleasant or boring. Our natural inclination is to avoid them in favor of more enjoyable activities. For these situations we just <strong>have to</strong> find the discipline to force ourselves to do the tasks and figuratively speaking whip ourselves into being motivated to complete them.</p>
<h2 id="extrinsic-motivation"><a href="#extrinsic-motivation" class="header-anchor"></a>Extrinsic motivation
</h2><p>As the name implies, this is something people external to you need to provide, such as your employer or manager. If you have challenges in managing yourself and delivering results on a regular basis, somebody else needs to <strong>set goals and deadlines and keep you accountable</strong> for them. At the end of the day this means that eventually you will stop receiving salary or other payments unless you did your job.</p>
<p>Forcing people to do something isn’t nice, but eventually it needs to be done. It would not be fair for an employer to pay those who did their work the same salary as those who procrastinated and fell short on their tasks.</p>
<p>If you work solo, you can also simulate the extrinsic motivation by publicly announcing milestones and deadlines to build up pressure for yourself to meet them and avoid public humiliation. It is a well-studied and scientifically proven phenomenon that most university students procrastinate at the start of assignments, and truly start working on them only once the deadline is imminent.</p>
<h2 id="external-help-for-addictions"><a href="#external-help-for-addictions" class="header-anchor"></a>External help for addictions
</h2><p>If procrastination is mainly due to a <em>single distraction</em> that is always on your mind, it may be a sign of an addiction. For example, constantly thinking about a computer game or staying up late playing a computer game, to the extent that it seriously affects your ability to work, may be a symptom of an addiction, and getting out of it may be easier with external help.</p>
<h2 id="discipline-and-structure"><a href="#discipline-and-structure" class="header-anchor"></a>Discipline and structure
</h2><p>Most of the time procrastination is not due to an addiction, but simply due to lack of self-discipline and structure. The good thing is that those things <strong>can be learned</strong>. It is mostly a matter of getting into new habits, which most young software engineers pick up more or less automatically while working along the more senior ones.</p>
<p>Hopefully these tips can help you stay on track and ensure you do everything you are expected to do with clear focus, and on time!</p> Best Practices for Submitting and Reviewing Merge Requests in Debian https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/Mon, 18 Aug 2025 00:00:00 +0000 https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/ <img src="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/featured-image.jpg" alt="Featured image of post Best Practices for Submitting and Reviewing Merge Requests in Debian" /><p>Historically the primary way to contribute to Debian has been to email the Debian bug tracker with a code patch. Now that <a class="link" href="https://udd.debian.org/cgi-bin/dep14stats.cgi" target="_blank" rel="noopener"
>92% of all Debian source packages</a> are hosted at <a class="link" href="https://salsa.debian.org" target="_blank" rel="noopener"
>salsa.debian.org</a> — the GitLab instance of Debian — more and more developers are using Merge Requests, but not necessarily in the optimal way. In this post, I share what I’ve found the best practice to be, presented in the natural workflow from forking to merging.</p>
<h2 id="why-use-merge-requests"><a href="#why-use-merge-requests" class="header-anchor"></a>Why use Merge Requests?
</h2><p>Compared to sending patches back and forth in email, using a git forge to review code contributions brings several benefits:</p>
<ul>
<li>Contributors can see the latest version of the code immediately when the maintainer pushes it to git, without having to wait for an upload to Debian archives.</li>
<li>Contributors can fork the development version and easily base their patches on the correct version and help test that the software continues to function correctly at that specific version.</li>
<li>Both maintainer and other contributors can easily see what was already submitted and avoid doing duplicate work.</li>
<li>It is easy for anyone to comment on a Merge Request and participate in the review.</li>
<li>Integrating CI testing is easy in Merge Requests by activating Salsa CI.</li>
<li>Tracking the state of a Merge Request is much easier than browsing Debian bug reports tagged ‘patch’, and the cycle of submit → review → re-submit → re-review is much easier to manage in the dedicated Merge Request view compared to participants setting up their own email plugins for code reviews.</li>
<li>Merge Requests can have extra metadata, such as ‘Approved’, and the metadata often updates automatically, such as a Merge Request being closed automatically when the Git commit ID from it is pushed to the target branch.</li>
</ul>
<p>Keeping these benefits in mind will help ensure that the best practices make sense and are aligned with maximizing these benefits.</p>
<h2 id="finding-the-debian-packaging-source-repository-and-preparing-to-make-a-contribution"><a href="#finding-the-debian-packaging-source-repository-and-preparing-to-make-a-contribution" class="header-anchor"></a>Finding the Debian packaging source repository and preparing to make a contribution
</h2><p>Before sinking any effort into a package, start by checking its overall status at the excellent <a class="link" href="https://tracker.debian.org/" target="_blank" rel="noopener"
>Debian Package Tracker</a>. This provides a clear overview of the package’s general health in Debian, when it was last uploaded and by whom, and if there is anything special affecting the package right now. This page also has quick links to the Debian bug tracker of the package, the build status overview and more. Most importantly, in the <strong>General section, the VCS row</strong> links to the version control repository the package advertises. Before opening that page, note the version most recently uploaded to Debian. This is relevant because nothing in Debian currently enforces that the package in version control is actually the same as the latest uploaded to Debian.</p>
<p><img src="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-tracker-page-example.png"
width="954"
height="462"
srcset="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-tracker-page-example_hu14946622339313742689.png 480w, https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-tracker-page-example.png 954w"
loading="lazy"
alt="Packaging source code repository links at tracker.debian.org"
class="gallery-image"
data-flex-grow="206"
data-flex-basis="495px"
>
</p>
<p>Following the Browse link opens the Debian package source repository, which is usually a project page on Salsa. To contribute, <strong>start by clicking the Fork button</strong>, select your own personal namespace and, under <em>Branches</em> to include, pick <em>Only the default branch</em> to avoid including unnecessary temporary development branches.</p>
<p><img src="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-salsa-fork-project-example.png"
width="953"
height="918"
srcset="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-salsa-fork-project-example_hu10835376501784676101.png 480w, https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-salsa-fork-project-example.png 953w"
loading="lazy"
alt="View after pressing Fork"
class="gallery-image"
data-flex-grow="103"
data-flex-basis="249px"
>
</p>
<p>Once forking is complete, <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp-clone.1.en.html" target="_blank" rel="noopener"
>clone it with git-buildpackage</a>. For this example repository, the exact command would be <code>gbp clone --verbose git@salsa.debian.org:otto/glow.git</code>.</p>
<p>Next, add the original repository as a new remote and pull from it to make sure you have all relevant branches. Using the same fork as an example, the commands would be:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">git remote add go-team https://salsa.debian.org/go-team/packages/glow.git
gbp pull --verbose --track-missing go-team</code><pre><code>git remote add go-team https://salsa.debian.org/go-team/packages/glow.git
gbp pull --verbose --track-missing go-team</code></pre></div>
<p>The <code>gbp pull</code> command can be repeated whenever you want to make sure the main branches are in sync with the original repository. Finally, run <code>gitk --all &</code> to visually browse the Git history and note the various branches and their states in the two remotes. Note the style in comments and repository structure the project has and make sure your contributions follow the same conventions to maximize the chances of the maintainer accepting your contribution.</p>
<p><img src="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/gitk-in-debian-glow-project.png"
width="972"
height="773"
srcset="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/gitk-in-debian-glow-project_hu13373430507333023928.png 480w, https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/gitk-in-debian-glow-project.png 972w"
loading="lazy"
class="gallery-image"
data-flex-grow="125"
data-flex-basis="301px"
>
</p>
<p>It may also be good to build the source package to establish a baseline of the current state and what kind of binaries and <code>.deb</code> packages it produces. If using <a class="link" href="https://manpages.debian.org/unstable/debcraft/debcraft.1.en.html" target="_blank" rel="noopener"
>Debcraft</a>, one can simply run <code>debcraft build</code> in the Git repository.</p>
<h2 id="submitting-a-merge-request-for-a-debian-packaging-improvement"><a href="#submitting-a-merge-request-for-a-debian-packaging-improvement" class="header-anchor"></a>Submitting a Merge Request for a Debian packaging improvement
</h2><p>Always start by making a development branch by running <code>git checkout -b <branch name></code> to clearly separate your work from the main branch.</p>
<p>When making changes, remember to follow the conventions you already see in the package. It is also important to be aware of <a class="link" href="https://optimizedbyotto.com/post/good-git-commit/" >general guidelines on how to make good Git commits</a>.</p>
<p>If you are not able to immediately finish coding, it may be useful to <strong>publish the Merge Request as a draft</strong> so that the maintainer and others can see that you started working on something and what general direction your change is heading in.</p>
<p>If you don’t finish the Merge Request in one sitting and return to it another day, you should remember to pull the Debian branch from the original Debian repository in case it has received new commits. This can be done easily with these commands (assuming the same remote and branch names as in the example above):</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">git fetch go-team
git rebase -i go-team/debian/latest</code><pre><code>git fetch go-team
git rebase -i go-team/debian/latest</code></pre></div>
<p><strong>Frequent rebasing is a great habit</strong> to help keep the Git history linear, and restructuring and rewording your commits will make the Git history easier to follow and understand <strong>why</strong> the changes were made.</p>
<p>When pushing improved versions of your branch, use <code>git push --force</code>. While GitLab does allow squashing, I recommend against it. It is better that the submitter makes sure the final version is a neat and clean set of commits that the receiver can easily merge without having to do any rebasing or squashing themselves.</p>
<p>When ready, remove the <em>draft</em> status of the Merge Request and wait patiently for review. If the maintainer does not respond in several days, try sending an email to <code><source package name>@packages.debian.org</code>, which is the official way to contact maintainers. You could also post a comment on the MR and tag the last few committers in the same repository so that a notification email is triggered. As a last resort, submit a bug report to the Debian bug tracker to announce that a Merge Request is pending review. This leaves a permanent record for posterity (or the Debian QA team) of your contribution. However, most of the time simply posting the Merge Request in Salsa is enough; excessive communication might be perceived as spammy, and someone needs to remember to check that the bug report is closed.</p>
<h3 id="respect-the-review-feedback-respond-quickly-and-avoid-merge-requests-getting-stale"><a href="#respect-the-review-feedback-respond-quickly-and-avoid-merge-requests-getting-stale" class="header-anchor"></a>Respect the review feedback, respond quickly and avoid Merge Requests getting stale
</h3><p>Once you get feedback, try to respond as quickly as possible. When people participating have everything fresh in their minds, it is much easier for the submitter to rework it and for the reviewer to re-review. If the Merge Request becomes stale, it can be challenging to revive it. Also, if it looks like the MR is only waiting for re-review but nothing happens, re-read the previous feedback and make sure you actually address everything. After that, post a friendly comment where you explicitly say you have addressed all feedback and are only waiting for re-review.</p>
<h2 id="reviewing-merge-requests"><a href="#reviewing-merge-requests" class="header-anchor"></a>Reviewing Merge Requests
</h2><p>This section about reviewing is not exclusive to Debian package maintainers — anyone can contribute to Debian by reviewing open Merge Requests. Typically, the larger an open source project gets, the more help is needed in reviewing and testing changes to avoid regressions, and all diligently done work is welcome. As the famous Linus quote goes, “given enough eyeballs, all bugs are shallow”.</p>
<p>On <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a>, you can browse open Merge Requests per project or for a whole group, just like on any GitLab instance.</p>
<p>Reviewing Merge Requests is, however, most fun when they are fresh and the submitter is active. Thus, the best strategy is to ensure you have subscribed to email notifications in the repositories you care about so you get an email for any new Merge Request (or Issue) immediately when posted.</p>
<p><img src="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-salsa-notifications-watch.png"
width="953"
height="446"
srcset="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-salsa-notifications-watch_hu3637407015989308234.png 480w, https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-salsa-notifications-watch.png 953w"
loading="lazy"
alt="Change notification settings from Global to Watch to get an email on new Merge Requests"
class="gallery-image"
data-flex-grow="213"
data-flex-basis="512px"
>
</p>
<p>When you see a new Merge Request, try to review it within a couple of days. If you cannot review in a reasonable time, posting a small note that you intend to review it later will feel better to the submitter compared to not getting any response.</p>
<p>Personally, I have a habit of assigning myself as a reviewer so that I can keep track of my whole review queue at <a class="link" href="https://salsa.debian.org/dashboard/merge_requests?reviewer_username=otto" target="_blank" rel="noopener"
>https://salsa.debian.org/dashboard/merge_requests?reviewer_username=otto</a>, and I recommend the same to others. Seeing the review assignment happen is also a good way to signal to the submitter that their submission was noted.</p>
<h3 id="reviewing-commit-by-commit-in-the-web-interface"><a href="#reviewing-commit-by-commit-in-the-web-interface" class="header-anchor"></a>Reviewing commit-by-commit in the web interface
</h3><p>Reviewing using the web interface works well in general, but I find that the way GitLab designed it is not ideal. In my ideal review workflow, I first read the Git commit message to understand what the submitter tried to do and why; only then do I look at the code changes in the commit. In GitLab, to do this one must first open the <em>Commits</em> tab and then click on the <strong>last commit</strong> in the list, as it is sorted in reverse chronological order with the first commit at the bottom. Only after that do I see the commit message and contents. Getting to the next commit is easy by simply clicking <em>Next</em>.</p>
<p><img src="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/debian-salsa-review-example.gif"
width="953"
height="749"
loading="lazy"
alt="Example review to demonstrate location of buttons and functionality"
class="gallery-image"
data-flex-grow="127"
data-flex-basis="305px"
>
</p>
<p>When adding the first comment, I choose <em>Start review</em> and for the following remarks <em>Add to review</em>. Finally, I click <em>Finish review</em> and <em>Submit review</em>, which will trigger <strong>one single email to the submitter with all my feedback</strong>. I try to avoid using the <em>Add comment now</em> option, as each such comment triggers a separate notification email to the submitter.</p>
<h3 id="reviewing-and-testing-on-your-own-computer-locally"><a href="#reviewing-and-testing-on-your-own-computer-locally" class="header-anchor"></a>Reviewing and testing on your own computer locally
</h3><p>For the most thorough review, I pull the code to my laptop for local review with <code>git pull <remote url> <branch name></code>. There is no need to run <code>git remote add</code> as pulling using a URL directly works too and saves from needing to clean up old remotes later.</p>
<p>Pulling the Merge Request contents locally allows me to build, run and inspect the code deeply and review the commits with full metadata in <a class="link" href="https://manpages.debian.org/unstable/gitk/gitk.1.en.html" target="_blank" rel="noopener"
>gitk</a> or equivalent.</p>
<h3 id="investing-enough-time-in-writing-feedback-but-not-too-much"><a href="#investing-enough-time-in-writing-feedback-but-not-too-much" class="header-anchor"></a>Investing enough time in writing feedback, but not too much
</h3><p>See my other post for more in-depth advice on <a class="link" href="https://optimizedbyotto.com/post/how-to-code-review/" >how to structure your code review feedback</a>.</p>
<p>In Debian, I would emphasize <strong>patience, to allow the submitter time to rework their submission.</strong> Debian packaging is notoriously complex, and even experienced developers often need more feedback and time to get everything right. Avoid the temptation to rush the fix in yourself. <strong>In open source, Git credits are often the only salary the submitter gets.</strong> If you take the idea from the submission and implement it yourself, you rob the submitter of the opportunity to get feedback, try to improve and finally feel accomplished. Sure, it takes extra effort to give feedback, but the contributor is likely to feel ownership of their work and later return to further improve it.</p>
<p>If a submission looks hopelessly low quality and you feel that giving feedback is a waste of time, you can simply respond with something along the lines of: “Thanks for your contribution and interest in helping Debian. Unfortunately, looking at the commits, I see several shortcomings, and it is unlikely a normal review process is enough to help you finalize this. Please reach out to Debian Mentors to get a mentor who can give you more personalized feedback.”</p>
<p>There might also be contributors who just “dump the code”, ignore your feedback and never return to finalize their submission. If a contributor does not return to finalize their submission in 3-6 months, I will in my own projects simply finalize it myself and thank the contributor in the commit message (but not mark them as the author).</p>
<p>Despite best practices, you will occasionally still end up doing some things in vain, but that is how volunteer collaboration works. We all just need to accept that some communication will inevitably feel like wasted effort, but it should be viewed as a necessary investment to get the benefits from the times when the communication led to real and valuable collaboration. Please do not treat all contributors as if they are unlikely to ever contribute again; otherwise, your behavior will cause them not to contribute again. If you want to grow a tree, you need to plant several seeds.</p>
<h3 id="approving-and-merging"><a href="#approving-and-merging" class="header-anchor"></a>Approving and merging
</h3><p>Assuming review goes well and you are ready to approve, and if you are the only maintainer, you can proceed to merge right away. If there are multiple maintainers, or if you otherwise think that someone else might want to chime in before it is merged, use the “Approve” button to show that you approve the change but leave it unmerged.</p>
<p>The person who approved does not necessarily have to be the person who merges. The point of the Merge Request review is not separation of duties in committing and merging — <strong>the main purpose of a code review is to have a different set of eyeballs looking at the change before it is committed</strong> into the main development branch for all eternity. In some packages, the submitter might actually merge themselves once they see another developer has approved. In some rare Debian projects, there might even be separate people taking the roles of submitting, approving and merging, but most of the time these three roles are filled by two people either as submitter and approver+merger or submitter+merger and approver.</p>
<p>If you are not a maintainer at all and do not have permissions to click <em>Approve</em>, simply post a comment summarizing your review and that you approve it and support merging it. This can help the maintainers review and merge faster.</p>
<h2 id="making-a-merge-request-for-a-new-upstream-version-import"><a href="#making-a-merge-request-for-a-new-upstream-version-import" class="header-anchor"></a>Making a Merge Request for a new upstream version import
</h2><p>Unlike many other Linux distributions, in Debian each source package has its own version control repository. The Debian sources consist of the upstream sources with an additional <code>debian/</code> subdirectory that contains the actual Debian packaging. For the same reason, a typical Debian packaging Git repository has a <code>debian/latest</code> branch that has changes only in the <code>debian/</code> subdirectory while the surrounding upstream files are the actual upstream files and have the actual upstream Git history. For details, see my post <a class="link" href="https://optimizedbyotto.com/post/debian-source-package-git/" >explaining Debian source packages in Git</a>.</p>
<p>Because of this Git branch structure, importing a new upstream version will typically modify three branches: <code>debian/latest</code>, <code>upstream/latest</code> and <code>pristine-tar</code>. When doing a Merge Request for a new upstream import, <strong>only submit one Merge Request for one branch</strong>: which means merging your new changes to the <code>debian/latest</code> branch.</p>
<p>There is no need to submit the <code>upstream/latest</code> branch or the <code>pristine-tar</code> branch. Their contents are fixed and mechanically imported into Debian. There are no changes that the reviewer in Debian can request the submitter to do on these branches, so asking for feedback and comments on them is useless. All review, comments and re-reviews concern the content of the <code>debian/latest</code> branch only.</p>
<p>It is not even necessary to use the <code>debian/latest</code> branch for a new upstream version. Personally, I always execute the new version import (with <code>gbp import-orig --verbose --uscan</code>) and prepare and test everything on <code>debian/latest</code>, but when it is time to submit it for review, I run <code>git checkout -b import/$(dpkg-parsechangelog -SVersion)</code> to get a branch named e.g. <code>import/1.0.1</code> and then push that for review.</p>
<h2 id="reviewing-a-merge-request-for-a-new-upstream-version-import"><a href="#reviewing-a-merge-request-for-a-new-upstream-version-import" class="header-anchor"></a>Reviewing a Merge Request for a new upstream version import
</h2><p>Reviewing and testing a new upstream version import is a bit tricky currently, but possible. The key is to use <code>gbp pull</code> to automate fetching all branches from the submitter’s fork. Assume you are reviewing a submission targeting the <a class="link" href="https://salsa.debian.org/go-team/packages/glow" target="_blank" rel="noopener"
>Glow package repository</a> and there is a Merge Request from user otto’s fork. As the maintainer, you would run the commands:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">git remote add otto https://salsa.debian.org/otto/glow.git
gbp pull --verbose otto</code><pre><code>git remote add otto https://salsa.debian.org/otto/glow.git
gbp pull --verbose otto</code></pre></div>
<p>If there was feedback in the first round and you later need to pull a new version for re-review, running <code>gbp pull --force</code> will not suffice, and this trick of manually fetching each branch and resetting them to the submitter’s version is needed:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">for BRANCH in pristine-tar upstream debian/latest
do
git checkout $BRANCH
git reset --hard origin/$BRANCH
git pull --force https://salsa.debian.org/otto/glow.git $BRANCH
done</code><pre><code>for BRANCH in pristine-tar upstream debian/latest
do
git checkout $BRANCH
git reset --hard origin/$BRANCH
git pull --force https://salsa.debian.org/otto/glow.git $BRANCH
done</code></pre></div>
<p>Once review is done, either click <em>Approve</em> and let the submitter push everything, or alternatively, push all the branches you pulled locally yourself. In GitLab and other forges, the Merge Request will automatically be marked as <em>Merged</em> once the commit ID that was the head of the Merge Request is pushed to the target branch.</p>
<p><img src="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/merged-and-approved.png"
width="422"
height="55"
srcset="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/merged-and-approved.png 422w"
loading="lazy"
class="gallery-image"
data-flex-grow="767"
data-flex-basis="1841px"
>
</p>
<h2 id="please-allow-enough-time-for-everyone-to-participate"><a href="#please-allow-enough-time-for-everyone-to-participate" class="header-anchor"></a>Please allow enough time for everyone to participate
</h2><p>When working on Debian, keep in mind that it is a community of volunteers. It is common for people to do Debian stuff only on weekends, so you should patiently wait for at least a week so that enough workdays and weekend days have passed for the people you interact with to have had time to respond in their own Debian time.</p>
<p>Having to wait may feel annoying and disruptive, but try to look at the upside: you do not need to do extra work simply while waiting for others. In some cases, that waiting can be useful thanks to the “sleep on it” phenomenon: when you yourself look at your own submission some days later with fresh eyes, you might notice something you overlooked earlier and improve your code change even without other people’s feedback!</p>
<h2 id="contribute-reviews"><a href="#contribute-reviews" class="header-anchor"></a>Contribute reviews!
</h2><p>The last but not least suggestion is to make a habit of contributing reviews to packages you <strong>do not</strong> maintain. As we already see in large open source projects, such as the Linux kernel, they have far more code submissions than they can handle. The bottleneck for progress and maintaining quality becomes the reviews themselves.</p>
<p>For Debian, as an organization and as a community, to be able to renew and grow new contributors, we need more of the senior contributors to shift focus from merely maintaining their packages and writing code to also intentionally interact with new contributors and guide them through the process of creating great open source software. Reviewing code is an effective way to both get tangible progress on individual development items and to transfer culture to a new generation of developers.</p>
<h2 id="why-arent-100-of-all-debian-source-packages-hosted-on-salsa"><a href="#why-arent-100-of-all-debian-source-packages-hosted-on-salsa" class="header-anchor"></a>Why aren’t 100% of all Debian source packages hosted on Salsa?
</h2><p>As seen at <a class="link" href="https://trends.debian.net/#vcs-hosting" target="_blank" rel="noopener"
>trends.debian.net</a>, more and more packages are using Salsa. Debian does not, however, have any policy about it. In fact, the <a class="link" href="https://www.debian.org/doc/debian-policy/search.html?q=salsa&check_keywords=yes&area=default" target="_blank" rel="noopener"
>Debian Policy Manual does not even mention the word “Salsa” anywhere</a>. Adoption of Salsa has so far been purely organic, as in Debian each package maintainer has full freedom to choose whatever preferences they have regarding version control.</p>
<p>I hope the trend to use Salsa will continue and more shared workflows emerge so that collaboration gets easier. To drive the culture of using Merge Requests and more, I drafted the Debian proposal <a class="link" href="https://dep-team.pages.debian.net/deps/dep18/" target="_blank" rel="noopener"
>DEP-18: Encourage Continuous Integration and Merge Request based Collaboration for Debian packages</a>. If you are active in Debian and you think DEP-18 is beneficial for Debian, please give a thumbs up at <a class="link" href="https://salsa.debian.org/dep-team/deps/-/merge_requests/21" target="_blank" rel="noopener"
>dep-team/deps!21</a>.</p> Debcraft – Easiest way to modify and build Debian packages https://optimizedbyotto.com/post/debcraft-easy-debian-packaging/Thu, 17 Jul 2025 00:00:00 +0000 https://optimizedbyotto.com/post/debcraft-easy-debian-packaging/ <img src="https://optimizedbyotto.com/post/debcraft-easy-debian-packaging/debcraft-image.jpg" alt="Featured image of post Debcraft – Easiest way to modify and build Debian packages" /><p>Debian packaging is notoriously hard. Far too many new contributors give up while trying, and many long-time contributors leave due to burnout from having to do too many thankless maintenance tasks. Some just skip testing their changes properly because it feels like too much toil.</p>
<p><strong><a class="link" href="https://salsa.debian.org/debian/debcraft" target="_blank" rel="noopener"
>Debcraft</a> is my attempt to solve this by automating all the boring stuff, making it easier to learn the correct practices, and helping new and old packagers better track changes in both source code and build artifacts.</strong></p>
<h2 id="the-challenge-of-declarative-packaging-code"><a href="#the-challenge-of-declarative-packaging-code" class="header-anchor"></a>The challenge of declarative packaging code
</h2><p>Unlike how <a class="link" href="https://en.wikipedia.org/wiki/RPM_Package_Manager" target="_blank" rel="noopener"
>rpm</a> or <a class="link" href="https://en.wikipedia.org/wiki/Alpine_Linux" target="_blank" rel="noopener"
>apk</a> packages are done, the <a class="link" href="https://en.wikipedia.org/wiki/Deb_%28file_format%29" target="_blank" rel="noopener"
>deb package</a> sources by design avoid having one massive procedural packaging recipe. Instead, the packaging is defined in multiple declarative files in the <code>debian/</code> subdirectory. For example, instead of a script running <code>install -m 755 bin/btop /usr/bin/btop</code> there is a file <code>debian/btop.install</code> containing the line <code>usr/bin/btop</code>.</p>
<p>This makes the overall system more robust and reliable, and allows, for example, extensive static analysis to find problems without having to build the package. The notable exception is the <code>debian/rules</code> file, which contains procedural code that can modify any aspect of the package build. Almost all other files are declarative.</p>
<p>Benefits include, among others, that the effect of a Debian-wide policy change can be relatively easily predicted by scanning what attributes and configurations all packages have declared.</p>
<p>The drawback is that to understand the syntax and meaning of each file, one must understand which build tools read which files and traverse potentially multiple layers of abstraction. In my view, this is the root cause of most of the perceived complexity.</p>
<h2 id="common-complaints-about-deb-packaging"><a href="#common-complaints-about-deb-packaging" class="header-anchor"></a>Common complaints about .deb packaging
</h2><p>Related to the above, people learning Debian packaging frequently voice the following complaints:</p>
<ul>
<li>Debian has too many tools to learn, often with overlapping or duplicate functionality.</li>
<li>Too much outdated and inconsistent documentation that makes learning the numerous tools needlessly hard.</li>
<li>Lack of documentation of the generally agreed best practices, mainly due to Debian’s reluctance as a project to pick one tool and deprecate the alternatives.</li>
<li>Multiple layers of abstraction and lack of clarity on what any single change in the <code>debian/</code> subdirectory leads to in the final package.</li>
<li>Requirement of Debian packages to be developed on a Debian system.</li>
</ul>
<h2 id="how-debcraft-solves-some-of-this"><a href="#how-debcraft-solves-some-of-this" class="header-anchor"></a>How Debcraft solves (some of) this
</h2><p><a class="link" href="https://salsa.debian.org/debian/debcraft" target="_blank" rel="noopener"
>Debcraft</a> is intentionally opinionated for the sake of simplicity, and makes heavy use of git, git-buildpackage, and most importantly Linux containers, supporting both Docker and Podman.</p>
<p>By using containers, Debcraft frees the user from the requirement of having to run Debian. This makes .deb packaging more accessible to developers running some other Linux distro or even Mac or Windows (with WSL). Of course, we want developers to run Debian (or a derivative like Ubuntu), but we want them even more to build, test, and ship their software as .deb. Even for Debian/Ubuntu users, having everything done inside clean hermetic containers of the latest target distribution version will yield more robust, secure, and reproducible builds and tests. All containers are built automatically on-the-fly using best practices for layer caching, making everything easy and fast.</p>
<p><strong>Debcraft has simple commands to make it easy to build, rebuild, test, and update packages.</strong> The most fundamental command is <code>debcraft build</code>, which will not only build the package but also fetch the sources if not already present, and with flags such as <code>--distribution</code> or <code>--source-only</code> build for any requested Debian or Ubuntu release, or generate source packages only for Debian or PPA upload purposes.</p>
<p>For ease of use, the output is colored and includes helpful explanations on what is being done, and suggests relevant Debian documentation for more information.</p>
<p>Most importantly, the build artifacts, along with various logs, are stored in separate directories, making it easy to compare before and after to see what changed as a result of the code or dependency updates (utilizing <a class="link" href="https://manpages.debian.org/unstable/diffoscope-minimal/diffoscope.1.en.html" target="_blank" rel="noopener"
>diffoscope</a>, among others).</p>
<p>While the above helps to debug successful builds, there is also the <code>debcraft shell</code> command to make debugging failed builds significantly easier by dropping into a shell where one can run various <code>dh</code> commands one-by-one.</p>
<p>Once the build works, testing autopkgtests is as easy as running <code>debcraft test</code>. As with all other commands, Debcraft is smart enough to read information like the target distribution from the <code>debian/changelog</code> entry.</p>
<p>When the package is ready to be released, there is the <code>debcraft release</code> command that will create the Debian source package in the correct format and facilitate uploading it either to your Personal Package Archive (PPA) or if you are a Debian Developer to the official Debian archive.</p>
<h2 id="automatically-improve-and-update-packages"><a href="#automatically-improve-and-update-packages" class="header-anchor"></a>Automatically improve and update packages
</h2><p>Additionally, the command <code>debcraft improve</code> will try to fix all issues that are possible to address automatically. It utilizes, among others, <a class="link" href="https://manpages.debian.org/unstable/lintian-brush/lintian-brush.1.en.html" target="_blank" rel="noopener"
>lintian-brush</a>, <a class="link" href="https://manpages.debian.org/unstable/codespell/codespell.1.en.html" target="_blank" rel="noopener"
>codespell</a>, and <a class="link" href="https://manpages.debian.org/unstable/dh-debputy/debputy.1.en.html" target="_blank" rel="noopener"
>debputy</a>. This makes repetitive Debian maintenance tasks easier, such as updating the package to follow the latest Debian policies.</p>
<p>To update the package to the latest upstream version, there is also <code>debcraft update</code>. It will read the package configuration files such as <code>debian/gbp.conf</code> and <code>debian/watch</code> and attempt to import the latest upstream version, refresh patches, build, and run autopkgtests. If everything passes, the new version is committed. This helps automate the process of updating to new upstream versions.</p>
<h2 id="try-out-debcraft-now"><a href="#try-out-debcraft-now" class="header-anchor"></a>Try out Debcraft now!
</h2><p>On a recent version of Debian and Ubuntu, Debcraft can be installed simply by running <code>apt install debcraft</code>. To use Debcraft on some other distribution or to get the latest features available in the development version install it using:</p>
<pre><code>git clone https://salsa.debian.org/debian/debcraft.git
cd debcraft
make install-local
</code></pre>
<p>To see exact usage instructions run <code>debcraft --help</code>.</p>
<h2 id="contributions-welcome"><a href="#contributions-welcome" class="header-anchor"></a>Contributions welcome
</h2><p>Current Debcraft version 0.5 still has some rough edges and missing features, but I have personally been using it for over a year to maintain all my packages in Debian. If you come across some issue, feel free to file a report at <a class="link" href="https://salsa.debian.org/debian/debcraft/-/issues" target="_blank" rel="noopener"
>https://salsa.debian.org/debian/debcraft/-/issues</a> or submit an improvement at <a class="link" href="https://salsa.debian.org/debian/debcraft/-/merge_requests" target="_blank" rel="noopener"
>https://salsa.debian.org/debian/debcraft/-/merge_requests</a>. The code is intentionally written entirely in shell script to keep the barrier to code contribution as low as possible.</p>
<blockquote>
<p><strong>By the way,</strong> if you aspire to become a Debian Developer, and want to follow my examples in using state-of-the-art tooling and collaborate using salsa.debian.org, feel free to reach out for mentorship. I am glad to see more people contribute to Debian!</p>
</blockquote> Corporate best practices for upstream open source contributions https://optimizedbyotto.com/post/best-practices-corporate-open-source-contributions/Mon, 30 Jun 2025 00:00:00 +0000 https://optimizedbyotto.com/post/best-practices-corporate-open-source-contributions/ <img src="https://optimizedbyotto.com/post/best-practices-corporate-open-source-contributions/featured-image.jpg" alt="Featured image of post Corporate best practices for upstream open source contributions" /><blockquote>
<p>This post is based on a presentation given at the <a class="link" href="https://www.validos.org/" target="_blank" rel="noopener"
>Validos</a> annual members’ meeting on June 25th, 2025.</p>
</blockquote>
<p>When I started getting into Linux and open source over 25 years ago, the majority of the software development in this area was done by academics and hobbyists. The number of companies participating in open source has since exploded in parallel with the growth of mobile and cloud software, the majority of which is built on top of open source. For example, Android powers most mobile phones today and is based on Linux. Almost all software used to operate large cloud provider data centers, such as AWS or Google, is either open source or made in-house by the cloud provider.</p>
<p>Pretty much all companies, regardless of the industry, have been using open source software at least to some extent for years. However, the degree to which they collaborate with the upstream origins of the software varies. <strong>I encourage all companies in a technical industry to start contributing upstream.</strong> There are many benefits to having a good relationship with your upstream open source software vendors, both for the short term and especially for the long term. Moreover, with the rollout of <a class="link" href="https://en.wikipedia.org/wiki/Cyber_Resilience_Act" target="_blank" rel="noopener"
>CRA in EU in 2025-2027</a>, the law will require software companies to contribute security fixes upstream to the open source projects their products use.</p>
<p>To ensure the process is well managed, business-aligned and legally compliant, there are a few <em>do’s</em> and <em>don’t do’s</em> that are important to be aware of.</p>
<h2 id="maintain-your-sboms"><a href="#maintain-your-sboms" class="header-anchor"></a>Maintain your SBOMs
</h2><p>For every piece of software, regardless of whether the code was done in-house, from an open source project, or a combination of these, every company needs to produce a <a class="link" href="https://en.wikipedia.org/wiki/Software_supply_chain" target="_blank" rel="noopener"
>Software Bill of Materials (SBOM)</a>. The SBOMs provide a standardized and interoperable way to track what software and which versions are used where, what software licenses apply, who holds the copyright of which component, which security fixes have been applied and so forth.</p>
<p>A catalog of SBOMs, or equivalent, forms the backbone of software supply-chain management in corporations.</p>
<h2 id="identify-your-strategic-upstream-vendors"><a href="#identify-your-strategic-upstream-vendors" class="header-anchor"></a>Identify your strategic upstream vendors
</h2><p>The SBOMs are likely to reveal that for any piece of non-trivial software, there are hundreds or thousands of upstream open source projects in use. Few organizations have resources to contribute to all of their upstreams.</p>
<p>If your organization is just starting to organize upstream contribution activities, identify the key projects that have the largest impact on your business and prioritize forming a relationship with them first. Organizations with a mature contribution process will be collaborating with tens or hundreds of upstreams.</p>
<h2 id="create-a-written-policy-with-input-from-business-owners-legal-and-marketing"><a href="#create-a-written-policy-with-input-from-business-owners-legal-and-marketing" class="header-anchor"></a>Create a written policy with input from business owners, legal and marketing
</h2><p>An upstream contribution policy typically covers things such as who decides what can be contributed upstream from a business point of view, what licenses are allowed or to avoid, how to document copyright, how to deal with projects that require signing copyright assignments (e.g. contributor license agreements), and other potential legal guidelines to follow. Additionally, the technical steps on how to prepare a contribution should be outlined, including how to internally review and re-review them, who the technical approvers are to ensure high quality and good reputation and so on.</p>
<p><strong>The policy does not have to be static</strong> or difficult to produce. Start with a small policy and a few trusted senior developers following it, and update its contents as you run into new situations that need internal company alignment. For example, don’t require staff to create new GitHub accounts merely for the purpose of doing one open source contribution. Initially, do things with minimal overhead and add requirements to the policy only if they have clear and strong benefits. <strong>The purpose of a policy should be to make it obvious and easy for employees to do the right thing</strong>, not to add obstacles and stop progress or encourage people to break the policy.</p>
<h2 id="appoint-an-internal-coordinator-and-champions"><a href="#appoint-an-internal-coordinator-and-champions" class="header-anchor"></a>Appoint an internal coordinator and champions
</h2><p>Having a written policy on how to contribute upstream will help ensure a consistent process and avoid common pitfalls. However, a written policy alone does not automatically translate into a well-running process. It is highly recommended to appoint at least one internal coordinator who is knowledgeable about how open source communities work, how software licensing and patents work, and is senior enough to have a good sense of what business priorities to optimize for. <strong>In small organizations it can be a single person, while larger organizations typically have a full Open Source Programs Office</strong>.</p>
<p>This coordinator should oversee the contribution process, track all contributions made across the organization, and further optimize the process by working with stakeholders across the business, including legal experts, business owners and CTOs. The marketing and recruiting teams should also be involved, as <strong>upstream contributions will have a reputation-building aspect as well</strong>, which can be enhanced with systematic tracking and publishing of activities.</p>
<p>Additionally, at least in the beginning, the organization should also appoint key staff members as open source champions. Implementing a new process always includes some obstacles and occasional setbacks, which may discourage employees from putting in the extra effort to reap the full long-term benefits for the company. <strong>Having named champions will empower them to make the first few contributions themselves, setting a good example and encouraging and mentoring others to contribute upstream as well.</strong></p>
<h2 id="avoid-excessive-approvals"><a href="#avoid-excessive-approvals" class="header-anchor"></a>Avoid excessive approvals
</h2><p>To maintain a high quality bar, it is always good to have all outgoing submissions reviewed by at least one or two people. <strong>Two or three pairs of eyeballs are significantly more likely to catch issues that might slip by someone working alone</strong>. The review also slows down the process by a day or two, which gives the author time to “sleep on it”, which usually helps to ensure the final submission is well-thought-out by the author.</p>
<p><strong>Do not require more than one or two reviewers.</strong> The marginal utility goes quickly to zero beyond a few reviewers, and at around four or five people the effect becomes negative, as the weight of each approval decreases and the reviewers begin to take less personal responsibility. Having too many people in the loop also makes each feedback round slow and expensive, to the extent that the author will hesitate to make updates and ask for re-reviews due to the costs involved.</p>
<p>If the organization experiences setbacks due to mistakes slipping through the review process, do not respond by adding more reviewers, as it will just grind the contribution process to a halt. <strong>If there are quality concerns, invest in training for engineers, CI systems and perhaps an internal certification program for those making public upstream code submissions.</strong> A typical software engineer is more likely to seriously try to become proficient at their job and put effort into a one-off certification exam and then make multiple high-quality contributions, than for a low-skilled engineer to improve and even want to continue doing more upstream contributions if they are burdened by heavy review processes every time they try to submit an upstream contribution.</p>
<h2 id="dont-expect-upstream-to-accept-all-code-contributions"><a href="#dont-expect-upstream-to-accept-all-code-contributions" class="header-anchor"></a>Don’t expect upstream to accept all code contributions
</h2><p>Sure, identifying the root cause of and fixing a tricky bug or writing a new feature requires significant effort. While an open source project will certainly appreciate the effort invested, it doesn’t mean it will always welcome all contributions with open arms. Occasionally, the project won’t agree that the code is correct or the feature is useful, and some contributions are bound to be rejected.</p>
<p>You can minimize the chance of experiencing rejections by having a solid internal review process that includes assessing how the upstream community is likely to understand the proposal. Sometimes how things are communicated is more important than how they are coded. Polishing inline comments and git commit messages helps ensure high-quality communication, along with a commitment to respond quickly to review feedback and conducting regular follow-ups until a contribution is finalized and accepted.</p>
<h2 id="start-small-to-grow-expertise-and-reputation"><a href="#start-small-to-grow-expertise-and-reputation" class="header-anchor"></a>Start small to grow expertise and reputation
</h2><p>In addition to keeping the open source contribution policy lean and nimble, it is also good to start practical contributions with small issues. Don’t aim to contribute massive features until you have a track record of being able to make multiple small contributions.</p>
<p>Keep in mind that not all open source projects are equal. Each has its own culture, written and unwritten rules, development process, documented requirements (which may be outdated) and more. Starting with a tiny contribution, even just a typo fix, is a good way to validate how code submissions, reviews and approvals work in a particular project. Once you have staff who have successfully landed smaller contributions, you can start planning larger proposals. The exact same proposal might be unsuccessful when proposed by a new person, and successful when proposed by a person who already has a reputation for prior high-quality work.</p>
<h2 id="embrace-all-and-any-publicity-you-get"><a href="#embrace-all-and-any-publicity-you-get" class="header-anchor"></a>Embrace all and any publicity you get
</h2><p>Some companies have concerns about their employees working in the open. Indeed, every email and code patch an employee submits, and all related discussions become public. This may initially sound scary, but is actually a potential source of good publicity. Employees need to be trained on how to conduct themselves publicly, and the discussions about code should contain only information strictly related to the code, without any references to actual production environments or other sensitive information. In the long run most employees contributing have a positive impact and the company should reap the benefits of positive publicity. If there are quality issues or employee judgment issues, hiding the activity or forcing employees to contribute with pseudonyms is not a proper solution. Instead, the problems should be addressed at the root, and bad behavior addressed rather than tolerated.</p>
<p>When people are working publicly, there tends to also be some degree of additional pride involved, which motivates people to try their best. Contributions need to be public for the sponsoring corporation to later be able to claim copyright or licenses. Considering that thousands of companies participate in open source every day, the prevalence of bad publicity is quite low, and the benefits far exceed the risks.</p>
<h2 id="scratch-your-own-itch"><a href="#scratch-your-own-itch" class="header-anchor"></a>Scratch your own itch
</h2><p>When choosing what to contribute, select things that benefit your own company. This is not purely about being selfish - often people working on resolving a problem they suffer from are the same people with the best expertise of what the problem is and what kind of solution is optimal. Also, the issues that are most pressing to your company are more likely to be universally useful to solve than any random bug or feature request in the upstream project’s issue tracker.</p>
<h2 id="remember-there-are-many-ways-to-help-upstream"><a href="#remember-there-are-many-ways-to-help-upstream" class="header-anchor"></a>Remember there are many ways to help upstream
</h2><p>While submitting code is often considered the primary way to contribute, please keep in mind there are also other highly impactful ways to contribute. Submitting high-quality bug reports will help developers quickly identify and prioritize issues to fix. Providing good research, benchmarks, statistics or feedback helps guide development and helps the project make better design decisions. Documentation, translations, organizing events and providing marketing support can help increase adoption and strengthen long-term viability for the project.</p>
<p>In some of the largest open source projects there are already far more pending contributions than the core maintainers can process. <strong>Therefore, developers who contribute code should also get into the habit of contributing reviews.</strong> As <a class="link" href="https://en.wikipedia.org/wiki/Linus%27s_law" target="_blank" rel="noopener"
>Linus’ law</a> states, <em>given enough eyeballs, all bugs are shallow</em>. Reviewing other contributors’ submissions will help improve quality, and also alleviate the pressure on core maintainers who are the only ones providing feedback. Reviewing code submitted by others is also a great learning opportunity for the reviewer. The reviewer does not need to be “better” than the submitter - any feedback is useful; merely posting review feedback is not the same thing as making an approval decision.</p>
<p>Many projects are also happy to accept monetary support and sponsorships. Some offer specific perks in return. By human nature, the largest sponsors always get their voice heard in important decisions, as no open source project wants to take actions that scare away major financial contributors.</p>
<h2 id="starting-is-the-hardest-part"><a href="#starting-is-the-hardest-part" class="header-anchor"></a>Starting is the hardest part
</h2><p>Long-term success in open source comes from a positive feedback loop of an ever-increasing number of users and collaborators. As seen in the examples of countless corporations contributing open source, the benefits are concrete, and the process usually runs well after the initial ramp-up and organizational learning phase has passed.</p>
<p>In open source ecosystems, contributing upstream should be as natural as paying vendors in any business. If you are using open source and not contributing at all, you likely have latent business risks without realizing it. You don’t want to wake up one morning to learn that your top talent left because they were <em>forbidden</em> from participating in open source for the company’s benefit, or that you were fined due to <em>CRA violations</em> and mismanagement in sharing security fixes with the correct parties. The faster you start with the process, the less likely those risks will materialize.</p> Creating Debian packages from upstream Git https://optimizedbyotto.com/post/debian-packaging-from-git/Mon, 26 May 2025 00:00:00 +0000 https://optimizedbyotto.com/post/debian-packaging-from-git/ <img src="https://optimizedbyotto.com/post/debian-packaging-from-git/git-to-deb.jpg" alt="Featured image of post Creating Debian packages from upstream Git" /><p>In this post, I demonstrate the optimal workflow for creating new Debian packages in 2025, preserving the upstream Git history. The motivation for this is to lower the barrier for sharing improvements to and from upstream, and to improve software provenance and supply-chain security by making it easy to inspect every change at any level using standard Git tooling.</p>
<p>Key elements of this workflow include:</p>
<ul>
<li>Using a Git fork/clone of the upstream repository as the starting point for creating Debian packaging repositories.</li>
<li>Consistent use of the same <code>git-buildpackage</code> commands, with all package-specific options in <code>gbp.conf</code>.</li>
<li>DEP-14 tag and branch names for an optimal <a class="link" href="https://optimizedbyotto.com/post/debian-source-package-git/" >Git packaging repository structure</a>.</li>
<li>Pristine-tar and upstream signatures for supply-chain security.</li>
<li>Use of <code>Files-Excluded</code> in the <code>debian/copyright</code> file to filter out unwanted files in Debian.</li>
<li>Patch queues to easily rebase and cherry-pick changes across Debian and upstream branches.</li>
<li>Efficient use of Salsa, Debian’s GitLab instance, for both automated feedback from CI systems and human feedback from peer reviews.</li>
</ul>
<p>To make the instructions so concrete that anyone can repeat all the steps themselves on a real package, I demonstrate the steps by packaging the command-line tool <a class="link" href="https://eradman.com/entrproject/" target="_blank" rel="noopener"
>Entr</a>. It is written in C, has very few dependencies, and its final Debian source package structure is simple, yet exemplifies all the important parts that go into a complete Debian package:</p>
<ol>
<li>Creating a new packaging repository and publishing it under your personal namespace on salsa.debian.org.</li>
<li>Using <code>dh_make</code> to create the initial Debian packaging.</li>
<li>Posting the first draft of the Debian packaging as a Merge Request (MR) and using Salsa CI to verify Debian packaging quality.</li>
<li>Running local builds efficiently and iterating on the packaging process.</li>
</ol>
<h2 id="create-new-debian-packaging-repository-from-the-existing-upstream-project-git-repository"><a href="#create-new-debian-packaging-repository-from-the-existing-upstream-project-git-repository" class="header-anchor"></a>Create new Debian packaging repository from the existing upstream project Git repository
</h2><p>First, create a new empty directory, then clone the upstream Git repository inside it:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">mkdir debian-entr
cd debian-entr
git clone --origin upstreamvcs --branch master \
--single-branch https://github.com/eradman/entr.git</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mkdir debian-entr
</span></span><span style="display:flex;"><span>cd debian-entr
</span></span><span style="display:flex;"><span>git clone --origin upstreamvcs --branch master <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --single-branch https://github.com/eradman/entr.git</span></span></code></pre></div></div></div>
<p>Using a clean directory makes it easier to inspect the build artifacts of a Debian package, which will be output in the parent directory of the Debian source directory.</p>
<p>The extra parameters given to <code>git clone</code> lay the foundation for the <a class="link" href="https://optimizedbyotto.com/post/debian-source-package-git/" >Debian packaging Git repository structure</a> where the upstream Git remote name is <code>upstreamvcs</code>. Only the upstream main branch is tracked to avoid cluttering Git history with upstream development branches that are irrelevant for packaging in Debian.</p>
<p>Next, enter the Git repository directory and list the Git tags. Pick the latest upstream <em>release tag</em> as the commit to start the branch <code>upstream/latest</code>. This <em>latest</em> refers to the upstream <em>release</em>, not the upstream development branch. Immediately after, branch off the <code>debian/latest</code> branch, which will have the actual Debian packaging files in the <code>debian/</code> subdirectory.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">cd entr
git tag # shows the latest upstream release tag was '5.6'
git checkout -b upstream/latest 5.6
git checkout -b debian/latest</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>cd entr
</span></span><span style="display:flex;"><span>git tag <span style="color:#75715e"># shows the latest upstream release tag was '5.6'</span>
</span></span><span style="display:flex;"><span>git checkout -b upstream/latest 5.6
</span></span><span style="display:flex;"><span>git checkout -b debian/latest</span></span></code></pre></div></div></div>
<pre class="mermaid">%%{init: { 'gitGraph': { 'mainBranchName': 'master' } } }%%
gitGraph:
checkout master
commit id: "Upstream 5.6 release" tag: "5.6"
branch upstream/latest
checkout upstream/latest
commit id: "New upstream version 5.6" tag: "upstream/5.6"
branch debian/latest
checkout debian/latest
commit id: "Initial Debian packaging"
commit id: "Additional change 1"
commit id: "Additional change 2"
commit id: "Additional change 3"
</pre>
<p>At this point, the repository is structured according to DEP-14 conventions, ensuring a clear separation between upstream and Debian packaging changes, but there are no Debian changes yet. Next, add the Salsa repository as a new remote which called <code>origin</code>, the same as the default remote name in Git.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">git remote add origin git@salsa.debian.org:otto/entr-demo.git
git push --set-upstream origin debian/latest</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>git remote add origin git@salsa.debian.org:otto/entr-demo.git
</span></span><span style="display:flex;"><span>git push --set-upstream origin debian/latest</span></span></code></pre></div></div></div>
<p>This is an important preparation step to later be able to create a Merge Request on Salsa that targets the <code>debian/latest</code> branch, which does not yet have any <code>debian/</code> directory.</p>
<h2 id="launch-a-debian-sid-unstable-container-to-run-builds-in"><a href="#launch-a-debian-sid-unstable-container-to-run-builds-in" class="header-anchor"></a>Launch a Debian Sid (unstable) container to run builds in
</h2><p>To ensure that all packaging tools are of the latest versions, run everything inside a fresh Sid container. This has two benefits: you are guaranteed to have the most up-to-date toolchain, and your host system stays clean without getting polluted by various extra packages. Additionally, this approach works even if your host system is not Debian/Ubuntu.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">cd ..
podman run --interactive --tty --rm --shm-size=1G --cap-add SYS_PTRACE \
--env='DEB*' --volume=$PWD:/tmp/test --workdir=/tmp/test debian:sid bash</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>cd ..
</span></span><span style="display:flex;"><span>podman run --interactive --tty --rm --shm-size<span style="color:#f92672">=</span>1G --cap-add SYS_PTRACE <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --env<span style="color:#f92672">=</span><span style="color:#e6db74">'DEB*'</span> --volume<span style="color:#f92672">=</span>$PWD:/tmp/test --workdir<span style="color:#f92672">=</span>/tmp/test debian:sid bash</span></span></code></pre></div></div></div>
<p>Note that the container should be started from the parent directory of the Git repository, not inside it. The <code>--volume</code> parameter will loop-mount the current directory inside the container. Thus all files created and modified are on the host system, and will persist after the container shuts down.</p>
<p>Once inside the container, install the basic dependencies:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">apt update -q && apt install -q --yes git-buildpackage dpkg-dev dh-make</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>apt update -q <span style="color:#f92672">&&</span> apt install -q --yes git-buildpackage dpkg-dev dh-make</span></span></code></pre></div></div></div>
<h2 id="automate-creating-the-debian-files-with-dh-make"><a href="#automate-creating-the-debian-files-with-dh-make" class="header-anchor"></a>Automate creating the <code>debian/</code> files with dh-make
</h2><p>To create the files needed for the actual Debian packaging, use <a class="link" href="https://manpages.debian.org/unstable/dh-make/dh_make.1.en.html" target="_blank" rel="noopener"
><code>dh_make</code></a>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-6"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-6" style="display:none;"># dh_make --packagename entr_5.6 --single --createorig
Maintainer Name : Otto Kekäläinen
Email-Address : otto@debian.org
Date : Sat, 15 Feb 2025 01:17:51 +0000
Package Name : entr
Version : 5.6
License : blank
Package Type : single
Are the details correct? [Y/n/q]
Done. Please edit the files in the debian/ subdirectory now.</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e"># dh_make --packagename entr_5.6 --single --createorig</span>
</span></span><span style="display:flex;"><span>Maintainer Name : Otto Kekäläinen
</span></span><span style="display:flex;"><span>Email-Address : otto@debian.org
</span></span><span style="display:flex;"><span>Date : Sat, <span style="color:#ae81ff">15</span> Feb <span style="color:#ae81ff">2025</span> 01:17:51 +0000
</span></span><span style="display:flex;"><span>Package Name : entr
</span></span><span style="display:flex;"><span>Version : 5.6
</span></span><span style="display:flex;"><span>License : blank
</span></span><span style="display:flex;"><span>Package Type : single
</span></span><span style="display:flex;"><span>Are the details correct? <span style="color:#f92672">[</span>Y/n/q<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Done. Please edit the files in the debian/ subdirectory now.</span></span></code></pre></div></div></div>
<p>Due to how <code>dh_make</code> works, the package name and version need to be written as a single underscore-separated string. In this case, you should choose <code>--single</code> to specify that the package type is a single binary package. Other options would be <code>--library</code> for library packages (see <a class="link" href="https://salsa.debian.org/gnome-team/libgda5/-/tree/debian/latest/debian" target="_blank" rel="noopener"
>libgda5 sources</a> as an example) or <code>--indep</code> (see <a class="link" href="https://salsa.debian.org/dns-team/dns-root-data/-/tree/debian/master/debian" target="_blank" rel="noopener"
>dns-root-data sources</a> as an example). The <code>--createorig</code> will create a mock upstream release tarball (<code>entr_5.6.orig.tar.xz</code>) from the current release directory, which is necessary due to historical reasons and how <code>dh_make</code> worked before Git repositories became common and Debian source packages were based off upstream release tarballs (e.g. <code>*.tar.gz</code>).</p>
<p>At this stage, a <code>debian/</code> directory has been created with template files, and you can start modifying the files and iterating towards actual working packaging.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-7"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-7" style="display:none;">git add debian/
git commit -a -m "Initial Debian packaging"</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>git add debian/
</span></span><span style="display:flex;"><span>git commit -a -m <span style="color:#e6db74">"Initial Debian packaging"</span></span></span></code></pre></div></div></div>
<h2 id="review-the-files"><a href="#review-the-files" class="header-anchor"></a>Review the files
</h2><p>The full list of files after the above steps with <code>dh_make</code> would be:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-8"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-8" style="display:none;">|-- entr
| |-- LICENSE
| |-- Makefile.bsd
| |-- Makefile.linux
| |-- Makefile.linux-compat
| |-- Makefile.macos
| |-- NEWS
| |-- README.md
| |-- configure
| |-- data.h
| |-- debian
| | |-- README.Debian
| | |-- README.source
| | |-- changelog
| | |-- control
| | |-- copyright
| | |-- gbp.conf
| | |-- entr-docs.docs
| | |-- entr.cron.d.ex
| | |-- entr.doc-base.ex
| | |-- manpage.1.ex
| | |-- manpage.md.ex
| | |-- manpage.sgml.ex
| | |-- manpage.xml.ex
| | |-- postinst.ex
| | |-- postrm.ex
| | |-- preinst.ex
| | |-- prerm.ex
| | |-- rules
| | |-- salsa-ci.yml.ex
| | |-- source
| | | `-- format
| | |-- upstream
| | | `-- metadata.ex
| | `-- watch.ex
| |-- entr.1
| |-- entr.c
| |-- missing
| | |-- compat.h
| | |-- kqueue_inotify.c
| | |-- strlcpy.c
| | `-- sys
| | `-- event.h
| |-- status.c
| |-- status.h
| `-- system_test.sh
`-- entr_5.6.orig.tar.xz</code><pre><code>|-- entr
| |-- LICENSE
| |-- Makefile.bsd
| |-- Makefile.linux
| |-- Makefile.linux-compat
| |-- Makefile.macos
| |-- NEWS
| |-- README.md
| |-- configure
| |-- data.h
| |-- debian
| | |-- README.Debian
| | |-- README.source
| | |-- changelog
| | |-- control
| | |-- copyright
| | |-- gbp.conf
| | |-- entr-docs.docs
| | |-- entr.cron.d.ex
| | |-- entr.doc-base.ex
| | |-- manpage.1.ex
| | |-- manpage.md.ex
| | |-- manpage.sgml.ex
| | |-- manpage.xml.ex
| | |-- postinst.ex
| | |-- postrm.ex
| | |-- preinst.ex
| | |-- prerm.ex
| | |-- rules
| | |-- salsa-ci.yml.ex
| | |-- source
| | | `-- format
| | |-- upstream
| | | `-- metadata.ex
| | `-- watch.ex
| |-- entr.1
| |-- entr.c
| |-- missing
| | |-- compat.h
| | |-- kqueue_inotify.c
| | |-- strlcpy.c
| | `-- sys
| | `-- event.h
| |-- status.c
| |-- status.h
| `-- system_test.sh
`-- entr_5.6.orig.tar.xz</code></pre></div>
<p>You can browse these files in the <a class="link" href="https://salsa.debian.org/otto/entr-demo/-/tree/debian/latest-dh-make/debian" target="_blank" rel="noopener"
>demo repository</a>.</p>
<p>The mandatory files in the <code>debian/</code> directory are:</p>
<ul>
<li><code>changelog</code>,</li>
<li><code>control</code>,</li>
<li><code>copyright</code>,</li>
<li>and <code>rules</code>.</li>
</ul>
<p>All the other files have been created for convenience so the packager has template files to work from. The files with the suffix <code>.ex</code> are example files that won’t have any effect until their content is adjusted and the suffix removed.</p>
<p>For detailed explanations of the purpose of each file in the <code>debian/</code> subdirectory, see the following resources:</p>
<ul>
<li><a class="link" href="https://www.debian.org/doc/debian-policy/" target="_blank" rel="noopener"
>The Debian Policy Manual</a>: Describes the structure of the operating system, the package archive and requirements for packages to be included in the Debian archive.</li>
<li><a class="link" href="https://www.debian.org/doc/manuals/developers-reference/developers-reference.en.html" target="_blank" rel="noopener"
>The Developer’s Reference</a>: A collection of best practices and process descriptions Debian packagers are expected to follow while interacting with one another.</li>
<li><a class="link" href="https://manpages.debian.org/unstable/debhelper/" target="_blank" rel="noopener"
>Debhelper man pages</a>: Detailed information of how the Debian package build system works, and how the contents of the various files in ‘debian/’ affect the end result.</li>
</ul>
<p>As Entr, the package used in this example, is a real package that already exists in the Debian archive, you may want to browse the actual Debian packaging source at <a class="link" href="https://salsa.debian.org/debian/entr/-/tree/debian/latest/debian" target="_blank" rel="noopener"
>https://salsa.debian.org/debian/entr/-/tree/debian/latest/debian</a> for reference.</p>
<p>Most of these files have standardized formatting conventions to make collaboration easier. To automatically format the files following the most popular conventions, simply run <code>wrap-and-sort -vast</code> or <code>debputy reformat --style=black</code>.</p>
<h2 id="identify-build-dependencies"><a href="#identify-build-dependencies" class="header-anchor"></a>Identify build dependencies
</h2><p>The most common reason for builds to fail is missing dependencies. The easiest way to identify which Debian package ships the required dependency is by using <a class="link" href="https://manpages.debian.org/unstable/apt-file/apt-file.1.en.html" target="_blank" rel="noopener"
>apt-file</a>. If, for example, a build fails complaining that <code>pcre2posix.h cannot be found</code> or that <code>libcre2-posix.so</code> is missing, you can use these commands:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-9"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-9" style="display:none;">$ apt install -q --yes apt-file && apt-file update
$ apt-file search pcre2posix.h
libpcre2-dev: /usr/include/pcre2posix.h
$ apt-file search libpcre2-posix.so
libpcre2-dev: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so
libpcre2-posix3: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so.3
libpcre2-posix3: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so.3.0.6</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ apt install -q --yes apt-file <span style="color:#f92672">&&</span> apt-file update
</span></span><span style="display:flex;"><span>$ apt-file search pcre2posix.h
</span></span><span style="display:flex;"><span>libpcre2-dev: /usr/include/pcre2posix.h
</span></span><span style="display:flex;"><span>$ apt-file search libpcre2-posix.so
</span></span><span style="display:flex;"><span>libpcre2-dev: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so
</span></span><span style="display:flex;"><span>libpcre2-posix3: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so.3
</span></span><span style="display:flex;"><span>libpcre2-posix3: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so.3.0.6</span></span></code></pre></div></div></div>
<p>The output above implies that the <code>debian/control</code> should be extended to define a <code>Build-Depends: libpcre2-dev</code> relationship.</p>
<p>There is also <a class="link" href="https://manpages.debian.org/unstable/devscripts/dpkg-depcheck.1.en.html" target="_blank" rel="noopener"
>dpkg-depcheck</a> that uses <a class="link" href="https://manpages.debian.org/unstable/strace/strace.1.en.html" target="_blank" rel="noopener"
>strace</a> to trace the files the build process tries to access, and lists what Debian packages those files belong to. Example usage:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-10"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-10" style="display:none;">dpkg-depcheck -b debian/rules build</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>dpkg-depcheck -b debian/rules build</span></span></code></pre></div></div></div>
<h2 id="build-the-debian-sources-to-generate-the-deb-package"><a href="#build-the-debian-sources-to-generate-the-deb-package" class="header-anchor"></a>Build the Debian sources to generate the .deb package
</h2><p>After the first pass of refining the contents of the files in <code>debian/</code>, test the build by running <a class="link" href="https://manpages.debian.org/unstable/dpkg-dev/dpkg-buildpackage.1.en.html" target="_blank" rel="noopener"
>dpkg-buildpackage</a> inside the container:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-11"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-11" style="display:none;">dpkg-buildpackage -uc -us -b</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>dpkg-buildpackage -uc -us -b</span></span></code></pre></div></div></div>
<p>The options <code>-uc -us</code> will skip signing the resulting Debian source package and other build artifacts. The <code>-b</code> option will skip creating a source package and only build the (binary) <code>*.deb</code> packages.</p>
<p>The output is very verbose and gives a large amount of context about what is happening during the build to make debugging build failures easier. In the build log of <code>entr</code> you will see for example the line <code>dh binary --buildsystem=makefile</code>. This and other <code>dh</code> commands can also be run manually if there is a need to quickly repeat only a part of the build while debugging build failures.</p>
<p>To see what files were generated or modified by the build simply run <code>git status --ignored</code>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-12"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-12" style="display:none;">$ git status --ignored
On branch debian/latest
Untracked files:
(use "git add <file>..." to include in what will be committed)
debian/debhelper-build-stamp
debian/entr.debhelper.log
debian/entr.substvars
debian/files
Ignored files:
(use "git add -f <file>..." to include in what will be committed)
Makefile
compat.c
compat.o
debian/.debhelper/
debian/entr/
entr
entr.o
status.o</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ git status --ignored
</span></span><span style="display:flex;"><span>On branch debian/latest
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Untracked files:
</span></span><span style="display:flex;"><span> <span style="color:#f92672">(</span>use <span style="color:#e6db74">"git add <file>..."</span> to include in what will be committed<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span> debian/debhelper-build-stamp
</span></span><span style="display:flex;"><span> debian/entr.debhelper.log
</span></span><span style="display:flex;"><span> debian/entr.substvars
</span></span><span style="display:flex;"><span> debian/files
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Ignored files:
</span></span><span style="display:flex;"><span> <span style="color:#f92672">(</span>use <span style="color:#e6db74">"git add -f <file>..."</span> to include in what will be committed<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span> Makefile
</span></span><span style="display:flex;"><span> compat.c
</span></span><span style="display:flex;"><span> compat.o
</span></span><span style="display:flex;"><span> debian/.debhelper/
</span></span><span style="display:flex;"><span> debian/entr/
</span></span><span style="display:flex;"><span> entr
</span></span><span style="display:flex;"><span> entr.o
</span></span><span style="display:flex;"><span> status.o</span></span></code></pre></div></div></div>
<p>Re-running <code>dpkg-buildpackage</code> will include running the command <code>dh clean</code>, which assuming it is configured correctly in the <code>debian/rules</code> file will reset the source directory to the original pristine state. The same can of course also be done with regular git commands <code>git reset --hard; git clean -fdx</code>. To avoid accidentally committing unnecessary build artifacts in Git, a <code>debian/.gitignore</code> can be useful and it would typically include all four files listed as “untracked” above.</p>
<p>After a successful build you would have the following files:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-13"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-13" style="display:none;">|-- entr
| |-- LICENSE
| |-- Makefile -> Makefile.linux
| |-- Makefile.bsd
| |-- Makefile.linux
| |-- Makefile.linux-compat
| |-- Makefile.macos
| |-- NEWS
| |-- README.md
| |-- compat.c
| |-- compat.o
| |-- configure
| |-- data.h
| |-- debian
| | |-- README.source.md
| | |-- changelog
| | |-- control
| | |-- copyright
| | |-- debhelper-build-stamp
| | |-- docs
| | |-- entr
| | | |-- DEBIAN
| | | | |-- control
| | | | `-- md5sums
| | | `-- usr
| | | |-- bin
| | | | `-- entr
| | | `-- share
| | | |-- doc
| | | | `-- entr
| | | | |-- NEWS.gz
| | | | |-- README.md
| | | | |-- changelog.Debian.gz
| | | | `-- copyright
| | | `-- man
| | | `-- man1
| | | `-- entr.1.gz
| | |-- entr.debhelper.log
| | |-- entr.substvars
| | |-- files
| | |-- gbp.conf
| | |-- patches
| | | |-- PR149-expand-aliases-in-system-test-script.patch
| | | |-- series
| | | |-- system-test-skip-no-tty.patch
| | | `-- system-test-with-system-binary.patch
| | |-- rules
| | |-- salsa-ci.yml
| | |-- source
| | | `-- format
| | |-- tests
| | | `-- control
| | |-- upstream
| | | |-- metadata
| | | `-- signing-key.asc
| | `-- watch
| |-- entr
| |-- entr.1
| |-- entr.c
| |-- entr.o
| |-- missing
| | |-- compat.h
| | |-- kqueue_inotify.c
| | |-- strlcpy.c
| | `-- sys
| | `-- event.h
| |-- status.c
| |-- status.h
| |-- status.o
| `-- system_test.sh
|-- entr-dbgsym_5.6-1_amd64.deb
|-- entr_5.6-1.debian.tar.xz
|-- entr_5.6-1.dsc
|-- entr_5.6-1_amd64.buildinfo
|-- entr_5.6-1_amd64.changes
|-- entr_5.6-1_amd64.deb
`-- entr_5.6.orig.tar.xz</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>|-- entr
</span></span><span style="display:flex;"><span>| |-- LICENSE
</span></span><span style="display:flex;"><span>| |-- Makefile -> Makefile.linux
</span></span><span style="display:flex;"><span>| |-- Makefile.bsd
</span></span><span style="display:flex;"><span>| |-- Makefile.linux
</span></span><span style="display:flex;"><span>| |-- Makefile.linux-compat
</span></span><span style="display:flex;"><span>| |-- Makefile.macos
</span></span><span style="display:flex;"><span>| |-- NEWS
</span></span><span style="display:flex;"><span>| |-- README.md
</span></span><span style="display:flex;"><span>| |-- compat.c
</span></span><span style="display:flex;"><span>| |-- compat.o
</span></span><span style="display:flex;"><span>| |-- configure
</span></span><span style="display:flex;"><span>| |-- data.h
</span></span><span style="display:flex;"><span>| |-- debian
</span></span><span style="display:flex;"><span>| | |-- README.source.md
</span></span><span style="display:flex;"><span>| | |-- changelog
</span></span><span style="display:flex;"><span>| | |-- control
</span></span><span style="display:flex;"><span>| | |-- copyright
</span></span><span style="display:flex;"><span>| | |-- debhelper-build-stamp
</span></span><span style="display:flex;"><span>| | |-- docs
</span></span><span style="display:flex;"><span>| | |-- entr
</span></span><span style="display:flex;"><span>| | | |-- DEBIAN
</span></span><span style="display:flex;"><span>| | | | |-- control
</span></span><span style="display:flex;"><span>| | | | <span style="color:#e6db74">`</span>-- md5sums
</span></span><span style="display:flex;"><span>| | | <span style="color:#e6db74">`</span>-- usr
</span></span><span style="display:flex;"><span>| | | |-- bin
</span></span><span style="display:flex;"><span>| | | | <span style="color:#e6db74">`</span>-- entr
</span></span><span style="display:flex;"><span>| | | <span style="color:#e6db74">`</span>-- share
</span></span><span style="display:flex;"><span>| | | |-- doc
</span></span><span style="display:flex;"><span>| | | | <span style="color:#e6db74">`</span>-- entr
</span></span><span style="display:flex;"><span>| | | | |-- NEWS.gz
</span></span><span style="display:flex;"><span>| | | | |-- README.md
</span></span><span style="display:flex;"><span>| | | | |-- changelog.Debian.gz
</span></span><span style="display:flex;"><span>| | | | <span style="color:#e6db74">`</span>-- copyright
</span></span><span style="display:flex;"><span>| | | <span style="color:#e6db74">`</span>-- man
</span></span><span style="display:flex;"><span>| | | <span style="color:#e6db74">`</span>-- man1
</span></span><span style="display:flex;"><span>| | | <span style="color:#e6db74">`</span>-- entr.1.gz
</span></span><span style="display:flex;"><span>| | |-- entr.debhelper.log
</span></span><span style="display:flex;"><span>| | |-- entr.substvars
</span></span><span style="display:flex;"><span>| | |-- files
</span></span><span style="display:flex;"><span>| | |-- gbp.conf
</span></span><span style="display:flex;"><span>| | |-- patches
</span></span><span style="display:flex;"><span>| | | |-- PR149-expand-aliases-in-system-test-script.patch
</span></span><span style="display:flex;"><span>| | | |-- series
</span></span><span style="display:flex;"><span>| | | |-- system-test-skip-no-tty.patch
</span></span><span style="display:flex;"><span>| | | <span style="color:#e6db74">`</span>-- system-test-with-system-binary.patch
</span></span><span style="display:flex;"><span>| | |-- rules
</span></span><span style="display:flex;"><span>| | |-- salsa-ci.yml
</span></span><span style="display:flex;"><span>| | |-- source
</span></span><span style="display:flex;"><span>| | | <span style="color:#e6db74">`</span>-- format
</span></span><span style="display:flex;"><span>| | |-- tests
</span></span><span style="display:flex;"><span>| | | <span style="color:#e6db74">`</span>-- control
</span></span><span style="display:flex;"><span>| | |-- upstream
</span></span><span style="display:flex;"><span>| | | |-- metadata
</span></span><span style="display:flex;"><span>| | | <span style="color:#e6db74">`</span>-- signing-key.asc
</span></span><span style="display:flex;"><span>| | <span style="color:#e6db74">`</span>-- watch
</span></span><span style="display:flex;"><span>| |-- entr
</span></span><span style="display:flex;"><span>| |-- entr.1
</span></span><span style="display:flex;"><span>| |-- entr.c
</span></span><span style="display:flex;"><span>| |-- entr.o
</span></span><span style="display:flex;"><span>| |-- missing
</span></span><span style="display:flex;"><span>| | |-- compat.h
</span></span><span style="display:flex;"><span>| | |-- kqueue_inotify.c
</span></span><span style="display:flex;"><span>| | |-- strlcpy.c
</span></span><span style="display:flex;"><span>| | <span style="color:#e6db74">`</span>-- sys
</span></span><span style="display:flex;"><span>| | <span style="color:#e6db74">`</span>-- event.h
</span></span><span style="display:flex;"><span>| |-- status.c
</span></span><span style="display:flex;"><span>| |-- status.h
</span></span><span style="display:flex;"><span>| |-- status.o
</span></span><span style="display:flex;"><span>| <span style="color:#e6db74">`</span>-- system_test.sh
</span></span><span style="display:flex;"><span>|-- entr-dbgsym_5.6-1_amd64.deb
</span></span><span style="display:flex;"><span>|-- entr_5.6-1.debian.tar.xz
</span></span><span style="display:flex;"><span>|-- entr_5.6-1.dsc
</span></span><span style="display:flex;"><span>|-- entr_5.6-1_amd64.buildinfo
</span></span><span style="display:flex;"><span>|-- entr_5.6-1_amd64.changes
</span></span><span style="display:flex;"><span>|-- entr_5.6-1_amd64.deb
</span></span><span style="display:flex;"><span><span style="color:#e6db74">`</span>-- entr_5.6.orig.tar.xz</span></span></code></pre></div></div></div>
<p>The contents of <code>debian/entr</code> are essentially what goes into the resulting <code>entr_5.6-1_amd64.deb</code> package. Familiarizing yourself with the majority of the files in the original upstream source as well as all the resulting build artifacts is time consuming, but it is a necessary investment to get high-quality Debian packages.</p>
<p>There are also tools such as <a class="link" href="https://salsa.debian.org/debian/debcraft" target="_blank" rel="noopener"
>Debcraft</a> that automate generating the build artifacts in separate output directories for each build, thus making it easy to compare the changes to correlate what change in the Debian packaging led to what change in the resulting build artifacts.</p>
<h2 id="re-run-the-initial-import-with-git-buildpackage"><a href="#re-run-the-initial-import-with-git-buildpackage" class="header-anchor"></a>Re-run the initial import with git-buildpackage
</h2><p>When upstreams publish releases as tarballs, they should also be imported for optimal software supply-chain security, in particular if upstream also publishes cryptographic signatures that can be used to verify the authenticity of the tarballs.</p>
<p>To achieve this, the files <code>debian/watch</code>, <code>debian/upstream/signing-key.asc</code>, and <code>debian/gbp.conf</code> need to be present with the correct options. In the <code>gbp.conf</code> file, ensure you have the correct options based on:</p>
<ol>
<li>Does upstream release tarballs? If so, enforce <code>pristine-tar = True</code>.</li>
<li>Does upstream sign the tarballs? If so, configure explicit signature checking with <code>upstream-signatures = on</code>.</li>
<li>Does upstream have a Git repository, and does it have release Git tags? If so, configure the release Git tag format, e.g. <code>upstream-vcs-tag = %(version%~%.)s</code>.</li>
</ol>
<p>To validate that the above files are working correctly, run <code>gbp import-orig</code> with the current version explicitly defined:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-14"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-14" style="display:none;">$ gbp import-orig --uscan --upstream-version 5.6
gbp:info: Launching uscan...
gpgv: Signature made 7. Aug 2024 07.43.27 PDT
gpgv: using RSA key 519151D83E83D40A232B4D615C418B8631BC7C26
gpgv: Good signature from "Eric Radman <ericshane@eradman.com>"
gbp:info: Using uscan downloaded tarball ../entr_5.6.orig.tar.gz
gbp:info: Importing '../entr_5.6.orig.tar.gz' to branch 'upstream/latest'...
gbp:info: Source package is entr
gbp:info: Upstream version is 5.6
gbp:info: Replacing upstream source on 'debian/latest'
gbp:info: Running Postimport hook
gbp:info: Successfully imported version 5.6 of ../entr_5.6.orig.tar.gz</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ gbp import-orig --uscan --upstream-version 5.6
</span></span><span style="display:flex;"><span>gbp:info: Launching uscan...
</span></span><span style="display:flex;"><span>gpgv: Signature made 7. Aug <span style="color:#ae81ff">2024</span> 07.43.27 PDT
</span></span><span style="display:flex;"><span>gpgv: using RSA key 519151D83E83D40A232B4D615C418B8631BC7C26
</span></span><span style="display:flex;"><span>gpgv: Good signature from <span style="color:#e6db74">"Eric Radman <ericshane@eradman.com>"</span>
</span></span><span style="display:flex;"><span>gbp:info: Using uscan downloaded tarball ../entr_5.6.orig.tar.gz
</span></span><span style="display:flex;"><span>gbp:info: Importing <span style="color:#e6db74">'../entr_5.6.orig.tar.gz'</span> to branch <span style="color:#e6db74">'upstream/latest'</span>...
</span></span><span style="display:flex;"><span>gbp:info: Source package is entr
</span></span><span style="display:flex;"><span>gbp:info: Upstream version is 5.6
</span></span><span style="display:flex;"><span>gbp:info: Replacing upstream source on <span style="color:#e6db74">'debian/latest'</span>
</span></span><span style="display:flex;"><span>gbp:info: Running Postimport hook
</span></span><span style="display:flex;"><span>gbp:info: Successfully imported version 5.6 of ../entr_5.6.orig.tar.gz</span></span></code></pre></div></div></div>
<p>As the original packaging was done based on the upstream release Git tag, the above command will fetch the tarball release, create the <code>pristine-tar</code> branch, and store the tarball delta on it. This command will also attempt to create the tag <code>upstream/5.6</code> on the <code>upstream/latest</code> branch.</p>
<h3 id="import-new-upstream-versions-in-the-future"><a href="#import-new-upstream-versions-in-the-future" class="header-anchor"></a>Import new upstream versions in the future
</h3><p>Forking the upstream Git repository, creating the initial packaging, and creating the DEP-14 branch structure are all one-off work needed only when creating the initial packaging.</p>
<p>Going forward, to import new upstream releases, one would simply run <code>git fetch upstreamvcs; gbp import-orig --uscan</code>, which fetches the upstream Git tags, checks for new upstream tarballs, and automatically downloads, verifies, and imports the new version. See the <a class="link" href="https://optimizedbyotto.com/post/debian-source-package-git/#try-it-yourself-example-repository-galera-4-demo" ><code>galera-4-demo</code> example in the <em>Debian source packages in Git explained</em> post</a> as a demo you can try running yourself and examine in detail.</p>
<p>You can also try running <code>gbp import-orig --uscan</code> without specifying a version. It would fetch it, as it will notice there is now Entr version 5.7 available, and import it.</p>
<h2 id="build-using-git-buildpackage"><a href="#build-using-git-buildpackage" class="header-anchor"></a>Build using git-buildpackage
</h2><p>From this stage onwards you should build the package using <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp-buildpackage.1.en.html" target="_blank" rel="noopener"
><code>gbp buildpackage</code></a>, which will do a more comprehensive build.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-15"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-15" style="display:none;">gbp buildpackage -uc -us</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gbp buildpackage -uc -us</span></span></code></pre></div></div></div>
<p>The <code>git-buildpackage</code> build also includes running <a class="link" href="https://manpages.debian.org/unstable/lintian/lintian.1.en.html" target="_blank" rel="noopener"
>Lintian</a> to find potential Debian policy violations in the sources or in the resulting <code>.deb</code> binary packages. Many Debian Developers run <code>lintian -EviIL +pedantic</code> after every build to check that there are no new nags, and to validate that changes intended to previous Lintian nags were correct.</p>
<h2 id="open-a-merge-request-on-salsa-for-debian-packaging-review"><a href="#open-a-merge-request-on-salsa-for-debian-packaging-review" class="header-anchor"></a>Open a Merge Request on Salsa for Debian packaging review
</h2><p>Getting everything perfectly right takes a lot of effort, and may require reaching out to an experienced <a class="link" href="https://www.debian.org/intro/people" target="_blank" rel="noopener"
>Debian Developers</a> for review and guidance. Thus, you should aim to publish your initial packaging work on Salsa, Debian’s GitLab instance, for review and feedback as early as possible.</p>
<p>For somebody to be able to easily see what you have done, you should rename your <code>debian/latest</code> branch to another name, for example <code>next/debian/latest</code>, and open a Merge Request that targets the <code>debian/latest</code> branch on your Salsa fork, which still has only the unmodified upstream files.</p>
<p>If you have followed the workflow in this post so far, you can simply run:</p>
<ol>
<li><code>git checkout -b next/debian/latest</code></li>
<li><code>git push --set-upstream origin next/debian/latest</code></li>
<li>Open in a browser the URL visible in the Git remote response</li>
<li>Write the Merge Request description in case the default text from your commit is not enough</li>
<li>Mark the MR as “Draft” using the checkbox</li>
<li>Publish the MR and request feedback</li>
</ol>
<p>Once a Merge Request exists, discussion regarding what additional changes are needed can be conducted as MR comments. With an MR, you can easily iterate on the contents of <code>next/debian/latest</code>, rebase, force push, and request re-review as many times as you want.</p>
<p>While at it, make sure the <em>Settings > CI/CD</em> page has under <em>CI/CD configuration file</em> the value <code>debian/salsa-ci.yml</code> so that the CI can run and give you immediate automated feedback.</p>
<p>For an example of an initial packaging Merge Request, see <a class="link" href="https://salsa.debian.org/otto/entr-demo/-/merge_requests/1" target="_blank" rel="noopener"
>https://salsa.debian.org/otto/entr-demo/-/merge_requests/1</a>.</p>
<h2 id="open-a-merge-request--pull-request-to-fix-upstream-code"><a href="#open-a-merge-request--pull-request-to-fix-upstream-code" class="header-anchor"></a>Open a Merge Request / Pull Request to fix upstream code
</h2><p>Due to the high quality requirements in Debian, it is fairly common that while doing the initial Debian packaging of an open source project, issues are found that stem from the upstream source code. While it is possible to carry extra patches in Debian, it is not good practice to deviate too much from upstream code with custom Debian patches. Instead, the Debian packager should try to get the fixes applied directly upstream.</p>
<p>Using <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp-pq.1.en.html" target="_blank" rel="noopener"
>git-buildpackage patch queues</a> is the most convenient way to make modifications to the upstream source code so that they automatically convert into Debian patches (stored at <code>debian/patches</code>), and can also easily be submitted upstream as any regular Git commit (and rebased and resubmitted many times over).</p>
<p>First, decide if you want to work out of the upstream development branch and later cherry-pick to the Debian packaging branch, or work out of the Debian packaging branch and cherry-pick to an upstream branch.</p>
<p>The example below starts from the upstream development branch and then cherry-picks the commit into the git-buildpackage patch queue:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-16"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-16" style="display:none;">git checkout -b bugfix-branch master
nano entr.c
make
./entr # verify change works as expected
git commit -a -m "Commit title" -m "Commit body"
git push # submit upstream
gbp pq import --force --time-machine=10
git cherry-pick <commit id>
git commit --amend # extend commit message with DEP-3 metadata
gbp buildpackage -uc -us -b
./entr # verify change works as expected
gbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>git checkout -b bugfix-branch master
</span></span><span style="display:flex;"><span>nano entr.c
</span></span><span style="display:flex;"><span>make
</span></span><span style="display:flex;"><span>./entr <span style="color:#75715e"># verify change works as expected</span>
</span></span><span style="display:flex;"><span>git commit -a -m <span style="color:#e6db74">"Commit title"</span> -m <span style="color:#e6db74">"Commit body"</span>
</span></span><span style="display:flex;"><span>git push <span style="color:#75715e"># submit upstream</span>
</span></span><span style="display:flex;"><span>gbp pq import --force --time-machine<span style="color:#f92672">=</span><span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>git cherry-pick <commit id>
</span></span><span style="display:flex;"><span>git commit --amend <span style="color:#75715e"># extend commit message with DEP-3 metadata</span>
</span></span><span style="display:flex;"><span>gbp buildpackage -uc -us -b
</span></span><span style="display:flex;"><span>./entr <span style="color:#75715e"># verify change works as expected</span>
</span></span><span style="display:flex;"><span>gbp pq export --drop --commit
</span></span><span style="display:flex;"><span>git commit --amend <span style="color:#75715e"># Write commit message along lines "Add patch to .."</span></span></span></code></pre></div></div></div>
<p>The example below starts by making the fix on a git-buildpackage patch queue branch, and then cherry-picking it onto the upstream development branch:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-17"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-17" style="display:none;">gbp pq import --force --time-machine=10
nano entr.c
git commit -a -m "Commit title" -m "Commit body"
gbp buildpackage -uc -us -b
./entr # verify change works as expected
gbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."
git checkout -b bugfix-branch master
git cherry-pick <commit id>
git commit --amend # prepare commit message for upstream submission
git push # submit upstream</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gbp pq import --force --time-machine<span style="color:#f92672">=</span><span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>nano entr.c
</span></span><span style="display:flex;"><span>git commit -a -m <span style="color:#e6db74">"Commit title"</span> -m <span style="color:#e6db74">"Commit body"</span>
</span></span><span style="display:flex;"><span>gbp buildpackage -uc -us -b
</span></span><span style="display:flex;"><span>./entr <span style="color:#75715e"># verify change works as expected</span>
</span></span><span style="display:flex;"><span>gbp pq export --drop --commit
</span></span><span style="display:flex;"><span>git commit --amend <span style="color:#75715e"># Write commit message along lines "Add patch to .."</span>
</span></span><span style="display:flex;"><span>git checkout -b bugfix-branch master
</span></span><span style="display:flex;"><span>git cherry-pick <commit id>
</span></span><span style="display:flex;"><span>git commit --amend <span style="color:#75715e"># prepare commit message for upstream submission</span>
</span></span><span style="display:flex;"><span>git push <span style="color:#75715e"># submit upstream</span></span></span></code></pre></div></div></div>
<p>The key git-buildpackage commands to enter and exit the patch-queue mode are:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-18"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-18" style="display:none;">gbp pq import --force --time-machine=10
gbp pq export --drop --commit</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gbp pq import --force --time-machine<span style="color:#f92672">=</span><span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>gbp pq export --drop --commit</span></span></code></pre></div></div></div>
<pre class="mermaid">%%{init: { 'gitGraph': { 'mainBranchName': 'debian/latest' } } }%%
gitGraph
checkout debian/latest
commit id: "Initial packaging"
branch patch-queue/debian/latest
checkout patch-queue/debian/latest
commit id: "Delete debian/patches/..."
commit id: "Patch 1 title"
commit id: "Patch 2 title"
commit id: "Patch 3 title"
</pre>
<p>These can be run at any time, regardless if any <code>debian/patches</code> existed prior, or if existing patches applied cleanly or not, or if there were old patch queue branches around. Note that the extra <code>-b</code> in <code>gbp buildpackage -uc -us -b</code> instructs to build only binary packages, avoiding any nags from <code>dpkg-source</code> that there are modifications in the upstream sources while building in the patches-applied mode.</p>
<h2 id="programming-language-specific-dh-make-alternatives"><a href="#programming-language-specific-dh-make-alternatives" class="header-anchor"></a>Programming-language specific dh-make alternatives
</h2><p>As each programming language has its specific way of building the source code, and many other conventions regarding the file layout and more, Debian has multiple custom tools to create new Debian source packages for specific programming languages.</p>
<ul>
<li>Go: <a class="link" href="https://manpages.debian.org/unstable/dh-make-golang/dh-make-golang.1.en.html" target="_blank" rel="noopener"
>dh-make-golang</a></li>
<li>Haskell: <a class="link" href="https://manpages.debian.org/unstable/cabal-debian/cabal-debian.1.en.html" target="_blank" rel="noopener"
>cabal-debian</a></li>
<li>Java: <a class="link" href="https://manpages.debian.org/unstable/javahelper/jh_makepkg.1.en.html" target="_blank" rel="noopener"
>jh_makepkg</a></li>
<li>JavaScript/Node.js: <a class="link" href="https://manpages.debian.org/unstable/npm2deb/npm2deb.1.en.html" target="_blank" rel="noopener"
>npm2deb</a></li>
<li>Lua: <a class="link" href="https://manpages.debian.org/unstable/dh-lua/dh-lua.7.en.html" target="_blank" rel="noopener"
>dh-lua</a></li>
<li>OCaml: <a class="link" href="https://manpages.debian.org/unstable/dh-ocaml/dh-ocaml.7.en.html" target="_blank" rel="noopener"
>dh-ocaml</a></li>
<li>Perl: <a class="link" href="https://manpages.debian.org/unstable/dh-make-perl/dh-make-perl.1p.en.html" target="_blank" rel="noopener"
>dh-make-perl</a></li>
<li>PHP: <a class="link" href="https://manpages.debian.org/unstable/pkg-php-tools/pkg-php-tools.7.en.html" target="_blank" rel="noopener"
>pkg-php-tools</a></li>
<li>Ruby: <a class="link" href="https://manpages.debian.org/unstable/gem2deb/gem2deb.1.en.html" target="_blank" rel="noopener"
>gem2deb</a></li>
</ul>
<p>Notably, Python does not have its own tool, but there is an <code>dh_make --python</code> option for <a class="link" href="https://manpages.debian.org/unstable/dh-make/dh_make.1.en.html" target="_blank" rel="noopener"
>Python support directly in dh_make itself</a>. The list is not complete and many more tools exist. For some languages, there are even competing options, such as for Go there is in addition to <code>dh-make-golang</code> also <a class="link" href="https://manpages.debian.org/unstable/gophian/gophian.1.en.html" target="_blank" rel="noopener"
>Gophian</a>.</p>
<p>When learning Debian packaging, there is no need to learn these tools upfront. Being aware that they exist is enough, and one can learn them only if and when one starts packaging a project in a new programming language.</p>
<h2 id="the-difference-between-source-git-repository-vs-source-packages-vs-binary-packages"><a href="#the-difference-between-source-git-repository-vs-source-packages-vs-binary-packages" class="header-anchor"></a>The difference between source Git repository vs source packages vs binary packages
</h2><p>As seen in earlier example, running <code>gbp buildpackage</code> on the Entr packaging repository above will result in several files:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-20"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-20" style="display:none;">entr_5.6-1_amd64.changes
entr_5.6-1_amd64.deb
entr_5.6-1.debian.tar.xz
entr_5.6-1.dsc
entr_5.6.orig.tar.gz
entr_5.6.orig.tar.gz.asc</code><pre><code>entr_5.6-1_amd64.changes
entr_5.6-1_amd64.deb
entr_5.6-1.debian.tar.xz
entr_5.6-1.dsc
entr_5.6.orig.tar.gz
entr_5.6.orig.tar.gz.asc</code></pre></div>
<p>The <code>entr_5.6-1_amd64.deb</code> is the <em>binary package</em>, which can be installed on a Debian/Ubuntu system. The rest of the files constitute the <em>source package</em>. To do a source-only build, run <code>gbp buildpackage -S</code> and note the files produced:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-21"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-21" style="display:none;">entr_5.6-1_source.changes
entr_5.6-1.debian.tar.xz
entr_5.6-1.dsc
entr_5.6.orig.tar.gz
entr_5.6.orig.tar.gz.asc</code><pre><code>entr_5.6-1_source.changes
entr_5.6-1.debian.tar.xz
entr_5.6-1.dsc
entr_5.6.orig.tar.gz
entr_5.6.orig.tar.gz.asc</code></pre></div>
<p>The source package files can be used to build the binary <code>.deb</code> for amd64, or any architecture that the package supports. It is important to grasp that the Debian source package is the preferred form to be able to build the binary packages on various Debian build systems, and the <em>Debian source package</em> is not the same thing as the <em>Debian packaging Git repository</em> contents.</p>
<pre class="mermaid">flowchart LR
git[Git repository<br>branch debian/latest] -->|gbp buildpackage -S| src[Source Package<br>.dsc + .tar.xz]
src -->|dpkg-buildpackage| bin[Binary Packages<br>.deb]
</pre>
<p>If the package is large and complex, the build could result in multiple binary packages. One set of package definition files in <code>debian/</code> will however only ever result in a single source package.</p>
<h2 id="option-to-repackage-source-packages-with-files-excluded-lists-in-the-debiancopyright-file"><a href="#option-to-repackage-source-packages-with-files-excluded-lists-in-the-debiancopyright-file" class="header-anchor"></a>Option to repackage source packages with <code>Files-Excluded</code> lists in the <code>debian/copyright</code> file
</h2><p>Some upstream projects may include binary files in their release, or other undesirable content that needs to be omitted from the source package in Debian. The easiest way to filter them out is by adding to the <code>debian/copyright</code> file a <code>Files-Excluded</code> field listing the undesired files. The <code>debian/copyright</code> file is read by <code>uscan</code>, which will repackage the upstream sources on-the-fly when importing new upstream releases.</p>
<p>For a real-life example, see the <a class="link" href="https://salsa.debian.org/games-team/godot/-/blob/debian/latest/debian/copyright" target="_blank" rel="noopener"
><code>debian/copyright</code> files in the Godot package</a> that lists:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">debian</span>
<button
class="codeblock-copy"
data-id="codeblock-id-23"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-23" style="display:none;">Files-Excluded: platform/android/java/gradle/wrapper/gradle-wrapper.jar</code><pre><code>Files-Excluded: platform/android/java/gradle/wrapper/gradle-wrapper.jar</code></pre></div>
<p>The resulting repackaged upstream source tarball, as well as the upstream version component, will have an extra <code>+ds</code> to signify that it is not the true original upstream source but has been modified by Debian:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-24"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-24" style="display:none;">godot_4.3+ds.orig.tar.xz
godot_4.3+ds-1_amd64.deb</code><pre><code>godot_4.3+ds.orig.tar.xz
godot_4.3+ds-1_amd64.deb</code></pre></div>
<h2 id="creating-one-debian-source-package-from-multiple-upstream-source-packages-also-possible"><a href="#creating-one-debian-source-package-from-multiple-upstream-source-packages-also-possible" class="header-anchor"></a>Creating one Debian source package from multiple upstream source packages also possible
</h2><p>In some rare cases the upstream project may be split across multiple Git repositories or the upstream release may consist of multiple components each in their own separate tarball. Usually these are very large projects that get some benefits from releasing components separately. If in Debian these are deemed to go into a single source package, it is technically possible using the <em>component</em> system in git-buildpackage and uscan. For an example see the <a class="link" href="https://salsa.debian.org/js-team/node-cacache/-/blob/master/debian/gbp.conf" target="_blank" rel="noopener"
>gbp.conf</a> and <a class="link" href="https://salsa.debian.org/js-team/node-cacache/-/blob/master/debian/watch" target="_blank" rel="noopener"
>watch</a> files in the node-cacache package.</p>
<p>Using this type of structure should be a last resort, as it creates complexity and inter-dependencies that are bound to cause issues later on. It is usually better to work with upstream and champion universal best practices with clear releases and version schemes.</p>
<h2 id="when-not-to-start-the-debian-packaging-repository-as-a-fork-of-the-upstream-one"><a href="#when-not-to-start-the-debian-packaging-repository-as-a-fork-of-the-upstream-one" class="header-anchor"></a>When not to start the Debian packaging repository as a fork of the upstream one
</h2><p>Not all upstreams use Git for version control. It is by far the most popular, but there are still some that use e.g. Subversion or Mercurial. Who knows — maybe in the future some new version control systems will start to compete with Git. There are also projects that use Git in massive monorepos and with complex submodule setups that invalidate the basic assumptions required to map an upstream Git repository into a Debian packaging repository.</p>
<p>In those cases one can’t use a <code>debian/latest</code> branch on a clone of the upstream Git repository as the starting point for the Debian packaging, but one must revert to the traditional way of starting from an upstream release tarball with <code>gbp import-orig package-1.0.tar.gz</code>.</p>
<h2 id="conclusion"><a href="#conclusion" class="header-anchor"></a>Conclusion
</h2><p>Created in August 1993, Debian is one of the oldest Linux distributions. In the 32 years since inception, the <code>.deb</code> packaging format and the tooling to work with it have evolved several generations. In the past 10 years, more and more Debian Developers have converged on certain core practices evidenced by <a class="link" href="https://trends.debian.net/" target="_blank" rel="noopener"
>https://trends.debian.net/</a>, but there is still a lot of variance in workflows even for identical tasks. Hopefully, you find this post useful in giving practical guidance on how exactly to do the most common things when packaging software for Debian.</p>
<p>Happy packaging!</p> Going full-time as an open source developer https://optimizedbyotto.com/post/full-time-open-source-developer/Wed, 16 Apr 2025 00:00:00 +0000 https://optimizedbyotto.com/post/full-time-open-source-developer/ <img src="https://optimizedbyotto.com/post/full-time-open-source-developer/featured-image.jpg" alt="Featured image of post Going full-time as an open source developer" /><p>After careful consideration, I’ve decided to embark on a new chapter in my professional journey. I’ve left my position at AWS to dedicate <strong><em>at least</em> the next six months to developing open source software</strong> and strengthening digital ecosystems. My focus will be on contributing to Linux distributions (primarily Debian) and other <em>critical infrastructure components</em> that our modern society depends on, but which may not receive adequate attention or resources.</p>
<h2 id="the-evolution-of-open-source"><a href="#the-evolution-of-open-source" class="header-anchor"></a>The Evolution of Open Source
</h2><p><strong>Open source won.</strong> Over the 25+ years I’ve been involved in the open source movement, I’ve witnessed its remarkable evolution. Today, Linux powers billions of devices — from tiny embedded systems and Android smartphones to massive cloud datacenters and even space stations. Examine any modern large-scale digital system, and you’ll discover it’s built upon thousands of open source projects.</p>
<p><strong>I feel the priority for the open source movement should no longer be increasing adoption, but rather <em>solving how to best maintain</em> the vast ecosystem of software.</strong> This requires building <em>robust institutions and processes</em> to secure proper resourcing and ensure the collaborative development process remains efficient and leads to ever-increasing quality of software.</p>
<h2 id="what-is-special-about-debian"><a href="#what-is-special-about-debian" class="header-anchor"></a>What is Special About Debian?
</h2><p>Debian, established in 1993 by Ian Murdock, stands as one of these institutions that has demonstrated exceptional resilience. There is no single authority, but instead a complex web of various stakeholders, each with their own goals and sources of funding. Every idea needs to be championed at length to a wide audience and implemented through a process of organic evolution.</p>
<p>Thanks to this approach, Debian has been consistently delivering production-quality, universally useful software for over three decades. Having been a Debian Developer for more than ten years, I’m well-positioned to contribute meaningfully to this community.</p>
<p><strong>If your organization relies on Debian or its derivatives such as Ubuntu, and you’re interested in funding cyber infrastructure maintenance by <em>sponsoring Debian work</em>, please don’t hesitate to reach out.</strong> This could include package maintenance and version currency, improving automated upgrade testing, general quality assurance and supply chain security enhancements.</p>
<blockquote>
<p>Best way to reach me is <strong>by e-mail <em>otto at debian.org</em></strong>. You can also <a class="link" href="https://cal.com/ottok" target="_blank" rel="noopener"
><strong>book a 15-minute chat with me</strong></a> for a quick introduction.</p>
</blockquote>
<h2 id="grow-or-die"><a href="#grow-or-die" class="header-anchor"></a>Grow or Die
</h2><p>My four-year tenure as a Software Development Manager at Amazon Web Services was very interesting. I’m grateful for my time at AWS and proud of my team’s accomplishments, particularly for creating an open source contribution process that got Amazon from zero to the largest external contributor to the MariaDB open source database.</p>
<p>During this time, I got to experience and witness a plethora of interesting things. I will surely share some of my key learnings in future blog posts. Unfortunately, the rate of progress in this mammoth 1.5 million employee organization was slowing down, and I didn’t feel I learned much new in the last few years. This realization, combined with the opportunity cost of not spending enough time on new cutting-edge technology, motivated me to take this leap.</p>
<p>Being a full-time open source developer may not be financially the most lucrative idea, but I think it is an excellent way to force myself to truly assess what is important on a global scale and what areas I want to contribute to.</p>
<p>Working fully on open source presents a fascinating duality: you’re not bound by any external resource or schedule limitations, and the progress you make is directly proportional to how much energy you decide to invest. Yet, you also depend on collaboration with people you might never meet and who are not financially incentivized to collaborate. This will undoubtedly expose me to all kinds of challenges. But what would be better in fostering holistic personal growth? I know that deep down in my DNA, I am not made to stay cozy or to do easy things. <strong>I need momentum.</strong></p>
<p>OK, let’s get going 🙂</p> Debian Salsa CI in Google Summer of Code 2025 https://optimizedbyotto.com/post/debian-salsa-ci-gsoc-2025/Tue, 25 Mar 2025 00:00:00 +0000 https://optimizedbyotto.com/post/debian-salsa-ci-gsoc-2025/ <img src="https://optimizedbyotto.com/post/debian-salsa-ci-gsoc-2025/featured-image.jpg" alt="Featured image of post Debian Salsa CI in Google Summer of Code 2025" /><p>Are you a student aspiring to participate in the <a class="link" href="https://summerofcode.withgoogle.com/" target="_blank" rel="noopener"
>Google Summer of Code 2025</a>? Would you like to improve the continuous integration pipeline used at <a class="link" href="https://salsa.debian.org" target="_blank" rel="noopener"
>salsa.debian.org</a>, the Debian GitLab instance, to help improve the quality of tens of thousands of software packages in Debian?</p>
<p>This summer 2025, I and Emmanuel Arias will be participating as mentors in the GSoC program. We are available to mentor students who propose and develop improvements to the <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline" target="_blank" rel="noopener"
>Salsa CI pipeline</a>, as we are members of the Debian team that maintains it.</p>
<p>A post by Santiago Ruano Rincón in the <a class="link" href="https://about.gitlab.com/blog/2023/09/19/debian-customizes-ci-tooling-with-gitlab/" target="_blank" rel="noopener"
>GitLab blog explains what Salsa CI is</a> and its short history since inception in 2018. At the time of the article in fall 2023 there were 9000+ source packages in Debian using Salsa CI. Now in 2025 there are over <a class="link" href="https://codesearch.debian.net/search?q=salsa+path%3Adebian%2F.*.yml&literal=0" target="_blank" rel="noopener"
>27,000 source packages</a> in Debian using it, and since summer 2024 some Ubuntu developers have started using it for enhanced quality assurance of packaging changes before uploading new package revisions to Ubuntu. Personally, I have been using Salsa CI since its inception, and contributing as a team member since 2019. See my blog post about <a class="link" href="https://optimizedbyotto.com/post/gitlab-mariadb-debian/" >GitLab CI for MariaDB in Debian</a> for a description of an advanced and extensive use case.</p>
<p><strong>Helping Salsa CI is a great way to make a global impact</strong>, as it will help avoid regressions and improve the quality of Debian packages. The benefits reach far beyond just Debian, as it will <a class="link" href="https://en.wikipedia.org/wiki/List_of_Linux_distributions" target="_blank" rel="noopener"
>also help hundreds of Debian derivatives</a>, such as Ubuntu, Linux Mint, Tails, Purism PureOS, Pop!_OS, Zorin OS, Raspberry Pi OS, a large portion of Docker containers, and even the Windows Subsystem for Linux.</p>
<h2 id="improving-salsa-ci-more-features-robustness-speed"><a href="#improving-salsa-ci-more-features-robustness-speed" class="header-anchor"></a>Improving Salsa CI: more features, robustness, speed
</h2><p>While Salsa CI with contributions from 71 people is already quite mature and capable, there are many ideas floating around about how it could be further extended. For example, <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/147" target="_blank" rel="noopener"
>Salsa CI issue #147</a> describes various static analyzers and linters that may be generally useful. <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/411" target="_blank" rel="noopener"
>Issue #411</a> proposes using <a class="link" href="https://manpages.debian.org/bookworm/faketime/faketime.1.en.html" target="_blank" rel="noopener"
>libfaketime</a> to run <a class="link" href="https://manpages.debian.org/bookworm/autopkgtest/autopkgtest.1.en.html" target="_blank" rel="noopener"
>autopkgtest</a> on arbitrary future dates to test for failures caused by date assumptions, such as the <a class="link" href="https://en.wikipedia.org/wiki/Year_2038_problem" target="_blank" rel="noopener"
>Y2038 issue</a>.</p>
<p>There are also ideas about making Salsa CI more robust and code easier to reuse by refactoring some of the yaml scripts into independent scripts in <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/230" target="_blank" rel="noopener"
>#230</a>, which could make it easier to run Salsa CI locally as suggested in <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/169" target="_blank" rel="noopener"
>#169</a>. There are also ideas about improving the Salsa CI’s own CI to avoid regressions from pipeline changes in <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/318" target="_blank" rel="noopener"
>#318</a>.</p>
<p>The CI system is also better when it’s faster, and some speed improvement ideas have been noted in <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/412" target="_blank" rel="noopener"
>#412</a>.</p>
<p>Improvements don’t have to be limited to changes in the pipeline itself. A useful project would also be to update more Debian packages to use Salsa CI, and ensure they adopt it in an optimal way as noted in <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/416" target="_blank" rel="noopener"
>#416</a>. It would also be nice to have a dashboard with statistics about all public Salsa CI pipeline runs as suggested in <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/413" target="_blank" rel="noopener"
>#413</a>.</p>
<p>These and more ideas can be found in the <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/?sort=priority_desc&state=opened&or%5Blabel_name%5D%5B%5D=Accepting%20MRs&or%5Blabel_name%5D%5B%5D=Newcomer&or%5Blabel_name%5D%5B%5D=Nice-To-Have&first_page_size=100" target="_blank" rel="noopener"
>issue list by filtering for tags Newcomer, Nice-To-Have or Accepting MRs</a>. A Google Summer of Code proposal does not have to be limited to these existing ideas. Participants are also welcome to propose completely novel ideas!</p>
<h2 id="good-time-to-also-learn-debian-packaging"><a href="#good-time-to-also-learn-debian-packaging" class="header-anchor"></a>Good time to also learn Debian packaging
</h2><p>Anyone working with Debian team should also take the opportunity to <a class="link" href="https://optimizedbyotto.com/post/debian-maintainer-habits/" >learn Debian packaging</a>, and contribute to the packaging or maintenance of 1-2 packages in parallel to improving the Salsa CI. All Salsa CI team members are also Debian Developers who can mentor and sponsor uploads to Debian.</p>
<p>Maintaining a few packages is a <a class="link" href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food" target="_blank" rel="noopener"
>great way to eat your own cooking and experience Salsa CI from the user perspective</a>, and likely to make you better at Salsa CI development.</p>
<h2 id="apply-now"><a href="#apply-now" class="header-anchor"></a>Apply now!
</h2><p><strong>The contributor applications opened yesterday on March 24, so to participate act now!</strong> If you are an eligible student and want to attend, head over to <a class="link" href="https://summerofcode.withgoogle.com/" target="_blank" rel="noopener"
>summerofcode.withgoogle.com</a> to learn more.</p>
<p>There are over a thousand participating organizations, with <a class="link" href="https://summerofcode.withgoogle.com/programs/2025/organizations/debian" target="_blank" rel="noopener"
>Debian</a>, <a class="link" href="https://summerofcode.withgoogle.com/programs/2025/organizations/gitlab" target="_blank" rel="noopener"
>GitLab</a> and <a class="link" href="https://mariadb.com/kb/en/google-summer-of-code-2025/" target="_blank" rel="noopener"
>MariaDB</a> being some examples. Within these organizations there may be multiple subteams and projects to choose from. The <a class="link" href="https://wiki.debian.org/SummerOfCode2025/Projects" target="_blank" rel="noopener"
>full list of participating Debian projects</a> can be found in the Debian wiki.</p>
<p>If you are interested in <a class="link" href="https://wiki.debian.org/SummerOfCode2025/Projects#SummerOfCode2025.2FApprovedProjects.2FSalsaCI.Salsa_CI_in_Debian" target="_blank" rel="noopener"
>GSoC for Salsa CI</a> specifically, feel free to</p>
<ol>
<li>Reach out to me and Emmanuel by email at otto@ and eamanu@ (debian.org).</li>
<li>Sign up at salsa.debian.org for an account (note it takes a few days due to manual vetting and approval process)</li>
<li>Read the project <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/blob/master/README.md" target="_blank" rel="noopener"
>README</a>, <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/blob/master/STRUCTURE.md" target="_blank" rel="noopener"
>STRUCTURE</a> and <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/blob/master/CONTRIBUTING.md" target="_blank" rel="noopener"
>CONTRIBUTING</a> to get a developer’s overview</li>
<li>Participate in issue discussions at <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/" target="_blank" rel="noopener"
>https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/</a></li>
</ol>
<p>Note that you don’t have to wait for GSoC to officially start to contribute. In fact, it may be useful to start immediately by submitting a Merge Request to do some small contribution, just to learn the process and to get more familiar with how everything works, and the team maintaining Salsa CI. Looking forward to seeing new contributors!</p>
<blockquote>
<h2 id="update-may-8th-2025"><a href="#update-may-8th-2025" class="header-anchor"></a>Update May 8th, 2025
</h2><p>The approved Google Summer of Code 2025 students have now been announced, and I am thrilled that <a class="link" href="https://summerofcode.withgoogle.com/programs/2025/organizations/debian" target="_blank" rel="noopener"
>9 students got approved for Debian</a>. I will be mentoring two students, <a class="link" href="https://summerofcode.withgoogle.com/programs/2025/projects/mmwLagR0" target="_blank" rel="noopener"
>Aquila</a> and <a class="link" href="https://summerofcode.withgoogle.com/programs/2025/projects/YTXVewUk" target="_blank" rel="noopener"
>Aayush</a>. You can follow their progress on the <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline" target="_blank" rel="noopener"
>Salsa CI Team GitLab page</a>.</p>
</blockquote> Will decentralized social media soon go mainstream? https://optimizedbyotto.com/post/distributed-social-media/Wed, 05 Mar 2025 00:00:00 +0000 https://optimizedbyotto.com/post/distributed-social-media/ <img src="https://optimizedbyotto.com/post/distributed-social-media/featured-image.png" alt="Featured image of post Will decentralized social media soon go mainstream?" /><p>In today’s digital landscape, social media is more than just a communication tool — it is the primary medium for global discourse. Heads of state, corporate leaders and cultural influencers now broadcast their statements directly to the world, shaping public opinion in real time. However, the dominance of a few centralized platforms — X/Twitter, Facebook and YouTube — raises critical concerns about control, censorship and the monopolization of information. Those who control these networks effectively wield significant power over public discourse.</p>
<p>In response, a new wave of distributed social media platforms has emerged, each built on different decentralized protocols designed to provide greater autonomy, censorship resistance and user control. While <a class="link" href="https://en.wikipedia.org/wiki/Comparison_of_software_and_protocols_for_distributed_social_networking" target="_blank" rel="noopener"
>Wikipedia maintains a comprehensive list</a> of distributed social networking software and protocols, it does not cover recent blockchain-based systems, nor does it highlight which have the most potential for mainstream adoption.</p>
<p>This post explores the leading decentralized social media platforms and the protocols they are based on: <a class="link" href="https://joinmastodon.org/" target="_blank" rel="noopener"
>Mastodon</a> (ActivityPub), <a class="link" href="https://bsky.app/" target="_blank" rel="noopener"
>Bluesky</a> (AT Protocol), <a class="link" href="https://warpcast.com/" target="_blank" rel="noopener"
>Warpcast</a> (Farcaster), <a class="link" href="https://hey.xyz/" target="_blank" rel="noopener"
>Hey</a> (Lens) and <a class="link" href="https://primal.net/" target="_blank" rel="noopener"
>Primal</a> (Nostr).</p>
<h2 id="comparison-of-architecture-and-mainstream-adoption-potential"><a href="#comparison-of-architecture-and-mainstream-adoption-potential" class="header-anchor"></a>Comparison of architecture and mainstream adoption potential
</h2><table>
<thead>
<tr>
<th>Protocol</th>
<th>Identity System</th>
<th>Example</th>
<th>Storage model</th>
<th>Cost for end users</th>
<th>Potential</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Mastodon</strong></td>
<td>Tied to server domain</td>
<td><a class="link" href="https://mastodon.social/@ottok" target="_blank" rel="noopener"
><code>@ottok@mastodon.social</code></a></td>
<td>Federated instances</td>
<td>Free (some instances charge)</td>
<td>High</td>
</tr>
<tr>
<td><strong>Bluesky</strong></td>
<td>Portable (DID)</td>
<td><a class="link" href="https://bsky.app/profile/ottoke.bsky.social" target="_blank" rel="noopener"
><code>ottoke.bsky.social</code></a></td>
<td>Federated instances</td>
<td>Free</td>
<td>Moderate</td>
</tr>
<tr>
<td><strong>Farcaster</strong></td>
<td>ENS (Ethereum)</td>
<td><a class="link" href="https://warpcast.com/ottok" target="_blank" rel="noopener"
><code>@ottok</code></a></td>
<td>Blockchain + off-chain</td>
<td>Small gas fees</td>
<td>Moderate</td>
</tr>
<tr>
<td><strong>Lens</strong></td>
<td>NFT-based (Polygon)</td>
<td><a class="link" href="https://hey.xyz/u/ottok" target="_blank" rel="noopener"
><code>@ottok</code></a></td>
<td>Blockchain + off-chain</td>
<td>Small gas fees</td>
<td>Niche</td>
</tr>
<tr>
<td><strong>Nostr</strong></td>
<td>Cryptographic Keys</td>
<td><a class="link" href="https://primal.net/p/npub16lc6uhqpg6dnqajylkhwuh3j7ynhcnje508tt4v6703w9kjlv9vqzz4z7f" target="_blank" rel="noopener"
><code>npub16lc6uhqpg6dnqajylkhwuh3j7ynhcnje508tt4v6703w9kjlv9vqzz4z7f</code></a></td>
<td>Federated instances</td>
<td>Free (some instances charge)</td>
<td>Niche</td>
</tr>
</tbody>
</table>
<h2 id="1-mastodon-activitypub"><a href="#1-mastodon-activitypub" class="header-anchor"></a>1. Mastodon (ActivityPub)
</h2><p><img src="https://optimizedbyotto.com/post/distributed-social-media/mastodon-screenshot.png"
width="800"
height="611"
srcset="https://optimizedbyotto.com/post/distributed-social-media/mastodon-screenshot_hu4527169553592586294.png 480w, https://optimizedbyotto.com/post/distributed-social-media/mastodon-screenshot.png 800w"
loading="lazy"
alt="Screenshot of Mastodon"
class="gallery-image"
data-flex-grow="130"
data-flex-basis="314px"
>
</p>
<p>Mastodon was created in 2016 by <a class="link" href="https://mastodon.social/@Gargron" target="_blank" rel="noopener"
>Eugen Rochko</a>, a German software developer who sought to provide a decentralized and user-controlled alternative to Twitter. It was built on the <a class="link" href="https://activitypub.rocks/" target="_blank" rel="noopener"
>ActivityPub protocol</a>, now standardized by W3C Social Web Working Group, to allow users to join independent servers while still communicating across the broader Mastodon network.</p>
<p>Mastodon operates on a <em>federated model</em>, where multiple independently run servers communicate via ActivityPub. Each server sets its own moderation policies, leading to a decentralized but fragmented experience. The servers can alternatively be called instances, relays or nodes, depending on what vocabulary a protocol has standardized on.</p>
<ul>
<li><strong>Identity</strong>: User identity is tied to the instance where they registered, represented as <code>@username@instance.tld</code>.</li>
<li><strong>Storage</strong>: Data is stored on individual instances, which federate messages to other instances based on their configurations.</li>
<li><strong>Cost</strong>: Free to use, but relies on instance operators willing to run the servers.</li>
</ul>
<p>The protocol defines multiple activities such as:</p>
<ul>
<li>Creating a post</li>
<li>Liking</li>
<li>Sharing</li>
<li>Following</li>
<li>Commenting</li>
</ul>
<h3 id="example-message-in-activitypub-json-ld-format"><a href="#example-message-in-activitypub-json-ld-format" class="header-anchor"></a>Example Message in ActivityPub (JSON-LD Format)
</h3><div class="codeblock ">
<header>
<span class="codeblock-lang">json</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">{
"@context": "https://www.w3.org/ns/activitystreams",
"type": "Create",
"actor": "https://mastodon.social/users/ottok",
"object": {
"type": "Note",
"content": "Hello from #Mastodon!",
"published": "2025-03-03T12:00:00Z",
"to": ["https://www.w3.org/ns/activitystreams#Public"]
}
}</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"@context"</span>: <span style="color:#e6db74">"https://www.w3.org/ns/activitystreams"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"type"</span>: <span style="color:#e6db74">"Create"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"actor"</span>: <span style="color:#e6db74">"https://mastodon.social/users/ottok"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"object"</span>: {
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"type"</span>: <span style="color:#e6db74">"Note"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"content"</span>: <span style="color:#e6db74">"Hello from #Mastodon!"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"published"</span>: <span style="color:#e6db74">"2025-03-03T12:00:00Z"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"to"</span>: [<span style="color:#e6db74">"https://www.w3.org/ns/activitystreams#Public"</span>]
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span>}</span></span></code></pre></div></div></div>
<p>Servers communicate across different platforms by publishing activities to their followers or forwarding activities between servers. Standard HTTPS is used between servers for communication, and the messages use JSON-LD for data representation. The <a class="link" href="https://en.wikipedia.org/wiki/WebFinger" target="_blank" rel="noopener"
>WebFinger protocol</a> is used for user discovery. There is however no neat way for home server discovery yet. This means that if you are browsing e.g. <a class="link" href="https://fosstodon.org/" target="_blank" rel="noopener"
>Fosstodon</a> and want to follow a user and press <em>Follow</em>, a dialog will pop up asking you to enter your own <em>home server</em> (e.g. <a class="link" href="https://mastodon.social/" target="_blank" rel="noopener"
>mastodon.social</a>) to redirect you there for actually executing the <em>Follow</em> action on with your account.</p>
<p>Mastodon is open source under the <a class="link" href="https://en.wikipedia.org/wiki/GNU_Affero_General_Public_License" target="_blank" rel="noopener"
>AGPL</a> at <a class="link" href="https://github.com/mastodon/mastodon" target="_blank" rel="noopener"
>github.com/mastodon/mastodon</a>. Anyone can operate their own instance. It just requires to run your own server and some skills to maintain a Ruby on Rails app with a PostgreSQL database backend, and basic understanding of the protocol to configure federation with other ActivityPub instances.</p>
<h3 id="popularity-already-established-but-will-it-grow-more"><a href="#popularity-already-established-but-will-it-grow-more" class="header-anchor"></a>Popularity: Already established, but will it grow more?
</h3><p>Mastodon has seen steady growth, especially after Twitter’s acquisition in 2022, with some estimates stating it peaked at 10 million users across thousands of instances. However, its fragmented user experience and the complexity of choosing instances have hindered mainstream adoption. Still, it remains the most established decentralized alternative to Twitter.</p>
<p>Note that Donald Trump’s <a class="link" href="https://help.truthsocial.com/legal/open-source/" target="_blank" rel="noopener"
>Truth Social is based on the Mastodon software</a> but does not federate with the ActivityPub network.</p>
<p>The ActivityPub protocol is the most widely used of its kind. One of the other most popular services is the <a class="link" href="https://join-lemmy.org/" target="_blank" rel="noopener"
>Lemmy</a> link sharing service, similar to Reddit. The larger ecosystem of ActivityPub is called <a class="link" href="https://en.wikipedia.org/wiki/Fediverse" target="_blank" rel="noopener"
>Fediverse</a>, and estimates put the <a class="link" href="https://www.fediverse.to/" target="_blank" rel="noopener"
>total active user count</a> around 6 million.</p>
<h2 id="2-bluesky-at-protocol"><a href="#2-bluesky-at-protocol" class="header-anchor"></a>2. Bluesky (AT Protocol)
</h2><p><img src="https://optimizedbyotto.com/post/distributed-social-media/bluesky-screenshot.png"
width="800"
height="611"
srcset="https://optimizedbyotto.com/post/distributed-social-media/bluesky-screenshot_hu12409526736856122866.png 480w, https://optimizedbyotto.com/post/distributed-social-media/bluesky-screenshot.png 800w"
loading="lazy"
alt="Screenshot of Bluesky"
class="gallery-image"
data-flex-grow="130"
data-flex-basis="314px"
>
</p>
<p>Interestingly, <a class="link" href="https://bsky.app/" target="_blank" rel="noopener"
>Bluesky</a> was conceived within Twitter in 2019 by Twitter founder Jack Dorsey. After being incubated as a Twitter-funded project, it spun off as an independent Public Benefit LLC in February 2022 and launched its public beta in February 2023.</p>
<p>Bluesky runs on top of the <strong>Authenticated Transfer (AT) Protocol</strong> published at <a class="link" href="https://github.com/bluesky-social/atproto" target="_blank" rel="noopener"
>https://github.com/bluesky-social/atproto</a>. The protocol enables portable identities and data ownership, meaning users can migrate between platforms while keeping their identity and content intact. In practice, however, there is only one popular server at the moment, which is Bluesky itself.</p>
<ul>
<li><strong>Identity</strong>: Usernames are domain-based (e.g., <code>@user.bsky.social</code>).</li>
<li><strong>Storage</strong>: Content is theoretically federated among various servers.</li>
<li><strong>Cost</strong>: Free to use, but relies on instance operators willing to run the servers.</li>
</ul>
<h3 id="example-message-in-at-protocol-json-format"><a href="#example-message-in-at-protocol-json-format" class="header-anchor"></a>Example Message in AT Protocol (JSON Format)
</h3><div class="codeblock ">
<header>
<span class="codeblock-lang">json</span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">{
"repo": "did:plc:ottoke.bsky.social",
"collection": "app.bsky.feed.post",
"record": {
"$type": "app.bsky.feed.post",
"text": "Hello from Bluesky!",
"createdAt": "2025-03-03T12:00:00Z",
"langs": ["en"]
}
}</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"repo"</span>: <span style="color:#e6db74">"did:plc:ottoke.bsky.social"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"collection"</span>: <span style="color:#e6db74">"app.bsky.feed.post"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"record"</span>: {
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"$type"</span>: <span style="color:#e6db74">"app.bsky.feed.post"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"text"</span>: <span style="color:#e6db74">"Hello from Bluesky!"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"createdAt"</span>: <span style="color:#e6db74">"2025-03-03T12:00:00Z"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"langs"</span>: [<span style="color:#e6db74">"en"</span>]
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span>}</span></span></code></pre></div></div></div>
<h3 id="popularity-hybrid-approach-may-have-business-benefits"><a href="#popularity-hybrid-approach-may-have-business-benefits" class="header-anchor"></a>Popularity: Hybrid approach may have business benefits?
</h3><p>Bluesky reported over 3 million users by 2024, probably getting traction due to its Twitter-like interface and Jack Dorsey’s involvement. Its hybrid approach — decentralized identity with centralized components — could make it a strong candidate for mainstream adoption, assuming it can scale effectively.</p>
<h2 id="3-warpcast-farcaster-network"><a href="#3-warpcast-farcaster-network" class="header-anchor"></a>3. Warpcast (Farcaster Network)
</h2><p>Farcaster was launched in 2021 by <a class="link" href="https://warpcast.com/dan" target="_blank" rel="noopener"
>Dan Romero</a> and <a class="link" href="https://warpcast.com/varunsrin" target="_blank" rel="noopener"
>Varun Srinivasan</a>, both former crypto exchange Coinbase executives, to create a decentralized but user-friendly social network. Built on the <a class="link" href="https://ethereum.org/" target="_blank" rel="noopener"
>Ethereum blockchain</a>, it could potentially offer a very attack-resistant communication medium.</p>
<p>However, in my own testing, Farcaster does not seem to fully leverage what Ethereum could offer. First of all, there is no diversity in programs implementing the protocol as at the moment there is only <a class="link" href="https://warpcast.com/" target="_blank" rel="noopener"
>Warpcast</a>. In Warpcast the signup requires an initial 5 USD fee that is not payable in ETH, and users need to create a new wallet address on the <a class="link" href="https://en.wikipedia.org/w/index.php?title=Base_%28blockchain%29&redirect=no" target="_blank" rel="noopener"
>Ethereum layer 2 network Base</a> instead of simply reusing their existing Ethereum wallet address or <a class="link" href="https://ens.domains/" target="_blank" rel="noopener"
>ENS name</a>.</p>
<p>Despite this, I can understand why Farcaster may have decided to start out like this. Having a single client program may be the best strategy initially. One of the <a class="link" href="https://matrix.org/" target="_blank" rel="noopener"
>decentralized chat protocol Matrix</a> founders, Matthew Hodgson, shared in <a class="link" href="https://fosdem.org/2025/schedule/event/fosdem-2025-6274-the-road-to-mainstream-matrix/" target="_blank" rel="noopener"
>his FOSDEM 2025</a> talk that he slightly regrets focusing too much on developing the protocol instead of making sure the app to use it is attractive to end users. So it may be sensible to ensure Warpcast gets popular first, before attempting to make the Farcaster protocol widely used.</p>
<p>As a protocol Farcaster’s hybrid approach makes it more scalable than fully on-chain networks, giving it a higher chance of mainstream adoption if it integrates seamlessly with broader Web3 ecosystems.</p>
<ul>
<li><strong>Identity</strong>: ENS (Ethereum Name Service) domains are used as usernames.</li>
<li><strong>Storage</strong>: Messages are stored in off-chain hubs, while identity is on-chain.</li>
<li><strong>Cost</strong>: Users must pay gas fees for some operations but reading and posting messages is mostly free.</li>
</ul>
<h3 id="example-message-in-farcaster-json-format"><a href="#example-message-in-farcaster-json-format" class="header-anchor"></a>Example Message in Farcaster (JSON Format)
</h3><div class="codeblock ">
<header>
<span class="codeblock-lang">json</span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">{
"fid": 766579,
"username": "ottok",
"custodyAddress": "0x127853e48be3870172baa4215d63b6d815d18f21",
"connectedWallet": "0x3ebe43aa3ae5b891ca1577d9c49563c0cee8da88",
"text": "Hello from Farcaster!",
"publishedAt": 1709424000,
"replyTo": null,
"embeds": []
}</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"fid"</span>: <span style="color:#ae81ff">766579</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"username"</span>: <span style="color:#e6db74">"ottok"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"custodyAddress"</span>: <span style="color:#e6db74">"0x127853e48be3870172baa4215d63b6d815d18f21"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"connectedWallet"</span>: <span style="color:#e6db74">"0x3ebe43aa3ae5b891ca1577d9c49563c0cee8da88"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"text"</span>: <span style="color:#e6db74">"Hello from Farcaster!"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"publishedAt"</span>: <span style="color:#ae81ff">1709424000</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"replyTo"</span>: <span style="color:#66d9ef">null</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"embeds"</span>: []
</span></span><span style="display:flex;"><span>}</span></span></code></pre></div></div></div>
<h3 id="popularity-decentralized-social-media--decentralized-payments-a-winning-combo"><a href="#popularity-decentralized-social-media--decentralized-payments-a-winning-combo" class="header-anchor"></a>Popularity: Decentralized social media + decentralized payments a winning combo?
</h3><p>Ethereum founder Vitalik Buterin (<a class="link" href="https://warpcast.com/vbuterin" target="_blank" rel="noopener"
>warpcast.com/vbuterin</a>) and many core developers are active on the platform. Warpcast, the main client for Farcaster, has seen increasing adoption, especially among Ethereum developers and Web3 enthusiasts. I too have an profile at <a class="link" href="https://warpcast.com/ottok" target="_blank" rel="noopener"
>warpcast.com/ottok</a>. However, the numbers are still very low and far from reaching network effects to really take off.</p>
<p>Blockchain-based social media networks, particularly those built on Ethereum, are compelling because they leverage existing user wallets and persistent identities while enabling native payment functionality. When combined with decentralized content funding through micropayments, these blockchain-backed social networks could offer unique advantages that centralized platforms may find difficult to replicate, being decentralized both as a technical network and in a funding mechanism.</p>
<h2 id="4-heyxyz-lens-network"><a href="#4-heyxyz-lens-network" class="header-anchor"></a>4. Hey.xyz (Lens Network)
</h2><p>The Lens Protocol was developed by decentralized finance (DeFi) team <a class="link" href="https://aave.com/" target="_blank" rel="noopener"
>Aave</a> and launched in May 2022 to provide a user-owned social media network. While initially built on <a class="link" href="https://polygon.technology/" target="_blank" rel="noopener"
>Polygon</a>, it has since launched its own Layer 2 network called the Lens Network in February 2024. Lens is currently the main competitor to Farcaster.</p>
<p>Lens stores profile ownership and references on-chain, while content is stored on <a class="link" href="https://ipfs.tech/" target="_blank" rel="noopener"
>IPFS</a>/<a class="link" href="https://arweave.org/" target="_blank" rel="noopener"
>Arweave</a>, enabling composability with DeFi and NFTs.</p>
<ul>
<li><strong>Identity</strong>: Profile ownership is tied to NFTs on the Polygon blockchain.</li>
<li><strong>Storage</strong>: Content is on-chain and integrates with IPFS/Arweave (like NFTs).</li>
<li><strong>Cost</strong>: Users must pay gas fees for some operations but reading and posting messages is mostly free.</li>
</ul>
<h3 id="example-message-in-lens-json-format"><a href="#example-message-in-lens-json-format" class="header-anchor"></a>Example Message in Lens (JSON Format)
</h3><div class="codeblock ">
<header>
<span class="codeblock-lang">json</span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">{
"profileId": "@ottok",
"contentURI": "ar://QmExampleHash",
"collectModule": "0x23b9467334bEb345aAa6fd1545538F3d54436e96",
"referenceModule": "0x0000000000000000000000000000000000000000",
"timestamp": 1709558400
}</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"profileId"</span>: <span style="color:#e6db74">"@ottok"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"contentURI"</span>: <span style="color:#e6db74">"ar://QmExampleHash"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"collectModule"</span>: <span style="color:#e6db74">"0x23b9467334bEb345aAa6fd1545538F3d54436e96"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"referenceModule"</span>: <span style="color:#e6db74">"0x0000000000000000000000000000000000000000"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"timestamp"</span>: <span style="color:#ae81ff">1709558400</span>
</span></span><span style="display:flex;"><span>}</span></span></code></pre></div></div></div>
<h3 id="popularity-probably-not-as-social-media-site-but-maybe-as-protocol"><a href="#popularity-probably-not-as-social-media-site-but-maybe-as-protocol" class="header-anchor"></a>Popularity: Probably not as social media site, but maybe as protocol?
</h3><p>The social media side of Lens is mainly the <a class="link" href="https://hey.xyz" target="_blank" rel="noopener"
>Hey.xyz</a> website, which seems to have fewer users than Warpcast, and is even further away from reaching critical mass for network effects. The Lens protocol however has a <a class="link" href="https://www.lens.xyz/docs" target="_blank" rel="noopener"
>lot of advanced features</a> and it may gain adoption as the building block for many Web3 apps.</p>
<h2 id="5-primalnet-nostr-network"><a href="#5-primalnet-nostr-network" class="header-anchor"></a>5. Primal.net (Nostr Network)
</h2><p>Nostr (Notes and Other Stuff Transmitted by Relays) was conceptualized in 2020 by an anonymous developer known as <a class="link" href="https://fiatjaf.com/" target="_blank" rel="noopener"
>fiatjaf</a>. One of the primary design tenets was to be a censorship-resistant protocol and it is popular among Bitcoin enthusiasts, with Jack Dorsey being one of the public supporters. Unlike the Farcaster and Lens protocols, Nostr is not blockchain-based but just a network of relay servers for message distribution. It does however use public key cryptography for identities, similar to how wallets work in crypto.</p>
<ul>
<li><strong>Identity</strong>: Public-private key pairs define identity (with prefix <code>npub...</code>).</li>
<li><strong>Storage</strong>: Content is federated among multiple servers, which in Nostr vocabulary are called <em>relays</em>.</li>
<li><strong>Cost</strong>: No gas fees, but relies on relay operators willing to run the servers.</li>
</ul>
<h3 id="example-message-in-nostr-json-format"><a href="#example-message-in-nostr-json-format" class="header-anchor"></a>Example Message in Nostr (JSON Format)
</h3><div class="codeblock ">
<header>
<span class="codeblock-lang">json</span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">{
"id": "note1xyz...",
"pubkey": "npub1...",
"kind": 1,
"content": "Hello from Nostr!",
"created_at": 1709558400,
"tags": [],
"sig": "sig1..."
}</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"id"</span>: <span style="color:#e6db74">"note1xyz..."</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"pubkey"</span>: <span style="color:#e6db74">"npub1..."</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"kind"</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"content"</span>: <span style="color:#e6db74">"Hello from Nostr!"</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"created_at"</span>: <span style="color:#ae81ff">1709558400</span>,
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"tags"</span>: [],
</span></span><span style="display:flex;"><span> <span style="color:#f92672">"sig"</span>: <span style="color:#e6db74">"sig1..."</span>
</span></span><span style="display:flex;"><span>}</span></span></code></pre></div></div></div>
<h3 id="popularity-if-jack-dorsey-and-bitcoiners-promote-it-enough"><a href="#popularity-if-jack-dorsey-and-bitcoiners-promote-it-enough" class="header-anchor"></a>Popularity: If Jack Dorsey and Bitcoiners promote it enough?
</h3><p>Primal.net as a web app is pretty solid, but it does not stand out much. While Jack Dorsey has shown support by donating $1.5 million to the protocol development in December 2021, its success likely depends on broader adoption by the Bitcoin community.</p>
<h2 id="will-any-of-these-replace-xtwitter"><a href="#will-any-of-these-replace-xtwitter" class="header-anchor"></a>Will any of these replace X/Twitter?
</h2><p>As usage patterns vary, the statistics are not fully comparable, but this overview of the situation in March 2025 gives a decent overview.</p>
<table>
<thead>
<tr>
<th>Platform</th>
<th>Total Accounts</th>
<th>Active Users</th>
<th>Growth Trend</th>
</tr>
</thead>
<tbody>
<tr>
<td>Mastodon</td>
<td><a class="link" href="https://mastodon-analytics.com/" target="_blank" rel="noopener"
>~10 million</a></td>
<td><a class="link" href="https://mastodon-analytics.com/" target="_blank" rel="noopener"
>~1 million</a></td>
<td>Steady</td>
</tr>
<tr>
<td>Bluesky</td>
<td><a class="link" href="https://bsky.jazco.dev/stats" target="_blank" rel="noopener"
>~33 million</a></td>
<td><a class="link" href="https://bsky.jazco.dev/stats" target="_blank" rel="noopener"
>~1 million</a></td>
<td>Steady</td>
</tr>
<tr>
<td>Nostr</td>
<td><a class="link" href="https://stats.nostr.band/" target="_blank" rel="noopener"
>~41 million</a></td>
<td><a class="link" href="https://stats.nostr.band/" target="_blank" rel="noopener"
>~20 thousand</a></td>
<td>Steady</td>
</tr>
<tr>
<td>Farcaster</td>
<td><a class="link" href="https://dune.com/pixelhack/farcaster" target="_blank" rel="noopener"
>~850 thousand</a></td>
<td><a class="link" href="https://dune.com/moesalih/web3-social" target="_blank" rel="noopener"
>~50 thousand</a></td>
<td>Flat</td>
</tr>
<tr>
<td>Lens</td>
<td><a class="link" href="https://testnet.lenscan.io/chart/total-accounts" target="_blank" rel="noopener"
>~140 thousand</a></td>
<td><a class="link" href="https://dune.com/moesalih/web3-social" target="_blank" rel="noopener"
>~20 thousand</a></td>
<td>Flat</td>
</tr>
</tbody>
</table>
<p>Mastodon and Bluesky have already reached millions of users, while Lens and Farcaster are growing within crypto communities. It is however clear that none of these are anywhere close to how popular X/Twitter is. In particular, Mastodon had a huge influx of users in the fall of 2022 when Twitter was acquired, but to challenge the incumbents the growth would need to significantly accelerate. <strong>We can all accelerate this development by embracing decentralized social media now alongside existing dominant platforms.</strong></p>
<p>Who knows, given the right circumstances maybe <a class="link" href="https://x.com/elonmusk" target="_blank" rel="noopener"
>X.com leadership</a> decides to change the operating model and start federating contents to break out from a walled garden model. The likelihood of such development would increase if decentralized networks get popular, and the incumbents feel they need to participate to not lose out.</p>
<h2 id="past-and-future"><a href="#past-and-future" class="header-anchor"></a>Past and future
</h2><p>The idea of decentralized social media is not new. One early pioneer identi.ca launched in 2008, only two years after Twitter, using the OStatus protocol to promote decentralization. A few years later it evolved into pump.io with the ActivityPump protocol, and also forked into GNU Social that continued with OStatus. I remember when these happened, and that in 2010 also Diaspora launched with fairly large publicity. Surprisingly both of these still operate (I can still post both on <a class="link" href="https://identi.ca/otto" target="_blank" rel="noopener"
>identi.ca</a> and <a class="link" href="https://diasp.org/people/4d1b3973ec26e266b50005e7" target="_blank" rel="noopener"
>diasp.org</a>), but the activity fizzled out years ago. <strong>The protocol however survived partially and evolved into ActivityPub, which is now the backbone of the Fediverse.</strong></p>
<p>The evolution of decentralized social media over the next decade will likely parallel developments in democracy, freedom of speech and public discourse. While the early 2010s emphasized maximum independence and freedom, the late 2010s saw growing support for content moderation to combat misinformation. The AI era introduces new challenges, potentially requiring proof-of-humanity verification for content authenticity.</p>
<p>Key factors that will determine success:</p>
<ul>
<li>User experience and ease of onboarding</li>
<li>Network effects and critical mass of users</li>
<li>Integration with existing web3 infrastructure</li>
<li>Balance between decentralization and usability</li>
<li>Sustainable economic models for infrastructure</li>
</ul>
<p>This is clearly an area of development worth monitoring closely, as the next few years may determine which protocol becomes the de facto standard for decentralized social communication.</p> 10 habits to help becoming a Debian maintainer https://optimizedbyotto.com/post/debian-maintainer-habits/Sun, 26 Jan 2025 00:00:00 +0000 https://optimizedbyotto.com/post/debian-maintainer-habits/ <img src="https://optimizedbyotto.com/post/debian-maintainer-habits/featured-image.jpg" alt="Featured image of post 10 habits to help becoming a Debian maintainer" /><p>Becoming a <a class="link" href="https://www.debian.org/devel/join/index.en.html#joining" target="_blank" rel="noopener"
>Debian maintainer</a> is a journey that combines technical expertise, community collaboration, and continuous learning. In this post, I’ll share 10 key habits that will both help you navigate the complexities of Debian packaging without getting lost and enable you to contribute more effectively to one of the world’s largest open source projects.</p>
<h2 id="1-read-and-re-read-the-debian-policy-the-developers-reference-and-the-git-buildpackage-manual"><a href="#1-read-and-re-read-the-debian-policy-the-developers-reference-and-the-git-buildpackage-manual" class="header-anchor"></a>1. Read and re-read the Debian Policy, the Developer’s Reference and the git-buildpackage manual
</h2><p>Anyone learning Debian packaging and aspiring to become a Debian maintainer is likely to wade through a lot of documentation, only to realize that much of it is outdated or sometimes outright incorrect.</p>
<p>Therefore, it is important to learn right from the start which sources are the most reliable and truly worth reading and re-reading. I recommend these documents, in order of importance:</p>
<ul>
<li><a class="link" href="https://www.debian.org/doc/debian-policy/" target="_blank" rel="noopener"
>The Debian Policy Manual</a>: Describes the structure of the operating system, the package archive, and requirements for packages to be included in the Debian archive.</li>
<li><a class="link" href="https://www.debian.org/doc/manuals/developers-reference/developers-reference.en.html" target="_blank" rel="noopener"
>The Developer’s Reference</a>: A collection of best practices and process descriptions Debian packagers are expected to follow while interacting with one another.</li>
<li><a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/index.html" target="_blank" rel="noopener"
>The git-buildpackage man pages</a>: While the Policy focuses on the end result and is intentionally void of practical instructions on creating or maintaining Debian packages, the Developer’s Reference goes into greater detail. However, it too lacks step-by-step instructions. For the exact commands, consult the man pages of git-buildpackage and its subcommands (e.g., <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp-clone.1.en.html" target="_blank" rel="noopener"
><code>gbp clone</code></a>, <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp-import-orig.1.en.html" target="_blank" rel="noopener"
><code>gbp import-orig</code></a>, <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp-pq.1.en.html" target="_blank" rel="noopener"
><code>gbp pq</code></a>, <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp-dch.1.en.html" target="_blank" rel="noopener"
><code>gbp dch</code></a>, <code>gbp push</code>). See also my post on <a class="link" href="https://optimizedbyotto.com/post/debian-source-package-git/" >Debian source package Git branch and tags</a> for easy to understand diagrams.</li>
</ul>
<h2 id="2-make-reading-man-pages-a-habit"><a href="#2-make-reading-man-pages-a-habit" class="header-anchor"></a>2. Make reading man pages a habit
</h2><p>In addition to the above, try to make a habit of <strong>checking out the man page of every new tool</strong> you use to ensure you are using it as intended.</p>
<p>The best place to read accurate and up-to-date documentation is <a class="link" href="https://manpages.debian.org" target="_blank" rel="noopener"
>manpages.debian.org</a>. The manual pages are maintained alongside the tools by their developers, ensuring greater accuracy than any third-party documentation.</p>
<p>If you are using a tool in the way the tool author documented, you can be confident you are doing the right thing, even if it wasn’t explicitly mentioned in some third-party guide about Debian packaging best practices.</p>
<h2 id="3-read-and-write-emails"><a href="#3-read-and-write-emails" class="header-anchor"></a>3. Read and write emails
</h2><p>While members of the Debian community have many channels of communication, the <a class="link" href="https://lists.debian.org/completeindex.html" target="_blank" rel="noopener"
>mailing lists</a> are by far the most prominent. Asking questions on the appropriate list is a good way to get current advice from other people doing Debian packaging. Staying subscribed to lists of interest is also a good way to read about new developments as they happen.</p>
<p>Note that every post is public and archived permanently, so the discussions on <strong>the mailing lists also form a body of documentation</strong> that can later be searched and referred to.</p>
<p>Regularly writing short and well-structured emails on the mailing lists is <a class="link" href="https://optimizedbyotto.com/post/efficient-communication-software-engineering-org/" >great practice for improving technical communication skills</a> — a useful ability in general. For Debian specifically, being active on mailing lists helps build a reputation that can later attract collaborators and supporters for more complex initiatives.</p>
<h2 id="4-create-and-use-an-openpgp-key"><a href="#4-create-and-use-an-openpgp-key" class="header-anchor"></a>4. Create and use an OpenPGP key
</h2><p>Related to reputation and identity, OpenPGP keys play a central role in the Debian community. OpenPGP is used to various degrees to sign Git commits and tags, sign and encrypt email, and — most importantly — to sign Debian packages so their origin can be verified.</p>
<p>The process of becoming a Debian Maintainer and eventually a Debian Developer culminates in getting your OpenPGP key included in the Debian keyring, which is used to control who can upload packages into the Debian archive.</p>
<p>The earlier you create a key and start using it to gain reputation for that specific key that is used to sign your work, the better. Note that due to a recent <a class="link" href="https://lwn.net/Articles/953797/" target="_blank" rel="noopener"
>schism</a> in the OpenPGP standards working group, it is safest to create an OpenPGP key using GnuPG version 2.2.x (not 2.4.x), or using <a class="link" href="https://sequoia-pgp.org/" target="_blank" rel="noopener"
>Sequoia-PGP</a>.</p>
<h2 id="5-integrate-salsa-ci-in-all-work"><a href="#5-integrate-salsa-ci-in-all-work" class="header-anchor"></a>5. Integrate Salsa CI in all work
</h2><p>One reason Debian remains popular, even 30 years after its inception, is due to its culture of maintaining high standards. For a newcomer, learning all the quality assurance tools such as Lintian, Piuparts, Adequate, various build variations, and reproducible builds may be overwhelming. However, these tasks are easier to manage thanks to Salsa CI, the continuous integration pipeline in Debian that runs tests on every commit at <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a>.</p>
<p>The earlier you <a class="link" href="https://salsa.debian.org/salsa-ci-team/pipeline" target="_blank" rel="noopener"
>activate Salsa CI</a> in the package repository you are working on, the faster you will achieve high quality in your package with fewer missteps. You can also further <a class="link" href="https://optimizedbyotto.com/post/gitlab-mariadb-debian/" >customize a package-specific <code>salsa-ci.yml</code></a> to have more testing coverage.</p>
<p><img src="https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-example.png"
loading="lazy"
alt="Example Salsa CI pipeline with customizations"
>
</p>
<h2 id="6-fork-on-salsa-and-use-draft-merge-requests-to-solicit-feedback"><a href="#6-fork-on-salsa-and-use-draft-merge-requests-to-solicit-feedback" class="header-anchor"></a>6. Fork on Salsa and use draft Merge Requests to solicit feedback
</h2><p>All modern Debian packages are hosted on <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a>. If you want to make a change to any package, it is easy to fork, make an initial attempt at the change, and publish it as a <strong>draft Merge Request</strong> (MR) on Salsa to solicit feedback.</p>
<p>People might have surprising reasons to object to the change you propose, or they might need time to get used to the idea before agreeing to it. Also, some people might object to a vague idea out of suspicion but agree once they see the exact implementation. There may also be a surprising number of people supporting your idea, and if there is an MR, they have a place to show their support.</p>
<p><strong>Don’t expect every Merge Request to be accepted.</strong> However, proposing an idea as running code in an MR is far more effective than raising the idea on a mailing list or in a bug report. Get into the habit of publishing plenty of merge requests to solicit feedback and drive discussions toward consensus.</p>
<h2 id="7-use-git-rebase-frequently"><a href="#7-use-git-rebase-frequently" class="header-anchor"></a>7. Use git rebase frequently
</h2><p>Linear Git history is much easier to read. The ease of reading <code>git log</code> and <code>git blame</code> output is vital in Debian, where packages often have updates from multiple people spanning many years — even decades. Debian packagers likely spend more time than the average software developer reading Git history.</p>
<p>Make sure you master <a class="link" href="https://optimizedbyotto.com/post/advanced-git-commands/" >Git commands</a> such as <code>gitk --all</code>, <code>git citool --amend</code>, <code>git commit -a --fixup <commit id></code>, <code>git rebase -i --autosquash <target branch></code>, <code>git cherry-pick <commit id 1> <id 2> <id 3></code>, and <code>git pull --rebase</code>.</p>
<p>If rebasing is not done on your initiative, rest assured others will ask you to do it. Thus, if the commands above are familiar, rebasing will be quick and easy for you.</p>
<h2 id="8-reviews-give-some-get-some"><a href="#8-reviews-give-some-get-some" class="header-anchor"></a>8. Reviews: give some, get some
</h2><p>In open source, the larger a project becomes, the more it attracts contributions, and the bottleneck for its growth isn’t how much code developers can create but how much code submissions can be properly reviewed.</p>
<p>At the time of writing, the <a class="link" href="https://salsa.debian.org/groups/debian/-/merge_requests" target="_blank" rel="noopener"
>main Salsa group “Debian” has over 800 open merge requests</a> pending reviews and approvals. Feel free to read and comment on any merge request you find. <a class="link" href="https://optimizedbyotto.com/post/how-to-code-review/" >You don’t have to be a subject matter expert</a> to provide valuable feedback. Even if you don’t have specific feedback, your comment as another human acknowledging that you read the MR and found no issues is viewed positively by the author. Besides, if you spend enough time reviewing MRs in a specific domain, you will eventually become an expert in it. <strong>Code reviews are not just about providing feedback to the submitter; they are also great learning opportunities for the reviewer.</strong></p>
<p>As a rule of thumb, you should review at least twice as many merge requests as you submit yourself.</p>
<h2 id="9-improve-debian-by-improving-upstream"><a href="#9-improve-debian-by-improving-upstream" class="header-anchor"></a>9. Improve Debian by improving upstream
</h2><p>It is common that while packaging software for Debian, bugs are uncovered and patched in Debian. <strong>Do not forget to submit the fixes upstream</strong>, and add a <code>Forwarded</code> field to the file in <code>debian/patches</code>! As the person building and packaging something in Debian, you automatically become an authority on that software, and the upstream is likely glad to receive your improvements.</p>
<p>While submitting patches upstream is a bit of work initially, getting improvements merged upstream eventually saves time for everyone and makes packaging in Debian easier, as there will be fewer patches to maintain with each new upstream release.</p>
<h2 id="10-dont-hold-any-habits-too-firmly"><a href="#10-dont-hold-any-habits-too-firmly" class="header-anchor"></a>10. Don’t hold any habits too firmly
</h2><p>Last but not least: Once people learn a specific way of working, they tend to stick to it for decades. Learning how to create and maintain Debian packages requires significant effort, and people tend to stop learning once they feel they’ve reached a sufficient level. <strong>This tendency to get stuck in a “local optimum” is understandable and natural, but try to resist it.</strong></p>
<p>It is likely that better techniques will evolve over time, so stay humble and re-evaluate your beliefs and practices every few years.</p>
<p>Mastering these habits takes time, but each small step brings you closer to making a meaningful impact on Debian. By staying curious, collaborative, and adaptable, you can ensure your contributions stand the test of time — just like Debian itself. Good luck on your journey toward becoming a Debian Maintainer!</p> Debian source packages in git explained https://optimizedbyotto.com/post/debian-source-package-git/Thu, 09 Jan 2025 00:00:00 +0000 https://optimizedbyotto.com/post/debian-source-package-git/ <img src="https://optimizedbyotto.com/post/debian-source-package-git/debian-git-non-native-package.svg" alt="Featured image of post Debian source packages in git explained" /><p>Most people with Linux experience have at some point installed <code>.deb</code> files on Debian or the more famous variant of it, Ubuntu. Programmers who have been involved with packaging and shipping software know that the code that generates those <code>.deb</code> packages is always in the <code>debian/</code> subdirectory in a software project. However, anyone who has tried to do Debian packaging also knows that all the automation involved can be challenging to grasp, and building packages, modifying packaging files, and repeatedly rebuilding them can feel way more frustrating than iterating in regular software development. As Debian has been around for three decades already, there is a lot of online documentation available, but unfortunately, most of it is outdated, and reading about old tools might just add to the confusion.</p>
<p>Thus, let me introduce an explainer of what the structure in 2025 should look like on a well-maintained Debian package source, and what benefits it brings. First, I’ll run through the basics to ensure all readers have them fresh in their mind, and further down, I get into the increasingly complex workings of how Debian source packaging works and why they have a certain git repository structure.</p>
<h2 id="native-vs-non-native-package-and-the-role-of-debian-revisions-of-form-10-1"><a href="#native-vs-non-native-package-and-the-role-of-debian-revisions-of-form-10-1" class="header-anchor"></a>Native vs non-native package and the role of Debian revisions of form 1.0-1
</h2><p>The first thing to decide when working on a Debian package is whether it should be a so-called <em>native</em> package or a <em>non-native</em> package. In Debian, a native package means that the Debian packaging is developed together with the software to be packaged. So if upstream project <em>Foo</em> releases the version <em>1.0</em>, and simultaneously Debian packages, the packages would be native and have the version <em>1.0</em>.</p>
<p>A non-native package, on the other hand, means that the Debian package was created separately from the upstream project. A big key difference between native and non-native is that the <strong>native package contents cannot diverge from the upstream release contents</strong>, as they are by definition the same single entity. <strong>A non-native package, however, <em>can</em> diverge from upstream, and it can have multiple releases of its own, with designated Debian revisions.</strong> So for the example <em>Foo 1.0</em>, the non-native Debian packages could be <em>1.0-1</em>, <em>1.0-2</em> and so forth, and they can contain custom patches that modify the behavior of <em>Foo</em> to run better or at least more suitable for users, in the view of the author of the Debian packaging.</p>
<p>This distinction is also key to understanding why the Debian packaging repositories have additional complexity. <strong>The non-native packages need to have clear separation of the upstreams they package, which leads to the need to have separate branches and patches and git tags</strong> to signify what commit was the Debian revision, separately from the upstream release.</p>
<h2 id="native-packages-are-simple-one-git-branch-one-debian-subdirectory"><a href="#native-packages-are-simple-one-git-branch-one-debian-subdirectory" class="header-anchor"></a>Native packages are simple: one git branch, one <code>debian/</code> subdirectory
</h2><p>For a native package, the story is simple. If the project is developed on, say, branch <code>main</code>, then the native Debian packaging is done on that branch in the subdirectory <code>debian/</code>. To build the native <code>.deb</code> packages of a specific release, just <code>git checkout</code> the project sources from the release git tag, for example, <code>v1.0</code>, and build with <a class="link" href="https://manpages.debian.org/unstable/dpkg-dev/dpkg-buildpackage.1.en.html" target="_blank" rel="noopener"
>dpkg-buildpackage</a> (or some other tool that wraps around it with additional build environment management).</p>
<pre class="mermaid">gitGraph:
checkout main
commit id: "Foo project 1.0" tag: "v1.0" tag: "debian/1.0"
commit id: "Foo commit a1"
commit id: "Foo commit b2"
commit id: "Foo project 1.1" tag: "v1.1" tag: "debian/1.1"
commit id: "Foo commit c3"
commit id: "Foo commit d4"
</pre>
<p>This is easy to grasp, as there are no additional abstractions layered on top of this. Native packages are, however, quite rare and limited in practice to software projects where the developers and the Debian packagers are the same set of people. <strong>If you are not the upstream of a project yourself, you cannot do <em>native</em> Debian packages.</strong></p>
<h2 id="non-native-packages-multiple-branches-patches-inside-git"><a href="#non-native-packages-multiple-branches-patches-inside-git" class="header-anchor"></a>Non-native packages: multiple branches, patches inside git
</h2><p>For a non-native package, things are more complex. <strong>First of all, the non-native packaging needs a branch of its own to live on.</strong> In 2025, modern Debian packages use the branch name <code>debian/latest</code> for all commits that improve the packaging. When a release is made from <code>debian/latest</code>, the commit will get a git tag of form <code>debian/1.0-1</code> (note the Debian revision <code>-1</code>).</p>
<p><strong>Second, the non-native packaging branch commits will only ever modify files in the <code>debian/</code> subdirectory.</strong> Maintaining clear separation between what is upstream code and what is a modification done in the Debian packaging is crucial both for security and supply chain auditability reasons (as explained in detail later), and also for ensuring that long-term maintenance is straightforward by <strong>avoiding downstream and upstream changes getting mingled and mixed up</strong>.</p>
<p>If the non-native Debian package needs to have some upstream code slightly changed, for example, to make the software build correctly on an architecture untested by upstream, a patch file would be added in the patch <code>debian/patches/</code>, which <code>dpkg-source</code> (which is one of the subcommands that dpkg-buildpackage automatically runs) then applies to the upstream code at build-time. However, on the Debian packaging branches there will never be any permanently git committed changes to upstream code, i.e., outside the <code>debian/</code> directory.</p>
<h2 id="from-upstream-release-branch-to-upstream-import-branch-to-debian-packaging-branch"><a href="#from-upstream-release-branch-to-upstream-import-branch-to-debian-packaging-branch" class="header-anchor"></a>From upstream release branch to upstream import branch to Debian packaging branch
</h2><p>In addition to the <code>main</code> and <code>debian/latest</code> branches already explained, there is a <strong>third branch</strong> called <code>upstream/latest</code> which acts as the target branch for upstream imports. If the upstream source code contains files that are not acceptable in Debian (and usually listed in the <code>debian/copyright</code> file under the <a class="link" href="https://manpages.debian.org/unstable/devscripts/uscan.1.en.html#COPYRIGHT_FILE_EXAMPLES" target="_blank" rel="noopener"
><code>Files-Excluded</code></a>` section), they would be purged on this branch before this import branch gets merged into the Debian branch.</p>
<pre class="mermaid">gitGraph:
checkout main
commit id: "Foo project 1.0" tag: "v1.0"
branch upstream/latest
commit id: "New upstream version 1.0" tag: "upstream/1.0"
branch debian/latest
commit id: "Update upstream source from tag 'upstream/1.0'"
commit id: "Create initial Debian packaging"
commit id: "Update changelog for 1.0-1 release into unstable" tag: "debian/1.0-1"
checkout main
commit id: "Foo commit a1"
commit id: "Foo commit b2"
commit id: "Foo project 1.1" tag: "v1.1"
checkout upstream/latest
merge main id: "New upstream version 1.1" tag: "upstream/1.1"
checkout debian/latest
merge upstream/latest id: "Update upstream source from tag 'upstream/1.1'"
commit id: "Update changelog and refresh patches after 1.1 import"
commit id: "Debian commit 1a"
commit id: "Debian commit 2b"
commit id: "Update changelog for 1.1-1 release into unstable" tag: "debian/1.1-1"
checkout main
commit id: "Foo commit c3"
</pre>
<p>In the rare case that an <em>upstream software project is not using git</em> for version control, the upstream release branch (e.g., <code>main</code>) would not exist at all, and the <code>upstream/latest</code> would only contain synthetic commits made by the Debian packager by importing the upstream <code>.tar.gz</code> source package releases or equivalent.</p>
<pre class="mermaid">%%{init: { 'gitGraph': { 'mainBranchName': 'upstream/latest' } } }%%
gitGraph:
checkout upstream/latest
commit id: "New upstream version 1.0" tag: "upstream/1.0"
branch debian/latest
commit id: "Update upstream source from tag 'upstream/1.0'"
commit id: "Create initial Debian packaging"
commit id: "Update changelog for 1.0-1 release into unstable" tag: "debian/1.0-1"
checkout upstream/latest
commit id: "New upstream version 1.1" tag: "upstream/1.1"
checkout debian/latest
merge upstream/latest id: "Update upstream source from tag 'upstream/1.1'"
commit id: "Update changelog and refresh patches after 1.1 import"
commit id: "Debian commit 1a"
commit id: "Debian commit 2b"
commit id: "Update changelog for 1.1-1 release into unstable" tag: "debian/1.1-1"
</pre>
<p>Some packages might import <strong>both</strong> from the upstream release git tag and the upstream source tarball release to ensure maximum supply chain auditability.</p>
<p>There are several steps involved, but the whole process is automated. The developer doing Debian packaging would normally just run <code>gbp import-orig --uscan</code> and everything else would be automatic. In a correctly configured Debian packaging repository, the <a class="link" href="https://manpages.debian.org/unstable/git-buildpackage/gbp.1.en.html" target="_blank" rel="noopener"
>gbp</a> tool knows what the upstream git remote address is, and what is the form of release git tags to look for. In most packages, the gbp tool uses <a class="link" href="https://manpages.debian.org/unstable/devscripts/uscan.1.en.html" target="_blank" rel="noopener"
>uscan</a> to get the upstream release tarball, and when available also check signatures. Configuring it all is a bit complex, because there are multiple files to edit (mainly <code>debian/control</code>, <code>debian/changelog</code>, <code>debian/copyright</code>, <code>debian/watch</code>, <code>debian/gbp.conf</code>, <code>debian/upstream/metadata</code> and <code>upstream/signing-keys.asc</code>), but it is a one-off effort that won’t need re-doing unless upstream changes how new releases are done.</p>
<h2 id="verifying-upstream-release-signatures-with-openpgp-and-pristine-tar"><a href="#verifying-upstream-release-signatures-with-openpgp-and-pristine-tar" class="header-anchor"></a>Verifying upstream release signatures with OpenPGP and pristine-tar
</h2><p>For a complete audit trail, three branches aren’t actually enough yet. There is also a fourth branch named <code>pristine-tar</code>. This branch is never merged on any other branch. Its only purpose is to hold the xdelta data files, which combined with the git repository contents can be used to reconstruct the exact upstream release tarball (if the format is <a class="link" href="https://manpages.debian.org/unstable/pristine-tar/pristine-tar.1.en.html#LIMITATIONS" target="_blank" rel="noopener"
>supported by pristine-tar</a>).</p>
<p>Being able to reproduce the bit-by-bit exact upstream release tarball out of the git repository is important so that tarball signatures can be verified and source authenticity checked. This naturally works only for upstreams that publish signed release tarballs (typically releasing <code>.asc</code> files along with their tarballs).</p>
<p>Upstreams might also sign their release git tags, but currently git-buildpackage does not support checking them automatically for authenticity (see <a class="link" href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=839866" target="_blank" rel="noopener"
>Bug#839866</a>).</p>
<h2 id="maintaining-multiple-generations-of-a-package-across-several-debian-and-ubuntu-releases-or-other-downstreams"><a href="#maintaining-multiple-generations-of-a-package-across-several-debian-and-ubuntu-releases-or-other-downstreams" class="header-anchor"></a>Maintaining multiple generations of a package across several Debian and Ubuntu releases (or other downstreams)
</h2><p>Debian (and Ubuntu) releases are very conservative. Once a release is made, updates to the packages are not accepted unless there are serious bugs or security issues that absolutely need to be fixed. If maintenance releases are made, they would all branch off from the <code>debian/latest</code> branch at the commit that was the last upload into the respective release. For example, if <em>Foo 1.0-1</em> in Debian Bookworm needs to have an update, a new branch <code>debian/bookworm</code> would be branched off tag <code>debian/1.0-1</code> and released as <em>Foo 1.0-1+deb12u1</em> and tagged in git as <code>debian/1.0-1+deb12u1</code>.</p>
<pre class="mermaid">gitGraph:
checkout main
commit id: "Foo project 1.0" tag: "v1.0"
branch upstream/latest
commit id: "New upstream version 1.0" tag: "upstream/1.0"
branch debian/latest
commit id: "Update upstream source from tag 'upstream/1.0'"
commit id: "Create initial Debian packaging"
commit id: "Update changelog for 1.0-1 release into unstable" tag: "debian/1.0-1"
branch debian/bookworm
commit id: "Backport fix 1a for Bookworm"
commit id: "Update changelog for 1.0-1+deb12u1 release into unstable" tag: "debian/1.0-1+deb12u1"
checkout main
commit id: "Foo commit a1"
commit id: "Foo commit b2"
commit id: "Foo project 1.1" tag: "v1.1"
checkout upstream/latest
merge main id: "New upstream version 1.1" tag: "upstream/1.1"
checkout debian/latest
merge upstream/latest id: "Update upstream source from tag 'upstream/1.1'"
commit id: "Update changelog and refresh patches after 1.1 import"
commit id: "Update changelog for 1.1-1 release into unstable" tag: "debian/1.1-1"
checkout main
commit id: "Foo commit c3"
</pre>
<p>Note that these maintenance releases always release with semantically lower versions than the latest version on the <code>debian/latest</code> branch. This ensures that any system with the latest maintenance branch version installed will on a full system upgrade pick the latest version from a newer Debian release.</p>
<h2 id="try-it-yourself-example-repository-galera-4-demo"><a href="#try-it-yourself-example-repository-galera-4-demo" class="header-anchor"></a>Try it yourself: example repository galera-4-demo
</h2><p>To fully grasp step-by-step what is happening, try doing a new upstream version import on a Debian packaging yourself using the example package repository prepared specifically for this exercise.</p>
<p>You don’t need to be a Debian (or Ubuntu) expert to test this. Any Linux user can easily start a Debian unstable (sid) Linux container by running <code>podman run --interactive --network host --tty --rm --shm-size=1G -e DISPLAY=$DISPLAY --volume=$PWD:/tmp/run --workdir=/tmp/run debian:sid bash</code> and then inside the container install the required software, configure temporary dummy settings, and run the import. Passing the <code>DISPLAY</code> will allow launching graphical programs from the container, and mounting the current host system path as a volume inside the container will ensure the created git repository will persist after exiting the container.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">apt update && apt install --yes --no-install-recommends git-buildpackage pristine-tar git-gui gitk
git config --global user.name "Your Name"
git config --global user.email "you@example.com"
export DEBEMAIL="you@example.com"
gbp clone --add-upstream-vcs https://salsa.debian.org/otto/galera-4-demo.git
cd galera-4-demo
gitk --all &</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>apt update <span style="color:#f92672">&&</span> apt install --yes --no-install-recommends git-buildpackage pristine-tar git-gui gitk
</span></span><span style="display:flex;"><span>git config --global user.name <span style="color:#e6db74">"Your Name"</span>
</span></span><span style="display:flex;"><span>git config --global user.email <span style="color:#e6db74">"you@example.com"</span>
</span></span><span style="display:flex;"><span>export DEBEMAIL<span style="color:#f92672">=</span><span style="color:#e6db74">"you@example.com"</span>
</span></span><span style="display:flex;"><span>gbp clone --add-upstream-vcs https://salsa.debian.org/otto/galera-4-demo.git
</span></span><span style="display:flex;"><span>cd galera-4-demo
</span></span><span style="display:flex;"><span>gitk --all &</span></span></code></pre></div></div></div>
<p>Note in the <code>gitk</code> window what it tells about the git commits, branches, and tags. Then proceed to run the import command:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">gbp import-orig --uscan --no-sign-tags
gbp:info: Launching uscan...
Newest version of galera-4 on remote site is 26.4.21, local version is 26.4.20
=> Newer package available from:
=> https://releases.galeracluster.com/galera-4/source/galera-4-26.4.21.tar.gz
gpgv: Signature made Tue Nov 26 20:07:41 2024 +00:00
gpgv: using RSA key 3D53839A70BC938B08CDD47F45460A518DA84635
gpgv: Good signature from "Codership Oy (Codership Signing Key) &lt;info@galeracluster.com>"
Leaving ../galera-4_26.4.21.orig.tar.gz where it is.
gbp:info: Using uscan downloaded tarball ../galera-4_26.4.21.orig.tar.gz
gbp:info: Importing '../galera-4_26.4.21.orig.tar.gz' to branch 'upstream/latest'...
gbp:info: Source package is galera-4
gbp:info: Upstream version is 26.4.21
gbp:info: Replacing upstream source on 'debian/latest'
gbp:info: Running Postimport hook
gbp:info: Successfully imported version 26.4.21 of ../galera-4_26.4.21.orig.tar.gz</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gbp import-orig --uscan --no-sign-tags
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>gbp:info: Launching uscan...
</span></span><span style="display:flex;"><span>Newest version of galera-4 on remote site is 26.4.21, local version is 26.4.20
</span></span><span style="display:flex;"><span> <span style="color:#f92672">=</span>> Newer package available from:
</span></span><span style="display:flex;"><span> <span style="color:#f92672">=</span>> https://releases.galeracluster.com/galera-4/source/galera-4-26.4.21.tar.gz
</span></span><span style="display:flex;"><span>gpgv: Signature made Tue Nov <span style="color:#ae81ff">26</span> 20:07:41 <span style="color:#ae81ff">2024</span> +00:00
</span></span><span style="display:flex;"><span>gpgv: using RSA key 3D53839A70BC938B08CDD47F45460A518DA84635
</span></span><span style="display:flex;"><span>gpgv: Good signature from <span style="color:#e6db74">"Codership Oy (Codership Signing Key) &lt;info@galeracluster.com>"</span>
</span></span><span style="display:flex;"><span>Leaving ../galera-4_26.4.21.orig.tar.gz where it is.
</span></span><span style="display:flex;"><span>gbp:info: Using uscan downloaded tarball ../galera-4_26.4.21.orig.tar.gz
</span></span><span style="display:flex;"><span>gbp:info: Importing <span style="color:#e6db74">'../galera-4_26.4.21.orig.tar.gz'</span> to branch <span style="color:#e6db74">'upstream/latest'</span>...
</span></span><span style="display:flex;"><span>gbp:info: Source package is galera-4
</span></span><span style="display:flex;"><span>gbp:info: Upstream version is 26.4.21
</span></span><span style="display:flex;"><span>gbp:info: Replacing upstream source on <span style="color:#e6db74">'debian/latest'</span>
</span></span><span style="display:flex;"><span>gbp:info: Running Postimport hook
</span></span><span style="display:flex;"><span>gbp:info: Successfully imported version 26.4.21 of ../galera-4_26.4.21.orig.tar.gz</span></span></code></pre></div></div></div>
<p>After successfully running the import, switch focus to the <code>gitk</code> window and press <code>F5</code> to reload the view and see all branches and tags how they look now after the import.</p>
<p><img src="https://optimizedbyotto.com/post/debian-source-package-git/debian-git-galera-demo-after-import.png"
width="918"
height="585"
srcset="https://optimizedbyotto.com/post/debian-source-package-git/debian-git-galera-demo-after-import_hu13840403377005627032.png 480w, https://optimizedbyotto.com/post/debian-source-package-git/debian-git-galera-demo-after-import.png 918w"
loading="lazy"
alt="Demo Debian repository right after gbp import-orig --uscan"
class="gallery-image"
data-flex-grow="156"
data-flex-basis="376px"
>
</p>
<h2 id="managing-patches-with-gbp-pq"><a href="#managing-patches-with-gbp-pq" class="header-anchor"></a>Managing patches with <code>gbp pq</code>
</h2><p>As explained above, Debian has good reasons to carry patch files in the <code>debian/patches</code> subdirectory separately from upstream code, and only apply it at build time. These files are cumbersome to edit manually, and they really shouldn’t be.Instead, the far superior way is to use <code>gbp pq</code>, which converts the patches to a temporary branch, where each patch represents a single commit. Package maintainers can then use regular git commands to rebase, amend and cherry-pick those commits and test that builds work.</p>
<p>To activate this branch, simply run <code>gbp pq switch --force</code>. The <code>--force</code> ensures that any existing patches-applied branch will automatically be deleted and overwritten, in case such a branch was left around from a previous patch <code>gbp pq</code> session. When all the commits on the temporary branch are final, convert them back to <code>debian/patches</code> contents on the packaging branch by running <code>gbp pq export --drop --commit</code>.</p>
<h2 id="managing-the-debianchangelog-with-gbp-dch"><a href="#managing-the-debianchangelog-with-gbp-dch" class="header-anchor"></a>Managing the <code>debian/changelog</code> with <code>gbp dch</code>
</h2><p>The git log is of course all that developers need for working with git. However, when the package is eventually built and shipped, the end users won’t see any git repositories but only the end result - the package and its files. Therefore Debian packagers maintain an extra <code>debian/changelog</code> file that summarizes to end users what changed in each Debian release.</p>
<p>Maintaining the changelog is greatly simplified with the two following commands. Immediately after a new upstream import one would typically run <code>gbp dch --distribution=UNRELEASED -- debian/</code> to put the changelog in an <em>UNRELEASED</em> state showing to other packagers that the next upload is still being prepared. The <code>-- debian/</code> at the end is important to tell git-buildpackage that only changes in the Debian packaging should be considered and all upstream changes ignored.</p>
<p>Later, right before the upload, one would typically run <code>gbp dch --release --commit</code> to finalize the changelog based off the git commit log entries.</p>
<p>Once an upload has been made, <code>gbp tag</code> should be used to automatically add the correct git tags to the repository.</p>
<h2 id="benefit-complete-record-of-software-provenance-for-the-sake-of-copyright-security-and-long-term-maintainability"><a href="#benefit-complete-record-of-software-provenance-for-the-sake-of-copyright-security-and-long-term-maintainability" class="header-anchor"></a>Benefit: complete record of software provenance for the sake of copyright, security and long-term maintainability
</h2><p>The multiple layers of versioning may seem complicated at first, but everything exists for a purpose. This git structure allows for very good software provenance, where the origin of any line of code can be tracked across multiple branches and tags. Learning to manage it all may require some initial effort, but once the most common git-buildpackage commands are familiar, using it is quite fast.</p>
<p>When working with big and complex packages a packager typically ends up having to debug a lot of build failures and do detective work to understand where a change came from and why it was made in order to fix the failures correctly. In modern Debian packaging repositories it is incredibly empowering to be able to run <code>git blame</code> on any file and see all the upstream and Debian changes, or to simply compare across tags or branches how files are different with commands such as <code>git difftool --dir-diff upstream/latest -- debian/</code> (with meld in screenshot below) to compare how the <code>debian/</code> contents on the current branch differ from another branch.</p>
<p><img src="https://optimizedbyotto.com/post/debian-source-package-git/git-dir-diff-meld-on-galera-sources.gif"
width="1200"
height="673"
loading="lazy"
alt=""git difftool --dir-diff upstream/latest -- debian/ with Meld Merge""
class="gallery-image"
data-flex-grow="178"
data-flex-basis="427px"
>
</p>
<p>Hosting the upstream release branch and the Debian packaging branch in the same repository provides near perfect software provenance, which goes a long way to manage copyright and security issues which are of high importance in Debian.</p> Are AI language models capable of doing financial forecasting? https://optimizedbyotto.com/post/ai-language-models-financial-forecasting/Thu, 12 Dec 2024 00:00:00 +0000 https://optimizedbyotto.com/post/ai-language-models-financial-forecasting/ <img src="https://optimizedbyotto.com/post/ai-language-models-financial-forecasting/featured-image.jpg" alt="Featured image of post Are AI language models capable of doing financial forecasting?" /><p>Considering LLMs are basically massively big statistical machines, a logical assumption would be that they should be well suited for predicting how the economy and the stock market develop. Let’s conduct a small test to see if that holds for the largest generic LLMs of today.</p>
<p>Let’s ask multiple models this: <em>The USD to CAD exchange ratio is as of now 1.3917. Considering everything you already know about what affects the price of USD and CAD, and how it is likely to develop, what do you predict the exchange rate to be in 7, 30 and 90 days? Don’t use ranges in your reply, but give your best estimate with three decimal accuracy.</em></p>
<p>Additionally, let’s ask all the models that have the capability to do lookups online this follow-up question so they are given <strong>a chance to revise</strong> their prediction with the most recent data they have access to:</p>
<p><em>When you include all additional information you can access online, how would you revise the prediction? Give your best estimate with three decimal accuracy.</em></p>
<p>Also, to test if the answer is actually based on any kind of rational and consistent thought, let’s repeat prompts in three separate chat sessions to see if the LLM arrives at the same prediction, or if it is just hallucinating and giving random replies.</p>
<table>
<thead>
<tr>
<th>Chat session</th>
<th>Model</th>
<th>7-day outlook</th>
<th>30-day forecast</th>
<th>90-day forecast</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>ChatGPT 4o</td>
<td>1.394</td>
<td>1.392</td>
<td>1.386</td>
</tr>
<tr>
<td></td>
<td>+ online</td>
<td>1.391</td>
<td>1.396</td>
<td>1.405</td>
</tr>
<tr>
<td>2</td>
<td>ChatGPT 4o</td>
<td>1.394</td>
<td>1.398</td>
<td>1.405</td>
</tr>
<tr>
<td></td>
<td>+ online</td>
<td>1.390</td>
<td>1.400</td>
<td>1.410</td>
</tr>
<tr>
<td>3</td>
<td>ChatGPT 4o</td>
<td>1.395</td>
<td>1.400</td>
<td>1.410</td>
</tr>
<tr>
<td></td>
<td>+ online</td>
<td>1.396</td>
<td>1.400</td>
<td>1.410</td>
</tr>
<tr>
<td>4</td>
<td>Gemini</td>
<td>1.395</td>
<td>1.402</td>
<td>1.410</td>
</tr>
<tr>
<td></td>
<td>+ online</td>
<td>1.395</td>
<td>1.405</td>
<td>1.415</td>
</tr>
<tr>
<td>5</td>
<td>Gemini</td>
<td>1.395</td>
<td>1.402</td>
<td>1.410</td>
</tr>
<tr>
<td></td>
<td>+ online</td>
<td>1.400</td>
<td>1.410</td>
<td>1.420</td>
</tr>
<tr>
<td>6</td>
<td>Gemini</td>
<td>1.395</td>
<td>1.402</td>
<td>1.410</td>
</tr>
<tr>
<td></td>
<td>+ online</td>
<td>1.398</td>
<td>1.405</td>
<td>1.415</td>
</tr>
<tr>
<td>7</td>
<td>Perplexity</td>
<td>1.389</td>
<td>1.395</td>
<td>1.378</td>
</tr>
<tr>
<td></td>
<td>+ revised</td>
<td>1.393</td>
<td>1.398</td>
<td>1.402</td>
</tr>
<tr>
<td>8</td>
<td>Perplexity</td>
<td>1.389</td>
<td>1.395</td>
<td>1.378</td>
</tr>
<tr>
<td></td>
<td>+ revised</td>
<td>1.388</td>
<td>1.392</td>
<td>1.378</td>
</tr>
<tr>
<td>9</td>
<td>Perplexity</td>
<td>1.389</td>
<td>1.395</td>
<td>1.378</td>
</tr>
<tr>
<td></td>
<td>+ revised</td>
<td>1.392</td>
<td>1.399</td>
<td>1.404</td>
</tr>
<tr>
<td>10</td>
<td>Grok 2</td>
<td>1.405</td>
<td>1.412</td>
<td>1.427</td>
</tr>
<tr>
<td></td>
<td>+ revised</td>
<td>1.398</td>
<td>1.405</td>
<td>1.415</td>
</tr>
<tr>
<td>11</td>
<td>Grok 2</td>
<td>1.403</td>
<td>1.418</td>
<td>1.428</td>
</tr>
<tr>
<td></td>
<td>+ revised</td>
<td>1.403</td>
<td>1.423</td>
<td>1.435</td>
</tr>
<tr>
<td>12</td>
<td>Grok 2</td>
<td>1.405</td>
<td>1.420</td>
<td>1.435</td>
</tr>
<tr>
<td></td>
<td>+ revised</td>
<td>1.403</td>
<td>1.415</td>
<td>1.428</td>
</tr>
<tr>
<td>13</td>
<td>Meta AI</td>
<td>1.395</td>
<td>1.388</td>
<td>1.382</td>
</tr>
<tr>
<td></td>
<td>+ revised</td>
<td>1.395</td>
<td>1.393</td>
<td>1.386</td>
</tr>
<tr>
<td>14</td>
<td>Meta AI</td>
<td>1.395</td>
<td>1.393</td>
<td>1.386</td>
</tr>
<tr>
<td></td>
<td>+ revised</td>
<td>1.395</td>
<td>1.393</td>
<td>1.385</td>
</tr>
<tr>
<td>15</td>
<td>Yi-Lightning</td>
<td>1.390</td>
<td>1.395</td>
<td>1.402</td>
</tr>
<tr>
<td>16</td>
<td>Yi-Lightning</td>
<td>1.393</td>
<td>1.397</td>
<td>1.401</td>
</tr>
<tr>
<td>17</td>
<td>Yi-Lightning</td>
<td>1.390</td>
<td>1.386</td>
<td>1.385</td>
</tr>
</tbody>
</table>
<h2 id="analysis"><a href="#analysis" class="header-anchor"></a>Analysis
</h2><p>As we can see, ChatGPT, Gemini and Grok predicted a rising trend, while Perplexity and Yi-Lightning were a bit more mixed. Meta AI was the only model with a consistent downward prediction. Almost all the models were surprisingly consistent in the 7-day prediction in different chats. Meta AI had by far the highest consistency, and it kept giving the same replies when asked repeatedly, while ChatGPT and Gemini appeared to converge on a set of predictions and eventually kept repeating the same numbers. The chat sessions were within a couple of hours, so it as an external tester it is hard to say if the models actually became consistent, or if the systems were caching the replies and thus ended giving the same results repeatedly.</p>
<p>I also tested Claude 3.5 Sonnet, but it refused to give any predictions at all, and instead offered to discuss which factors affect the exchange rate. Such an response from <strong>Claude is probably the best answer any LLM should give at the moment</strong>.</p>
<h2 id="actual-results"><a href="#actual-results" class="header-anchor"></a>Actual results
</h2><p>Assuming there are no new wars or pandemics, and the existing trends in interest rates, employment rate, oil price and such continue, at least one of the predictions above should turn out to be correct.</p>
<table>
<thead>
<tr>
<th>Date</th>
<th>USD-CAD</th>
<th>Closest estimate</th>
</tr>
</thead>
<tbody>
<tr>
<td>Nov 11th, 2024</td>
<td>1.3917</td>
<td>ChatGPT (1.391)</td>
</tr>
<tr>
<td>Nov 18th, 2024</td>
<td>1.402</td>
<td>Gemini (1.402)</td>
</tr>
<tr>
<td>Dec 11th, 2024</td>
<td>1.417</td>
<td>Grok 2 (1.418)</td>
</tr>
<tr>
<td>Feb 11th, 2025</td>
<td>1.429</td>
<td>Grok 2 (1.428)</td>
</tr>
</tbody>
</table>
<h2 id="conclusion"><a href="#conclusion" class="header-anchor"></a>Conclusion
</h2><p>This is of course just a quick test to get a sense of how the LLMs predict, and by no means a reliable study on how well LLMs can be used to predict financial markets.</p>
<p>It is however enough data to show that:</p>
<ol>
<li>
<p><strong>The same model tends to give different predictions when prompted several times</strong>. This alone indicates that the models probably do not have some kind of latent internal understanding of financial markets, and <strong>thus the generic LLMs do not possess hidden superpowers to predict</strong> it.</p>
</li>
<li>
<p><strong>No model was consistently more accurate than others.</strong> The LLMs ChatGPT, Gemini and Grok 2 got one value out of 4 correct, which can be attributed to pure luck.</p>
</li>
</ol>
<p>However, there is nothing preventing a modern day <a class="link" href="https://en.wikipedia.org/wiki/Jim_Simons" target="_blank" rel="noopener"
>Jim Simons</a> from building an AI specialized on financial data and giving consistent and reliable predictions. And undoubtedly many are already working on this, as the financial incentives are high. It is actually beneficial for the greater good too. How well a market economy works depends largely on how efficiently price arbitrage and allocation of resources happen. If the majority of the world’s capital just goes mechanically into index tracking ETFs, it could lead to massive self-reinforcing asset bubbles. More intelligence is needed for the “invisible hand of markets” to play out properly.</p>
<p>But current generic LLMs, despite being massively big, do not seem to possess this capability. As with most other LLM applications, they seem to be good at generating convincing looking content, and help humans in finding information and assisting in simple tasks. LLMs might occasionally outperform stupid or lazy people, but to have true progress, we still need humans with original ideas and good judgement. If not otherwise, then at least good judgement to choose which of the LLM generated results are accepted and acted upon.</p> The simple art of effective decision-making for managers https://optimizedbyotto.com/post/make-good-decisions/Sun, 23 Jun 2024 00:00:00 +0000 https://optimizedbyotto.com/post/make-good-decisions/ <img src="https://optimizedbyotto.com/post/make-good-decisions/featured-image.jpg" alt="Featured image of post The simple art of effective decision-making for managers" /><p>A large part of a manager’s role is to make decisions and be responsible for their outcomes. While there is ample advice on how to be successful in many other managerial core areas, such as growing your people, the domain of high-quality decision-making seems less crowded. In this post, I summarize what I have found during my 20+ years as a manager to be a simple and effective way to approach decision-making.</p>
<h2 id="identify-options"><a href="#identify-options" class="header-anchor"></a>Identify options
</h2><p>The first step in making a decision is to identify that there is a decision to be made to begin with. Ask yourself: Does something have to be done in a particular way, or are there options? If there is only one option, is the decision about timing when to execute it? To be able to drive influence as a manager, one must be able to recognize the opportunities to make decisions.</p>
<p>A manager might also be explicitly asked to make a decision. People might be at crossroads and waiting for a manager to take the responsibility of choosing which way to proceed. In those cases, a manager should also start by finding out what the possible options are beyond what was presented initially.</p>
<p>The ability to grasp a process, break it down into smaller steps, and exhaustively find the options is a skill that can be honed. You can, however, accelerate acquiring this skill by constantly asking yourself, “Are there more options?”</p>
<h2 id="explore-and-rank-the-options-to-discover-what-is-actually-the-optimal-outcome"><a href="#explore-and-rank-the-options-to-discover-what-is-actually-the-optimal-outcome" class="header-anchor"></a>Explore and rank the options to discover what is actually the optimal outcome
</h2><p>Once you have at least two options you can embark on the exploratory phase of collecting data on the various options. During this phase you can still discover more options, and then also explore them.</p>
<p>The reason you need at least two options to explore is that you need to be able to rank them on some metric. Hence, the exploratory phase should also lead you to uncover what are the metrics you actually care about. In addition to discovering options you thus also discover what is the optimal outcome of the decision. The need to make a decision often arises from some sort of problem that was faced, but quite often it is unclear what would actually be the best possible outcome. <strong>Effective decision-making hinges both on discovering the options, as well as on discovering what outcome is most valued.</strong> The alternative options are external factors dictated by the situation, while the discovery of the desired outcome comes from within and from one’s values.</p>
<p>Once you understand the options and the optimal outcome, it is fairly easy to rank them or use techniques such as a <a class="link" href="https://en.wikipedia.org/wiki/SWOT_analysis" target="_blank" rel="noopener"
>SWOT matrix</a> to compare the options. With complete information, most decisions become clear and most managers would end up making the identical decision. The differentiating factor is thus not the actual decision making given the facts, but mostly at what point are different people content with the data points collected. As a general rule, if you have the time and resources, continue to push for slightly more data points even after you reach a level where you initially thought you had enough information.</p>
<p>When prioritizing data collection, try to think ahead what kind of information would definitely confirm something, or what discovery would disprove an assumption. If you start to strongly lean towards some decision already during the exploratory phase, put effort in specifically seeking for views and information that would disprove it.</p>
<h2 id="understand-severity-urgency-and-finality"><a href="#understand-severity-urgency-and-finality" class="header-anchor"></a>Understand severity, urgency and finality
</h2><p>Obviously, when the stakes are high, a manager needs to spend a lot more energy on making the decision. To ensure the correct amount of effort is spent on each decision, be explicit about assessing the importance and urgency of a decision. Small decisions should be made frequently and without a delay. If a decision has significant impact, but is not urgent, use that to your advantage and postpone it to allow more information about options to be collected.</p>
<p>Additionally, consider the finality of a decision. If the decision is easy to revert, there should be less of a reason to delay it. The <a class="link" href="https://www.youtube.com/watch?v=DcWqzZ3I2cY&t=3599s" target="_blank" rel="noopener"
>famous concept of “one-way and two-way doors”</a> helps to understand this well. No decision is ever totally free from consequences if reverted, but understanding where the decision sits on a scale of “picking a hat, haircut or tattoo” significantly helps in making a better decision.</p>
<h2 id="break-your-bias"><a href="#break-your-bias" class="header-anchor"></a>Break your bias
</h2><p>This is probably the biggest challenge, even for very experienced leaders. There is no source of absolute truth and everything we hold in our heads is a result of ingesting decades worth of information with varying degrees of trustworthiness. Even if there were some imaginary filter that ensures we only learn about things that are true, many snippets of information will eventually become outdated and false over time. Our brain is constantly in a flux of multiple levels of different thought processes, emotions and moods. We can make good decisions only by using our own brain, yet at the same time we need to be aware that we can’t fully trust our brain.</p>
<p>Setting aside the <a class="link" href="https://en.wikipedia.org/wiki/Epistemology" target="_blank" rel="noopener"
>epistemological</a> thoughts that there is no absolute knowledge, even when we could have access to the truth we might fail to recognize it if our brain is trapped by unconscious bias. To have a better chance at breaking free, everyone should familiarize themselves with the <a class="link" href="https://en.wikipedia.org/wiki/List_of_cognitive_biases" target="_blank" rel="noopener"
>common cognitive biases</a> in order to recognize when running at risk of any of them. Listening attentively to people with opposing views is also a good way to break bias.</p>
<p>You should be particularly careful in your decision <strong>if you feel you knew the decision before collecting enough data</strong> points to support it. Don’t let your quick but stupid lower limbic brain system take the driver’s seat, but instead make a continuous effort to allow your brain cortex to process things and only then make the decision.</p>
<h2 id="sleep-on-it"><a href="#sleep-on-it" class="header-anchor"></a>Sleep on it
</h2><p>The previous paragraph neatly leads us to the last, but not least, important advice on how to make good decisions: sleep on it. Ask anyone who worked with me and they will remember at least one case where I said this and postponed making the final decision by one extra day.</p>
<p>You should continue working on a decision until you think you have everything you need to make the decision, no further research or consultations are needed, and you could just announce it. However, if the decision is not urgent to the day, rather write down your decision as a draft, keep it for yourself and sleep on it. The next morning look at the draft, ask yourself if you still agree to it, and only then commit. Having that extra night of sleep not only ensures our energy levels are recharged and we are more likely to think clearly, but also allows our brain and unconscious to process the information and thoughts from the previous day. Sometimes you might wake up in the morning having realized that you missed something or that you actually value a certain outcome more than another.</p>
<p>If the decision is urgent and it can’t wait until the next day, you can still drastically increase the quality or at least the confidence of a decision by taking a break, going for a walk or at least taking a couple of deep breaths before committing.</p>
<h2 id="some-decision-is-better-than-no-decision"><a href="#some-decision-is-better-than-no-decision" class="header-anchor"></a>Some decision is better than no decision
</h2><p>The saying “sleep on it” specifically means just one night of sleep, or perhaps a weekend, but not postponing a decision for too long. If a decision is postponed for weeks or months, the circumstances are likely to change, and it is no longer the same decision. Something can of course be postponed, but in those cases the decision should be explicit that the decision is to postpone. Avoiding making explicit decisions is just bad management.</p>
<p>It is said that the Finnish army taught its leaders during WWII that if they don’t know what decision to make, then always just “hook from the right”. It was considered both detrimental to troop morale and tactically inferior to stay put in the same location on the battlefield. Keeping the troops moving and executing a maneuver, even with incomplete information about the enemy positions on their right flank and taking a huge risk, was considered a superior option compared to not making any decision at all.</p>
<p>Luckily, very few of us are forced to make decisions about matters of life and death, but all managers need to remember that making decisions is a core duty of theirs, and that taking some action that leads somewhere is better than not making a decision at all. If the decision was wrong, one should own it and learn from it, so that the next decision will be better. Indecisiveness is worse, and will eventually lead to a figurative slow death.</p>
<p>If you have principles you follow or anecdotes about making decisions, please share them in the comments!</p> Should developers always just write code and never design documents? https://optimizedbyotto.com/post/write-high-quality-design-documents/Thu, 30 May 2024 00:00:00 +0000 https://optimizedbyotto.com/post/write-high-quality-design-documents/ <img src="https://optimizedbyotto.com/post/write-high-quality-design-documents/featured-image.jpg" alt="Featured image of post Should developers always just write code and never design documents?" /><p><strong>In software engineering, most ideas can be implemented without writing any design document at all.</strong> This is particularly prominent in open source communities. For example, the Linux kernel has 35 million lines of code that have been written and rewritten many times over alongside 30 years of mailing list discussions. Linux wasn’t created as a result of a grandiose design document by Linus Torvalds, but it evolved organically in small increments of actual running code.</p>
<p>In open source and in software engineering in general, ideas are presented most of the time directly as patches that add, delete, or change specific lines of code. Those patches are not only read but also directly built, run, tested – and contested. Design and implementation are intertwined, and decisions are relatively small, quick, and done in writing.</p>
<p>However, this is not always the best way to evolve software. I would argue that <strong>even in open source, design documents are needed</strong>, and it would be good to see more of them being written than is the practice today. Let’s dive into why, when, and how to write design documents.</p>
<h2 id="top-3-benefits-reducing-risk-communicating-intent-and-growing-the-authors-ability-to-think"><a href="#top-3-benefits-reducing-risk-communicating-intent-and-growing-the-authors-ability-to-think" class="header-anchor"></a>Top-3 benefits: reducing risk, communicating intent, and growing the author’s ability to think
</h2><p>Design documents have three clear benefits. First of all, a design document helps to <strong>manage technical risk</strong> <strong>and</strong> <strong>organizational cost risk</strong>. If it takes several months or years to develop something, starting the process with a high-quality design document helps to map out unknown dimensions and decreases the technical risk of the idea being impossible to implement. If there is no design and something is developed right away, there is also a risk that it might be rejected after implementation by downstream users or collaborators, and thus all work put in would be wasted. This is the first benefit of design documents.</p>
<p>Secondly, design documents are an excellent medium to <strong>communicate the intent</strong> of the change to others. An engineering team might have stakeholders, executives, customers, maintainers of other dependent software packages, legal requirements, etc. While it’s perfectly possible to develop something without writing the design first, designing forces a clear articulation of what the idea is about and why it is needed.</p>
<p>Thirdly, <strong>writing documents grows the author’s ability to think</strong>. When writing, it becomes clear very quickly what parts of the idea are still vague, ambiguous, incomplete, or even misunderstood. It helps in revealing blind spots, giving ideas shape and detail, and thus increasing their quality. Dumping a part of your brain in a document, and then revisiting and restructuring those thoughts many times over will always lead to a better thought process and higher quality of outcome.</p>
<p>In open source, you rarely see people writing design documents or conducting formal review and approval processes. They do, however, exist, and open source developers should also practice their skills in writing design docs. A well-written design document not only conveys the merits of a great idea clearly but also shows that the author is a great thinker and fully understands what they are doing.</p>
<h2 id="why-are-design-docs-common-in-enterprise-but-rare-in-open-source"><a href="#why-are-design-docs-common-in-enterprise-but-rare-in-open-source" class="header-anchor"></a>Why are design docs common in enterprise but rare in open source?
</h2><p>The above benefits are universal and benefit any type of software development. In the enterprise software setting, these additional aspects are often true, making design documents more common:</p>
<ul>
<li>Making the change happen requires a significant investment, potentially multiple people working full-time for an extended time. A written description of how that time will be used and what it will produce is needed to get buy-in from the authority that funds the development.</li>
<li>Rolling out the change affects many other people and the software they develop and maintain. The whole change might be moot if there isn’t prior approval and commitment from the stakeholders to adapt to the change.</li>
<li>The change might involve technical risks or have security considerations, and a technical plan needs to be vetted and approved by risk bearers before implementation starts to avoid major technical catastrophes.</li>
</ul>
<p>Ad-hoc implementation may take place both in the enterprise setting and in open source, but ad-hoc is much more common in open source, as the person carrying the cost of the implementation work and the consequences is often the very same person. In contrast, in an enterprise setting, the costs and risks are carried by a larger group that needs to agree before any work starts.</p>
<h2 id="famous-series-of-design-documents-in-open-source-development"><a href="#famous-series-of-design-documents-in-open-source-development" class="header-anchor"></a>Famous series of design documents in open source development
</h2><p>In open source communities, ideas that are too large to be put into code directly typically surface first as a mailing list discussion or in an issue tracker thread. Full-fledged design documents are rare, but they do exist, particularly in large projects. Examples include:</p>
<ul>
<li><a class="link" href="https://www.rfc-editor.org/about/" target="_blank" rel="noopener"
>Request for Comments (RFC)</a> documents from The Internet Engineering Task Force</li>
<li><a class="link" href="https://peps.python.org/" target="_blank" rel="noopener"
>Python Enhancement Proposals (PEP)</a></li>
<li><a class="link" href="https://dep-team.pages.debian.net/" target="_blank" rel="noopener"
>Debian Enhancement Proposals (DEP)</a></li>
<li><a class="link" href="https://eips.ethereum.org/" target="_blank" rel="noopener"
>Ethereum Improvement Proposals (EIP)</a></li>
</ul>
<h2 id="when-to-write-a-design-document"><a href="#when-to-write-a-design-document" class="header-anchor"></a>When to write a design document
</h2><p>If in doubt, just start writing a design document. It is cheap, and you can always stop in the middle and never publish it. The mere fact that you are contemplating it is a sign that you probably have some unstructured thoughts floating around in your brain, and starting the writing process will benefit you a lot.</p>
<p>To decide if you should go all the way and actually finalize and publish a design document in contrast to just implementing the idea directly, consider the three benefits listed above: risk management, communicating intent, and building trust in the author’s ability. Are these benefits relevant for your idea? If so, publish a design document.</p>
<h2 id="how-to-start-a-design-doc-use-a-blank-sheet-of-paper"><a href="#how-to-start-a-design-doc-use-a-blank-sheet-of-paper" class="header-anchor"></a>How to start a design doc: use a blank sheet of paper
</h2><p>If you decide to start writing a design document, don’t use a template; simply start with a blank sheet. Writing down any idea — large or small — should start with a paper where the author notes down the core idea first. Don’t start from a template, and in particular, avoid the pitfall of grandiose thoughts that lead to convoluted designs and bloated documents.</p>
<p>I’ve seen many times over the past 15 years the pattern of design document templates repeat, and it has never resulted in high-quality outcomes. Various design document templates, and fancy-sounding software development methodologies in general, are surely a good business for large consulting companies, but I have never seen them actually increase the quality of software development. In the best case, templates result in good-looking documents that are thin on content, but in the worst case, they massively dumb down the authors and put them in a mode where they are exonerated from all responsibility for the contents.</p>
<p>Just start from scratch and focus on the core idea. If you can’t express it briefly and clearly in a one-pager, you need to spend more time thinking about the core idea. Polish the idea before you even consider polishing the document.</p>
<h2 id="how-to-finalize-a-design-doc-expand-and-iterate"><a href="#how-to-finalize-a-design-doc-expand-and-iterate" class="header-anchor"></a>How to finalize a design doc: expand and iterate
</h2><p>Only once the core idea is crisp and clear, and there is agreement to invest in it, should the work to polish the design document start. If there is a template, this is the point in time when it makes sense to start applying the template.</p>
<p>Having a crisp one-pager first also helps write the design document in a way that first presents the solution, and only then the motivations for the solution. If people jump directly to write the final design document, it often starts with a lengthy analysis of the problem and motivation of the solution, and only then the solution. Such a structure reflects the thought process of the author but is not a good way to structure a design document. The design doc should always start with the solution, and after that, the rest of the document exists to motivate and support the solution.</p>
<p><strong>Writing out the full design document is all about constantly expanding and iterating on it.</strong>
To flesh out all aspects, it is good to have a list of questions and ensure that they are all addressed while working through the document:</p>
<ul>
<li>What is the title? If there are similar competing or earlier designs, what is the unique identifier of this specific design doc?</li>
<li>Who is the author? Who is responsible for the design being good and correct?</li>
<li>Who is going to implement it? Who is sponsoring or funding the design or implementation work?</li>
<li>What is the status of the document? Is it, for example, just a draft, or is it pending comments or review, or has it already been approved?</li>
<li>If the document is approved, who approved it and when? How is the approval tracked? Is it easy to prove what specific version of the design each approver read when giving their approval to it?</li>
<li>What is the proposed solution exactly? Are there diagrams, pictures, prototypes, or others that cover the key parts of the proposed solution?</li>
<li>What is the scope of the solution? Was something intentionally left out and why?</li>
<li>Why should this solution be done now? What happens if it is not done at all, or if it is postponed? Are there workarounds?</li>
<li>What assumptions is the design based on? What happens to those assumptions over time? Will they still hold?</li>
<li>What alternative designs were considered? Why is the proposed solution the best?</li>
<li>What are the known trade-offs and downsides of the proposed solution? How are those being mitigated?</li>
<li>What is the work estimate and cost of the solution? What is the long-term cost and total cost of ownership?</li>
<li>What is the development and testing plan? How will it be rolled out? Can the rollout be phased? Can it be rolled back if needed?</li>
<li>What is the quality assurance process? How is security being reviewed and assured?</li>
<li>What is the impact on performance? How does the system scale? Where are the limits of scalability? What is the maximum load to be used in load testing and benchmarking?</li>
<li>How will the solution be operated, monitored, and measured?</li>
<li>How is the success of the solution measured and validated? How does one know if the solution actually worked?</li>
<li>When the solution is ready, how will it be documented and communicated? Are there different audiences that need different communications (e.g., internal vs external, developers vs users)?</li>
</ul>
<h2 id="take-your-time"><a href="#take-your-time" class="header-anchor"></a>Take your time
</h2><p><strong>Designing is not fast.</strong> Authors should not expect to be able to sit down one day and write a design document. A good idea takes time to mature, several revisions of writing down to become clear, and a lot of time spent on waiting and getting feedback. For this reason, designing can’t be the main task for anybody but should be done on the side of other work. The design of the next idea should typically be in progress already while implementing the previous one.</p>
<p>Jeff Bezos famously wrote in the <a class="link" href="https://www.aboutamazon.com/news/company-news/2017-letter-to-shareholders" target="_blank" rel="noopener"
>2017 letter to shareholders</a>:</p>
<blockquote>
<p>The great memos are written and re-written, shared with colleagues who are asked to improve the work, set aside for a couple of days, and then edited again with a fresh mind. They simply can’t be done in a day or two.</p><span class="cite"><span>― </span><span>Jeff Bezos, founder of Amazon</span><cite></cite></span></blockquote>
<p>A complete and high-quality design document takes a lot of calendar time. A good design matures like a bottle of wine. It can’t be forced to take shape quickly. Designing is like practicing wisdom – give it time.</p> Heartbleed and XZ backdoor learnings: open source infrastructure can be improved efficiently with moderate funding https://optimizedbyotto.com/post/what-heartbleed-xz-utils-had-in-common/Sun, 07 Apr 2024 00:00:00 +0000 https://optimizedbyotto.com/post/what-heartbleed-xz-utils-had-in-common/ <img src="https://optimizedbyotto.com/post/what-heartbleed-xz-utils-had-in-common/featured-image.jpg" alt="Featured image of post Heartbleed and XZ backdoor learnings: open source infrastructure can be improved efficiently with moderate funding" /><p>The XZ Utils backdoor, discovered last week, and the Heartbleed security vulnerability ten years ago, share the same ultimate root cause. Both of them, and in fact all critical infrastructure open source projects, should be fixed with the same solution: ensure baseline funding for proper open source maintenance.</p>
<p>Open source software is the foundation of much of our digital infrastructure. From web servers to encryption libraries to operating systems, open source code powers systems that millions rely on daily. The open source model has proven tremendously successful at producing innovative, reliable, and widely used software.</p>
<p>However, the Heartbleed vulnerability in OpenSSL and the recent backdoor discovered in the XZ Utils compression library have highlighted potential weaknesses in how open source software is funded, developed, and maintained. <strong>These incidents showed that even very widely used open source projects can have serious, undiscovered bugs due to lack of resources.</strong></p>
<h2 id="learnings-from-heartbleed"><a href="#learnings-from-heartbleed" class="header-anchor"></a>Learnings from Heartbleed
</h2><p>Today, April 7th, 2024, marks the 10-year anniversary since <a class="link" href="https://www.cve.org/CVERecord?id=CVE-2014-0160" target="_blank" rel="noopener"
>CVE-2014-0160</a> was published. This security vulnerability known as <a class="link" href="https://en.wikipedia.org/wiki/Heartbleed" target="_blank" rel="noopener"
>“Heartbleed”</a> was a flaw in the <a class="link" href="https://www.openssl.org/" target="_blank" rel="noopener"
>OpenSSL</a> cryptography software, the most popular option to implement <a class="link" href="https://en.wikipedia.org/wiki/Transport_Layer_Security" target="_blank" rel="noopener"
>Transport Layer Security (TLS)</a>. In more layman’s terms, if you type <code>https://</code> in your browser address bar, chances are high that you are interacting with OpenSSL.</p>
<p>The fallout from Heartbleed was immense, prompting widespread panic among developers, businesses, and users alike. About one-fifth of all web servers in the world at the time were believed to be vulnerable to the attack, allowing theft of the servers’ private keys and users’ session cookies and passwords.</p>
<p>The software bug existed in OpenSSL’s codebase for over two years before being discovered. While code reviews were in place, the bug wasn’t spotted and went into OpenSSL’s source code repository on New Year’s Eve, December 31st, 2011. At the time, the OpenSSL project was maintained by a small 4-person team with limited funding and basically working as volunteers, driven by just the importance of their mission.</p>
<p><strong>This was the ultimate root cause – a piece of software that had started as a hobby project (<a class="link" href="https://en.wikipedia.org/wiki/History_of_Linux#The_creation_of_Linux" target="_blank" rel="noopener"
>just like Linux</a>) grew over time and became part of the Internet infrastructure, but there was no mechanism to ensure resources would grow to be able to maintain it well long-term.</strong></p>
<p>In April 2014, the <a class="link" href="https://www.linuxfoundation.org/" target="_blank" rel="noopener"
>Linux Foundation</a> Executive Director Jim Zemlin seized the opportunity to get visibility and managed to get Amazon Web Services, Cisco, Dell, Facebook, Fujitsu, Google, IBM, Intel, Microsoft, NetApp, Qualcomm, Rackspace, and VMware to <a class="link" href="https://arstechnica.com/information-technology/2014/04/tech-giants-chastened-by-heartbleed-finally-agree-to-fund-openssl/" target="_blank" rel="noopener"
>all pledge to commit at least $100,000 a year for at least three years</a> to the <a class="link" href="https://en.wikipedia.org/wiki/Core_Infrastructure_Initiative" target="_blank" rel="noopener"
>Core Infrastructure Initiative</a>. The initiative continued for many years and eventually transformed into the <a class="link" href="https://openssf.org/" target="_blank" rel="noopener"
>Open Source Security Foundation</a>. Also due to Heartbleed, the European Commission launched the <a class="link" href="https://joinup.ec.europa.eu/collection/eu-fossa-2" target="_blank" rel="noopener"
>EU-Free and Open Source Software Auditing project</a> and spent at least a million euros on auditing OpenSSL, the Apache Server, KeePass, and other security-critical open source software.</p>
<p>This relatively modest funding, along with code audits and process improvements, allowed OpenSSL to become more secure and sustainable. Today the OpenSSL project is thriving: it is <a class="link" href="https://www.openssl.org/blog/blog/2024/01/23/fips-309/" target="_blank" rel="noopener"
>FIPS 140-2 certified</a> and has a healthy base of both financial and code contributors.</p>
<h2 id="learnings-from-the-xz--liblzma-library-backdoor"><a href="#learnings-from-the-xz--liblzma-library-backdoor" class="header-anchor"></a>Learnings from the XZ / liblzma library backdoor
</h2><p>While there are surely still more details to uncover in the coming weeks, when the news broke about the <a class="link" href="https://en.wikipedia.org/wiki/XZ_Utils_backdoor" target="_blank" rel="noopener"
>XZ compression software backdoor</a> (<a class="link" href="https://www.cve.org/CVERecord?id=CVE-2024-3094" target="_blank" rel="noopener"
>CVE-2024-3094</a>), it was immediately clear that <strong>it happened because XZ had become hugely popular and widely used but was still maintained by one single overworked person as a spare time project</strong>. A well-resourced malicious actor was able to manipulate and pressure the maintainer to give them commit access, and thus the software supply chain was compromised. We should not blame the original maintainer, but rather everyone else for not realizing how widely used XZ was, yet going by with very little support and resources.</p>
<p>A huge number of applications depend on XZ. Right now the priority should be to offer help to maintain it properly, both upstream and at various downstreams, such as in Linux distributions, so the <a class="link" href="https://en.wikipedia.org/wiki/Software_supply_chain" target="_blank" rel="noopener"
>whole software supply chain is secured</a>. It does not require a massive effort – just having a couple more maintainers to share the maintenance and review work should go a long way.</p>
<h2 id="would-we-be-better-off-with-closed-source-software"><a href="#would-we-be-better-off-with-closed-source-software" class="header-anchor"></a>Would we be better off with closed-source software?
</h2><p>In both cases, the vulnerabilities were fixed quickly because the world had access to the source code of the affected software. This is a major advantage of open source software: it allows anyone to inspect the code and find potential vulnerabilities, <strong>and submit fixes to them</strong>.</p>
<p>In the case of Heartbleed, Google’s security team reported it to OpenSSL first, but the Finnish national <a class="link" href="https://www.kyberturvallisuuskeskus.fi/en/" target="_blank" rel="noopener"
>NCSC-FI</a> has records of local cybersecurity company Codenomicon reporting it independently. In the case of XZ, a Microsoft employee and PostgreSQL developer Andres Freund found the backdoor while doing performance regression testing in a Debian Linux development version. It was a huge fluke of luck that the XZ backdoor didn’t go in any actual Linux distribution releases. Next time we might not be as lucky, so more reviews, testing, and validation are needed. It will need resources, but at least public review is possible – thanks to this infrastructure-level software being open source.</p>
<p><strong>Public scrutiny, testing, and validation are not possible for closed-source software.</strong> In fact, if closed-source code gets backdoored, it will go unnoticed for a much longer time. For example, the <a class="link" href="https://en.wikipedia.org/wiki/2020_United_States_federal_government_data_breach" target="_blank" rel="noopener"
>2020 U.S. government data breach</a> was possible due to multiple backdoors and flaws that went undetected for a long time in closed-source software from SolarWinds, Microsoft, VMware, and Zerologon. In theory, companies always have money (unless they are bankrupt), but in practice, the pressure to channel that money into software review and testing varies wildly, and working without exposure to public scrutiny often incentivizes companies to skimp on security to maximize profits.</p>
<p>Thus, I firmly believe in open source software having a better overall security posture as long as there are reasonable resources. And if the source code is public, anybody can audit how active the maintenance is and thus also the <strong>fact that maintenance is funded itself is a public and auditable property of open source</strong>.</p>
<h2 id="pledge-for-funding-and-participation"><a href="#pledge-for-funding-and-participation" class="header-anchor"></a>Pledge for funding and participation
</h2><p>Both Heartbleed and the XZ backdoor incident underscore the critical role that open source software plays in powering the digital infrastructure of today’s world. Such important and widely used projects shouldn’t be struggling to get by. It’s time for companies to step up and provide reasonable funding to the projects they depend on.</p>
<p>You don’t need billions to meaningfully improve open source security – the OpenSSL example shows that even modest funding increases can have an outsized impact. A tiny slice of the corporate IT budget pie could go a long way. Additionally, some of the <strong>government defense spending should be funneled into key open source software projects</strong> that our society relies on.</p>
<p>The incidents of Heartbleed and the XZ backdoor serve as sobering reminders of the vulnerabilities that may exist within our open source infrastructure today. However, they also present an opportunity for positive change. By investing in the security and maintenance of open source projects through moderate funding and support, we can enhance the resilience of our digital infrastructure and ensure a safer and more secure internet for all.</p> Communication is the key to efficiency in a software engineering organization https://optimizedbyotto.com/post/efficient-communication-software-engineering-org/Sun, 31 Mar 2024 00:00:00 +0000 https://optimizedbyotto.com/post/efficient-communication-software-engineering-org/ <img src="https://optimizedbyotto.com/post/efficient-communication-software-engineering-org/featured-image.jpg" alt="Featured image of post Communication is the key to efficiency in a software engineering organization" /><p>For a software engineering organization to be efficient, <strong>it is key that everyone is an efficient communicator</strong>. Everybody needs to be calibrated in <em>what</em> to communicate, to <em>whom</em> and <em>how</em> to ensure information spreads properly in the organization. Having smart people with a lot of knowledge results in progress only if information flows well in the veins of the organization.</p>
<p>This does not mean that everyone needs to communicate everything – on the contrary, it is also the responsibility of every individual to make sure there is the right amount of communication, not too much and not too little. From an individual point of view, it is also important to be a good communicator, both verbally and in <a class="link" href="https://optimizedbyotto.com/post/writing-tips-for-software-professionals/" >writing</a>, as that defines to a large degree how professionally others will perceive you.</p>
<p>Reflecting on the principles below may help <strong>both your organization and you personally</strong> to become a more efficient communicator.</p>
<h2 id="communicate-early-and-exactly"><a href="#communicate-early-and-exactly" class="header-anchor"></a>Communicate Early and Exactly
</h2><p>Foster a culture where people share small updates early. <strong>When you introduce a change, describe it immediately.</strong> Don’t accept “I will document it later” from yourself or others. People are interested in the change when they learn about it, and should be able to immediately read up on Git commit messages, ticket communications, etc. When you make a change that affects the workflow of others, announce it immediately. Don’t wait until other people run into problems and start asking questions.</p>
<p>When you announce a change, be exact. If the change has a commit id or a URL, or if there is a document that describes it in detail, reference it. Avoid abbreviations and spell out the names of things to avoid misunderstandings. Use the same name consistently when referencing the same thing. Don’t be vague if being exact requires just a couple of seconds more of effort. If you know something, spell it out and don’t force other people to guess. In a larger organization, it might even make sense to have a written vocabulary to ensure that people understand the daily jargon and assign the same meaning to the words used.</p>
<p>Keep in mind that you, the announcer, are <strong>one</strong> person, but your audience consists of <strong>many</strong> people: if you take additional time and effort to be precise, you may save a great deal of repeated effort by many others to determine precisely what you were referring to. When you see other people putting effort into making easy-to-understand, brief, and crisp communication, thank them for it.</p>
<h2 id="use-the-right-channels"><a href="#use-the-right-channels" class="header-anchor"></a>Use the Right Channels
</h2><p><strong>Always keep communication in context.</strong> If a piece of code does something that needs an explanation, do it in the inline comments rather than in an external document. For example, if the user of a software application needs guidance, don’t offer it on a completely separate system that is hard to discover, but instead offer it via the user interface of the software itself, or in a <a class="link" href="https://en.wikipedia.org/wiki/Man_page" target="_blank" rel="noopener"
>man page</a> or a <a class="link" href="https://en.wikipedia.org/wiki/README" target="_blank" rel="noopener"
>README</a> that a user is likely to come across when trying to use the software. If there is a bug that needs to be debugged and fixed, discuss it in the issue tracker about the bug itself, not in a separate communication channel elsewhere. If there is code review open on GitHub or GitLab, don’t discuss it on Slack or equivalent, but do it in the code review comments – as it was designed for. Always communicate about something as closely as possible to the subject of the message.</p>
<p><strong>Prefer asynchronous channels over synchronous channels.</strong> Chat systems like Slack or Zulip are better than a phone call. A well-written email is better than scheduling a 30-minute meeting. In an async channel, people can process incoming communication and respond in their own time. Sometimes a response requires some research and is not possible in real-time anyway. Having to juggle schedules can also be a wasteful use of time compared to working through a queue of tasks. Interrupting software engineers is very costly, as it can take tens of minutes before one gets back into “flow” and back to cracking a hard engineering problem. Also, as many teams today work across many time zones, you might need to wait until the next day for the reply anyway.</p>
<p>When using Slack and similar chat software, try to pick the most appropriate channel. Avoid private one-on-one discussions unless the matter is personal or confidential. The discussion in a public channel is more inclusive and allows others to join spontaneously or to be pinged in the discussion to join it. In Slack and similar chat systems that have threads, use them correctly to make it easier for participants to follow up on the topic while at the same time keeping the main channel cleaner. Avoid using <em>@here</em> on channels with more than 20 participants. Get into the habit of using <code>Shift+Enter</code> to write multiple paragraphs in one message instead of multiple short ones in succession which might cause unnecessary notifications.</p>
<p>In chat systems, do <strong>not</strong> send people messages that only say “<a class="link" href="https://nohello.net/" target="_blank" rel="noopener"
>Hello</a>”. Get straight to the point.</p>
<p><strong>Have a chat when it is appropriate.</strong> If you feel there is miscommunication and you can’t resolve it async with well-written and thoughtful messages, a short chat 1:1 or with a small group of people can bring a lot of clarity. An in-person meeting or video chat usually works best, as both parties can read each other’s cues to see that they understand and can follow the topic.</p>
<h2 id="teams-exist-to-channel-the-flow-of-information-in-the-veins-of-an-organization"><a href="#teams-exist-to-channel-the-flow-of-information-in-the-veins-of-an-organization" class="header-anchor"></a>Teams Exist to Channel the Flow of Information in the Veins of an Organization
</h2><p>Team members interact mostly with others inside their team. It is not the responsibility of individual team members to know what people in other teams are doing. If something noteworthy is happening or is planned to happen, it is the responsibility of the team lead to communicate that upwards and laterally along the organizational lines. The team lead is also responsible for the inward flow of information and making team members aware of things that are relevant to the team.</p>
<p><strong>The reason teams exist is to limit the flow of information.</strong> Most organizations are divided into 5–15 person teams simply because if teams were very large with 20 or more people, the overhead of everybody communicating with everybody would eat up too much time.</p>
<p>With this in mind, please be considerate and try to avoid approaching individual engineers in other teams too often. <strong>Channel communication through managers and architects, who are responsible for gatekeeping and prioritizing things.</strong> In a large organization, if you notice that people are reaching out to you personally all the time, just politely refer those requests to your manager.</p>
<p>In particular, when doing cross-org communication for large groups of people, think about the <a class="link" href="https://github.com/github/how-engineering-communicates" target="_blank" rel="noopener"
>signal vs noise</a> ratio. Free flow of information may sound like a noble principle, but a lot of information does not necessarily convert into real knowledge sharing. Write and share summaries instead of raw information. Be deliberate in selecting who should know what and when.</p>
<h2 id="make-meetings-intentionally-efficient"><a href="#make-meetings-intentionally-efficient" class="header-anchor"></a>Make Meetings Intentionally Efficient
</h2><p>Principles for <a class="link" href="https://optimizedbyotto.com/post/tips-for-efficient-meetings/" >good meetings</a>:</p>
<ul>
<li>If you organize a meeting, make sure it has an agenda in the meeting invite so attendees know what the meeting is about, and have a chance to prepare for the meeting in advance. The agenda also allows people to make a better-informed decision if they can skip the meeting or not</li>
<li>If the meeting makes decisions, those should be written down somewhere (e.g. design doc, issue, ticket, meeting minutes). People tend to forget, so there must be some way to recall what the meeting decided.</li>
<li>Don’t invite too many people. If there are more than 5 attendees in a 30-minute meeting, there will be no genuine discussions as there would be less than 5 minutes of speaking time per participant.</li>
<li>Don’t attend all possible meetings. If dozens of people are invited to the meeting and it seems like an announcement event rather than a discussion, maybe just skip it and read the announcement documents instead.</li>
</ul>
<h2 id="practice-efficient-statusprogress-communication"><a href="#practice-efficient-statusprogress-communication" class="header-anchor"></a>Practice Efficient Status/Progress Communication
</h2><p>The purpose of progress information is to allow others to learn the state of an issue and allow them to adapt their own work in relation to that issue. Issue status information also helps the author themself to remember where and in what state they left something, essentially being communication to their future self.</p>
<p>Good principles to follow:</p>
<ul>
<li><strong>Avoid duplication.</strong> If an issue tracking system is in use in an organization, and an issue tracker entry has been filed, maintain it and do not disperse the information out in multiple places. Focus your energy on making sure the issue tracker is up-to-date so followers of that issue don’t need to ask about status or search for separate updates in email or old chat messages.</li>
<li><strong>Keep status information current.</strong> There is no point in a status-tracking system if the statuses are out-of-date. On the other hand, there is no need to update the status daily. A good rule of thumb is to update important status information immediately when it happens and less important statuses perhaps bi-weekly or monthly, depending on what the normal cadence of reviewing and prioritizing work is.</li>
<li><strong>Annotate status changes.</strong> If an issue was closed but there is no comment whatsoever on why it was closed, it will raise more questions than it answers. When updating the status of issues and in particular when closing them, add a comment on what changed and why the status changed. Use to your convenience the feature present in most issue tracking software (e.g. GitLab and GitHub) that automatically closes issues when a commit with a closing note lands on the mainline Git branch, and those status updates are automatically annotated with a link to the change, including date and author.</li>
<li><strong>No news is bad news.</strong> In the context of status information and progress communication, people tend to view a lack of communication as a sign of a lack of progress. If something is blocked and there is no progress, a quick message noting “no progress” is better than silence and letting people stare at issues with no updates. Eventually people will start to worry and will reach out for updates, so skipping status updates to save effort might not actually save any effort.</li>
<li><strong>Remember the purpose.</strong> At the end of the day, progress is more important than communication. If a task is <em>small</em> and you work on it <em>alone</em>, issues/status reporting may be omitted completely. If you find yourself spending more effort on communication about an issue than working on the issue itself, something is wrong with the overall process and you should review it.</li>
</ul>
<h2 id="give-and-get-feedback"><a href="#give-and-get-feedback" class="header-anchor"></a>Give and Get Feedback
</h2><p><strong>Be honest.</strong> Engineering is about building stuff that works, and a lot of effort goes into making sure stuff actually works. That requires a constant loop of testing and feedback. If you spot something that is off, report it. Don’t waste time on sugar-coating when communicating from a professional engineer to another, but just report the problem you’ve spotted and do it exactly.</p>
<p><strong>Be grateful for all the feedback you get.</strong> Thank the person for taking the time to give feedback. The more you get, the better. Never scold another engineer for giving you feedback you don’t like. Stay professional and just read/listen and try to understand it.</p>
<p>Not all feedback is valid, and <strong>as a professional you choose what feedback you act on</strong>. If you don’t understand the feedback, ask for clarification. <strong>Engineering is, and should be, full of debate</strong> about what is the wrong or right solution, so that the chances of landing on the actually right solution are maximized. Be professional and don’t take those discussions emotionally. Engage in them, and make sure all data points are analyzed. Intentionally challenge your own bias and preconceptions, and try to reach the best conclusion you can.</p>
<h2 id="less-is-more"><a href="#less-is-more" class="header-anchor"></a>Less Is More
</h2><p>Delete stuff that is outdated or obsolete. Remove weasel words and duplicate texts. If you come across some documentation that is clearly outdated but might be needed for archival purposes, then add a banner to it stating that it is no longer up-to-date and kept only for archival purposes (in particular in a wiki where anybody can contribute to maintaining the contents). False information may cause more harm than no information.</p>
<p>Avoid <a class="link" href="https://en.wikipedia.org/wiki/Link_rot" target="_blank" rel="noopener"
>link rot</a>. If a document is moved from one place to another, delete the old version and replace it with a link to the new one.</p>
<p>Always shorten and simplify code when you can do so without sacrificing other desirable qualities: correctness, readability, maintainability, and consistency with the rest of the codebase. For example, if you are writing or maintaining 10 unit tests which are 20 lines of code each, but differ only in a couple of inputs and outputs, then combine them into a single parameterized test.</p>
<p>Layering and abstractions are valuable techniques for writing reusable and correct code. However, <strong>too much abstraction</strong> can make code difficult to understand and reason about. An integrated development environment (IDE) can be a useful tool for quickly navigating a code base. However, if you <em>have to</em> use an IDE to follow the code’s logic, it is a telltale sign that the code is way too abstracted.</p>
<h2 id="a-good-coder-is-also-good-at-writing-in-human-languages"><a href="#a-good-coder-is-also-good-at-writing-in-human-languages" class="header-anchor"></a>A Good Coder Is Also Good at Writing in Human Languages
</h2><p>The primary goal for code is to be easy to understand and follow. Optimize for readability and maintainability. Do not optimize for speed or build layers of abstractions upfront. Instead, do such things only later on if and when you really need to.</p>
<p>Principles to make code easy to understand:</p>
<ul>
<li><strong>Start with good naming:</strong> Files, functions, variables should all be named in such a way that one can guess from the name what they do or contain. Don’t be afraid of changing a name if you realize that something else would describe it much better. Functionality and contents evolve – so should the naming.</li>
<li><strong>Use the project’s coding conventions.</strong> A suboptimal but consistent convention is better than mixing multiple conventions in one code base. Use correct indentation, line length, white space, etc. to enhance the readability of your code. Keep the flow easy to follow.</li>
<li><strong>Add inline comments</strong> in places that require some additional explanation of what the code does or why the code was written in a particular way. Good inline comments prevent the code from being deleted or refactored by somebody else, or by yourself a year later when you can no longer recall the details.</li>
<li><strong>Longer or higher-level documentation should go into README files.</strong> The convention of using README files in code repositories is a great application of coupling code and documentation. A README is easy to discover in a code repository, and very likely to get updated by the same commits that update the code itself. If a code repository completely lacks a README file, the code is most likely not intended to be long-lived and should be avoided.</li>
</ul>
<h2 id="software-engineers-need-to-excel-at-making-git-commits"><a href="#software-engineers-need-to-excel-at-making-git-commits" class="header-anchor"></a>Software Engineers Need to Excel at Making Git Commits
</h2><p>Last but not least - remember that good software engineers write both <strong>great code</strong> and <strong><a class="link" href="https://optimizedbyotto.com/post/good-git-commit/" >brilliant Git commit messages</a></strong>. The more senior the engineer, the more they value both the code <em>and</em> the description of it, and the whole <a class="link" href="https://optimizedbyotto.com/post/how-to-code-review/" >feedback cycle</a> that eventually leads to making the world a better place – or at least one piece of software better. Practice this skill even if it requires a bit of extra effort initially.</p>
<h2 id="if-you-are-not-a-native-speaker-invest-in-improving-your-english-skills"><a href="#if-you-are-not-a-native-speaker-invest-in-improving-your-english-skills" class="header-anchor"></a>If You Are Not a Native Speaker, Invest in Improving Your English Skills
</h2><p>Most of the world’s population are not native English speakers – including myself. Yet, as English is the <a class="link" href="https://en.wikipedia.org/wiki/Lingua_franca" target="_blank" rel="noopener"
>lingua franca</a> of the software profession world, we all need to put effort into becoming more fluent in English. The best way to become fluent is to simply force yourself to read, write, listen, and speak English, and to do it in a way where you intentionally try to improve your English. Personally, I, for example, watch YouTube videos in well-articulated English, such as the <a class="link" href="https://www.youtube.com/@TheQIElves" target="_blank" rel="noopener"
>British comedy panel show QI</a>, or listen to podcasts attentively, trying to pick up new expressions that help articulate ideas and opinions more accurately, and in general to expand my vocabulary.</p>
<h2 id="high-quality-communication-facilitates-high-quality-engineering"><a href="#high-quality-communication-facilitates-high-quality-engineering" class="header-anchor"></a>High-Quality Communication Facilitates High-Quality Engineering
</h2><p>From an organizational point of view, <strong>it doesn’t matter how many amazingly smart engineers you hire if there are not proper mechanisms in place to ensure that the right amount of relevant information flows</strong> between the experts. Efficient communication is vital also for growing junior engineers quickly. You don’t want to have any engineers wasting time trying to solve problems that have already been solved. The whole organization will be <strong>vastly more productive <em>if</em> everyone is able to find information, easily and quickly at the time they need it</strong>.</p>
<p>Achieving it is not hard nor expensive – it just requires setting a couple of ground rules, reflecting on their meaning, and executing them consistently. Managers play a vital role in achieving a strong communication culture by leading with their example and showing that good communication is valued across the organization.</p> 8 writing tips for software professionals https://optimizedbyotto.com/post/writing-tips-for-software-professionals/Sun, 24 Mar 2024 00:00:00 +0000 https://optimizedbyotto.com/post/writing-tips-for-software-professionals/ <img src="https://optimizedbyotto.com/post/writing-tips-for-software-professionals/featured-image.jpg" alt="Featured image of post 8 writing tips for software professionals" /><p>People usually associate advanced software engineering with gray-bearded experts with vast knowledge of how computers and things like compiler internals work. However, having technical knowledge is just the base requirement to work in the field. In my experience, the <strong>greatest minds</strong> in the field are not just experts in knowledge, but also <strong>extremely efficient communicators, particularly in writing</strong>.</p>
<p>Following these 8 principles can help you maximize your efficiency in written communication:</p>
<h2 id="1-less-is-more"><a href="#1-less-is-more" class="header-anchor"></a>1. Less is more
</h2><p>In a workplace setting, the ability to summarize something in three sentences is far more valuable than the ability to write fancy-looking research papers. Forget school assignments with minimum lengths – in reality, you need to put in effort to specifically keep it short.</p>
<h2 id="2-start-with-the-solution-or-the-ask"><a href="#2-start-with-the-solution-or-the-ask" class="header-anchor"></a>2. Start with the solution or the ask
</h2><p>Unless you are a professional novel writer building up an arc of drama, your readers are most likely not captivated enough to read all of your text fully. Therefore, you need to put forward your main <strong>suggestion</strong> or <strong>request</strong> as early in the text as possible. In ideal cases, the main message you want to convey is already in the title.</p>
<h2 id="3-show-the-facts-with-examples"><a href="#3-show-the-facts-with-examples" class="header-anchor"></a>3. Show the facts, with examples
</h2><p>If you are an expert, people will value your opinions. But it is always much more compelling if they are delivered with supporting facts, numbers, timelines, and references. Ideally, there is a reliable source to refer to or an indicator or statistic to look at, but a couple of anecdotal case examples also work well as both evidence and as a concrete story to showcase cause and effect.</p>
<h2 id="4-always-quantify"><a href="#4-always-quantify" class="header-anchor"></a>4. Always quantify
</h2><p>A number is always more expressive than an adjective. Instead of a vague “<em>expensive</em>”, just write “<em>500 USD/h</em>” if the price is known. Don’t state that something is “<em>significantly faster</em>” as it does not actually mean anything. Saying, for example, “<em>travel time decreased to 5 hours (down 30% from 7 hours)</em>” paints a much more accurate picture.</p>
<h2 id="5-include-links-and-references"><a href="#5-include-links-and-references" class="header-anchor"></a>5. Include links and references
</h2><p>Instead of a verbal reference like “<em>read the report for more,</em>” do a service to readers and include a direct URL they can simply click. When describing a system or a problem, include the documentation link or issue tracker identifier.</p>
<h2 id="6-explain-why-it-matters"><a href="#6-explain-why-it-matters" class="header-anchor"></a>6. Explain why it matters
</h2><p>After stating facts, ask yourself “<em>so what?</em>”. Cater to readers who are not fully familiar with the domain by being explicit on <strong>why</strong> something matters and <strong>what it means</strong>, in as concrete terms as possible.</p>
<h2 id="7-ask-feedback-from-one-person"><a href="#7-ask-feedback-from-one-person" class="header-anchor"></a>7. Ask feedback from one person
</h2><p>Before sending out a text to a large group of recipients, ask one person to read it first. If your main message does not get across, iterate on your text until at least one person understands it in the way you intended. If the text has great significance, you might continue to ask for feedback from two or three more people, but remember that everyone has an opinion, and there is no guarantee that getting more opinions will converge on one opinion. Asking multiple people for opinions is not directly bad, but perhaps wasteful, as it quickly leads to diminishing returns.</p>
<h2 id="8-sleep-on-it"><a href="#8-sleep-on-it" class="header-anchor"></a>8. Sleep on it
</h2><p>When it comes to your own text, <em>the most important opinion is your own</em>. A good way to figure out what <strong>you really want</strong> and value is to write a text, put it away, and then return to it one or more days later and ask yourself <strong>if you still</strong> really agree with it.</p>
<h2 id="sender-is-responsible-for-delivery"><a href="#sender-is-responsible-for-delivery" class="header-anchor"></a>Sender is responsible for delivery
</h2><p>Last but not least, remember <strong>it is the responsibility of the broadcaster to make sure the message was received.</strong> Don’t assume people received and saw your message, or that they read it, or that they understood what they read. You need to put in the effort to prepare your message and follow up on how it was received.</p>
<p>Writing well is also a way to show respect for the reader’s intellect and time. Think about it this way: If you send a message to a hundred people and expect them to spend 6 minutes each reading it, you are spending 600 minutes (10 hours) of organizational time. If you spend 15 minutes extra to polish your message so it can be read and understood in just 2 minutes, you save the organization almost a full workday (600 minutes vs. 15 + 200 minutes equals 385 minutes or 6 ½ hours less).</p>
<p>It does not matter how good your idea is if the text describing it is bad. If you practice writing well, people near and far will become more receptive to your ideas.</p>
<h2 id="what-are-your-tips"><a href="#what-are-your-tips" class="header-anchor"></a>What are your tips?
</h2><p>Are you a seasoned professional who masters written communication? What are your tips? Please comment below!</p> Tab-tastic tips for streamlined web browser use https://optimizedbyotto.com/post/web-browser-tab-tips/Fri, 08 Mar 2024 00:00:00 +0000 https://optimizedbyotto.com/post/web-browser-tab-tips/ <img src="https://optimizedbyotto.com/post/web-browser-tab-tips/featured-image.jpg" alt="Featured image of post Tab-tastic tips for streamlined web browser use" /><p>What is the single most common action you repeat over and over when using your computer? Let me guess – opening a new tab in the browser. Here are my tips for opening, switching and closing tabs everyone should know.</p>
<h2 id="opening-a-tab"><a href="#opening-a-tab" class="header-anchor"></a>Opening a tab
</h2><p>This one most people know: press <code>Ctrl+T</code> to open a new tab. But did you know that you don’t always need to type a URL or start a web search? <strong>You can also jump directly to the content you wanted to view by using custom address bar shortcuts.</strong></p>
<p>All popular browsers support defining custom keywords so that what you type in the address bar can take you where you are going even faster. In Chrome (and <a class="link" href="https://en.wikipedia.org/wiki/Chromium_%28web_browser%29" target="_blank" rel="noopener"
>Chromium</a>) you can customize what shortcuts can be used in the address bar by opening <em>Settings > Search engine > Manage search engines and site search</em>. Below are my favorite custom searches.</p>
<h3 id="ask-with-perplexity-ai"><a href="#ask-with-perplexity-ai" class="header-anchor"></a>Ask with Perplexity AI
</h3><p>Want to quickly ask an AI for something? Just configure <code>@p</code> in your shortcuts to query <code>https://www.perplexity.ai/search?q=%s&focus=internet</code> and you are never further than a couple key strokes away from asking <a class="link" href="https://www.perplexity.ai/" target="_blank" rel="noopener"
>Perplexity AI</a>.</p>
<p><img src="https://optimizedbyotto.com/post/web-browser-tab-tips/chrome-shotcut-perplexity.gif"
width="942"
height="704"
loading="lazy"
alt="Ask Perplexity AI a question directly from the browser address bar"
class="gallery-image"
data-flex-grow="133"
data-flex-basis="321px"
>
</p>
<p>I used to always <em>google</em> everything I wanted to know, but nowadays I find myself doing it less and less. Instead, I type <code>@p <question></code> in the address bar, press enter and immediately get the answer from Perplexity along with links to the information sources. No more wasting time on skimming through irrelevant search result pages!</p>
<h3 id="open-a-man-page-instantly"><a href="#open-a-man-page-instantly" class="header-anchor"></a>Open a man page instantly
</h3><p>Yes, any <a class="link" href="https://en.wikipedia.org/wiki/Man_page" target="_blank" rel="noopener"
>man page</a> can be accessed easily by running on the command-line <code>man</code> followed by the command name. But reading man pages in a browser window with nice fonts and in a separate window next to the command-line window is much more ergonomic and an easier way to craft commands. For this use case, I have configured the shortcut <code>@man</code> that jumps to the latest version of the man page in Debian using URL <code>https://dyn.manpages.debian.org/jump?suite=unstable&language=en&q=%s</code>.</p>
<p><img src="https://optimizedbyotto.com/post/web-browser-tab-tips/chrome-custom-search-address-bar-keywords.png"
width="1200"
height="627"
srcset="https://optimizedbyotto.com/post/web-browser-tab-tips/chrome-custom-search-address-bar-keywords_hu8377311690373025271.png 480w, https://optimizedbyotto.com/post/web-browser-tab-tips/chrome-custom-search-address-bar-keywords.png 1200w"
loading="lazy"
alt="Custom search engine configuration view in Chrome"
class="gallery-image"
data-flex-grow="191"
data-flex-basis="459px"
>
</p>
<h3 id="jump-to-any-google-drive-file-or-folder-quickly"><a href="#jump-to-any-google-drive-file-or-folder-quickly" class="header-anchor"></a>Jump to any Google Drive file or folder quickly
</h3><p>Oddly enough, Chrome does not have any built-in shortcut to Google Drive. Adding a shortcut with this url will achieve it <code>https://drive.google.com/drive/u/0/search?q=%s</code>.</p>
<p><img src="https://optimizedbyotto.com/post/web-browser-tab-tips/chrome-shotcut-google-drive-search.gif"
width="700"
height="277"
loading="lazy"
alt="Search Google Drive directly from the browser address bar"
class="gallery-image"
data-flex-grow="252"
data-flex-basis="606px"
>
</p>
<h2 id="jumping-_between_-tabs"><a href="#jumping-_between_-tabs" class="header-anchor"></a>Jumping <em>between</em> tabs
</h2><p>If you are like me and have dozens of tabs open simultaneously, learn to use keyboard shortcut <code>Ctrl+Tab</code>. This will jump to the next tab. Pressing <code>Ctrl+Shift+Tab</code> will do the same in reverse direction. By pressing <code>Ctrl+1</code> you can instantly jump to the first tab, and with <code>Ctrl+2</code> to the second tab and so forth. This is handy in particular if your first tabs are pinned and always have your e-mail or calendar and you need to open them frequently.</p>
<p>Too many tabs to cycle through them? No worries, you can always press <code>Ctrl+Shift+A</code> to open a dialog where you can search the tab based on the website title.</p>
<p><img src="https://optimizedbyotto.com/post/web-browser-tab-tips/chrome-search-tabs.png"
width="333"
height="376"
srcset="https://optimizedbyotto.com/post/web-browser-tab-tips/chrome-search-tabs.png 333w"
loading="lazy"
alt="Searching open tabs after pressing Ctrl+Shift+A"
class="gallery-image"
data-flex-grow="88"
data-flex-basis="212px"
>
</p>
<p>In Chrome you can also type <code>@tabs</code> in the address bar to search your open tabs, or <code>@history</code> to search tabs and pages you recently closed.</p>
<h2 id="close-a-tab-or-reopen-a-closed-tab"><a href="#close-a-tab-or-reopen-a-closed-tab" class="header-anchor"></a>Close a tab, or reopen a closed tab
</h2><p>To close a tab, press <code>Ctrl+W</code>. Oops – if you accidentally close a tab, re-open it quickly with <code>Ctrl+Shift+T</code>. You can even press it multiple times to re-open several old tabs in the reverse order of closing them, basically <em>undo</em> for tab closing.</p>
<h2 id="bookmark-all-tabs"><a href="#bookmark-all-tabs" class="header-anchor"></a>Bookmark all tabs
</h2><p>What if you have too many tabs open and you need to close the browser window? In Chrome, there is a handy shortcut <code>Ctrl+Shift+D</code> that will bookmark all open tabs in a folder name you choose. Then you can safely close the window knowing that you will always find them in that specific folder in your bookmarks.</p>
<h2 id="keyboard-shortcut-summary"><a href="#keyboard-shortcut-summary" class="header-anchor"></a>Keyboard shortcut summary
</h2><table>
<thead>
<tr>
<th>Action</th>
<th>Shortcut</th>
</tr>
</thead>
<tbody>
<tr>
<td>Open a new tab</td>
<td>Ctrl+T</td>
</tr>
<tr>
<td>Close a tab</td>
<td>Ctrl+W</td>
</tr>
<tr>
<td>Undo closing a tab</td>
<td>Ctrl+Shift+T</td>
</tr>
<tr>
<td>Jump one tab to the right</td>
<td>Ctrl+Tab</td>
</tr>
<tr>
<td>Jump one tab to the left</td>
<td>Ctrl+Shift+Tab</td>
</tr>
<tr>
<td>Open first tab, open nth tab</td>
<td>Ctrl+1, Ctrl+2, …</td>
</tr>
<tr>
<td>Search tab by website title</td>
<td>Ctrl+Shift+A</td>
</tr>
<tr>
<td>Bookmark all open tabs (e.g. before closing window)</td>
<td>Ctrl+Shift+D</td>
</tr>
<tr>
<td>Open link in a new tab without leaving current web page</td>
<td>Ctrl+click</td>
</tr>
</tbody>
</table>
<h2 id="what-is-your-tip"><a href="#what-is-your-tip" class="header-anchor"></a>What is your tip?
</h2><p>Knowing how to use a web browser efficiently should be considered a basic life skill in modern society. The <strong>above keyboard shortcuts work in all browsers</strong> and are as universal as <a class="link" href="https://linuxnatives.net/2021/copy-paste-like-a-pro" target="_blank" rel="noopener"
>Ctrl+C and Ctrl+V</a>.</p>
<p>What is your additional browser productivity tip? Share it in a comment below.</p> Advanced Git commands every senior software developer needs to know https://optimizedbyotto.com/post/advanced-git-commands/Thu, 29 Feb 2024 00:00:00 +0000 https://optimizedbyotto.com/post/advanced-git-commands/ <img src="https://optimizedbyotto.com/post/advanced-git-commands/featured-image.jpg" alt="Featured image of post Advanced Git commands every senior software developer needs to know" /><p><a class="link" href="https://git-scm.com/" target="_blank" rel="noopener"
>Git</a> is by far the most popular software version control system today, and every software developer surely knows the basics of how to <a class="link" href="https://optimizedbyotto.com/post/good-git-commit/" >make a Git commit</a>. Given the popularity, it is surprising how many people don’t actually know the advanced commands. Mastering them might help you unlock a new level of productivity. Let’s dive in!</p>
<h2 id="avoid-excess-downloads-with-selective-and-shallow-git-clone"><a href="#avoid-excess-downloads-with-selective-and-shallow-git-clone" class="header-anchor"></a>Avoid excess downloads with selective and shallow Git clone
</h2><p>When working with large Git repositories, it is not always desirable to clone the full repository as it would take too long to download. Instead you can execute the clone for example like this:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">git clone --branch 11.5 --shallow-since=3m https://github.com/MariaDB/server.git mariadb-server</code><pre><code>git clone --branch 11.5 --shallow-since=3m https://github.com/MariaDB/server.git mariadb-server</code></pre></div>
<p>This will make a clone that <strong>only tracks branch <em>11.5</em> and no other branches</strong>. Additionally, this uses the shallow clone feature to <strong>fetch commit history only for the past 3 months</strong> instead of the entire history (which in this example would otherwise be 20+ years). You could also specify <code>3w</code> or <code>1y</code> to fetch three weeks or one year. After the initial clone, you can use <code>git remote set-branches --add origin 10.11</code> to start tracking an additional branch, which will be downloaded on <code>git fetch</code>.</p>
<p>If you already have a git repository, and all you want to do is <strong>fetch one single branch from a remote repository one-off</strong>, without adding it as a new remote, you can run:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">$ git fetch https://github.com/robinnewhouse/mariadb-server.git ninja-build-cracklib
From https://github.com/robinnewhouse/mariadb-server
* branch ninja-build-cracklib -> FETCH_HEAD
$ git merge FETCH_HEAD
Updating 112eb14f..c649d78a
Fast-forward
plugin/cracklib_password_check/CMakeLists.txt | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
$ git show
commit c649d78a8163413598b83f5717d3ef3ad9938960 (HEAD -> 11.5)
Author: Robin</code><pre><code>$ git fetch https://github.com/robinnewhouse/mariadb-server.git ninja-build-cracklib
From https://github.com/robinnewhouse/mariadb-server
* branch ninja-build-cracklib -> FETCH_HEAD
$ git merge FETCH_HEAD
Updating 112eb14f..c649d78a
Fast-forward
plugin/cracklib_password_check/CMakeLists.txt | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
$ git show
commit c649d78a8163413598b83f5717d3ef3ad9938960 (HEAD -> 11.5)
Author: Robin</code></pre></div>
<p>This is a very fast and small download, which will not persist as a remote. It creates a temporary Git reference called <code>FETCH_HEAD</code>, which you can then use to inspect the branch history by running <code>git show FETCH_HEAD</code>, or you can merge it, cherry-pick, or perform other operations.</p>
<p>If you want to download the bare minimum, you can even operate on individual commits as raw patch files. A typical example would be to <strong>download a GitHub Pull Request as a patch file and apply it locally</strong>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">$ curl -LO https://patch-diff.githubusercontent.com/raw/MariaDB/server/pull/3026.patch
$ git am 3026.patch
Applying: Fix ninja build for cracklib_password_check
$ git show
commit a9c44bc204735574f2724020842373b53864e131 (HEAD -> 11.5)
Author: Robin</code><pre><code>$ curl -LO https://patch-diff.githubusercontent.com/raw/MariaDB/server/pull/3026.patch
$ git am 3026.patch
Applying: Fix ninja build for cracklib_password_check
$ git show
commit a9c44bc204735574f2724020842373b53864e131 (HEAD -> 11.5)
Author: Robin</code></pre></div>
<p>The same works for GitLab Merge Requests as well – just add <code>.patch</code> at the end of the MR url. This will apply both the code change inside the patch, as well as honor the author field, and use the patch description as the commit subject line and message body. However, when running <code>git am</code>, the committer name, email, and date will be that of the user applying the patch, and thus the SHA-sum of the commit ID will not be identical.</p>
<p>The latest Git has a new experimental command <a class="link" href="https://git-scm.com/docs/git-sparse-checkout" target="_blank" rel="noopener"
>sparse-checkout</a> that allows one to checkout only a subset of files, but I won’t recommend it as this post is purely about best practices and tips I myself find frequently useful to know.</p>
<h2 id="inspecting-git-history-and-comparing-revisions"><a href="#inspecting-git-history-and-comparing-revisions" class="header-anchor"></a>Inspecting Git history and comparing revisions
</h2><p>The best command to view the history of a single file is:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">git log --oneline --follow path/to/filename.ext</code><pre><code>git log --oneline --follow path/to/filename.ext</code></pre></div>
<p>The extra <code>--follow</code> makes Git traverse the history longer to find if the same contents existed previously with a different file name, thus <strong>showing file contents across file renames</strong>. Using <code>--oneline</code> provides a nice short list of just the Git subject lines. To view the full Git commit messages as well as the actual changes, use this:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">git log --patch --follow path/to/filename.ext</code><pre><code>git log --patch --follow path/to/filename.ext</code></pre></div>
<p>If there is a specific change you are looking for, search it with <code>git log --patch -S <keyword></code>.</p>
<p>To view the project history in general, having this alias is handy:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">alias g-log="git log --graph --format='format:%C(yellow)%h%C(reset) %s %C(magenta)%cr%C(reset)%C(auto)%d%C(reset)'"</code><pre><code>alias g-log="git log --graph --format='format:%C(yellow)%h%C(reset) %s %C(magenta)%cr%C(reset)%C(auto)%d%C(reset)'"</code></pre></div>
<p><img src="https://optimizedbyotto.com/post/advanced-git-commands/git-log-branches.png"
width="1253"
height="769"
srcset="https://optimizedbyotto.com/post/advanced-git-commands/git-log-branches_hu12940301691173766031.png 480w, https://optimizedbyotto.com/post/advanced-git-commands/git-log-branches_hu16415094831403011112.png 1024w, https://optimizedbyotto.com/post/advanced-git-commands/git-log-branches.png 1253w"
loading="lazy"
alt="Custom git log format with all branches"
class="gallery-image"
data-flex-grow="162"
data-flex-basis="391px"
>
</p>
<p>The output shows all references, multiple branches in parallel and it is nicely colorized. If the project has a lot of messy merges, sticking to one branch may be more readable:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-6"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-6" style="display:none;">git log --oneline --no-merges --first-parent</code><pre><code>git log --oneline --no-merges --first-parent</code></pre></div>
<p><img src="https://optimizedbyotto.com/post/advanced-git-commands/git-log-parent-branch-only.png"
width="1252"
height="774"
srcset="https://optimizedbyotto.com/post/advanced-git-commands/git-log-parent-branch-only_hu15579114208679559908.png 480w, https://optimizedbyotto.com/post/advanced-git-commands/git-log-parent-branch-only_hu13676921408714466298.png 1024w, https://optimizedbyotto.com/post/advanced-git-commands/git-log-parent-branch-only.png 1252w"
loading="lazy"
alt="Custom git log format with parent branches only"
class="gallery-image"
data-flex-grow="161"
data-flex-basis="388px"
>
</p>
<p>However, an even better option is to use <code>gitk --all &</code>. This standard Git graphical user interface allows you to browse the history, search for changes with a specific string, jump to a specific commit to quickly inspect it and what preceded it, open a graphical Git blame in a new window, etc. The <code>--all</code> instructs <code>gitk</code> to show all branches and references, and the ampersand backgrounds the process so that your command-line prompt is freed to run other commands. If your workflow is based on working over SSH on a remote server, simply connect with <code>ssh -X remote.server.example</code> to have X11 forwarding enabled (only works on Linux). Then on the SSH command-line just run <code>gitk --all &</code> and a window should pop up.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-7"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-7" style="display:none;">laptop$ ssh -X remote-server.example.com
server$ echo $DISPLAY
:0 (X11 forwarding is enabled and xauth running)
server$ cd path/to/git/repo
server$ gitk --all &</code><pre><code>laptop$ ssh -X remote-server.example.com
server$ echo $DISPLAY
:0 (X11 forwarding is enabled and xauth running)
server$ cd path/to/git/repo
server$ gitk --all &</code></pre></div>
<p>A typical need is also to compare the files and changes across multiple commits or branches using Git diff. The nicer graphical option to it is to run <code>git difftool --dir-diff branch1..branch2</code> which will open the diff program of your choice. Personally I have opted to always use <a class="link" href="https://meldmerge.org/" target="_blank" rel="noopener"
>Meld</a> with <code>git config diff.tool meld</code>.</p>
<p><img src="https://optimizedbyotto.com/post/advanced-git-commands/git-difftool-meld.png"
width="1254"
height="778"
srcset="https://optimizedbyotto.com/post/advanced-git-commands/git-difftool-meld_hu11108748408415570093.png 480w, https://optimizedbyotto.com/post/advanced-git-commands/git-difftool-meld_hu6908882807784497843.png 1024w, https://optimizedbyotto.com/post/advanced-git-commands/git-difftool-meld.png 1254w"
loading="lazy"
alt="Demo of git difftool with Meld"
class="gallery-image"
data-flex-grow="161"
data-flex-basis="386px"
>
</p>
<h2 id="committing-rebasing-cherry-picking-and-merging"><a href="#committing-rebasing-cherry-picking-and-merging" class="header-anchor"></a>Committing, rebasing, cherry-picking and merging
</h2><p>When making a Git commit, doing it graphically with <code>git citool</code> helps to clearly see what changes have been made, and to select the files and even the exact lines to be committed with the click of a mouse. The tool also offers built-in spell-checking, and the text box is sized just right to visually enforce keeping line lengths within limits. Since development involves committing and amending commits all the time, I recommend having these aliases:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-8"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-8" style="display:none;">alias g-commit='git citool &'
alias g-amend='git citool --amend &'</code><pre><code>alias g-commit='git citool &'
alias g-amend='git citool --amend &'</code></pre></div>
<p>Personally, I practically never commit by simply running <code>git commit</code>. If I commit from the command line at all, it is usually due to the need to do something special, such as change the author with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-9"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-9" style="display:none;">git commit --amend --no-edit --author "Otto Kekäläinen <otto@debian.org>"</code><pre><code>git commit --amend --no-edit --author "Otto Kekäläinen <otto@debian.org>"</code></pre></div>
<p>Another case where a command-line commit fits my workflow well is during final testing before a code submission when I find a flaw on the branch I am working on. In these cases, I fix the code, and quickly issue:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-10"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-10" style="display:none;">git commit -a --fixup a1b2c3
git rebase -i --autosquash main</code><pre><code>git commit -a --fixup a1b2c3
git rebase -i --autosquash main</code></pre></div>
<p>This will commit the change, mark it as a fix for commit <code>a1b2c3</code>, and then open the interactive rebase view <strong>with the fixup commit automatically placed at the right location</strong>, resulting in a quick turnaround to make the branch flawless and ready for submission.</p>
<p>Occasionally a Git commit needs to be applied to multiple branches. For example, after making a bugfix with the id <code>a1b2c3</code> on the main branch, you might want to backport it to release branches 11.4 and 11.3 with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-11"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-11" style="display:none;">git cherry-pick -x a1b2c3</code><pre><code>git cherry-pick -x a1b2c3</code></pre></div>
<p>The extra -x will make Git amend the commit message with a reference to the commit id it originated from. In this case, it would state: <code>(cherry picked from commit a1b2c3)</code>. This helps people reading the commit messages later to track down when and where the commit was first made.</p>
<p>When doing merges, the most effective way to handle conflicts is by <strong>using Meld to graphically compare and resolve merges</strong>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-12"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-12" style="display:none;">$ git merge branch2
Auto-merging VERSION
CONFLICT (content): Merge conflict in SOMEFILE
Automatic merge failed; fix conflicts and then commit the result.
$ git mergetool
Merging:
SOMEFILE
Normal merge conflict for 'SOMEFILE':
{local}: modified file
{remote}: modified file
$ git commit -a
[branch1 e4952e06] Merge branch 'branch2' into branch1</code><pre><code>$ git merge branch2
Auto-merging VERSION
CONFLICT (content): Merge conflict in SOMEFILE
Automatic merge failed; fix conflicts and then commit the result.
$ git mergetool
Merging:
SOMEFILE
Normal merge conflict for 'SOMEFILE':
{local}: modified file
{remote}: modified file
$ git commit -a
[branch1 e4952e06] Merge branch 'branch2' into branch1</code></pre></div>
<p><img src="https://optimizedbyotto.com/post/advanced-git-commands/git-mergetool-with-meld-demo.gif"
width="1246"
height="765"
loading="lazy"
alt="Demo of git mergetool with Meld"
class="gallery-image"
data-flex-grow="162"
data-flex-basis="390px"
>
</p>
<p>One more thing to remember is that if a merge or rebase fails, remember to run <code>git merge --abort</code> or <code>git rebase --abort</code> to stop it and get back to the normal state. Another typical need is to discard all temporary changes and get back to a clean state ready to do new commits. For that I recommend this alias:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-13"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-13" style="display:none;">alias g-clean='git clean -fdx && git reset --hard && git submodule foreach --recursive git clean -fdx && git submodule foreach --recursive git reset --hard'</code><pre><code>alias g-clean='git clean -fdx && git reset --hard && git submodule foreach --recursive git clean -fdx && git submodule foreach --recursive git reset --hard'</code></pre></div>
<p>This will reset all modified files to their pristine state from the last commit, as well as delete all files that are not in version control but may be present in the project directory.</p>
<h2 id="managing-multiple-remotes-and-branches"><a href="#managing-multiple-remotes-and-branches" class="header-anchor"></a>Managing multiple remotes and branches
</h2><p>The most important tip for working with Git repositories is to remember at the start of every coding session to always run <code>git remote update</code>. This will fetch all remotes and make sure you have all the latest Git commits made since the last time you worked with the repository.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-14"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-14" style="display:none;">$ git remote update
Fetching origin
Fetching upstream
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), 445 bytes | 55.00 KiB/s, done.
From https://github.com/eradman/entr
e2a6ab7..6fa963e master -> upstream/master</code><pre><code>$ git remote update
Fetching origin
Fetching upstream
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), 445 bytes | 55.00 KiB/s, done.
From https://github.com/eradman/entr
e2a6ab7..6fa963e master -> upstream/master</code></pre></div>
<p>In the example above, you can see that there isn’t just the <code>origin</code>, but also a second remote called <code>upstream</code>. Most people use Git in a centralized model, meaning that there is one central main repository on e.g. GitHub or GitLab, and each developer in the project <code>pushes</code> and <code>pulls</code> that central repository. However, Git was designed from the start to be a distributed system that can sync with multiple remotes. To understand how to control this one needs to learn the concept of tracking branches and learn the options of the <code>git remote</code> command.</p>
<p>Consider this example that has two remotes, <em>origin</em> and <em>upstream</em>, and the <em>origin</em> remote has 3 push urls:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-15"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-15" style="display:none;">$ git remote -v
origin git@salsa.debian.org:debian/entr.git (fetch)
origin git@salsa.debian.org:debian/entr.git (push)
origin git@gitlab.com:ottok/entr.git (push)
origin git@github.com:ottok/entr.git (push)
upstream https://github.com/eradman/entr (fetch)
upstream https://github.com/eradman/entr (push)
$ cat .git/config
[remote "origin"]
url = git@salsa.debian.org:debian/entr.git
fetch = +refs/heads/*:refs/remotes/origin/*
pushurl = git@salsa.debian.org:debian/entr.git
pushurl = git@gitlab.com:ottok/entr.git
pushurl = git@github.com:ottok/entr.git
[remote "upstream"]
url = https://github.com/eradman/entr
fetch = +refs/heads/*:refs/remotes/upstream/*
[branch "debian/latest"]
remote = origin
merge = refs/heads/debian/latest
[branch "master"]
remote = upstream
merge = refs/heads/master</code><pre><code>$ git remote -v
origin git@salsa.debian.org:debian/entr.git (fetch)
origin git@salsa.debian.org:debian/entr.git (push)
origin git@gitlab.com:ottok/entr.git (push)
origin git@github.com:ottok/entr.git (push)
upstream https://github.com/eradman/entr (fetch)
upstream https://github.com/eradman/entr (push)
$ cat .git/config
[remote "origin"]
url = git@salsa.debian.org:debian/entr.git
fetch = +refs/heads/*:refs/remotes/origin/*
pushurl = git@salsa.debian.org:debian/entr.git
pushurl = git@gitlab.com:ottok/entr.git
pushurl = git@github.com:ottok/entr.git
[remote "upstream"]
url = https://github.com/eradman/entr
fetch = +refs/heads/*:refs/remotes/upstream/*
[branch "debian/latest"]
remote = origin
merge = refs/heads/debian/latest
[branch "master"]
remote = upstream
merge = refs/heads/master</code></pre></div>
<p>In this repository, the branch <code>master</code> is configured to track the remote <code>upstream</code>. Thus, if I am in the branch <code>master</code> and run <code>git pull</code> it will fetch <code>master</code> from the upstream repository. I can then checkout the <code>debian/latest</code> branch, merge on <code>upstream</code> and do other changes. Eventually, when I am done and issue <code>git push</code>, the changes on branch <code>debian/latest</code> will go to remote <code>origin</code> automatically. The <code>origin</code> has 3 <code>pushurls</code>, which means that the updated debian/latest will end up on both the Debian server as well as GitHub and GitLab.</p>
<p>The commands to set this up were:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-16"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-16" style="display:none;">git clone git@salsa.debian.org:debian/entr.git
cd entr
git remote set-url --add --push origin git@salsa.debian.org:otto/entr.git
git remote set-url --add --push origin git@gitlab.com:ottok/entr.git
git remote set-url --add --push origin git@github.com:ottok/entr.git
git remote add upstream https://github.com/eradman/entr</code><pre><code>git clone git@salsa.debian.org:debian/entr.git
cd entr
git remote set-url --add --push origin git@salsa.debian.org:otto/entr.git
git remote set-url --add --push origin git@gitlab.com:ottok/entr.git
git remote set-url --add --push origin git@github.com:ottok/entr.git
git remote add upstream https://github.com/eradman/entr</code></pre></div>
<h2 id="keeping-repositories-nice-and-tidy"><a href="#keeping-repositories-nice-and-tidy" class="header-anchor"></a>Keeping repositories nice and tidy
</h2><p>As most developers use feature and bug branches to make changes and submit them for review, a lot of old and unnecessary branches will start to pollute the Git history over time. Therefore it is good to check from time to time what branches have been merged with <code>git branch --merged</code> and delete them.</p>
<p>If a branch is deleted remotely as a result of somebody else doing cleanup, you can make Git automatically delete those branches for you locally as well with <code>git config --local fetch.prune true</code>. You can run this one-off as well with <code>git fetch --prune --verbose --dry-run</code>.</p>
<p>When working with multiple remotes, it might at times be hard to reason what will happen on a Git pull or Git push command. To see what tags and branches are updated and how without actually updating them run:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-17"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-17" style="display:none;">git fetch --verbose --dry-run
git push --verbose --dry-run
git push --tags --verbose --dry-run</code><pre><code>git fetch --verbose --dry-run
git push --verbose --dry-run
git push --tags --verbose --dry-run</code></pre></div>
<p>Using the <code>--dry-run</code> option is particularly important when running <code>push</code> or <code>pull</code> with <code>--prune</code> or <code>--prune-tags</code> to see which branches or tags would be deleted locally or on the remote.</p>
<p>Another maintenance task to occasionally spend time on is to run this command to make Git delete all unreachable objects and to pack the ones that should be kept forever:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-18"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-18" style="display:none;">git prune --verbose --progress; git repack -ad; git gc --aggressive; git prune-packed</code><pre><code>git prune --verbose --progress; git repack -ad; git gc --aggressive; git prune-packed</code></pre></div>
<p>To do this for every Git repository on your computer, you can run:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-19"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-19" style="display:none;">find ~ -name .git -type d | while read D
do
echo "===> $D: "
(cd "$D"; git prune --verbose --progress; nice -n 15 git repack -ad; nice -n 15 git gc --aggressive; git prune-packed)
done</code><pre><code>find ~ -name .git -type d | while read D
do
echo "===> $D: "
(cd "$D"; git prune --verbose --progress; nice -n 15 git repack -ad; nice -n 15 git gc --aggressive; git prune-packed)
done</code></pre></div>
<h2 id="better-git-experience-with-liquip-prompt-and-fzf"><a href="#better-git-experience-with-liquip-prompt-and-fzf" class="header-anchor"></a>Better Git experience with Liquip Prompt and fzf
</h2><p>It is not practical to constantly run <code>git status</code> (or <code>git status --ignored</code>) or to press <code>F5</code> in a <code>gitk</code> window to be aware of the Git repository status. A much handier solution is to have the Git status integrated in the command-line prompt. My favorite is <a class="link" href="https://linuxnatives.net/2020/liquid-prompt" target="_blank" rel="noopener"
>Liquid Prompt</a>, which shows the branch name, and displays green if everything is committed and clean, red if there are uncommitted changes, and yellow if changes are not pushed.</p>
<p>Another additional tool I recommend is the <a class="link" href="https://linuxnatives.net/2021/save-time-command-line-fuzzy-finder" target="_blank" rel="noopener"
>Fuzzy Finder fzf</a>. It has many uses in the command-line environment, and for Git this alias is handy for changing branches:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-20"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-20" style="display:none;">alias g-checkout="git checkout "$(git branch --sort=-committerdate --no-merged | fzf)""</code><pre><code>alias g-checkout="git checkout "$(git branch --sort=-committerdate --no-merged | fzf)""</code></pre></div>
<p>This will list all local branches with the recent ones topmost, and present the list in an interactive form using fzf so you can select the branch either using arrow keys, or typing a part of the branch name.</p>
<p><img src="https://optimizedbyotto.com/post/advanced-git-commands/liquid-prompt-fzf-git-demo.gif"
width="1246"
height="765"
loading="lazy"
alt="Demo of Liquid Prompt and git branch selection with Fuzzy Finder (fzf)"
class="gallery-image"
data-flex-grow="162"
data-flex-basis="390px"
>
</p>
<h2 id="bash-aliases"><a href="#bash-aliases" class="header-anchor"></a>Bash aliases
</h2><p>While Git has its own alias system, I prefer to have everything in plain <a class="link" href="https://en.wikipedia.org/wiki/Bash_%28Unix_shell%29" target="_blank" rel="noopener"
>Bash aliases</a> defined in my <code>.bashrc</code>. Many of these are explained in this post, but there are a couple extra as well. I leave it up to the reader to study the <a class="link" href="https://manpages.debian.org/unstable/git-man/git-push.1.en.html" target="_blank" rel="noopener"
>Git man page</a> to learn for example what <code>git push --force-with-lease</code> does.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-21"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-21" style="display:none;">alias g-log="git log --graph --format='format:%C(yellow)%h%C(reset) %s %C(magenta)%cr%C(reset)%C(auto)%d%C(reset)'"
alias g-history='gitk --all &'
alias g-checkout='git checkout $(git branch --sort=-committerdate --no-merged | fzf)'
alias g-commit='git citool &'
alias g-amend='git citool --amend &'
alias g-fixup='git commit -a --fixup'
alias g-rebase='git rebase --interactive --autosquash'
alias g-pull='git pull --verbose --rebase'
alias g-pushf='git push --verbose --force-with-lease'
alias g-status='git status --ignored'
alias g-clean='git clean -fdx && git reset --hard && git submodule foreach --recursive git clean -fdx && git submodule foreach --recursive git reset --hard'</code><pre><code>alias g-log="git log --graph --format='format:%C(yellow)%h%C(reset) %s %C(magenta)%cr%C(reset)%C(auto)%d%C(reset)'"
alias g-history='gitk --all &'
alias g-checkout='git checkout $(git branch --sort=-committerdate --no-merged | fzf)'
alias g-commit='git citool &'
alias g-amend='git citool --amend &'
alias g-fixup='git commit -a --fixup'
alias g-rebase='git rebase --interactive --autosquash'
alias g-pull='git pull --verbose --rebase'
alias g-pushf='git push --verbose --force-with-lease'
alias g-status='git status --ignored'
alias g-clean='git clean -fdx && git reset --hard && git submodule foreach --recursive git clean -fdx && git submodule foreach --recursive git reset --hard'</code></pre></div>
<h2 id="keep-on-learning"><a href="#keep-on-learning" class="header-anchor"></a>Keep on learning
</h2><p>As a programmer, it is not enough to know programming languages and how to write code well. You also need to understand the software lifecycle and change management. Understanding Git deeply helps you better prepare for situations where potentially hundreds of people collaborate on the same code base for years and years.</p>
<p>To learn more about Git concepts, I recommend reading the entire <a class="link" href="https://git-scm.com/book/en/v2" target="_blank" rel="noopener"
>Pro Git book</a>. The original version is over a decade old, but the online version keeps getting regular updates by people contributing to it in the open source spirit. As an example, I <a class="link" href="https://github.com/progit/progit2/pull/1850" target="_blank" rel="noopener"
>wrote</a> a new section last year about <a class="link" href="https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work#_everyone_must_sign" target="_blank" rel="noopener"
>automatically signing Git commits</a>. Skimming through the <a class="link" href="https://git-scm.com/docs" target="_blank" rel="noopener"
>Git reference documentation</a> (online version of <a class="link" href="https://en.wikipedia.org/wiki/Man_page" target="_blank" rel="noopener"
>man pages</a>) is also a great way to become aware of what capabilities Git offers.</p>
<p>What is your favorite command-line Git trick or favorite tool? Comment below.</p> Learn to write better Git commit messages by example https://optimizedbyotto.com/post/git-commit-message-examples/Sun, 18 Feb 2024 00:00:00 +0000 https://optimizedbyotto.com/post/git-commit-message-examples/ <img src="https://optimizedbyotto.com/post/git-commit-message-examples/featured-image.jpg" alt="Featured image of post Learn to write better Git commit messages by example" /><p>When people learn programming they – for completely obvious and natural reasons – initially focus on learning the syntax of programming languages and libraries. However, these are just tools. The essence of software engineering is about automating thought, applying algorithmic thinking, and anticipating the known and unknown. The code might be succinct, but the reasoning behind it can be extensive, and it needs to show in the communication around the code. <strong>The more senior a programmer is, the more their success depends on their communication skills.</strong></p>
<h2 id="communication-is-important--even-in-programming"><a href="#communication-is-important--even-in-programming" class="header-anchor"></a>Communication is important – even in programming
</h2><p>One could even claim that software development teams <strong>thrive or fall based on how quick and efficient the feedback cycle</strong> about the code is and how well the team shares information while researching and solving problems.</p>
<p>At the core of code-related communication is <strong>Git commit messages</strong>. When a team member shares a new code change for others to review, the <strong>speed and accuracy of the reviewers</strong> depends heavily on how well the <strong>intent of the change</strong> was described and motivated.</p>
<p>In addition to reviews, a great <strong>Git commit also has permanent utility</strong> as part of the code base. If it later turns out the commit had a bug, whoever is trying to fix it will have a much easier time reading in the commit what the change was supposed to do, and consequently understanding where it fell short, and will thus be able to rewrite the same change in the correct way. This leads to <strong>bugs being fixed much more quickly and with less effort</strong> – and most often the person doing the fix is a <em>future you</em> who no longer remembers what the <em>present you</em> were thinking while making that commit, and the future you just have to stare at the commit message and contents until it makes sense.</p>
<h2 id="common-mistakes"><a href="#common-mistakes" class="header-anchor"></a>Common mistakes
</h2><p>If you haven’t already, first read <a class="link" href="https://optimizedbyotto.com/post/good-git-commit/" >How to make a good git commit</a>. In addition to knowing what a good end result looks like, it might be useful to learn the <strong>typical mistakes</strong> developers make and to know explicitly <strong>what to not do</strong>.</p>
<p>Repeating and extending the recommendations from the <a class="link" href="https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines" target="_blank" rel="noopener"
>Git Pro book</a> (authored by, among others, GitHub co-founder Scott Chacon):</p>
<ul>
<li>Never exceed 70 characters in the Git title, and preferably keep it under 50</li>
<li>Use imperative format, not past tense: instead of “Fixed” or “Added”, write “Fix” or “Add”</li>
<li>Write at least one sentence in the Git message body</li>
<li>Separate the message body by one empty line from the subject line</li>
<li>A title is like an e-mail subject line – no dot at the end</li>
<li>The body should use full sentences that end with a dot</li>
<li>Wrap the message body around 72 characters, lines should not overly long</li>
<li>Don’t use diary-like language to explain what you did, but rather what the change does: if the description starts with “In this commit I..” or “I checked..”, there surely is a simpler way to express it clearly and universally</li>
<li>If you have multiple commit messages that have exactly the same title, you are surely also doing something wrong, as the change itself for sure isn’t identical</li>
<li>Writing “Update <em>filename</em>” is never a good description of the change, as it just states the obvious – instead, describe the <strong>intent</strong> of what the change tries to achieve</li>
<li>Don’t use AI to write your commit messages – AI can only see what changed in the files, it cannot possibly know <strong>why you made the change</strong>, which is exactly the essence of the Git commit message</li>
</ul>
<h2 id="example-1"><a href="#example-1" class="header-anchor"></a>Example 1
</h2><p>In this example the improved version has a more descriptive title that captures both what the change was, as well as why it was made. The Git commit message is restructured to explain the same thing with less repetition.</p>
<div class="codeblock error">
<header>
<span class="codeblock-lang">Initial</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">Fix security vulnerabilities found by FlawFinder
Fixing security issues found by FlawFinder. Project code base
contains a number of old-style unsafe C function usage. In this commit we are
replacing string functions: `strcpy()` `strcat()` and `sprint()` with the safe
new and/or custom functions such as `snprintf()` `safe_strcpy()` and
`safe_strcat()`
The FlawFinder log before changes:
$ cat flawfinder-all-vulnerabilities.html | grep "Hits ="
Hits = 14955
After the change:
$ cat flawfinder-all-vulnerabilities.html | grep "Hits ="
Hits = 14668
The number of fixes - 287</code><pre><code>Fix security vulnerabilities found by FlawFinder
Fixing security issues found by FlawFinder. Project code base
contains a number of old-style unsafe C function usage. In this commit we are
replacing string functions: `strcpy()` `strcat()` and `sprint()` with the safe
new and/or custom functions such as `snprintf()` `safe_strcpy()` and
`safe_strcat()`
The FlawFinder log before changes:
$ cat flawfinder-all-vulnerabilities.html | grep "Hits ="
Hits = 14955
After the change:
$ cat flawfinder-all-vulnerabilities.html | grep "Hits ="
Hits = 14668
The number of fixes - 287</code></pre></div>
<div class="codeblock success">
<header>
<span class="codeblock-lang">Improved</span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">Fix insecure use of strcpy, strcat and sprintf in Connect
Old style C functions `strcpy()`, `strcat()` and `sprintf()` are vulnerable
to security issues due to lacking memory boundary checks. Replace these in
the Connect storage engine with safe new and/or custom functions such as
`snprintf()` `safe_strcpy()` and `safe_strcat()`.
With this change, FlawFinder static security analyzer reports 287 fewer
findings.</code><pre><code>Fix insecure use of strcpy, strcat and sprintf in Connect
Old style C functions `strcpy()`, `strcat()` and `sprintf()` are vulnerable
to security issues due to lacking memory boundary checks. Replace these in
the Connect storage engine with safe new and/or custom functions such as
`snprintf()` `safe_strcpy()` and `safe_strcat()`.
With this change, FlawFinder static security analyzer reports 287 fewer
findings.</code></pre></div>
<h2 id="example-2"><a href="#example-2" class="header-anchor"></a>Example 2
</h2><p>In this example the title was changed to use imperative format, and to more precisely tell what was changed in order to distinguish the commit from other similar ones that fix <code>cppcheck</code> failures. The message body explains what the change does instead of what “we” did, and shows the error message verbatim so anybody searching for the error message will find this text.</p>
<div class="codeblock error">
<header>
<span class="codeblock-lang">Initial</span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">Fixing cppcheck failure
We have an error while running CI in gitlab "There is an unknown macro here
somewhere. Configuration is required. If DBUG_EXECUTE_IF is a macro then please
configure it." Add a workaround - change problematic string with false alarm
before cppcheck run then revert it back.</code><pre><code>Fixing cppcheck failure
We have an error while running CI in gitlab "There is an unknown macro here
somewhere. Configuration is required. If DBUG_EXECUTE_IF is a macro then please
configure it." Add a workaround - change problematic string with false alarm
before cppcheck run then revert it back.</code></pre></div>
<div class="codeblock success">
<header>
<span class="codeblock-lang">Improved</span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">Add certain DBUG_EXECUTE_IF cases to cppcheck allowlist
Cppcheck failed on error:
There is an unknown macro here somewhere. Configuration is required.
If DBUG_EXECUTE_IF is a macro then please configure it.
This is a false positive and safe to ignore. Extend filtering to exclude it
from cppcheck results.</code><pre><code>Add certain DBUG_EXECUTE_IF cases to cppcheck allowlist
Cppcheck failed on error:
There is an unknown macro here somewhere. Configuration is required.
If DBUG_EXECUTE_IF is a macro then please configure it.
This is a false positive and safe to ignore. Extend filtering to exclude it
from cppcheck results.</code></pre></div>
<h2 id="example-3"><a href="#example-3" class="header-anchor"></a>Example 3
</h2><p>Here again the title was made more specific about which fix this is about exactly to distinguish it from other similar fixes, and since both the error message was known and the previous commit that caused it was identified, they are included in the Git message to clearly justify the change, as well as make debugging similar things much easier in the future.</p>
<div class="codeblock error">
<header>
<span class="codeblock-lang">Initial</span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">Releaser fix
releaser failed due to missing manifest path. Adding return statement to
the function</code><pre><code>Releaser fix
releaser failed due to missing manifest path. Adding return statement to
the function</code></pre></div>
<div class="codeblock success">
<header>
<span class="codeblock-lang">Improved</span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">Add missing return to find_manifest_file()
The refactor in f9f6d299 split load_manifest() into two functions by simply
copy-pasting the lines. This omitted that the new function needs to have a
`return` added, otherwise it might return `None`.
This fixes the releaser failure about:
line 214, in get_engine_name_from_manifest_file
with open(manifest_filename, "r") as manifest:
TypeError: expected str, bytes or os.PathLike object, not NoneType</code><pre><code>Add missing return to find_manifest_file()
The refactor in f9f6d299 split load_manifest() into two functions by simply
copy-pasting the lines. This omitted that the new function needs to have a
`return` added, otherwise it might return `None`.
This fixes the releaser failure about:
line 214, in get_engine_name_from_manifest_file
with open(manifest_filename, "r") as manifest:
TypeError: expected str, bytes or os.PathLike object, not NoneType</code></pre></div>
<h2 id="example-4"><a href="#example-4" class="header-anchor"></a>Example 4
</h2><p>This example shows making the title more specific by spelling out what component exactly is extended and with which variables. In the description use imperative ‘Add’ instead of ‘Adding’, and restructure the text to clearly say what is being done, followed by <em>why</em> it is useful, and include explanation about backwards compatibility to further champion that the change is safe to do. Also fix line breaks and add space between paragraphs.</p>
<div class="codeblock error">
<header>
<span class="codeblock-lang">Initial</span>
<button
class="codeblock-copy"
data-id="codeblock-id-6"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-6" style="display:none;">Add TLS version to auth plugin available variables
The authentication audit plugins currently do not have access to the TLS
version used. Adding this variable to list of available variables for
audit plugin.
Logging the TLS version can be useful for traceability and to
help identify suspicious or malformed connections attempting to use
unsupported TLS versions.
This can be used to detect and block malicious connection attempts.</code><pre><code>Add TLS version to auth plugin available variables
The authentication audit plugins currently do not have access to the TLS
version used. Adding this variable to list of available variables for
audit plugin.
Logging the TLS version can be useful for traceability and to
help identify suspicious or malformed connections attempting to use
unsupported TLS versions.
This can be used to detect and block malicious connection attempts.</code></pre></div>
<div class="codeblock success">
<header>
<span class="codeblock-lang">Improved</span>
<button
class="codeblock-copy"
data-id="codeblock-id-7"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-7" style="display:none;">Extend audit plugin to include tls_version and tls_version_length variables
Add tls_version and tls_version_length variables to the audit plugin so
they can be logged. This is useful to help identify suspicious or malformed
connections attempting to use unsupported TLS versions. A log with this
information will allow to detect and block more malicious connection attempts.
Users with 'server_audit_events' empty will have these two new variables
automatically visible in their logs, but if users don't want them, they can
always configure what fields to include by listing the fields in
'server_audit_events'.</code><pre><code>Extend audit plugin to include tls_version and tls_version_length variables
Add tls_version and tls_version_length variables to the audit plugin so
they can be logged. This is useful to help identify suspicious or malformed
connections attempting to use unsupported TLS versions. A log with this
information will allow to detect and block more malicious connection attempts.
Users with 'server_audit_events' empty will have these two new variables
automatically visible in their logs, but if users don't want them, they can
always configure what fields to include by listing the fields in
'server_audit_events'.</code></pre></div>
<h2 id="example-5"><a href="#example-5" class="header-anchor"></a>Example 5
</h2><p>In this example the title can be simplified to summarize the change. The change was initially tested, but later came permanent. There was no change in the contents of the change however, and thus in this case it was better to not describe the lifecycle of the commit in the Git commit message title, but instead keep that information elsewhere among the developers, or perhaps inferred from the fact that the commit was initially on a development branch and only later applied on mainline.</p>
<p>Also avoid writing in “I have ..”, and instead use imperative format that describes what the change is and most importantly extend it to explain <em>why</em> the change was made. Who renamed what file or changed what line of code is always visible in the Git commit anyway. The focus of the Git commit should be on communicating the intent of the change, as <em>why</em> something was changed isn’t always obvious, yet incredibly important in order to assess if the change is correct or not.</p>
<div class="codeblock error">
<header>
<span class="codeblock-lang">Initial</span>
<button
class="codeblock-copy"
data-id="codeblock-id-8"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-8" style="display:none;">Switch to `new-archives` branch to test the new archives layout
I have renamed `archives.html` to `.archives.html` to disable overriding.
I have modified archives.md to output JSON.</code><pre><code>Switch to `new-archives` branch to test the new archives layout
I have renamed `archives.html` to `.archives.html` to disable overriding.
I have modified archives.md to output JSON.</code></pre></div>
<div class="codeblock success">
<header>
<span class="codeblock-lang">Improved</span>
<button
class="codeblock-copy"
data-id="codeblock-id-9"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-9" style="display:none;">Use new layout for archives page
Update theme to version with new archives page and disable the old archives
page. Ensure new archive page is also published as JSON so that
the interactive search can use the JSON file as backend.</code><pre><code>Use new layout for archives page
Update theme to version with new archives page and disable the old archives
page. Ensure new archive page is also published as JSON so that
the interactive search can use the JSON file as backend.</code></pre></div>
<h2 id="easiest-way-to-update-a-commit-message-git-citool---amend"><a href="#easiest-way-to-update-a-commit-message-git-citool---amend" class="header-anchor"></a>Easiest way to update a commit message: <code>git citool --amend</code>
</h2><p><img src="https://optimizedbyotto.com/post/good-git-commit/git-citool-example.png"
loading="lazy"
alt="screenshot of git citool"
>
</p>
<p>Writing a good Git commit message while preparing a code submission is much easier if you follow this process:</p>
<ol>
<li>Start writing code changes</li>
<li>Save intermediate changes with <code>git commit -am WIP</code></li>
<li>Test, polish, iterate</li>
<li>Run <code>git citool --amend</code> to polish the Git commit message when the code change is final and you can see the whole change and thus are able to easily explain what you did and why</li>
<li>Rebase on latest main branch, e.g. <code>git fetch origin main; git rebase -i origin/main</code></li>
<li>Push to code review</li>
</ol>
<p>If you find yourself doing frequent rebases and amends, congratulations! It means that you have mastered the craft of preparing great code submissions.</p>
<h2 id="good-git-commit-messages-help-you-avoid-duplicate-effort"><a href="#good-git-commit-messages-help-you-avoid-duplicate-effort" class="header-anchor"></a>Good git commit messages help you avoid duplicate effort
</h2><p>One extra benefit of having a great commit message is that you don’t have to rewrite anything when submitting the code for review. Every single code review system I have ever used will <strong>automatically use the git commit title and message as the review title and message</strong> (at least if the review is a single commit review).</p>
<p><img src="https://optimizedbyotto.com/post/git-commit-message-examples/git-commit-message-automatic-gitlab-merge-request-description.gif"
width="1254"
height="732"
loading="lazy"
alt="Screencast of git commit message automatically being reused as the merge request description on GitLab"
class="gallery-image"
data-flex-grow="171"
data-flex-basis="411px"
>
</p>
<h2 id="why-you-should-always-polish-the-commit-message-even-if-the-commit-does-not-feel-important"><a href="#why-you-should-always-polish-the-commit-message-even-if-the-commit-does-not-feel-important" class="header-anchor"></a>Why you should always polish the commit message, even if the commit does not feel important
</h2><p>If you are proud of your work and like doing things well, you will follow these guidelines by nature. However, some lazy ass might say that while they agree with the principles, they <em>don’t have time to follow them</em>. To that I always respond that <strong>humans tend to get really good at the things we practice</strong>. If you do the wrong thing over and over, you become an expert at doing things incorrectly. Is that really what you want?</p>
<p>Doing things correctly from the outset will steer you away from situations where you are knowingly doing lousy quality just to “save time”. Practicing doing something well will ultimately lead you to become <strong>a person who does that thing well, <em>effortlessly</em></strong>.</p>
<p>Have you run into situations where you find it challenging to write a good git commit title and message? Share your example in the comments below and I will try to help you formulate the text and capture the essence of the change in a concise title and description.</p> When everyone else is wrong https://optimizedbyotto.com/post/when-everyone-else-is-wrong/Fri, 19 Jan 2024 00:00:00 +0000 https://optimizedbyotto.com/post/when-everyone-else-is-wrong/ <img src="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/fed-balance-sp500-2022-2024.png" alt="Featured image of post When everyone else is wrong" /><p>The stock market is a powerful globally distributed forecasting system. Last Friday it was forecasting a rosy future as the <a class="link" href="https://www.investing.com/indices/msci-world" target="_blank" rel="noopener"
>MSCI World index</a> got to a new all-time high. But it does not make sense.</p>
<p>There is no single entity controlling stock prices. They represent the collective best guess of the whole economic world on what the value of each company should be. <strong>Historically the market has been pretty accurate.</strong> For example, when the news broke about COVID-19 spreading, stocks plummeted in anticipation of a global crisis. And indeed, a pandemic followed that forced a large part of the economy to a standstill, and major government intervention was necessary to avoid total mayhem. As another example, in the first month of the Russian invasion of Ukraine in February-March 2022, the stock price of Cheniere Energy went up almost 50%, as the market predicted that this American liquified natural gas producer would hugely benefit from Russian competitors being blockaded. Again, the market was right – the by the end of 2022, Cheniere’s profits had doubled.</p>
<p>The MSCI World index has grown about 30% since the low point in the fall of 2022. The U.S. focused S&P index is even more extreme, having grown almost 40% since the low point in fall of 2022, ending last Friday at 4840, slightly above the previous closing high record 4797 set on Jan 3, 2022. Investors participating in the market are collectively forecast that the U.S. economy has recovered and things will be fine. <strong>There is just one problem.</strong> Everyone is wrong.</p>
<h2 id="the-us-market-is-looking-way-_too_-good"><a href="#the-us-market-is-looking-way-_too_-good" class="header-anchor"></a>The U.S. market is looking way <em>too</em> good
</h2><p>The S&P will soon probably surpass 5000 points, but this party can’t last for long. <a class="link" href="https://www.reuters.com/markets/us/sp-500s-wild-ride-an-all-time-high-2024-01-19/" target="_blank" rel="noopener"
>Reuters has a great summary</a> of economic events overlaid on the S&P 500 in past two years:</p>
<p><img src="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/thomsonreuters-snp-500-composite-2022-2023-2024.png"
width="915"
height="699"
srcset="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/thomsonreuters-snp-500-composite-2022-2023-2024_hu4732180407981972937.png 480w, https://optimizedbyotto.com/post/when-everyone-else-is-wrong/thomsonreuters-snp-500-composite-2022-2023-2024.png 915w"
loading="lazy"
alt="Reuters graph on S&P 500 development in 2022–2024"
class="gallery-image"
data-flex-grow="130"
data-flex-basis="314px"
>
</p>
<p><strong>The situation seems very contradictory.</strong> Increased inflation should affect the purchasing power of consumers and companies negatively. While salary increases help consumers survive, and higher sales prices help companies offset increased costs, undeniably the overall effect is still negative. The <a class="link" href="https://fred.stlouisfed.org/series/USACPALTT01CTGYM" target="_blank" rel="noopener"
>U.S Consumer Price Index</a> growth has slowed down a bit since the peak in 2022, but prices are still growing faster than in all of 2010’s. Due to inflation, the U.S. Federal Reserve increased in July 2023 its <a class="link" href="https://fred.stlouisfed.org/series/DFF" target="_blank" rel="noopener"
>interest rate to 5.5%</a>, and still keeps it there. High interest rates discourages companies from taking loans and making investments, so for the time being the economy should slow down, not accelerate.</p>
<h2 id="is-the-money-printer-is-on-again"><a href="#is-the-money-printer-is-on-again" class="header-anchor"></a>Is the money printer is on again?
</h2><p>Following the insolvency of the Silicon Valley Bank on March 10th, 2023, the U.S. Federal Reserve announced it will guarantee the 200+ billion dollars the bank had lost. Soon it repeated it to a few more banks. Essentially the Fed printed 300+ billion dollars in March 2023. When you overlay the <a class="link" href="https://fred.stlouisfed.org/series/SP500" target="_blank" rel="noopener"
>S&P 500</a> and the <a class="link" href="https://fred.stlouisfed.org/series/TOTRESNS" target="_blank" rel="noopener"
>U.S Federal Reserve balance sheet</a> the correlation seems pretty strong:</p>
<p><img src="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/fed-balance-sp500-2022-2024.png"
width="958"
height="450"
srcset="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/fed-balance-sp500-2022-2024_hu6982137695610780262.png 480w, https://optimizedbyotto.com/post/when-everyone-else-is-wrong/fed-balance-sp500-2022-2024.png 958w"
loading="lazy"
alt="The S&P 500 index and the U.S. Federal Reserve balance in 2022–2024"
class="gallery-image"
data-flex-grow="212"
data-flex-basis="510px"
>
</p>
<p>The upward trend in November and December might be related to U.S. lawmakers passing so-called “appropriation bills” to allow the U.S. government to continue overspending. Similar bills and resolutions are <a class="link" href="https://www.pgpf.org/blog/2024/01/continuing-resolutions-were-designed-to-be-stopgap-measures-but-now-we-average-five-a-year" target="_blank" rel="noopener"
>likely to repeat on March 1st and 8th, 2024</a>. All of this makes me think the stock prices are out of touch with the underlying companies growth, and mostly just a function of how much money is being pumped into the U.S. economy by the government and the Fed.</p>
<p>This is not exactly anything new. Taking on <a class="link" href="https://fred.stlouisfed.org/series/GFDEBTN" target="_blank" rel="noopener"
>more national debt</a> seems to be the U.S. policy, no matter which president or party is on power:</p>
<p><img src="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/us-gov-debt-2003-2023.png"
width="958"
height="450"
srcset="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/us-gov-debt-2003-2023_hu9438979962476967897.png 480w, https://optimizedbyotto.com/post/when-everyone-else-is-wrong/us-gov-debt-2003-2023.png 958w"
loading="lazy"
alt="The U.S. government public debt growth in 2003–2023"
class="gallery-image"
data-flex-grow="212"
data-flex-basis="510px"
>
</p>
<p>What is new is the scale it has reached now. The sum is rapidly approaching <a class="link" href="https://fiscaldata.treasury.gov/americas-finance-guide/national-debt/" target="_blank" rel="noopener"
>34 trillion</a> U.S. dollars. That is 34 000 000 000 000 (12 zeros!). <strong>The sum is insanely large.</strong> Think about the most expensive single item one could buy: a USS Gerald R. Ford class nuclear powered aircraft carrier has a price tag of 12 000 000 000 (billion) dollars. The U.S. has 11 aircraft carriers and all other countries combined have 6–9, so in total under 20 in the world. With 34 trillion one could buy 2833 aircraft carriers á 12 billion.</p>
<p>The <a class="link" href="https://www.cbo.gov/publication/59946" target="_blank" rel="noopener"
>U.S. budget projections for 2024-2034</a> is a grim read. The deficit in 2024 is 1.6 trillion USD, and is forecasted to fluctuate between 1.6 and 2.6 trillion for the next 10 years. The interest payment for the national debt is 870 billion in 2024, and 951 billion in 2025 – the first year when U.S. debt interest payments exceeds the U.S. national defence budget!</p>
<h2 id="is-this-sustainable"><a href="#is-this-sustainable" class="header-anchor"></a>Is this sustainable?
</h2><p>In 2024 the U.S. government is going to issue 1.6 trillion USD worth of completely new bonds. In addition, the U.S. government also needs to issue several trillion worth of new bonds in order to pay old lenders for bonds that mature in 2024. Somebody needs to buy this 5–10 trillion of U.S. bonds, and it is very unlikely the normal money markets would be able to absorb this supply. Thus, the Federal reserve bank system will need to step in and “print money” to buy up the bonds that don’t sell on the free market.</p>
<p>The USA is not alone in having a lot of debt. The national debt in Greece is around 150% of their GDP, and Japan has been at over 200% for a decade. However, the U.S. has a huge GDP, so having 120% debt-to-GDP ratio is exceptional in absolute numbers. Consider that <strong>when comparing national debt per capita, Japan and the U.S. both hover around 100k USD</strong> (depending on <a class="link" href="https://countryeconomy.com/national-debt?anio=2022" target="_blank" rel="noopener"
>source</a>), so actually the U.S. situation is already very worrying.</p>
<p>What is extra concerning is that for the past decades the <a class="link" href="https://fred.stlouisfed.org/series/GDP" target="_blank" rel="noopener"
>GDP growth in the U.S.</a> seems to correlate with their <a class="link" href="https://fred.stlouisfed.org/series/GFDEBTN" target="_blank" rel="noopener"
>government overspending and taking on debt</a>:</p>
<p><img src="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/us-gov-debt-gdp-1980-2023.png"
width="958"
height="450"
srcset="https://optimizedbyotto.com/post/when-everyone-else-is-wrong/us-gov-debt-gdp-1980-2023_hu1344056553667587972.png 480w, https://optimizedbyotto.com/post/when-everyone-else-is-wrong/us-gov-debt-gdp-1980-2023.png 958w"
loading="lazy"
alt="The U.S. government public debt and gross national product growth correlation in 1980–2023"
class="gallery-image"
data-flex-grow="212"
data-flex-basis="510px"
>
</p>
<p>This begs the question: how much of the economic growth in the U.S. has actually been based on printing money to begin with? How much has there been improvement in real productivity? Due to how GDP is accounted, public debt automatically increases it, so some correlation is expected, but a very strong correlation as seen in the graph above could indicate that the real economy is not growing, only debt is. The stock market is supposed to grow over time as a function of the GDP and overall productivity growing, but is that really happening now?</p>
<p>One interesting observation when inspecting the S&P 500 and the MSCI world index <a class="link" href="https://www.msci.com/constituents" target="_blank" rel="noopener"
>constituents</a>, is that they are both dominated by just seven large U.S. tech firms. However, their effect on U.S. GDP is limited, as their corporate profits are accounted in Ireland due to tax planning. This has led to the GDP of Ireland to skyrocket in the past 10 years <a class="link" href="https://fred.stlouisfed.org/series/PCAGDPIEA646NWDB#0" target="_blank" rel="noopener"
>from 50k to 100k USD per capita</a>. If the U.S. government runs out of money, it will be much more likely to crack down on tax evasion practices and force the large tech companies to pay more corporate tax in the U.S. Seems like a major disruption factor to me. Yet their stock prices keep growing.</p>
<h2 id="expectations-on-the-magnificent-seven"><a href="#expectations-on-the-magnificent-seven" class="header-anchor"></a>Expectations on the “magnificent seven”
</h2><p>The extremely high valuation is justified only if these seven companies are about to make a huge productivity leap that lifts the entire U.S. economy, including creating 1.7 trillion in additional tax revenue to close the deficit gap. <strong>Seems the stock market is predicting exactly that</strong>. Sure, artificial intelligence will help increase productivity everywhere, but I find it very hard to believe that AI productivity gains would accumulate on these tech companies to such a degree that their astronomical valuations would be justified. To me it seems that <strong>the market is now just plain wrong.</strong></p>
<p>According to the <a class="link" href="https://www.world-exchanges.org/" target="_blank" rel="noopener"
>World Federation of Exchanges (WFE) exchanges</a> there are over 58 000 stock listed companies in the world. There should be plenty of options for the stock market to bet on. Yet the market predicts that the U.S. stocks in particular would be future winners.</p>
<p><strong>My prediction for 2024 is a large correction across the U.S. stock market.</strong> If it does not happen already during the spring, surely latest after the U.S. presidential elections in November when <em>keeping appearances</em> ends and policymakers are ready to face the reality.</p>
<p>What is your read of the situation? Post in the comments below.</p> Make habits, not goals https://optimizedbyotto.com/post/make-habits-not-goals/Fri, 29 Dec 2023 00:00:00 +0000 https://optimizedbyotto.com/post/make-habits-not-goals/ <img src="https://optimizedbyotto.com/post/make-habits-not-goals/featured-image.jpg" alt="Featured image of post Make habits, not goals" /><blockquote>
<p>First we make our habits, and then they make us</p><span class="cite"><span>― </span><span>John Dryden, poet and literary critic</span><cite></cite></span></blockquote>
<p>Are you perhaps planning to make a new year’s resolution to run a marathon? Or are you committing to a new sales quota at work for the year 2024? If you haven’t achieved the goal by July 2024, will you be unhappy? Will you give yourself some slack and simply postpone the goal? And if you exceed it, will you immediately make a new goal? Would you feel happy about it?</p>
<p>Most people are really bad at setting and reaching goals. First of all, people seldom reach their goals. In environments where people consistently achieve their goals, it is usually because the goal was set too low to begin with. In rare cases where goals are set high and then met, or even far exceeded, the fact that the goal existed or the level of the goal doesn’t seem to make any difference.</p>
<p>As I see it, thinking about goals isn’t healthy. I find it a much better approach to think about habits. <strong>Habits focus on the immediate action instead of a long-term outcome.</strong> People tend to be much more efficient and also happier after adopting habits. For example, if you want to run a marathon, start by adopting a habit of running 10 km twice a week instead of the sporadic running you would do otherwise. If you work in sales – or if you manage a team of sellers – don’t think about the annual sales quota but instead focus on adopting a habit of reaching out to, for example, 10 new customers daily.</p>
<p>Habits are better than goals because:</p>
<ul>
<li><strong>Habits are adopted immediately.</strong> You know in a matter of days if you are keeping the habit or not, while goals make it too easy to postpone doing the work.</li>
<li><strong>Habits are easier to plan.</strong> Selecting a goal that is concrete and measurable is hard. Even if a good metric exists, it is hard to choose a level low enough to be reachable and thus motivating to work towards, yet high enough to be admirable and to fuel people to make extra effort. For a habit, you pick something concrete, and just do it consistently.</li>
<li><strong>Habits are easy to track and measure.</strong> It is immediately evident if somebody is not sticking to a habit. There is no denial, just effort to get back into the habit. Goals are vague, and not meeting a goal is evident only when it is too late to do anything about it.</li>
<li><strong>Habits can be sustained</strong> for years and years. Goals often compel acts of heroism, which are not sustainable in the long run. As Bruce Lee once said, <em>“long-term consistency trumps short-term intensity.”</em></li>
<li><strong>Habits make people happier.</strong> If you forget or are unable to do something, just get back into the habit the following day. If you fail to meet a goal, you just feel miserable and have no immediate way to rectify the failure.</li>
</ul>
<h2 id="first-step-adopt-a-new-microhabit"><a href="#first-step-adopt-a-new-microhabit" class="header-anchor"></a>First step: Adopt a new microhabit
</h2><p>The smaller the step is, the easier it is to take. Microhabits are a concept of improving something, one tiny step at a time.</p>
<p>An example of a microhabit could be the act of drinking a glass of water every morning. This is something I have been doing myself for almost 3 years now: Every morning, I go straight out of bed to the kitchen and drink a glass of water. Only after that do I allow myself to brush my teeth and do other things that are part of the usual morning routine.</p>
<p>This is a great microhabit, as after a full night’s sleep, one is bound to have a dry mouth and some dehydration, for which water is the best cure.</p>
<p>Starting the day with a glass of water gives a nice head start into a larger habit of drinking water frequently. People typically don’t drink enough plain water, even though it is cheap (practically free), and ensuring good water balance takes away all symptoms of dehydration, such as headache and tiredness. For some reason, modern humans tend to prefer other drinks, such as beer or coffee, which can actually make dehydration worse. Water is essential for all life on earth, and we need to stress the importance of it. Humans can go without food and fast for 2–3 weeks, but without water, we perish in only 3–5 days. Water has zero calories, and the sensation of fullness from drinking benefits weight control.</p>
<h2 id="second-step-dont-give-up-stick-to-the-habit-until-it-becomes-effortless"><a href="#second-step-dont-give-up-stick-to-the-habit-until-it-becomes-effortless" class="header-anchor"></a>Second step: Don’t give up, stick to the habit until it becomes effortless
</h2><p>If you lapse from a habit one day, don’t worry, just get back into the habit the next day. Think of ways to remind yourself of and strengthen the habit. The above example of drinking a glass of water every morning can be strengthened by keeping a water purifier filled with water on the kitchen countertop. Every time you walk past the kitchen, it will remind you of the habit. This setup also minimizes the effort required to perform the microhabit, as you always have the water easily at hand. Personally, I also think water at room temperature feels healthier than drinking ice cold water directly from the tap.</p>
<p>The fact that you decided to do this, and you keep doing it after weeks and months will help strengthen your willpower. Studies on <a class="link" href="https://en.wikipedia.org/wiki/Neuroplasticity" target="_blank" rel="noopener"
>neuroplasticity</a> show that it will physically help to rewire the circuits in your brain into making a decision, and sticking to it. Once you master this one microhabit, you’ll find it easier to adopt other habits that take more effort to get into.</p>
<p>As explained in a <a class="link" href="https://doi.org/10.3389/fnins.2022.699817" target="_blank" rel="noopener"
>review article in Frontiers in Neuroscience</a> of 63 meta-analyses, this occurs because of an increased ability to exert effortful control in our brains. Simply put, as with all biological organisms, our brain has evolved to save energy and only think in situations where that extra energy consumption is necessary. Most of the time, our brains run on autopilot, which means not only that the existing pathways keep getting reinforced, but also that the pathways of the pathway control system itself stay weak, as new pathways don’t need to be formed. Hence, when we take on a new habit and exert the mental effort to repeat the routine with conscious intent over and over, it leads to the creation and reinforcement of pathways related to the habit. The day a habit has grown strong enough to become part of our brain’s autopilot program, the system that decides what goes into the autopilot and what goes out is also at its strongest.</p>
<h2 id="third-step-increase-the-number-of-microhabits"><a href="#third-step-increase-the-number-of-microhabits" class="header-anchor"></a>Third step: Increase the number of microhabits
</h2><p>After successfully adopting your first intentional microhabit, the next step could be to either <strong>expand the first microhabit</strong> to make it more complicated, <strong>or to adopt a second microhabit</strong>.</p>
<ol>
<li>An example of expanding the first one could be taking a multivitamin pill with the water, or adding salt to the water to follow the <a class="link" href="https://www.hubermanlab.com/episode/using-salt-to-optimize-mental-and-physical-performance" target="_blank" rel="noopener"
>Huberman routine</a>.</li>
<li>An example of an easy second microhabit would be to drink a cup of warm water every evening before going to bed. Since it is just water, you can drink it even after brushing your teeth. It feels somewhat filling, and helps me avoid drinking or snacking on anything else, which is likely unhealthy. The warmth of the water feels good, and I think it is plausible that it has similar health benefits to drinking tea, just without any flavor or sweeteners.</li>
</ol>
<h2 id="get-into-a-habit-now"><a href="#get-into-a-habit-now" class="header-anchor"></a>Get into a habit now
</h2><p>Get into good habits. Keep them for the rest of your life. Change them only if you find a new, better habit to replace the old. If you slip from your habit one day, forgive yourself, and get back into the habit the next day. By keeping the good habits, you will eventually reach your goal – and eventually also far exceed the goal. <strong>Make habits, not goals.</strong></p> How to conduct an effective code review https://optimizedbyotto.com/post/how-to-code-review/Sun, 19 Nov 2023 00:00:00 +0000 https://optimizedbyotto.com/post/how-to-code-review/ <img src="https://optimizedbyotto.com/post/how-to-code-review/featured-image.jpg" alt="Featured image of post How to conduct an effective code review" /><p>In software development, the code review process stands as a crucial checkpoint for ensuring code quality, fostering collaboration, and promoting knowledge sharing among team members. Despite the importance, many engineers lack a clear mental map of how effective reviews work. This is my attempt to help code reviews and reviewers improve.</p>
<h2 id="first-review-question-what-is-the-intention"><a href="#first-review-question-what-is-the-intention" class="header-anchor"></a>First review question: What is the intention?
</h2><p>Focus first on the <a class="link" href="https://optimizedbyotto.com/post/good-git-commit/" >git commit title and message</a>. Do you understand the description? Does the proposed change make sense based on only the description? Is the idea clear? Ask yourself if you can come up with reasons why the change should not be made, or if there are obvious better alternatives.</p>
<p>If the description does not make sense to you, immediately share that as your initial feedback. <strong>If the description about the intent is incomprehensible, don’t waste time reviewing code implementation details.</strong> It could be that the submission was still just a draft, and the only (and immediate) feedback should be a request for the submitter to clarify their intent.</p>
<p>The next thing to check is if the code matches the description. For example, if the change proposes to fix a bug, it should not include extra code that adds a new feature. Sometimes when you review the code, 90% of the changes make sense and do exactly what the git commit message stated, but there might be some code lines changed that you don’t understand. Ask the submitter about these. Maybe they were included by mistake, or for an unobvious reason. This should be addressed by some comments inline next to the code, or by amending the change description.</p>
<h2 id="next-is-the-implementation-good"><a href="#next-is-the-implementation-good" class="header-anchor"></a>Next: Is the implementation good?
</h2><p>Assuming the description and changes align and make sense, shift your focus to the implementation details and assessing how the problem was solved. As a reviewer, you should ask yourself: Are there alternative ways to achieve the same outcome? Would these alternatives bring clear advantages, or is the current proposal the most sensible one?</p>
<p>If the code changes look mostly good and you don’t anticipate a complete overhaul in the next revision, proceed to giving feedback about the details in the code. Look for potential logical errors, deviations from coding conventions, anti-patterns, spelling mistakes, and even hint at any security concerns or performance issues you may anticipate. Scrutinize every aspect that catches your eye and suggest improvements wherever possible.</p>
<p>If you find yourself commenting on nearly every code line changed, <strong>consider aborting the review</strong> and asking the submitter to address the first set of comments before completing a comprehensive review.</p>
<p><strong>Beware that some people might take every comment literally.</strong> When pointing out things that can be improved, remember to include phrases like “<em>in general it is better to..</em>” or “<em>if you have time do..</em>” to soften the general feedback, and clarify what actually needs to be addressed before you can give your approval for the change, and which things you merely suggest as quality improvements without insisting on them. <strong>Also beware of submitters who address comments mechanically.</strong> Instead of giving your own proposed better implementation as a code snippet, simply ask the submitter to “<em>please rethink X so it always does Y without doing Z</em>” which forces the submitter to write the better version themselves, and in the process train their brain muscle. Who knows, after thinking about the thing you pointed out, the submitter might even write a better fix than you initially thought of.</p>
<h2 id="nitpicking-is-not-only-ok-but-actually-important"><a href="#nitpicking-is-not-only-ok-but-actually-important" class="header-anchor"></a>Nitpicking is not only OK, but actually important
</h2><p>Some may be hesitant to nitpick in code reviews, as nitpicking is not something civilized people tend to do in normal daily interactions. However, <strong>nitpicking is an integral part of code reviews</strong> and software engineering in general. Computers read every dot and comma literally, and these need to be exactly correct. The whole idea of a code review is to maximize quality and minimize the room for error, in both the short term and the long term. Thus, in the context of a code review, please nitpick as much as you can.</p>
<p><strong>The amount of feedback should be maximal, but the bar for approval does not need to be maximally high.</strong> The requirements for minimum quality should be suitable for the skill level of the developer team. Yes, over time the quality bar can be raised as the development team collectively learns about best practices, but if the quality bar is too high from the onset, it will discourage developers from making new submissions. And the iterative cycle that allows for learning to happen won’t take place.</p>
<p>Managing quality is hard, but ensuring acceptable quality is central to the code review process. As a general rule, however, if you are in doubt about what is reasonable to require, err on the side of demanding higher quality rather than settling for something you feel is substandard. In the long term, you are more likely to be happy about having made such a decision.</p>
<p>Keep in mind that the main reason people settle for low quality is that <em>doing things at the highest quality level requires more effort</em>, and some people are lazy. However, if people are constantly pushed to do things at high quality, doing things correctly will soon become effortless.</p>
<h2 id="continuous-integration-and-tests-save-everybodys-time"><a href="#continuous-integration-and-tests-save-everybodys-time" class="header-anchor"></a>Continuous integration and tests save everybody’s time
</h2><p>To support the review process by humans, all software projects should have a CI in place that runs automatic tests on every code change and helps detect <em>at least all easily detectable</em> regressions. Code submissions with failing CI typically don’t attract many reviewers, so submitters should review and fix all CI issues themselves as soon as possible.</p>
<p>On a related note, all high-quality code submissions that change the program’s behavior should also update and extend the associated test code. If, for example, the software project has unit tests, and a submitter sends new code without new unit tests, the reviewer should request the submitter to write the missing tests. This feedback can even be automatic – a good CI system could detect a drop in test coverage.</p>
<h2 id="communicate-clearly-and-precisely"><a href="#communicate-clearly-and-precisely" class="header-anchor"></a>Communicate clearly and precisely
</h2><p>In the review process, it is important that both the submitter and the reviewer communicate clearly and precisely.</p>
<p>From the reviewer’s side, this means that the reviewer should be clear on what things are required from the submitter and what things are just nice to have. If a reviewer thinks the whole approach is bad, they should be frank and reject the submission from the get go.</p>
<p>From the submitter’s side, the initial submission should be as complete as possible. If code is submitted for review (e.g., Pull Request or Merge Request) but it does not have the final contents, the submitter should be explicit about this and prefix their git commits or the review title with <code>WIP:</code> to signify that it is work-in-progress.</p>
<p>In my experience, the typical reason for code submissions and reviews getting stalled is simply unclear communication. The code submitted might be fully correct, but the plain English part is lacking, leading to miscommunication. Typically, reviewers also postpone diving into submissions that they don’t understand at first glance, as having to do detective work to figure out an unclear code submission can feel overwhelming. <strong>Therefore, the best way to ensure submissions are reviewed in a timely manner is to communicate clearly.</strong></p>
<h2 id="avoid-noise-maximize-signal"><a href="#avoid-noise-maximize-signal" class="header-anchor"></a>Avoid noise, maximize “signal”
</h2><p>Needless to say, both submitter and reviewer should know how to properly use the review tool. For example, both GitHub and GitLab have “Start review” and “Finish review” features that allow the reviewer to write multiple comments without spamming the submitter with multiple emails. Pressing “Finish review” will trigger the submission of one single email with all comments. Most systems also have buttons for requesting review and re-requesting review that the submitter should use to communicate clearly when the review feedback is addressed.</p>
<p>When a submitter re-submits code for review, they should always use <code>git push --force</code>. The reviewer is always looking at the submission with the question “Is this ready to be merged?” in their mind, and the submission they look at should be the final polished code. There should be no <code>WIP</code> commits or multiple intermediate git commits – the reviewer is not interested in how the submitter ended up with the correct end result. <strong>Only the final version matters.</strong></p>
<h2 id="respect-peoples-time-prioritize-correctly"><a href="#respect-peoples-time-prioritize-correctly" class="header-anchor"></a>Respect people’s time, prioritize correctly
</h2><p>Reviews require time. The submitter should be extra diligent not to waste the reviewer’s time. Likewise, the reviewer should try to review as quickly as possible, or at least share their initial impression as a quick, short comment.</p>
<p><strong>If reviewers are short on time, they should prioritize re-reviewing submissions they’ve already once given feedback on.</strong> People who already have context about something should continue on it, as it is most efficient for them. Other people on the team are more likely to review code submissions that have no reviews at all.</p>
<p>Likewise, submitters should prioritize responding to review feedback and updating and polishing their existing submissions as soon as possible. A reviewer is much more likely to re-review and approve a submission they have already looked at a couple days earlier than submissions that have been lingering for weeks or months.</p>
<h2 id="honor-original-code-let-submitters-feel-ownership"><a href="#honor-original-code-let-submitters-feel-ownership" class="header-anchor"></a>Honor original code, let submitters feel ownership
</h2><p>A code submission and the review process serves as a process for the new submitter to become initiated with the software project. Reviewers should keep this in mind, and not rush to grab the code, but put in a little extra effort to guide the submitter on code base and on the quality bar. Reviewers might feel frustrated at times, but that is not an excuse for bad behaviour.</p>
<p>Sometimes, the reviewer might be tempted to fix the issues in the submission themselves instead of giving the original submitter a chance to do the final polish. <strong>A reviewer must resist this, as this will kill the feeling of ownership of the original submitter.</strong> In the context of open source projects, grabbing somebody else’s submission and committing it yourself might also constitute a copyright violation, and it certainly does not encourage the submitter to continue making further submissions. The reviewer should at most just rebase the commit, nothing more.</p>
<p>If the review process takes too many rounds of back-and-forth, then as a compromise the reviewer could merge the submission as-is, and immediately follow up with their own changes on top of the commit.</p>
<p>In both open source and company-internal work, the achievement that one’s code got merged is very valuable, in particular if it was the first accepted contribution to a particular project from that person. Don’t rob this from code submitters; let them earn it and have their <a class="link" href="https://en.wikipedia.org/wiki/Git" target="_blank" rel="noopener"
>git</a> credits and purple GitHub merge badges in their profile.</p>
<h3 id="there-may-be-many-reviewers-but-never-more-than-one-code-submitter"><a href="#there-may-be-many-reviewers-but-never-more-than-one-code-submitter" class="header-anchor"></a>There may be many reviewers, but never more than one code submitter
</h3><p>A fundamental principle in code submissions is maintaining a single code submitter. While there may be <strong>multiple people reviewing</strong> and posting comments, there should never be more than <strong>one person submitting</strong> the code (and subsequently improved revisions). Having one author ensures a clear owner, who iterates the submission and decides how to address all feedback. If using a git branch, that person is the only one who amends commits and force pushes the branch until the final version has been completed.</p>
<p>If the submission is a patch on a mailing list, having multiple submitters of a single email is impossible and authorship and ownership not an issue. On GitHub or GitLab, this problem might arise, as the submission is a git branch, and the branch permissions may allow multiple people to push to it. <strong>Having multiple people pushing to the same pull request, however, is plain wrong.</strong></p>
<p>In some projects, git pull requests are intentionally misused by having multiple authors push git commits on them and discussing the collaboration in the “review comments” section. <strong>The correct way for a group of people to write code together in git before the code goes on the mainline branch is to have a <em>feature branch</em>.</strong> Both <a class="link" href="https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request" target="_blank" rel="noopener"
>GitHub Pull Requests</a> and <a class="link" href="https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html" target="_blank" rel="noopener"
>GitLab Merge Requests</a> allow the submitter to select the <em>target branch</em> of the submission. With a feature branch model, each collaborator submits their own individual pull requests towards the feature branch, and once the feature is complete, the feature owner carries the responsibility of getting the feature branch accepted into the mainline branch of the git project.</p>
<p>Sometimes it may happen that there is disagreement or laziness, and the submitter refuses to properly address all feedback. In these cases, the recipient/reviewer either dismisses the submission completely or accepts it as-is (merges on target branch), where work can continue in the form of follow-up submissions from other people to further improve it.</p>
<p>It may also happen that a submission has a great idea, but the implementation is bad and unacceptable. The submitter is asked to improve the implementation, but the submitter might fail to do so. In this situation, nobody else should rewrite that same submission, as it blurs the lines of authorship and ownership. Instead, the other person with a better implementation should simply open a new pull request or submit an email with their own version and in their own name. After that, it will be up to the reviewers to decide which submission gets accepted and which one declined.</p>
<h2 id="code-reviews-are-opportunities-to-learn--for-both-the-code-submitter-and-the-reviewer"><a href="#code-reviews-are-opportunities-to-learn--for-both-the-code-submitter-and-the-reviewer" class="header-anchor"></a>Code reviews are opportunities to learn – for both the code submitter and the reviewer
</h2><p>Keep in mind that the reviewer does not need to be a superior developer. <strong>Anybody can be a reviewer, no matter how senior or junior a software developer they are</strong>. Reading code written by somebody else is a good way to get exposure to various coding styles and various ways that different developers solve coding problems. Reviewing code is a learning opportunity for the reviewer, and thus the review process ultimately leads to both the submitter and the reviewer writing better code in the future.</p>
<p>If you have an opportunity to do code reviews, do it, even if you don’t feel like an authority on a specific domain. You can still make a valuable contribution by giving feedback on some aspects, while leaving the final approval decision to a domain expert.</p>
<p>Remember to communicate clearly – the better the code is documented, and the better the feedback is explained, the more both submitter and reviewer learn from the experience.</p>
<p>If you have additional tips on best practices for code reviews, please add a comment below!</p> My 5 tips for efficient meetings https://optimizedbyotto.com/post/tips-for-efficient-meetings/Sun, 08 Oct 2023 00:00:00 +0000 https://optimizedbyotto.com/post/tips-for-efficient-meetings/ <img src="https://optimizedbyotto.com/post/tips-for-efficient-meetings/featured-image.jpg" alt="Featured image of post My 5 tips for efficient meetings" /><p>While large organisations scale best by emphasizing asynchronous communications, in-person or video meetings also have their place. As a manager who is involved in a lot of planning and coordination work I’ve noticed that I’ve spent the majority of my working time in meetings in past years. These are my 5 tips to make meetings as efficient as possible.</p>
<h2 id="have-a-written-meeting-agenda-before-and-during-the-meeting"><a href="#have-a-written-meeting-agenda-before-and-during-the-meeting" class="header-anchor"></a>Have a written meeting agenda before and during the meeting
</h2><p>While a simple <em>‘hey we should chat’</em> works for small informal meetings, having <strong>an agenda always makes the meeting much more efficient</strong> – even if there are just two participants. Put an agenda in every meeting invite you send. Explain what is the purpose of the meeting, who is attending and what should be the outcome. It does not have to be formal. Even a tiny agenda summary is enough to show you’ve put some thought into why the meeting is taking place.</p>
<p>When the meeting starts, show the agenda to remind everyone what the purpose of the meeting is so that attendees can help make the meeting productive. When possible, I prefer to organize meetings with <a class="link" href="https://calendar.google.com/" target="_blank" rel="noopener"
>Google Calendar</a> because <a class="link" href="https://meet.google.com/" target="_blank" rel="noopener"
>Meet</a> will automatically show the invite text (the agenda) in the video call as a small popup in the lower left corner.</p>
<h3 id="fall-back-agenda-if-unprepared-whot"><a href="#fall-back-agenda-if-unprepared-whot" class="header-anchor"></a>Fall-back agenda if unprepared: WHOT
</h3><p>If you find yourself chairing a meeting unprepared, remember the acronym <em>WHOT</em> to improvise an agenda and meeting structure that works in most situations:</p>
<ul>
<li>
<p><strong>Why are we here</strong> - Explain why the thing that is the topic of the meeting matters, and briefly the context or perhaps latest developments around it.</p>
</li>
<li>
<p><strong>How attendees relate to the topic</strong> - introduce participants if they have not met before, and even if they know each other, explain how they ended up being invited to the meeting or how they are expected to participate in the topic.</p>
</li>
<li>
<p><strong>Opinions (or orders)</strong> - Ask attendees for opinions on the topic. If the purpose of the meeting was to get input from everyone, your duty as the chair of the meeting is to make sure everyone has a chance to talk. Most meetings about non-urgent topics are about collecting and aligning views. If the meeting is about an urgent topic such as an operational issue, the O in the acronym stands for orders and the main part of the meeting is to make sure all participants know what they should execute immediately after the meeting. Even when giving out orders you should still verify that the participants understand and agree with the ask and not assume too much.</p>
</li>
<li>
<p><strong>Timeframe</strong> - Conclude the meeting by summarizing what is the timeframe of the agreed actions, or when the next event in the topic is expected to happen, or if a follow-up meeting is expected state the timeframe for it.</p>
</li>
</ul>
<h2 id="be-inclusive-with-open-ended-questions-and-dont-fear-silence"><a href="#be-inclusive-with-open-ended-questions-and-dont-fear-silence" class="header-anchor"></a>Be inclusive with open-ended questions and don’t fear silence
</h2><p>Don’t be afraid of moments of silence when running a meeting. In fact, you should ensure that there are some pauses so that people who need more time to formulate their thoughts have an opportunity to speak up. This is important in particular if many participants are not native speakers of the meeting language. The best ideas might not be the ones that are voiced first, but the ones that come after some pondering on the topic.</p>
<p>If some participants talk too much, ask them politely to give space to others. If everyone seems silent, ask open-ended questions. Maintain a welcoming environment for discussion where speakers or their opinions are not directly criticized as that might silence some participants from sharing their honest opinions. Make sure all discussions have a respectful tone and opinions are voiced in a constructive manner with focus on solutions, or when discussing problems encourage participants to put forward concrete data points instead of pure opinions. As a chair for the meeting strive to be kind, but firm.</p>
<h2 id="respect-the-peoples-time"><a href="#respect-the-peoples-time" class="header-anchor"></a>Respect the people’s time
</h2><p>As the organizer you are responsible for allocating enough time. If the meeting is going overtime, end it, and schedule a follow-up meeting. If you think some participants talk too much or off-topic, steer the discussion back on-topic and remind about the time remaining. Even if there is allocated time left, don’t allow room for random mumblings. <strong>Respectful use of people’s time includes cutting the meeting short if the goal was met.</strong></p>
<p>To keep meetings efficient, they should not take longer than one hour. When planning the meeting and thinking about the agenda, make sure there are not too many items for one meeting. Also make sure that there are not too many participants. If the intent is to discuss something for an hour, there should be max 12 participants (which means on average 5 minutes of speaking time per person). If the meeting has 20 or more participants, it is not a meeting but mostly a one-way announcement to the audience.</p>
<p>One large meeting with 20+ people could alternatively also be a series of meetings where the same chair talks to groups of 5 at a time, and after all the meetings sends out summary of all of them with a final conclusion.</p>
<h2 id="send-a-summary-after-the-meeting"><a href="#send-a-summary-after-the-meeting" class="header-anchor"></a>Send a summary after the meeting
</h2><p>People have a tendency to forget what was discussed or agreed after 2-3 weeks. Meetings are at risk of being wasted time unless the outcome is recorded at least in a small written summary. Writing formal meeting minutes is not an efficient use of time for everyday business meetings, but a 5-minute investment in a short summary is always worth it.</p>
<p>Formal meeting minutes are justified if the meeting is making decisions with ramifications in the tens or hundreds of thousands of dollars. Formal meeting minutes should record who was present, what the decision was, and <em>why</em>. For formal meetings it is important to agree to have some kind of approval mechanism for the minutes, and to not execute the decisions until the minutes have been approved with e.g. signatures.</p>
<p>If the meeting topic is very contentious, the meeting minutes should be written during the meeting and shared on-screen so participants can already during the meeting object in real-time if they see the decisions or their justifications not written down correctly.</p>
<h2 id="dont-have-meetings-if-possible"><a href="#dont-have-meetings-if-possible" class="header-anchor"></a>Don’t have meetings if possible
</h2><p>The last tip is to take some time and think if a meeting really is needed. If you just want to have the opinion on one specific thing, perhaps the meeting could be replaced with a chat thread where your first message is the question followed with a bit of context. This might even work better, as people can take as much time as they want to formulate their best possible reply in writing and with exact data points and references. To ensure everybody participates, you can request everyone acknowledges the question with an 👀 emoji or a short reply. Such ‘meetings’ have the additional benefit that they don’t need a separate agenda or summary, as the chat was in recorded writing automatically.</p>
<p>As a replacement for a large 20+ participant meeting one could have a well written announcement email, or a video recording, since most of the participants in a large meeting will not have time to speak anyway, and they will mostly just be receiving information without participation.</p>
<p>If you feel that a meeting is needed for social cohesion, perhaps get the core decision done in an asynchronous chat or email discussion, and organize a breakfast or lunch separately to get the best social effect in a setting that is free from debating and just purely positive social time together.</p>
<p>If the meeting is a one-on-one between two people, it could be replaced with a walk in the park. That will make sure you both get some fresh air, and who knows, maybe also some fresh ideas?</p> Pulsar, the best code editor https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/Sun, 24 Sep 2023 00:00:00 +0000 https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/ <img src="https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/featured-pulsar-screenshot.png" alt="Featured image of post Pulsar, the best code editor" /><p>The key to being productive as a programmer is to have a great code editor. I have been an avid user of <a class="link" href="https://web.archive.org/web/20221002074229/https://atom.io/" target="_blank" rel="noopener"
>Atom</a> since 2014, and its successor <a class="link" href="https://pulsar-edit.dev/" target="_blank" rel="noopener"
>Pulsar</a> since now in 2023.</p>
<h2 id="the-best-code-editor-came-from-github"><a href="#the-best-code-editor-came-from-github" class="header-anchor"></a>The best code editor came from GitHub
</h2><p><a class="link" href="https://en.wikipedia.org/wiki/Atom_%28text_editor%29" target="_blank" rel="noopener"
>Atom the code editor</a> was created by Nathan Sobo, who joined GitHub in 2011 specifically to build the best possible code editor in the world. He also co-led the development of Teletype for Atom, which pioneered collaborative code editing, i.e. multiple developers writing the same code files at the same time. The team also created the <a class="link" href="https://en.wikipedia.org/wiki/Electron_%28software_framework%29" target="_blank" rel="noopener"
>Electron Framework</a>, which lives on as the user interface framework for e.g. <a class="link" href="https://en.wikipedia.org/wiki/Visual_Studio_Code" target="_blank" rel="noopener"
>Microsoft VS Code</a> (created a couple years after Atom).</p>
<p>VS Code is actually the reason why Atom eventually <a class="link" href="https://github.blog/2022-06-08-sunsetting-atom/" target="_blank" rel="noopener"
>died</a>. In 2018, Microsoft acquired GitHub, the world’s largest open source hosting platform, to get premier mind share of developers, and as part of that plan, Microsoft also wanted as many developers as possible to do all their coding using an editor controlled by Microsoft. To achieve this goal, Microsoft invested hugely in VS Code, and naturally discontinued Atom after gaining control of it through the GitHub acquisition.</p>
<p>However, being <a class="link" href="https://opensource.org/licenses/" target="_blank" rel="noopener"
>open source software</a>, Atom can’t fully be killed. Yes, Microsoft controls the name, trademark and website – which are all now shut down – but the <a class="link" href="https://github.com/pulsar-edit/pulsar/blob/master/LICENSE.md" target="_blank" rel="noopener"
>source code is MIT licensed</a> and has resurrected as Pulsar the editor.</p>
<h2 id="local-app-but-built-with-web-technologies-htmljavascript"><a href="#local-app-but-built-with-web-technologies-htmljavascript" class="header-anchor"></a>Local app but built with web technologies HTML/JavaScript
</h2><p>The key innovation in Atom (and now Pulsar) was to build the whole editor as a web application that runs locally (offline) inside a dedicated browser window (Electron). This radically lowers the barrier of entry for all the millions of web developers out there to participate in the development of the editor. This is reflected today in Pulsar’s tagline “hyper-hackable text editor”, with reference to the original meaning of the word “hackable” as users’ ability to use the tool in ways the original designer had not anticipated.</p>
<p>To make Pulsar even more flexible, it also has a plugin system (called packages) with <a class="link" href="https://web.pulsar-edit.dev/packages" target="_blank" rel="noopener"
>over 10,000 community-developed easily installable packages</a> one can use to extend the features of the editor.</p>
<p>Pulsar is feature rich and very powerful already out-of-the-box, but thanks to the vast number of additional packages, a developer can easily optimize one’s workflow to be as productive as possible. If a package does not already exist for your use case, creating one yourself is moderately easy to most developers, as you only need to know web development tech (JavaScript/CSS/HTML) to do it.</p>
<p>Consider the screenshot below showing a simple C source code file being edited.</p>
<p><img src="https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/pulsar-code-file-example.png"
width="1003"
height="774"
srcset="https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/pulsar-code-file-example_hu7353216324560348284.png 480w, https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/pulsar-code-file-example.png 1003w"
loading="lazy"
alt="Editing a simple C file in Pulsar"
class="gallery-image"
data-flex-grow="129"
data-flex-basis="311px"
>
</p>
<p>In this picture, you can see that there is:</p>
<ul>
<li>a linter that automatically highlights with yellow a specific code word the user should improve</li>
<li>a yellow vertical line shows what lines in the file have been changed and are not yet committed in git</li>
<li>a file browser showing the files in the project, with modified files being yellow, new files being green and files ignored by git being greyed out</li>
<li>a rich status bar showing the git status, cursor location, file encoding and type, git branch name and even the GitLab CI status for the latest git commit</li>
</ul>
<p>This is perfect for my coding workflow – it does not feel bloated but has all the features I need without excess. Check out the post about <a class="link" href="https://optimizedbyotto.com/post/develop-code-10x-faster/" >How to code 10x faster than an average programmer</a> to see the Pulsar autosave feature being used to create the optimal code-test-repeat workflow.</p>
<h2 id="powerful-keyboard-shortcuts--without-the-need-to-memorize-too-much"><a href="#powerful-keyboard-shortcuts--without-the-need-to-memorize-too-much" class="header-anchor"></a>Powerful keyboard shortcuts – without the need to memorize too much
</h2><p>For maximum productivity, most developers prefer to be able to do everything using the keyboard, without the need to lift your hand to fiddle with the mouse. Pulsar has a keyboard shortcut for almost everything. If something is missing, you can also configure more yourself, and you can change the default keyboard bindings if you have existing preferences. However, the best part is that you don’t really need to spend any time learning keyboard shortcuts - <strong>you just need to remember Ctrl+Shift+P</strong>. This opens the command palette, where you can write in a keyword, select the action and press enter, or see on screen the keyboard shortcut and use it until it has gone effortlessly into your muscle memory.</p>
<p><img src="https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/pulsar-command-palette.png"
width="1003"
height="774"
srcset="https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/pulsar-command-palette_hu11632327062133518543.png 480w, https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/pulsar-command-palette.png 1003w"
loading="lazy"
alt="Open the command palette in Pulsar by pressing Ctrl+Shift+P"
class="gallery-image"
data-flex-grow="129"
data-flex-basis="311px"
>
</p>
<p>The two other keyboard shortcuts worth remembering are (same but without the shift) <strong>Ctrl+P</strong> that allows you to type a partial filename and press enter to quickly open any file in the project, and <strong>Ctrl+Shift+F</strong> to search a string anywhere in the project. The search feature is amazing, and also comes with a powerful search-and-replace feature with interactive previews on what string it would replace in which file.</p>
<h2 id="why-not-use-vim-like-all-real-programmers-do"><a href="#why-not-use-vim-like-all-real-programmers-do" class="header-anchor"></a>Why not use Vim like all real programmers do?
</h2><p>I know several world class programmers, and interestingly, the commonality among them is that they all seem to use <a class="link" href="https://en.wikipedia.org/wiki/Vim_%28text_editor%29" target="_blank" rel="noopener"
>Vim</a> as their code editor. Many people I know who think of themselves as world class programmers use <a class="link" href="https://en.wikipedia.org/wiki/Editor_war" target="_blank" rel="noopener"
>Emacs</a>.</p>
<p>While both of these can probably compete with Pulsar in the number of extensions available and the amount of customizations and keyboard shortcuts, neither of them has a graphical user interface, and they are completely incapable of doing things like showing images or, for example, a Markdown preview (which by the way you can open in Pulsar by pressing <strong>Ctrl+Shift+M</strong>).</p>
<p><img src="https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/pulsar-markdown-example.png"
width="1003"
height="774"
srcset="https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/pulsar-markdown-example_hu17082990645493122934.png 480w, https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/pulsar-markdown-example.png 1003w"
loading="lazy"
alt="Markdown preview in Pulsar by pressing Ctrl+Shift+M"
class="gallery-image"
data-flex-grow="129"
data-flex-basis="311px"
>
</p>
<p>Also, Vim and Emacs use <em>modes</em> to allow users to either write the text in the file itself or to enter commands for the editor. <a class="link" href="https://www.theregister.com/2020/02/19/larry_tesler/" target="_blank" rel="noopener"
>Modes are terrible</a> to use, and thanks to people like Larry Tesler, we haven’t had modes in any new software since the 1980’s, and nearly all of humanity is blissfully unaware of what modes even are and will never be exposed to them. Instead of using keyboard shortcuts via modes, we today have keyboard shortcuts like Ctrl+C and Ctrl+V which are easy to use.</p>
<h2 id="are-electron-apps-slow"><a href="#are-electron-apps-slow" class="header-anchor"></a>Are Electron apps slow?
</h2><p>Admittedly, Pulsar is a bit slow to start up. It takes maybe 6–8 seconds for it to fully load from scratch, but once it is open, using it feels pretty snappy. The slowness is due to being an Electron app, which is basically a web browser. Indeed, the original author of Atom/Pulsar, Nathan Sobo, started a new code editor project after Atom was shut down. The new editor <a class="link" href="https://zed.dev/" target="_blank" rel="noopener"
>Zed</a> is extremely fast, as it is written in Rust and also because it utilizes the GPU for rendering. Zed, however, is not open source, and it is not available for Linux, so I have not even tried it. Instead, I tested another promising Rust-based blazingly fast editor, <a class="link" href="https://lapce.dev/" target="_blank" rel="noopener"
>Lapce</a>. It is truly fast and pleasant to use. However, it lacks a lot of features, so I end up having to do extra manual work.</p>
<p>Thus, Pulsar holds up as my favorite editor, and for me <strong>Pulsar is the fastest one when measuring how quickly I am able to deliver code changes</strong> and write complete and correct text.</p>
<h2 id="my-pulsar-config-and-packages"><a href="#my-pulsar-config-and-packages" class="header-anchor"></a>My Pulsar config and packages
</h2><p>If you want to replicate the Pulsar setup I have going, this is my <code>.pulsar/config.cson</code>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">yaml</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">"*":
autocomplete-plus:
confirmCompletion: "tab always, enter when suggestion explicitly selected"
autosave:
enabled: true
welcome:
showChangeLog: false
showOnStartup: false</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#e6db74">"*"</span>:
</span></span><span style="display:flex;"><span> <span style="color:#f92672">autocomplete-plus</span>:
</span></span><span style="display:flex;"><span> <span style="color:#f92672">confirmCompletion</span>: <span style="color:#e6db74">"tab always, enter when suggestion explicitly selected"</span>
</span></span><span style="display:flex;"><span> <span style="color:#f92672">autosave</span>:
</span></span><span style="display:flex;"><span> <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span> <span style="color:#f92672">welcome</span>:
</span></span><span style="display:flex;"><span> <span style="color:#f92672">showChangeLog</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span> <span style="color:#f92672">showOnStartup</span>: <span style="color:#66d9ef">false</span></span></span></code></pre></div></div></div>
<p>These are some of my favorite packages:</p>
<ul>
<li><a class="link" href="https://github.com/pulsar-edit/sort-lines" target="_blank" rel="noopener"
>Sort Lines</a></li>
<li><a class="link" href="https://web.pulsar-edit.dev/packages/linter-spell" target="_blank" rel="noopener"
>Linter Spell (human languages)</a></li>
<li><a class="link" href="https://web.pulsar-edit.dev/packages/linter-flake8" target="_blank" rel="noopener"
>Linter Flake8 (Python)</a></li>
<li><a class="link" href="https://web.pulsar-edit.dev/packages/linter-clang" target="_blank" rel="noopener"
>Linter Clang</a></li>
<li><a class="link" href="https://web.pulsar-edit.dev/packages/linter-shellcheck-pulsar" target="_blank" rel="noopener"
>Linter ShellCheck</a></li>
<li><a class="link" href="https://web.pulsar-edit.dev/packages/linter-markdown" target="_blank" rel="noopener"
>Linter MarkDown</a></li>
<li><a class="link" href="https://github.com/burodepeper/language-markdown" target="_blank" rel="noopener"
>Markdown language grammar</a></li>
<li><a class="link" href="https://github.com/tsbarnes/language-debian" target="_blank" rel="noopener"
>Debian control and rules file linter</a></li>
<li><a class="link" href="https://github.com/lloiser/language-ansi-styles" target="_blank" rel="noopener"
>ANSI color styles</a></li>
<li><a class="link" href="https://github.com/russpowers/nim-atom" target="_blank" rel="noopener"
>Nim language support</a></li>
<li><a class="link" href="https://github.com/T-Huelsken/gitlab-manager" target="_blank" rel="noopener"
>GitLab Manager</a></li>
<li><a class="link" href="https://github.com/Stepsize/atom-better-git-blame" target="_blank" rel="noopener"
>Better Git Blame</a></li>
<li><a class="link" href="https://github.com/atom-minimap/minimap" target="_blank" rel="noopener"
>Minimap</a></li>
<li><a class="link" href="https://github.com/ansballard/minimap-autohider" target="_blank" rel="noopener"
>Minimap autohider</a></li>
</ul>
<h2 id="using-pulsar-to-write-git-commit-messages"><a href="#using-pulsar-to-write-git-commit-messages" class="header-anchor"></a>Using Pulsar to write git commit messages
</h2><p>Run <code>git config --global core.editor "pulsar --wait"</code> to use Pulsar for git
commit messages. While writing commit messages remember to press <code>Ctrl+Shift+Q</code>
to reflow paragraphs and have line wrapping that follows <a class="link" href="https://optimizedbyotto.com/post/git-commit-message-examples/" >git best
practices</a>.</p> Unpacking Linux containers: understanding Docker and its alternatives https://optimizedbyotto.com/post/linux-containers-docker/Mon, 08 May 2023 00:00:00 +0000 https://optimizedbyotto.com/post/linux-containers-docker/ <img src="https://optimizedbyotto.com/post/linux-containers-docker/featured-image.jpg" alt="Featured image of post Unpacking Linux containers: understanding Docker and its alternatives" /><p>In popularizing Linux containers, Docker brought about a new era of systems design based on these lightweight platforms, rather than heavy virtual machines. However, now that Docker is slowly declining, it’s time to learn about the next generation of Linux container tools.</p>
<h2 id="docker"><a href="#docker" class="header-anchor"></a>Docker
</h2><p>When <a class="link" href="https://en.wikipedia.org/wiki/Docker_%28software%29" target="_blank" rel="noopener"
>Docker</a> officially launched in 2013, it was not the first containerization solution for Linux. For example, Linux already had <a class="link" href="https://en.wikipedia.org/wiki/LXC" target="_blank" rel="noopener"
>LXC</a> back in 2008 (early versions of Docker ran on top of it), and <a class="link" href="https://en.wikipedia.org/wiki/FreeBSD_jail" target="_blank" rel="noopener"
>FreeBSD jails</a> had been around since 1999. Nevertheless, Docker <em>was</em> the first developer-friendly and complete end-to-end solution that let us easily create, distribute, and run Linux containers.</p>
<p>Not only was it technically sound and convenient to use, but Docker was also a great example of a successful and well-run open source project. I experienced this personally during a couple of contributions where two people did the initial review within 24h of my Pull Request and a third person merged it in less than two weeks from the submission date. Docker developers also contributed <em>back</em> to Linux plenty of containerization-related improvements, started drove standardization efforts, and spun off many subcomponents (e.g., <a class="link" href="https://containerd.io/" target="_blank" rel="noopener"
>containerd</a>, <a class="link" href="https://en.wikipedia.org/wiki/Open_Container_Initiative" target="_blank" rel="noopener"
>OCI</a>, <a class="link" href="https://github.com/moby/buildkit" target="_blank" rel="noopener"
>BuildKit</a>).</p>
<p>Today, container-based system architectures and development workflows are extremely popular, as seen with, for instance, the rise of <a class="link" href="https://github.com/moby/moby/commit/b619220ce11770ffaea068b54d3975c74f7c24f9" target="_blank" rel="noopener"
>Kubernetes</a>. While we are <em>still</em> waiting for the <em>‘year of the Linux desktop’</em> to happen, Docker did certainly make more Windows and Mac users run a virtual Linux machine on their laptops than ever before.</p>
<p>The company Docker Inc was, from the start, a venture-funded endeavor centered around an <a class="link" href="https://en.wikipedia.org/wiki/Open-core_model" target="_blank" rel="noopener"
>open core model</a> and launched many closed-source products that drove revenue over the years. What used to be the core Docker software was renamed <em>Moby</em> in 2017, and that is where the open-source contributions (e.g., <a class="link" href="https://github.com/moby/moby/commit/b619220ce11770ffaea068b54d3975c74f7c24f9" target="_blank" rel="noopener"
>mine from 2015</a>) can be found. The founder <a class="link" href="https://twitter.com/solomonstre" target="_blank" rel="noopener"
>Solomon Hykes</a> no longer works for Docker Inc, and in recent years public sentiment around Docker has suffered due to various controversies. Yet at the same time, many similar (and some perhaps <em>better</em>) solutions have entered the space.</p>
<h2 id="what-actually-is-a-docker-container"><a href="#what-actually-is-a-docker-container" class="header-anchor"></a>What actually <em>is</em> a Docker container?
</h2><p>To build a container, a software developer first writes a <a class="link" href="https://docs.docker.com/engine/reference/builder/" target="_blank" rel="noopener"
>Dockerfile</a>, which defines what Linux distribution the container is based on along with what software and configuration files and data it has. Much of the <code>Dockerfile</code> contents are basically shell script.</p>
<p>The build is done with command <code>docker build</code>, which executes the contents of the <code>Dockerfile</code> line-by-line and creates a Linux-compatible root filesystem (files under <code>/</code>). This is done utilizing a clever overlay filesystem, where each line in the <code>Dockerfile</code> amounts to one new layer. Thus, rebuilds of the container do <em>not</em> need to rebuild the whole filesystem, but can just execute the <code>Dockerfile</code> lines that changed from the previous build.</p>
<p>On a typical Linux system, the filesystem layers after a <code>docker build</code> execution can be found at <code>/var/lib/docker/</code>. If the container was based on Debian, one could find, for example, the <code>apt-get</code> binary of the image at a path like <code>/var/lib/docker/overlay2/c1ead1[...]d04e06/diff/usr/bin/apt-get</code>.</p>
<p>Additionally, some metadata is created in the process, which designates among other things the <em>entrypoint</em> of the container — i.e. what binary on the root filesystem to run when starting the container.</p>
<h2 id="unpacking-a-container"><a href="#unpacking-a-container" class="header-anchor"></a>Unpacking a container
</h2><p>To inspect what the root filesystem of the Docker image <code>debian:sid</code> looks like, one could create a container and inspect the mounted merged filesystem:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">$ docker container create -i -t --name demo debian:sid
2734eb[...]d18852
$ cat /var/lib/docker/image/overlay2/layerdb/mounts/2734eb[...]d18852/mount-id
2854c7[...]9dfe25
$ find /var/lib/docker/overlay2/2854c7[...]9dfe25 | grep apt-get
/var/lib/docker/overlay2/2854c7[...]9dfe25/merged/usr/share/man/man8/apt-get.8.gz
/var/lib/docker/overlay2/2854c7[...]9dfe25/merged/usr/share/man/pt/man8/apt-get.8.gz
/var/lib/docker/overlay2/2854c7[...]9dfe25/merged/usr/bin/apt-get</code><pre><code>$ docker container create -i -t --name demo debian:sid
2734eb[...]d18852
$ cat /var/lib/docker/image/overlay2/layerdb/mounts/2734eb[...]d18852/mount-id
2854c7[...]9dfe25
$ find /var/lib/docker/overlay2/2854c7[...]9dfe25 | grep apt-get
/var/lib/docker/overlay2/2854c7[...]9dfe25/merged/usr/share/man/man8/apt-get.8.gz
/var/lib/docker/overlay2/2854c7[...]9dfe25/merged/usr/share/man/pt/man8/apt-get.8.gz
/var/lib/docker/overlay2/2854c7[...]9dfe25/merged/usr/bin/apt-get</code></pre></div>
<p>The command <a class="link" href="https://docs.docker.com/engine/reference/commandline/export/" target="_blank" rel="noopener"
>docker export</a> makes it easy to get the root filesystem into a tar package, for example.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">$ docker export demo > debian-sid.tar
$ tar xvf debian-sid.tar
.dockerenv
bin
boot/
dev/
dev/console
dev/pts/
dev/shm/
etc/
etc/.pwd.lock
etc/alternatives/
etc/alternatives/README
etc/alternatives/awk
...
var/spool/mail
var/tmp/
$ find . | grep apt-get
./usr/share/man/man8/apt-get.8.gz
./usr/bin/apt-get</code><pre><code>$ docker export demo > debian-sid.tar
$ tar xvf debian-sid.tar
.dockerenv
bin
boot/
dev/
dev/console
dev/pts/
dev/shm/
etc/
etc/.pwd.lock
etc/alternatives/
etc/alternatives/README
etc/alternatives/awk
...
var/spool/mail
var/tmp/
$ find . | grep apt-get
./usr/share/man/man8/apt-get.8.gz
./usr/bin/apt-get</code></pre></div>
<p>In theory, <strong>anything could create this root filesystem</strong>, and likewise anything starting could run a binary inside it — even the classic <a class="link" href="https://en.wikipedia.org/wiki/Chroot" target="_blank" rel="noopener"
>chroot</a>. If you edit the files and want to get them back into Docker to run as a container, <a class="link" href="https://docs.docker.com/engine/reference/commandline/import/" target="_blank" rel="noopener"
>docker import</a> makes it easy.</p>
<p>To export a full container image with both the root filesystem and the metadata, the <a class="link" href="https://docs.docker.com/build/exporters/oci-docker/" target="_blank" rel="noopener"
>docker buildx</a> command offers some output format options, such as the <a class="link" href="https://github.com/opencontainers/image-spec/blob/v1.0.2/image-layout.md" target="_blank" rel="noopener"
>Open Container Initiative standard format</a> or the <a class="link" href="https://github.com/moby/moby/blob/v24.0.5/image/spec/v1.2.md" target="_blank" rel="noopener"
>Docker native image format</a>. To import a full container image with metadata, refer to the <a class="link" href="https://docs.docker.com/engine/reference/commandline/load/" target="_blank" rel="noopener"
>docker load</a> command.</p>
<h2 id="orchestrating-a-container-start-with-dockerd-containerd-and-runc"><a href="#orchestrating-a-container-start-with-dockerd-containerd-and-runc" class="header-anchor"></a>Orchestrating a container start with dockerd, containerd and runc
</h2><p>In the above example, a container was created, but not <em>started</em>. To start a container, one can try running:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">$ docker run -it debian:sid bash
root@c9a8e6c222ae:/#</code><pre><code>$ docker run -it debian:sid bash
root@c9a8e6c222ae:/#</code></pre></div>
<p>From a user experience point of view, you are basically dropped into a Bash shell in a Debian Sid container. Under the hood, the <code>docker</code> command-line tool sends an HTTP request to the <code>dockerd</code> daemon running on the local system, which in turn asks <code>containerd</code> to run the container, which <em>in turn</em> starts <code>runc</code> directly or (due to backwards compatibility reasons) a <code>containerd-runc-shim</code>. Inside this one, you can find the actual running Bash binary:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">$ ps fax | grep -C container
1122 /usr/bin/containerd
1660 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
55409 /usr/bin/containerd-shim-runc-v2 -namespace moby -id c9a8e[..]0847e -address /run/containerd/containerd.sock
55428 \_ bash</code><pre><code>$ ps fax | grep -C container
1122 /usr/bin/containerd
1660 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
55409 /usr/bin/containerd-shim-runc-v2 -namespace moby -id c9a8e[..]0847e -address /run/containerd/containerd.sock
55428 \_ bash</code></pre></div>
<p>Anyway, if you’re fine with slightly <em>less</em> automation and having more of a “hands-on” experience, read the <a class="link" href="https://manpages.debian.org/unstable/runc/runc.8.en.html" target="_blank" rel="noopener"
>man page for runc</a> and try running the container directly with it.</p>
<h2 id="alternatives-in-the-linux-containers-stack"><a href="#alternatives-in-the-linux-containers-stack" class="header-anchor"></a>Alternatives in the Linux containers stack
</h2><p>The Linux Foundation has a nice architecture schema to illustrate the various components and alternatives in the stack that originally evolved from Docker:</p>
<p><img src="https://optimizedbyotto.com/post/linux-containers-docker/linux-container-stack-alternatives.png"
width="1200"
height="727"
srcset="https://optimizedbyotto.com/post/linux-containers-docker/linux-container-stack-alternatives_hu11125483801176160481.png 480w, https://optimizedbyotto.com/post/linux-containers-docker/linux-container-stack-alternatives.png 1200w"
loading="lazy"
alt="Linux containers architecture diagram from containerd.io"
class="gallery-image"
data-flex-grow="165"
data-flex-basis="396px"
>
</p>
<p>The <code>runc</code> is the <a class="link" href="https://en.wikipedia.org/wiki/Open_Container_Initiative" target="_blank" rel="noopener"
>OCI</a> reference implementation of their runtime specification. Popular alternatives to <code>runc</code> include: <a class="link" href="https://github.com/containers/crun" target="_blank" rel="noopener"
>crun</a> (implemented in C to be faster and use less memory than <code>runc</code>, which is in Go) and <a class="link" href="https://cri-o.io/" target="_blank" rel="noopener"
>CRI-O</a> (smaller and faster, with just enough features to be perfect for Kubernetes).</p>
<p>There are also container runtimes such as <a class="link" href="https://katacontainers.io/" target="_blank" rel="noopener"
>Kata</a> and <a class="link" href="https://www.zerovm.org/" target="_blank" rel="noopener"
>ZeroVM</a> based on the idea of running each container inside a minimal virtual machine, which achieve better isolation between the containers compared to running them directly on the same host. This design aims to hit a “sweet spot” between the optimized performance of lightweight containers and the security of traditional full virtual machines.</p>
<h2 id="podman"><a href="#podman" class="header-anchor"></a>Podman
</h2><p>Missing from the diagram above is the current major competitor, the Red Hat-sponsored <a class="link" href="https://github.com/containers" target="_blank" rel="noopener"
>Podman, which offers a complete replacement</a> for the whole Docker stack.</p>
<p>The command-line tool <code>podman</code> is designed to be a drop-in-replacement for <code>docker</code>, so one can run the earlier command examples by just changing the first word: <code>podman build ..</code>, <code>podman container create ...</code>, <code>podman export ..</code> and so forth. Even <code>podman volume prune --force && podman system prune --force</code> does exactly the same as the Docker equivalent — which is nice, as I tend to run that frequently to clean away containers and free disk space when I’m not actively using them.</p>
<p>To start a container one can run (for example):</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">$ podman run -it debian:sid bash
root@312cbccb5938:/#</code><pre><code>$ podman run -it debian:sid bash
root@312cbccb5938:/#</code></pre></div>
<p>When a container started like this is running, you would see in the process list something along the lines of:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">87524 \_ podman
99902 \_ /usr/libexec/podman/conmon --api-version 1 -c 312cbc[...]93a0e1 -u 312cbc[...]93a0e1 -r /usr/bin/crun
-b /home/otto/.local/share/containers/storage/overlay-containers/312cbc[...]93a0e1/userdata
-p /run/user/1001/containers/overlay-containers/312cbc[...]93a0e1/userdata/pidfile -n naughty_dewdney
--exit-dir /run/user/1001/libpod/tmp/exits --full-attach -l journald --log-level warning --runtime-arg
--log-format=json --runtime-arg --log
--runtime-arg=/run/user/1001/containers/overlay-containers/312cbc[...]93a0e1/userdata/oci-log -t
--conmon-pidfile /run/user/1001/containers/overlay-containers/312cbc[...]93a0e1/userdata/conmon.pid
--exit-command /usr/bin/podman --exit-command-arg --root
--exit-command-arg /home/otto/.local/share/containers/storage --exit-command-arg --runroot
--exit-command-arg /run/user/1001/containers --exit-command-arg --log-level --exit-command-arg warning
--exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir
--exit-command-arg /run/user/1001/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun
--exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend
--exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup
--exit-command-arg 312cbc[...]93a0e1
99905 \_ bash</code><pre><code>87524 \_ podman
99902 \_ /usr/libexec/podman/conmon --api-version 1 -c 312cbc[...]93a0e1 -u 312cbc[...]93a0e1 -r /usr/bin/crun
-b /home/otto/.local/share/containers/storage/overlay-containers/312cbc[...]93a0e1/userdata
-p /run/user/1001/containers/overlay-containers/312cbc[...]93a0e1/userdata/pidfile -n naughty_dewdney
--exit-dir /run/user/1001/libpod/tmp/exits --full-attach -l journald --log-level warning --runtime-arg
--log-format=json --runtime-arg --log
--runtime-arg=/run/user/1001/containers/overlay-containers/312cbc[...]93a0e1/userdata/oci-log -t
--conmon-pidfile /run/user/1001/containers/overlay-containers/312cbc[...]93a0e1/userdata/conmon.pid
--exit-command /usr/bin/podman --exit-command-arg --root
--exit-command-arg /home/otto/.local/share/containers/storage --exit-command-arg --runroot
--exit-command-arg /run/user/1001/containers --exit-command-arg --log-level --exit-command-arg warning
--exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir
--exit-command-arg /run/user/1001/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun
--exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend
--exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup
--exit-command-arg 312cbc[...]93a0e1
99905 \_ bash</code></pre></div>
<p>Unlike Docker, there is no <code>containerd</code> or <code>runc</code> at play, but instead <a class="link" href="https://manpages.debian.org/unstable/conmon/conmon.8.en.html" target="_blank" rel="noopener"
>conmon</a> runs <a class="link" href="https://manpages.debian.org/unstable/crun/crun.1.en.html" target="_blank" rel="noopener"
>crun</a>, which is the <em>actual</em> container runtime. Note also that the container runs with regular user permissions (no need for root) and that the default location for storing container images and other data is in <code>~/.local/share/containers/</code> in the user home directory.</p>
<h3 id="podman-desktop"><a href="#podman-desktop" class="header-anchor"></a>Podman desktop
</h3><p>While I <em>personally</em> prefer to work on the command-line, I need to give a shoutout to Podman for also having a nifty desktop application for those who prefer to use graphical tools:</p>
<p><img src="https://optimizedbyotto.com/post/linux-containers-docker/podman-desktop-demo.gif"
width="926"
height="597"
loading="lazy"
alt="Podman Desktop demo"
class="gallery-image"
data-flex-grow="155"
data-flex-basis="372px"
>
</p>
<h2 id="lxc-and-lxd"><a href="#lxc-and-lxd" class="header-anchor"></a>LXC and LXD
</h2><p>The basic utility of Linux containers is to give system administrators a building block which behaves a bit <em>like</em> a virtual machine in terms of being an encapsulated unit — <strong>but without being so slow and resource hungry as actual virtual machines!</strong> Although containers typically boast a full root filesystem, the Docker philosophy was that each container should run just <em>one</em> process — and run it <em>well</em> — and crucially, not have any process managers or init systems inside the container. Many system administrators, however, <em>do</em> in practice run Docker containers that use, as an example, <a class="link" href="https://en.wikipedia.org/wiki/Runit" target="_blank" rel="noopener"
>runit</a> to ‘boot’ the container and manage server daemon processes inside them.</p>
<p>The Canonical-backed <a class="link" href="https://ubuntu.com/lxd" target="_blank" rel="noopener"
>LXD</a> however tailors itself <em>specifically</em> for this type of use case, building upon LXC. After installing LXD and running <code>lxd init</code> to configure it, you can run full containerized operating systems with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-6"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-6" style="display:none;">$ lxc launch images:debian/sid demo
Creating demo
Starting demo
$ lxc exec demo -- bash
root@demo:~#</code><pre><code>$ lxc launch images:debian/sid demo
Creating demo
Starting demo
$ lxc exec demo -- bash
root@demo:~#</code></pre></div>
<p>The host process list will show something along the lines of:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-7"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-7" style="display:none;">root 105632 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root 105466 /bin/sh /snap/lxd/24061/commands/daemon.start
root 105645 \_ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
lxd 105975 \_ dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo
--pid-file= --no-ping --interface=lxdbr0 --dhcp-rapid-commit --quiet-dhcp --quiet-dhcp6
--quiet-ra --listen-address=10.199.145.1 --dhcp-no-override --dhcp-authoritative
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts
--dhcp-range 10.199.145.2,10.199.145.254,1h --listen-address=fd42:3147:bafe:37e5::1
--enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd
--interface-name _gateway.lxd,lxdbr0 -S /lxd/
--conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd -g lxd
...
root 107867 \_ /snap/lxd/current/bin/lxd forkexec demo /var/snap/lxd/common/lxd/containers
/var/snap/lxd/common/lxd/logs/demo/lxc.conf 0 0 0
-- env PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOME=/root
USER=root LANG=C.UTF-8 TERM=xterm-256color
-- cmd bash
1000000 107870 \_ bash
root 106209 [lxc monitor] /var/snap/lxd/common/lxd/containers demo
1000000 106221 \_ /sbin/init
1000000 106372 \_ /lib/systemd/systemd-journald
1000000 106401 \_ /lib/systemd/systemd-udevd
1000997 106420 \_ /lib/systemd/systemd-resolved
1000100 106431 \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
--syslog-only
1000000 106433 \_ /lib/systemd/systemd-logind
1000000 106436 \_ /sbin/agetty -o -p -- \u --noclear --keep-baud - 115200,38400,9600 linux
1000998 106446 \_ /lib/systemd/systemd-networkd</code><pre><code>root 105632 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root 105466 /bin/sh /snap/lxd/24061/commands/daemon.start
root 105645 \_ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
lxd 105975 \_ dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo
--pid-file= --no-ping --interface=lxdbr0 --dhcp-rapid-commit --quiet-dhcp --quiet-dhcp6
--quiet-ra --listen-address=10.199.145.1 --dhcp-no-override --dhcp-authoritative
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts
--dhcp-range 10.199.145.2,10.199.145.254,1h --listen-address=fd42:3147:bafe:37e5::1
--enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd
--interface-name _gateway.lxd,lxdbr0 -S /lxd/
--conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd -g lxd
...
root 107867 \_ /snap/lxd/current/bin/lxd forkexec demo /var/snap/lxd/common/lxd/containers
/var/snap/lxd/common/lxd/logs/demo/lxc.conf 0 0 0
-- env PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOME=/root
USER=root LANG=C.UTF-8 TERM=xterm-256color
-- cmd bash
1000000 107870 \_ bash
root 106209 [lxc monitor] /var/snap/lxd/common/lxd/containers demo
1000000 106221 \_ /sbin/init
1000000 106372 \_ /lib/systemd/systemd-journald
1000000 106401 \_ /lib/systemd/systemd-udevd
1000997 106420 \_ /lib/systemd/systemd-resolved
1000100 106431 \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
--syslog-only
1000000 106433 \_ /lib/systemd/systemd-logind
1000000 106436 \_ /sbin/agetty -o -p -- \u --noclear --keep-baud - 115200,38400,9600 linux
1000998 106446 \_ /lib/systemd/systemd-networkd</code></pre></div>
<p>Notice how the daemon runs as root (and interacting with lxd/lxc requires root permissions). However, thanks to UID mapping, the root user inside the container is <em>not</em> a root user as found on the host system. This is one of the key design differences — and <em>why LXD is considered more secure than Docker</em>.</p>
<p>The downloaded root filesystems are stored at <code>/var/snap/lxd/common/lxd/images/</code> while the filesystems of running containers can be found at <code>/var/snap/lxd/common/lxd/storage-pools/default/containers/</code> as long as the LXD storage is directory-based (as opposed to a LVM or OpenZFS pool).</p>
<p>The examples above all have <code>snap</code> in their path, as there is no native Ubuntu package for LXD…but it forces users to install a Snap even when running <code>apt install lxd</code>.</p>
<p>As <code>lxd</code> controls the whole system, the command for managing individual containers is <code>lxc</code>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-8"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-8" style="display:none;">$ lxc list
+------+---------+---------------------+-----------------------------------------------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+---------------------+-----------------------------------------------+-----------+-----------+
| demo | RUNNING | 10.199.145.6 (eth0) | fd42:3147:bafe:37e5:216:3eff:fe01:8da8 (eth0) | CONTAINER | 0 |
+------+---------+---------------------+-----------------------------------------------+-----------+-----------+
$ lxc delete demo --force</code><pre><code>$ lxc list
+------+---------+---------------------+-----------------------------------------------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+---------------------+-----------------------------------------------+-----------+-----------+
| demo | RUNNING | 10.199.145.6 (eth0) | fd42:3147:bafe:37e5:216:3eff:fe01:8da8 (eth0) | CONTAINER | 0 |
+------+---------+---------------------+-----------------------------------------------+-----------+-----------+
$ lxc delete demo --force</code></pre></div>
<p>Ergo, the process of creating LXC compatible container images is fairly simple. One can use any container builder to create the root filesystem (the <a class="link" href="https://ubuntu.com/tutorials/create-custom-lxd-images#3-creating-basic-system-installation" target="_blank" rel="noopener"
>LXC docs recommend using deboostrap</a> directly), and the basic metadata yaml file is so brief, it can be written manually. These are then imported to LXC with <code>lxc image import metadata.tar.gz rootfs.tar.gz --alias demo</code>.</p>
<p>The whole LXD stack ships with integrated tooling — even offering metal-as-a-service capabilities (MAAS) — so it goes <em>way beyond</em> what the Docker stack has.</p>
<h2 id="so-where-are-we-headed"><a href="#so-where-are-we-headed" class="header-anchor"></a>So, where are we headed?
</h2><p>To fully grasp how containers actually work, you should read the Linux kernel documentation on <a class="link" href="https://man7.org/linux/man-pages/man7/namespaces.7.html" target="_blank" rel="noopener"
>namespaces</a> and permission control via <a class="link" href="https://man7.org/linux/man-pages/man7/capabilities.7.html" target="_blank" rel="noopener"
>capabilities</a>. Keeping an eye on the progress of the <a class="link" href="https://opencontainers.org/" target="_blank" rel="noopener"
>Open Container Initiative</a> will keep you right on top of the latest developments, and considering OCI compatibility in your infrastructure will enable you to migrate between Docker, Podman, and LXD easily.</p>
<p>Choosing the right container technology to use depends on <em>where</em> you intend to ship your containers. For developers targeting Kubernetes compatible production environments, Podman probably makes the most sense at the moment. Or, if your infrastructure consists of a lot of virtualized Ubuntu hosts and you want to have more flexibility, LXD is probably a good choice.</p>
<p>Podman is certainly gaining a lot of popularity <a class="link" href="https://trends.google.com/trends/explore/TIMESERIES/1693782600?hl=en-US&tz=420&date=today+5-y&hl=en-CA&q=%2Fg%2F11j4j_npvw,lxc&sni=3" target="_blank" rel="noopener"
>according to Google Trends</a>. Docker will, however, continue to have the largest mindshare among average developers for years to come. For now, my recommendation is for all systems administrators and software architects to try and understand how these tools you <em>rely on</em> actually work — <em>by getting your hands dirty with them</em>. Choose the solutions you understand best, and keep an eye on the horizon for what’s coming next!</p> The optimal home office https://optimizedbyotto.com/post/optimal-home-office-setup/Mon, 17 Apr 2023 00:00:00 +0000 https://optimizedbyotto.com/post/optimal-home-office-setup/ <img src="https://optimizedbyotto.com/post/optimal-home-office-setup/featured-home-office.jpg" alt="Featured image of post The optimal home office" /><p>The perfect home office setup achieves two things: it helps you <strong>stay focused</strong> for extended periods, allowing you to be in “the flow” and it prioritizes ergonomic design to ensure that long hours at the computer <strong>don’t compromise your health</strong>.</p>
<h2 id="optimal-ergonomics"><a href="#optimal-ergonomics" class="header-anchor"></a>Optimal ergonomics
</h2><p>I’ve spent a lot of time and thought creating my personal setup so that it optimizes home office efficiency and inspiration. Here’s a glimpse into my workspace:</p>
<ul>
<li>
<p>At the center of my workspace is an <a class="link" href="https://www.amazon.ca/gp/product/B07SPHL3HF" target="_blank" rel="noopener"
>extra wide, curved monitor</a>. It sits on an elevated shelf, at eye level. This positioning keeps my neck straight, reducing tension in my shoulders.</p>
</li>
<li>
<p>The heart of my setup is an <a class="link" href="https://www.amazon.ca/gp/product/B08QJLCT7D" target="_blank" rel="noopener"
>adjustable electric desk</a>. I can set it to the perfect height, so arms maintain a comfortable 90-degree angle. This eliminates slouching, saving my lower back from unnecessary stress.</p>
</li>
<li>
<p>With my desk is a comfortable chair, adjustable for the most optimal positions. The design ensures my feet and waist maintain the right angles. Plus, its wheels offer the flexibility to slide it away when I prefer standing.</p>
</li>
<li>
<p>At my fingertips is a wireless mouse and an <a class="link" href="https://www.amazon.ca/Microsoft-A11-00337-Natural-Keyboard-Elite/dp/B0000642RX" target="_blank" rel="noopener"
>ergonomic split keyboard</a>. This design allows my wrists to rest, and my elbows to angle outward at a comfy 45 degrees, reducing wrist pain or carpal tunnel syndrome.</p>
</li>
<li>
<p>I keep my workspace feeling fresh and at a cool, consistent temp with an air conditioner. Additionally, a higher ceiling ensures ample airflow, while my laptop sits on a <a class="link" href="https://www.amazon.ca/Portable-Laptop-Cooling-Powered-Support/dp/B07WC31MHQ" target="_blank" rel="noopener"
>cooling fan</a>.</p>
</li>
<li>
<p>Positioned above the keyboard and behind the monitor is a strategically placed <a class="link" href="https://www.amazon.ca/gp/product/B08HMLKS2N" target="_blank" rel="noopener"
>light source</a>. This minimizes eye strain and can also be used to cast a flattering light on me during video calls. Speaking of which…</p>
</li>
<li>
<p>My <a class="link" href="https://www.amazon.ca/MEE-audio-1080p-Webcam-Light/dp/B08PYWL6T6" target="_blank" rel="noopener"
>external high-resolution camera equipped with a microphone</a> and supplemental lighting, so I’m ready to take video calls. It’s easy to adjust, so I can look my best on screen.</p>
</li>
</ul>
<p>As you can see in the pictures, my office doesn’t have windows. A distant view can provide a restful break for the eyes, so having windows would be good, but natural light isn’t always optimal. A room illuminated solely by engineered light ensures I always have optimal conditions even into the wee hours, as I tend to be a night owl. To avoid disrupting your <a class="link" href="https://en.wikipedia.org/wiki/Circadian_rhythm" target="_blank" rel="noopener"
>circadian rhythm</a>, remember to use display settings that automatically decrease emitted blue light in the evenings.</p>
<p><img src="https://optimizedbyotto.com/post/optimal-home-office-setup/home-office-overview.jpg"
width="1200"
height="1600"
srcset="https://optimizedbyotto.com/post/optimal-home-office-setup/home-office-overview_hu12672435799749552238.jpg 480w, https://optimizedbyotto.com/post/optimal-home-office-setup/home-office-overview.jpg 1200w"
loading="lazy"
alt="Home office (click for larger image)"
class="gallery-image"
data-flex-grow="75"
data-flex-basis="180px"
>
<img src="https://optimizedbyotto.com/post/optimal-home-office-setup/home-office-drawer.jpg"
width="1200"
height="900"
srcset="https://optimizedbyotto.com/post/optimal-home-office-setup/home-office-drawer_hu15858657560502272036.jpg 480w, https://optimizedbyotto.com/post/optimal-home-office-setup/home-office-drawer.jpg 1200w"
loading="lazy"
alt="Desk drawer and built-in USB ports"
class="gallery-image"
data-flex-grow="133"
data-flex-basis="320px"
>
<img src="https://optimizedbyotto.com/post/optimal-home-office-setup/home-office-air-conditioning.jpg"
width="1200"
height="1600"
srcset="https://optimizedbyotto.com/post/optimal-home-office-setup/home-office-air-conditioning_hu4667689472114826547.jpg 480w, https://optimizedbyotto.com/post/optimal-home-office-setup/home-office-air-conditioning.jpg 1200w"
loading="lazy"
alt="Air conditioning"
class="gallery-image"
data-flex-grow="75"
data-flex-basis="180px"
>
</p>
<p><img src="https://optimizedbyotto.com/post/optimal-home-office-setup/laptop-cooler.jpg"
width="915"
height="791"
srcset="https://optimizedbyotto.com/post/optimal-home-office-setup/laptop-cooler_hu4473267889152095994.jpg 480w, https://optimizedbyotto.com/post/optimal-home-office-setup/laptop-cooler.jpg 915w"
loading="lazy"
alt="Elevated laptop stand with cooling fan"
class="gallery-image"
data-flex-grow="115"
data-flex-basis="277px"
>
<img src="https://optimizedbyotto.com/post/optimal-home-office-setup/quintis-monitor-lamp.jpg"
width="1462"
height="1225"
srcset="https://optimizedbyotto.com/post/optimal-home-office-setup/quintis-monitor-lamp_hu4715146806035682769.jpg 480w, https://optimizedbyotto.com/post/optimal-home-office-setup/quintis-monitor-lamp_hu11863886389729882159.jpg 1024w, https://optimizedbyotto.com/post/optimal-home-office-setup/quintis-monitor-lamp.jpg 1462w"
loading="lazy"
alt="Quntis computer monitor light"
class="gallery-image"
data-flex-grow="119"
data-flex-basis="286px"
>
<img src="https://optimizedbyotto.com/post/optimal-home-office-setup/MEE-CL8A-webcam.jpg"
width="630"
height="630"
srcset="https://optimizedbyotto.com/post/optimal-home-office-setup/MEE-CL8A-webcam_hu13311346569840562195.jpg 480w, https://optimizedbyotto.com/post/optimal-home-office-setup/MEE-CL8A-webcam.jpg 630w"
loading="lazy"
alt="MEE CL8A HD webcam with detachable tripod"
class="gallery-image"
data-flex-grow="100"
data-flex-basis="240px"
>
</p>
<h2 id="cable-management-and-docking"><a href="#cable-management-and-docking" class="header-anchor"></a>Cable management and docking
</h2><p>Ensuring cables are arranged efficiently, and accessibly, guarantees not just an aesthetically pleasing office but a functional one, too. Hidden from view (not pictured) is a metallic shelf beneath the desk with an extension cord boasting multiple electrical sockets and USB ports. This solitary extension powers the entire desk setup, with just a singular cord discreetly extending to the wall. Its design is flexible, moving in harmony as I adjust the desk’s height. A one-button power-off feature on the extension cord is particularly handy when I’m about to leave for a trip and want to quickly shut everything off.</p>
<p>My monitor isn’t just for visuals — it’s also a hub. Equipped with USB ports, it connects my keyboard, mouse, and monitor light. Docking my laptop is a breeze. A <strong>single USB-C cable</strong> handles both charging and connectivity to all peripheral devices (monitor, keyboard and mouse).</p>
<p>Integrated within my desk are two USB ports. One port for a USB-A to multi-device converter which charges essentials like my phone and headset. Despite having numerous USB-powered gadgets like my laptop cooler, watch winder, and LED light strip on the table, I always have ports at the ready for anything else that needs a charge.</p>
<h2 id="staying-in-the-flow"><a href="#staying-in-the-flow" class="header-anchor"></a>Staying in “the flow”
</h2><p>Harnessing the elusive state of ‘flow’ is the holy grail of productivity for those engaged in creative endeavors. It’s that state of immersive focus where hours can feel like mere minutes and often where our most significant accomplishments emerge. Central to this deep concentration is the environment in which I work.</p>
<p>The foundation of my productivity temple is its isolation — a dedicated room free from distractions. This tranquility, essential for anyone looking to dive deep into their work, forms the cornerstone of my home office. I’ve chosen background lighting that sets the right mood to help you get ideas. Every item on my desk has been chosen intentionally and there is nothing extra. <strong>A clutter-free workspace translates to a clutter-free mind.</strong> With integrated USB outlets, a comprehensive cable management system, and a single power cord, I ensure my physical space remains pristine. A concealed drawer further houses miscellaneous items, keeping them out of sight but within reach.</p>
<p>The acoustics of my setup is the only thing that needs more work. The rug behind my computer and the carpet on the floor help cut down on echoes, but I want to add acoustic panels to further minimize the echo. Next time I can choose the room layout, I will try to avoid a wall behind my screen and narrow walls as it creates echo too easily.</p>
<p>Over time, I tried integrating scheduled breaks into my routine, using apps like <a class="link" href="https://workrave.org/" target="_blank" rel="noopener"
>Workrave</a> designed to prompt hourly pauses. However, these frequent interruptions shattered my flow state more than they helped. So, I’ve shifted my strategy. Instead of disrupting the rhythm, I ensure the ergonomics of my space support extended hours of comfortable, strain-free work.</p>
<h2 id="optimized-breaks"><a href="#optimized-breaks" class="header-anchor"></a>Optimized breaks
</h2><p>Thanks to having my laptop docked with a single USB-C cable, it is convenient to unplug and transition from a stationary work setting to a mobile one, be it another spot in my home or stepping out for a change of scenery.</p>
<p>The audio experience shouldn’t be tethered either. Investing in a high-quality <a class="link" href="https://www.amazon.ca/Sony-WH-CH720N-Cancelling-Headphones-Microphone/dp/B0BS1QCFHX" target="_blank" rel="noopener"
>Bluetooth headset equipped with noise-cancelling</a> capabilities is invaluable. While it’s an obvious tool during travel or cafe-work sessions, its utility extends to the home office too. With it, I can freely roam my living space during remote meetings, providing a much-needed break from prolonged sitting. </p>
<p>The importance of physical activity during break times can’t be overstated. Interspersing work hours with moments of physical exertion, whether through pull-ups or exercises with <a class="link" href="https://www.amazon.ca/gp/product/B08CMZMFCS" target="_blank" rel="noopener"
>parallettes</a>, combats desk-bound fatigue and invigorates the mind.</p>
<h2 id="working-out-of-office"><a href="#working-out-of-office" class="header-anchor"></a>Working out-of-office
</h2><p>One of the healthiest things we can do is keep active and go for a walk. On occasion, I’ve joined meetings with my headset on, walking outside, absorbing the content of the conversation and the rejuvenating effects of nature. Even though this may be limited to meetings that don’t require my visual attention, I find myself reenergized and coming to projects with fresh perspectives.</p>
<p>I can envision a not-so-distant future where technology may also play a role in this. Imagine augmented reality goggles paired with headsets that allow you to both view and participate in presentations as you walk a woodland path. Visualize crafting emails or designing projects using intuitive hand gestures while you’re perched on a park bench. This immersive, tech-driven experience could redefine our notion of a ‘workspace,’ transforming the outdoors into a potential office.</p>
<p>As technology continues to evolve and offers us new ways to work, the value of a dedicated workspace — especially a home office setup — remains undeniable. While the boundaries of ‘workspaces’ may expand and become more fluid in the future, the significance of a well-optimized home office is an investment that will stand the test of time. Here’s to many more productive days and better work habits!</p> How to make a good git commit https://optimizedbyotto.com/post/good-git-commit/Sun, 26 Mar 2023 00:00:00 +0000 https://optimizedbyotto.com/post/good-git-commit/ <img src="https://optimizedbyotto.com/post/good-git-commit/git-citool-example.png" alt="Featured image of post How to make a good git commit" /><p>As a software developer, your core skill is how to improve an existing code base to make the software better iteratively, <strong>patch by patch</strong>.</p>
<p>To be a good software developer you need to:</p>
<ul>
<li>understand the concept of a code patch,</li>
<li>know how to do code improvements in well sized and properly documented patches, and</li>
<li>skillfully use <a class="link" href="https://git-scm.com/" target="_blank" rel="noopener"
>git version control software</a> to manage patches.</li>
</ul>
<h2 id="what-is-a-patch"><a href="#what-is-a-patch" class="header-anchor"></a>What is a patch?
</h2><p>A <strong>patch defines the changes</strong> to be made to the code base. It is basically a list of code lines to be added, removed or modified in a code base. Each patch always also has an <strong>author</strong>, a <strong>timestamp</strong> when it was written, a <strong>title</strong> that describes it and a longer text body that <strong>explains why</strong> this particular patch is good and applying it on the code base is beneficial.</p>
<p>Example:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">patch</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">Author: Otto Kekäläinen
Date: June 22nd, 2022 08:08:08
Make output friendlier for users
Add line break so text is readable and add a 2 second delay between
messages so it does not scroll too fast.
--- a/demo.c
+++ b/demo.c
@@ -8,7 +8,8 @@ int main()
{
for(;;)
{
- printf("Hello world!");
+ printf("Hello world!\n");
+ sleep(2);
}
return 0;
}</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-patch" data-lang="patch"><span style="display:flex;"><span>Author: Otto Kekäläinen
</span></span><span style="display:flex;"><span>Date: June 22nd, 2022 08:08:08
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Make output friendlier for users
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Add line break so text is readable and add a 2 second delay between
</span></span><span style="display:flex;"><span>messages so it does not scroll too fast.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">--- a/demo.c
</span></span></span><span style="display:flex;"><span><span style="color:#f92672"></span><span style="color:#a6e22e">+++ b/demo.c
</span></span></span><span style="display:flex;"><span><span style="color:#a6e22e"></span><span style="color:#75715e">@@ -8,7 +8,8 @@ int main()
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> {
</span></span><span style="display:flex;"><span> for(;;)
</span></span><span style="display:flex;"><span> {
</span></span><span style="display:flex;"><span><span style="color:#f92672">- printf("Hello world!");
</span></span></span><span style="display:flex;"><span><span style="color:#f92672"></span><span style="color:#a6e22e">+ printf("Hello world!\n");
</span></span></span><span style="display:flex;"><span><span style="color:#a6e22e">+ sleep(2);
</span></span></span><span style="display:flex;"><span><span style="color:#a6e22e"></span> }
</span></span><span style="display:flex;"><span> return 0;
</span></span><span style="display:flex;"><span> }
</span></span></code></pre></div></div></div>
<h2 id="how-to-make-a-patch"><a href="#how-to-make-a-patch" class="header-anchor"></a>How to make a patch
</h2><p>You can make a patch by simply copying a file, changing something in it, and then comparing the copy to the original file using the <a class="link" href="https://manpages.debian.org/unstable/diffutils/diff.1.en.html" target="_blank" rel="noopener"
>command diff</a> and saving the output.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">$ cp demo.c demo.c.orig
$ nano demo.c
$ diff -u demo.c.orig demo.c > demo.patch
$ cat demo.patch</code><pre><code>$ cp demo.c demo.c.orig
$ nano demo.c
$ diff -u demo.c.orig demo.c > demo.patch
$ cat demo.patch</code></pre></div>
<div class="codeblock ">
<header>
<span class="codeblock-lang">patch</span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">--- demo.c.orig
+++ demo.c
@@ -8,7 +8,8 @@ int main()
{
for(;;)
{
- printf("Hello world!");
+ printf("Hello world!\n");
+ sleep(2);
}
return 0;
}</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-patch" data-lang="patch"><span style="display:flex;"><span><span style="color:#f92672">--- demo.c.orig
</span></span></span><span style="display:flex;"><span><span style="color:#f92672"></span><span style="color:#a6e22e">+++ demo.c
</span></span></span><span style="display:flex;"><span><span style="color:#a6e22e"></span><span style="color:#75715e">@@ -8,7 +8,8 @@ int main()
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> {
</span></span><span style="display:flex;"><span> for(;;)
</span></span><span style="display:flex;"><span> {
</span></span><span style="display:flex;"><span><span style="color:#f92672">- printf("Hello world!");
</span></span></span><span style="display:flex;"><span><span style="color:#f92672"></span><span style="color:#a6e22e">+ printf("Hello world!\n");
</span></span></span><span style="display:flex;"><span><span style="color:#a6e22e">+ sleep(2);
</span></span></span><span style="display:flex;"><span><span style="color:#a6e22e"></span> }
</span></span><span style="display:flex;"><span> return 0;
</span></span><span style="display:flex;"><span> }
</span></span></code></pre></div></div></div>
<p>The patch can be sent by email or uploaded somewhere. After that, anybody can download the patch, read it, and apply it to their copy of the code base using the <a class="link" href="https://manpages.debian.org/unstable/patch/patch.1.en.html" target="_blank" rel="noopener"
>command patch</a>.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">$ grep Hello demo.c
printf("Hello world!");
$ curl -O https://…/demo.patch
$ patch -p0 < demo.patch
patching file demo.c
$ grep Hello demo.c
printf("Hello world!\n");</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ grep Hello demo.c
</span></span><span style="display:flex;"><span> printf<span style="color:#f92672">(</span><span style="color:#e6db74">"Hello world!"</span><span style="color:#f92672">)</span>;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>$ curl -O https://…/demo.patch
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>$ patch -p0 < demo.patch
</span></span><span style="display:flex;"><span>patching file demo.c
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>$ grep Hello demo.c
</span></span><span style="display:flex;"><span> printf<span style="color:#f92672">(</span><span style="color:#e6db74">"Hello world!\n"</span><span style="color:#f92672">)</span>;</span></span></code></pre></div></div></div>
<p>As this is not very fast nor convenient, software developers like to use git, a version control software that automates all of this. In git, we tend to talk about git <strong>commits</strong>, which basically just means a <strong>patch that has been applied on a code base</strong>.</p>
<h2 id="examples-of-good-git-commit-messages"><a href="#examples-of-good-git-commit-messages" class="header-anchor"></a>Examples of good git commit messages
</h2><p>A good git commit message typically has these characteristics (adapted from the <a class="link" href="https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines" target="_blank" rel="noopener"
>Git Pro book</a>):</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">Capitalized, short summary of what the change is
More detailed explanatory text that focuses on the 'why' to motivate
the change. Use present tense and imperative format (write "Fix bug",
not "Fixed bug"). Wrap it to about 72 characters or so. The blank line
separating the summary from the body is critical.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet, followed by a
single space, with blank lines in between, but conventions vary here
- Use a hanging indent</code><pre><code>Capitalized, short summary of what the change is
More detailed explanatory text that focuses on the 'why' to motivate
the change. Use present tense and imperative format (write "Fix bug",
not "Fixed bug"). Wrap it to about 72 characters or so. The blank line
separating the summary from the body is critical.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet, followed by a
single space, with blank lines in between, but conventions vary here
- Use a hanging indent</code></pre></div>
<p>Here are a couple of real-world examples in pure text form:</p>
<p>From <a class="link" href="https://github.com/MariaDB/server/commit/ff1d8fa7b0fe473a6bacd23ac553e711b3a11032" target="_blank" rel="noopener"
>MariaDB@ff1d8fa7</a>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">Deb: Clean away Buster to Bookworm upgrade tests in Salsa-CI
Upgrades from Debian 10 "Buster" directly to Debian 12 "Bookworm",
skipping Debian 11 "Bullseye", fail with apt erroring on:
libcrypt.so.1: cannot open shared object file
This is an intentional OpenSSL transition as described in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993755</code><pre><code>Deb: Clean away Buster to Bookworm upgrade tests in Salsa-CI
Upgrades from Debian 10 "Buster" directly to Debian 12 "Bookworm",
skipping Debian 11 "Bullseye", fail with apt erroring on:
libcrypt.so.1: cannot open shared object file
This is an intentional OpenSSL transition as described in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993755</code></pre></div>
<p>From <a class="link" href="https://github.com/MariaDB/server/commit/2c5294142382469a9ad48c44979b7fcb7b146417" target="_blank" rel="noopener"
>MariaDB@2c529441</a>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-6"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-6" style="display:none;">Deb: Run wrap-and-sort -av
Sort and organize the Debian packaging files.
Also revert 4d03269 that was done in vain.</code><pre><code>Deb: Run wrap-and-sort -av
Sort and organize the Debian packaging files.
Also revert 4d03269 that was done in vain.</code></pre></div>
<h2 id="five-requirements-for-a-good-git-commit"><a href="#five-requirements-for-a-good-git-commit" class="header-anchor"></a>Five requirements for a good git commit
</h2><p>In order of importance:</p>
<h3 id="1-commits-should-be-atomic"><a href="#1-commits-should-be-atomic" class="header-anchor"></a>1. Commits should be <em>atomic</em>
</h3><p>The first and most important thing about <strong>a good patch or a commit</strong> is that it <strong>should be a <a class="link" href="https://git-scm.com/docs/gitworkflows#_separate_changes" target="_blank" rel="noopener"
>self-standing change</a></strong>. If a commit fixes a bug, it should not at the same time add a new feature or fix some other completely unrelated bug, otherwise it is not <em>atomic</em>. If you add a new feature, the same commit should ideally also add automatic tests for the feature to ensure it won’t regress, and the same commit should update the documentation to mention the feature, as it is all related and should either go or not go into the code base along with the feature itself.</p>
<p>If your changes are not properly scoped and self-standing, you might end up in a situation later on where somebody decides to revert or reject the commit that introduced a new feature, but miss removing the tests or documentation about it, which would not have happened if they were added in separate commits.</p>
<p>There is no clear rule on what is the optimal scope for a commit; it is something you will learn by experience. Sometimes it makes sense to have several separate changes in one single commit simply because of each one of them being so small. In other cases, one single logical change might span multiple commits, because it was perhaps clearer to move or rename files in one commit and then update their commits in another. This is something you will just have to learn over time as you become a more experienced software developer.</p>
<h3 id="2-the-title-should-be-descriptive-yet-terse-and-not-too-long"><a href="#2-the-title-should-be-descriptive-yet-terse-and-not-too-long" class="header-anchor"></a>2. The title should be descriptive, yet terse and not too long
</h3><p>A title starts with a capital letter and has no trailing dot. Just like the subject line in an email. The title should make sense when read in a list of commits. If the title is too long, it will be cut off. A limit of 72 characters is safe to all typical places where people will be reading it, such as in a terminal window or when browsing GitHub or GitLab, but striving for under 50 characters is even better.</p>
<h3 id="3-the-commit-message-should-explain-why-it-was-made"><a href="#3-the-commit-message-should-explain-why-it-was-made" class="header-anchor"></a>3. The commit message should explain <em>why</em> it was made
</h3><p>The text should be verbose enough for anybody reviewing the commit to understand <strong>why</strong> it was made, and to be convinced that the change is good. Every commit must have a text body, even if it is very short. This forces the author to spend a few seconds thinking about the change before committing.</p>
<p>Note that the commit message is about the change itself, so it should answer the question ‘why’. If you want to explain how a certain line of code works, simply use an inline comment next to the code itself. That way, the <em>documentation is in the correct context</em>. <strong>The git commit description should have just a tiny bit of ‘what’ and ‘how’, and mostly focus on the ‘why’.</strong></p>
<p>The commit body should be wrapped at about 72 characters. Proper use of empty lines and lists that are indented with a dash or star makes the body more readable.</p>
<p>Remember to use imperative format. Don’t write <em>Fixed bug</em> or <em>Added feature</em>. Instead write <em>Fix bug</em> or <em>Add feature</em>. The patch hasn’t added or fixed anything at the time you wrote it. Think about it like an order you give to the code base to start following. Also, try to keep your text in the current tense and passive format. Don’t write <em>This commit makes X</em> but simply <em>Make X</em>. Don’t write <em>I changed Y</em> but just simply <em>Change Y</em>.</p>
<h3 id="4-use-references-when-available"><a href="#4-use-references-when-available" class="header-anchor"></a>4. Use references when available
</h3><p>If your code change is related to a previous commit, mention the commit ID. In most software commits, IDs will automatically become links. If the code change is related to something that was discussed or tracked elsewhere, please include the bug tracker ID or a URL to the discussion. However, the reference alone does not remove the need to write a git commit message. You cannot expect that somebody reading your commit has time or even access to open and read all references - use them only as pointers for more information.</p>
<p>Viewing one of the earlier git commit message examples in <a class="link" href="https://git-scm.com/docs/gitk" target="_blank" rel="noopener"
>gitk</a>, <a class="link" href="https://gitlab.com/ottok/mariadb/-/commit/2c5294142382469a9ad48c44979b7fcb7b146417" target="_blank" rel="noopener"
>GitLab</a> and <a class="link" href="https://github.com/MariaDB/server/commit/ff1d8fa7b0fe473a6bacd23ac553e711b3a11032" target="_blank" rel="noopener"
>GitHub</a> illustrates how a git commit automatically becomes a link:</p>
<p><img src="https://optimizedbyotto.com/post/good-git-commit/good-git-commit-gitk-gitlab-github.png"
width="1440"
height="424"
srcset="https://optimizedbyotto.com/post/good-git-commit/good-git-commit-gitk-gitlab-github_hu15316212958483157633.png 480w, https://optimizedbyotto.com/post/good-git-commit/good-git-commit-gitk-gitlab-github_hu17365610380609069757.png 1024w, https://optimizedbyotto.com/post/good-git-commit/good-git-commit-gitk-gitlab-github.png 1440w"
loading="lazy"
alt="Same commit in gitk, GitLab and GitHub"
class="gallery-image"
data-flex-grow="339"
data-flex-basis="815px"
>
</p>
<h3 id="5-maintain-correct-authorship-and-copyright-credits"><a href="#5-maintain-correct-authorship-and-copyright-credits" class="header-anchor"></a>5. Maintain correct authorship and copyright credits
</h3><p>The author name and timestamp are automatic if you configure your git correctly, so this should be a non-issue. If you neglect to configure git with your real name and email, you will be muddling the waters for anybody who later wants to verify something about authorship. In the worst case scenario, all your commits might be purged from the git repository due to unclear copyright.</p>
<p>Also keep in mind that if you commit code on behalf of somebody else, you must tell git that the author for a particular commit was somebody else and you only committed it. Read up on <a class="link" href="https://git-scm.com/docs/git-commit#Documentation/git-commit.txt---authorltauthorgt" target="_blank" rel="noopener"
>git commit –author</a> for details.</p>
<h2 id="the-right-tools-make-git-commits-easy"><a href="#the-right-tools-make-git-commits-easy" class="header-anchor"></a>The right tools make git commits easy
</h2><p>Using a good tool to craft your git commits goes a long way in making the commit flawless.</p>
<p>My personal choice is <a class="link" href="https://github.com/prati0100/git-gui/" target="_blank" rel="noopener"
>git-citool</a>, which is distributed together with git itself, so anybody can use it on any operating system. It does not use the native graphics of each operating system, but a cross-platform graphics library, which may look a bit ugly. It is, however, very easy and convenient to use, so I love it.</p>
<p>To make a new commit, simply run <code>git citool</code>. It starts off empty and then you can select which files you want to stage, and write the git commit message in this box, and press commit. Super easy, and it is very clear what changes you are committing.</p>
<p><img src="https://optimizedbyotto.com/post/good-git-commit/git-citool-example.png"
width="1317"
height="617"
srcset="https://optimizedbyotto.com/post/good-git-commit/git-citool-example_hu15856864830268359891.png 480w, https://optimizedbyotto.com/post/good-git-commit/git-citool-example_hu18299217684725536277.png 1024w, https://optimizedbyotto.com/post/good-git-commit/git-citool-example.png 1317w"
loading="lazy"
alt="git citool example"
class="gallery-image"
data-flex-grow="213"
data-flex-basis="512px"
>
</p>
<h2 id="dont-settle-for-bad-commits---amend-them"><a href="#dont-settle-for-bad-commits---amend-them" class="header-anchor"></a>Don’t settle for bad commits - amend them
</h2><p>If you are not happy with your commit and want to edit it, or to use git terminology you want to “amend”, this is possible only for the topmost git commit that does not have any child commits yet. Run <code>git-citool --amend</code>.</p>
<p>Here you can see a git commit that is really bad, so it really needs to be fixed. However, with git-citool it is easy and fast.</p>
<h2 id="wip-commits-how-to-avoid-postponing-writing-the-perfect-git-commit-message"><a href="#wip-commits-how-to-avoid-postponing-writing-the-perfect-git-commit-message" class="header-anchor"></a>WIP commits: how to avoid postponing writing the perfect git commit message
</h2><p>Remember that you don’t have to make a perfect git commit right off the bat. Do it only once you know what you actually want to write in it. While still working on the code and saving intermediate versions of it, I recommend using WIP commits where the title is simply <em>WIP</em>, or if you already have some commit text draft, prefix the title with <em>WIP:</em>.</p>
<h3 id="use-git-rebase--i-frequently"><a href="#use-git-rebase--i-frequently" class="header-anchor"></a>Use <code>git rebase -i</code> frequently
</h3><p>When you are done with WIP commits, you can run <a class="link" href="https://git-scm.com/docs/git-rebase" target="_blank" rel="noopener"
>git rebase -i</a> to squash them together and write the final git commit message.</p>
<p>For a visual explanation see the presentation <a class="link" href="https://www.youtube.com/watch?v=1NoNTqank_U&t=76s" target="_blank" rel="noopener"
>A Branch in Time (a story about revision histories)</a> by Tekin Süleyman on how to use interactive rebase in git and why rebasing and amending commits will end up making the code quality better in the long run.</p>
<div class="video-wrapper">
<iframe loading="lazy"
src="https://www.youtube.com/embed/1NoNTqank_U?start=76"
allowfullscreen
title="YouTube Video"
>
</iframe>
</div>
<h2 id="a-polished-git-commit-is-always-worth-the-effort"><a href="#a-polished-git-commit-is-always-worth-the-effort" class="header-anchor"></a>A polished git commit is always worth the effort
</h2><p>Someone who is lazy might say that while they agree with the principles, they don’t have time to follow them. To that I respond that <strong>doing things correctly from the onset actually saves time down the road</strong>.</p>
<ul>
<li>
<p><strong>If your git commits are good, the job of the reviewer will be much easier.</strong> They won’t waste time on just trying to understand your change, but they will get it directly and will be able to focus their energy on actually reviewing and spotting flaws in your code. If you avoid shipping a bug, you save a lot of work not having to debug, write a fix and ship a new release.</p>
</li>
<li>
<p><strong>A great git commit is also useful even if it later turns out the commit had a bug</strong>, because whoever fixes that bug will have a much easier time reading in the commit what the change was supposed to do, and understanding where it fell short, and then making the same change in the correct way. This leads to bugs being fixed much more quickly and with less effort - and most often the person doing the fix is a future you who no longer remembers what the present you was thinking while making that commit, with the future you just having to stare at the commit until it makes sense.</p>
</li>
<li>
<p><strong>You don’t have to rewrite anything when it comes time to submit the commit for review</strong>. Every single code review system I have ever used will automatically use the commit title and message as the review title and message if the review is a single commit review.</p>
</li>
</ul>
<h2 id="now-go-and-build-great-software--patch-by-patch"><a href="#now-go-and-build-great-software--patch-by-patch" class="header-anchor"></a>Now go and build great software – patch by patch!
</h2><p>Now you know how to make a good git commit message. If you are proud of your work and like doing things well, you will follow these guidelines. To further learn how to polish your git commit messages, see also the post on <a class="link" href="https://optimizedbyotto.com/post/git-commit-message-examples/" >git commit messages by example</a>.</p>
<p>The Linux kernel developer’s guide has also an excellent description of <a class="link" href="https://docs.kernel.org/process/submitting-patches.html#separate-your-changes" target="_blank" rel="noopener"
>how to separate changes into self-standing logical changes</a>, and <a class="link" href="https://docs.kernel.org/process/submitting-patches.html#describe-your-changes" target="_blank" rel="noopener"
>how to describe the changes</a>.</p> Quick builds and rebuilds of MariaDB using Docker https://optimizedbyotto.com/post/quick-builds-and-rebuilds-of-mariadb-with-docker/Sun, 12 Mar 2023 00:00:00 +0000 https://optimizedbyotto.com/post/quick-builds-and-rebuilds-of-mariadb-with-docker/ <img src="https://optimizedbyotto.com/post/quick-builds-and-rebuilds-of-mariadb-with-docker/mariadb-server-atom-autosave-entr-demo.gif" alt="Featured image of post Quick builds and rebuilds of MariaDB using Docker" /><p>The MariaDB server has over 2 million lines of code. Downloading, compiling (and re-compiling), and running the test suite can potentially consume a lot of time away from actually making the code changes and being productive. Knowing a few simple shortcuts can help avoid wasting time.</p>
<p>While the official build instructions on <a class="link" href="https://mariadb.org/get-involved/getting-started-for-developers/get-code-build-test/" target="_blank" rel="noopener"
>mariadb.org</a> and <a class="link" href="https://mariadb.com/kb/en/generic-build-instructions/" target="_blank" rel="noopener"
>mariadb.com/kb</a> are useful to read, there are ways to make the build (and rebuild) significantly faster and more efficient.</p>
<blockquote>
<h2 id="tldr-for-debianubuntu-users"><a href="#tldr-for-debianubuntu-users" class="header-anchor"></a>TL;DR for Debian/Ubuntu users
</h2><p>Get the latest MariaDB 11.0 source code, install build dependencies, configure, build and run test suite to validate binaries work:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">mkdir quick-rebuilds
cd quick-rebuilds
git clone --branch 11.0 --shallow-since=3m \
--recurse-submodules --shallow-submodules \
https://github.com/MariaDB/server.git mariadb-server
mkdir -p ccache build data
docker run --interactive --tty --rm -v ${PWD}:/quick-rebuilds \
-w /quick-rebuilds debian:sid bash
echo 'deb-src https://deb.debian.org/debian sid main' \
> /etc/apt/sources.list.d/deb-src-sid.list
apt update
apt install -y --no-install-recommends \
devscripts equivs ccache eatmydata ninja-build clang entr moreutils
mk-build-deps -r -i mariadb-server/debian/control \
-t 'apt-get -y -o Debug::pkgProblemResolver=yes --no-install-recommends'
export CCACHE_DIR=$PWD/ccache
export CXX=${CXX:-clang++}
export CC=${CC:-clang}
export CXX_FOR_BUILD=${CXX_FOR_BUILD:-clang++}
export CC_FOR_BUILD=${CC_FOR_BUILD:-clang}
export CFLAGS='-Wno-unused-command-line-argument'
export CXXFLAGS='-Wno-unused-command-line-argument'
cmake -S mariadb-server/ -B build/ -G Ninja --fresh \
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache \
-DPLUGIN_COLUMNSTORE=NO -DPLUGIN_ROCKSDB=NO -DPLUGIN_S3=NO \
-DPLUGIN_MROONGA=NO -DPLUGIN_CONNECT=NO -DPLUGIN_TOKUDB=NO \
-DPLUGIN_PERFSCHEMA=NO -DWITH_WSREP=OFF
eatmydata cmake --build build/
./build/mysql-test/mysql-test-run.pl --force --parallel=auto</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mkdir quick-rebuilds
</span></span><span style="display:flex;"><span>cd quick-rebuilds
</span></span><span style="display:flex;"><span>git clone --branch 11.0 --shallow-since<span style="color:#f92672">=</span>3m <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --recurse-submodules --shallow-submodules <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> https://github.com/MariaDB/server.git mariadb-server
</span></span><span style="display:flex;"><span>mkdir -p ccache build data
</span></span><span style="display:flex;"><span>docker run --interactive --tty --rm -v <span style="color:#e6db74">${</span>PWD<span style="color:#e6db74">}</span>:/quick-rebuilds <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -w /quick-rebuilds debian:sid bash
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">'deb-src https://deb.debian.org/debian sid main'</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> > /etc/apt/sources.list.d/deb-src-sid.list
</span></span><span style="display:flex;"><span>apt update
</span></span><span style="display:flex;"><span>apt install -y --no-install-recommends <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> devscripts equivs ccache eatmydata ninja-build clang entr moreutils
</span></span><span style="display:flex;"><span>mk-build-deps -r -i mariadb-server/debian/control <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -t <span style="color:#e6db74">'apt-get -y -o Debug::pkgProblemResolver=yes --no-install-recommends'</span>
</span></span><span style="display:flex;"><span>export CCACHE_DIR<span style="color:#f92672">=</span>$PWD/ccache
</span></span><span style="display:flex;"><span>export CXX<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>CXX<span style="color:#66d9ef">:-</span>clang++<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>export CC<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>CC<span style="color:#66d9ef">:-</span>clang<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>export CXX_FOR_BUILD<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>CXX_FOR_BUILD<span style="color:#66d9ef">:-</span>clang++<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>export CC_FOR_BUILD<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>CC_FOR_BUILD<span style="color:#66d9ef">:-</span>clang<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>export CFLAGS<span style="color:#f92672">=</span><span style="color:#e6db74">'-Wno-unused-command-line-argument'</span>
</span></span><span style="display:flex;"><span>export CXXFLAGS<span style="color:#f92672">=</span><span style="color:#e6db74">'-Wno-unused-command-line-argument'</span>
</span></span><span style="display:flex;"><span>cmake -S mariadb-server/ -B build/ -G Ninja --fresh <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -DCMAKE_CXX_COMPILER_LAUNCHER<span style="color:#f92672">=</span>ccache -DCMAKE_C_COMPILER_LAUNCHER<span style="color:#f92672">=</span>ccache <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -DPLUGIN_COLUMNSTORE<span style="color:#f92672">=</span>NO -DPLUGIN_ROCKSDB<span style="color:#f92672">=</span>NO -DPLUGIN_S3<span style="color:#f92672">=</span>NO <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -DPLUGIN_MROONGA<span style="color:#f92672">=</span>NO -DPLUGIN_CONNECT<span style="color:#f92672">=</span>NO -DPLUGIN_TOKUDB<span style="color:#f92672">=</span>NO <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -DPLUGIN_PERFSCHEMA<span style="color:#f92672">=</span>NO -DWITH_WSREP<span style="color:#f92672">=</span>OFF
</span></span><span style="display:flex;"><span>eatmydata cmake --build build/
</span></span><span style="display:flex;"><span>./build/mysql-test/mysql-test-run.pl --force --parallel<span style="color:#f92672">=</span>auto</span></span></code></pre></div></div></div>
<p>To rebuild after code change simply run:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">eatmydata cmake --build build/</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>eatmydata cmake --build build/</span></span></code></pre></div></div></div>
<p>For full details, read the whole article.</p>
</blockquote>
<h2 id="stay-organized-keep-directories-clean"><a href="#stay-organized-keep-directories-clean" class="header-anchor"></a>Stay organized, keep directories clean
</h2><p>The first step is to create the working directory and some directories inside it to:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">mkdir quick-rebuilds
cd quick-rebuilds
mkdir -p ccache build data</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mkdir quick-rebuilds
</span></span><span style="display:flex;"><span>cd quick-rebuilds
</span></span><span style="display:flex;"><span>mkdir -p ccache build data</span></span></code></pre></div></div></div>
<p>The directory <code>ccache</code> will be used by the tool with the same name to store build cache permanently. Build artifacts will be output in the directory <code>build</code> to avoid polluting the source code directory so that Git in the source tree will not accidentally commit any machine-generated files. The <code>data</code> directory is useful for temporary test installs.</p>
<p>The next step is to get the source code into this working directory.</p>
<h2 id="dont-download-the-whole-project--use-shallow-git-clone"><a href="#dont-download-the-whole-project--use-shallow-git-clone" class="header-anchor"></a>Don’t download the whole project – use shallow Git clone
</h2><p>The oldest Git commit in the project is from <a class="link" href="https://github.com/MariaDB/server/commit/7eec25e393727b16bb916b50d82b0aa3084e065c" target="_blank" rel="noopener"
>July, 2000</a>. Since then, MariaDB has had nearly 200 000 commits. To build the latest version and perhaps submit a Pull Request to commit your improvement to the project, you don’t necessarily need to have all those 200 000 commits available in your Git clone. You can use <a class="link" href="https://git-scm.com/docs/shallow" target="_blank" rel="noopener"
>shallow Git clone</a> to, for example, fetch only the history of the past 3 months:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">$ git clone --branch 11.0 --shallow-since=3m \
--recurse-submodules --shallow-submodules \
https://github.com/MariaDB/server.git mariadb-server
Cloning into 'mariadb-server'...
remote: Enumerating objects: 41075, done.
remote: Counting objects: 100% (41075/41075), done.
remote: Compressing objects: 100% (29333/29333), done.
remote: Total 41075 (delta 19706), reused 20092 (delta 10708), pack-reused 0
Receiving objects: 100% (41075/41075), 75.85 MiB | 8.48 MiB/s, done.
Resolving deltas: 100% (19706/19706), done.
Checking out files: 100% (24070/24070), done.
Submodule 'extra/wolfssl/wolfssl' (https://github.com/wolfSSL/wolfssl.git) registered for path 'extra/wolfssl/wolfssl'
Submodule 'libmariadb' (https://github.com/MariaDB/mariadb-connector-c.git) registered for path 'libmariadb'
Submodule 'storage/columnstore/columnstore' (https://github.com/mariadb-corporation/mariadb-columnstore-engine.git) registered for path 'storage/columnstore/columnstore'
Submodule 'storage/maria/libmarias3' (https://github.com/mariadb-corporation/libmarias3.git) registered for path 'storage/maria/libmarias3'
Submodule 'storage/rocksdb/rocksdb' (https://github.com/facebook/rocksdb.git) registered for path 'storage/rocksdb/rocksdb'
Submodule 'wsrep-lib' (https://github.com/codership/wsrep-lib.git) registered for path 'wsrep-lib'
Cloning into '/srv/sources/mariadb/quick-rebuilds/mariadb-server/extra/wolfssl/wolfssl'...
remote: Enumerating objects: 2851, done.
remote: Counting objects: 100% (2851/2851), done.
remote: Compressing objects: 100% (2124/2124), done.
remote: Total 2851 (delta 800), reused 1576 (delta 589), pack-reused 0
Receiving objects: 100% (2851/2851), 20.91 MiB | 10.43 MiB/s, done.
Resolving deltas: 100% (800/800), done.
...
Unpacking objects: 100% (3/3), done.
From https://github.com/codership/wsrep-API
* branch 694d6ca47f5eec7873be99b7d6babccf633d1231 -> FETCH_HEAD
Submodule path 'wsrep-lib/wsrep-API/v26': checked out '694d6ca47f5eec7873be99b7d6babccf633d1231'
$ git -C mariadb-server/ show --oneline --summary
f2dc4d4c (HEAD -> 11.0, origin/HEAD, origin/11.0) MDEV-30673 InnoDB recovery hangs when buf_LRU_get_free_block
$ git -C mariadb-server submodule
4fbd4fd36a21efd9d1a7e17aba390e91c78693b1 extra/wolfssl/wolfssl (4fbd4fd)
12bd1d5511fc2ff766ff6256c71b79a95739533f libmariadb (12bd1d5)
8b032853b7a200d9af4d468ac58bb9f4b6ac7040 storage/columnstore/columnstore (8b03285)
3846890513df0653b8919bc45a7600f9b55cab31 storage/maria/libmarias3 (3846890)
bba5e7bc21093d7cfa765e1280a7c4fdcd284288 storage/rocksdb/rocksdb (bba5e7b)
275a0af8c5b92f0ee33cfe9e23f3db5f59b56e9d wsrep-lib (275a0af)
$ du -shc mariadb-server/.git/modules/{storage/*,extra/wolfssl,libmariadb,wsrep-lib} \
mariadb-server/.git mariadb-server/
30M mariadb-server/.git/modules/storage/columnstore
1M mariadb-server/.git/modules/storage/maria
20M mariadb-server/.git/modules/storage/rocksdb
40M mariadb-server/.git/modules/extra/wolfssl
2M mariadb-server/.git/modules/libmariadb
1M mariadb-server/.git/modules/wsrep-lib
80M mariadb-server/.git
548M mariadb-server/
=720M total</code><pre><code>$ git clone --branch 11.0 --shallow-since=3m \
--recurse-submodules --shallow-submodules \
https://github.com/MariaDB/server.git mariadb-server
Cloning into 'mariadb-server'...
remote: Enumerating objects: 41075, done.
remote: Counting objects: 100% (41075/41075), done.
remote: Compressing objects: 100% (29333/29333), done.
remote: Total 41075 (delta 19706), reused 20092 (delta 10708), pack-reused 0
Receiving objects: 100% (41075/41075), 75.85 MiB | 8.48 MiB/s, done.
Resolving deltas: 100% (19706/19706), done.
Checking out files: 100% (24070/24070), done.
Submodule 'extra/wolfssl/wolfssl' (https://github.com/wolfSSL/wolfssl.git) registered for path 'extra/wolfssl/wolfssl'
Submodule 'libmariadb' (https://github.com/MariaDB/mariadb-connector-c.git) registered for path 'libmariadb'
Submodule 'storage/columnstore/columnstore' (https://github.com/mariadb-corporation/mariadb-columnstore-engine.git) registered for path 'storage/columnstore/columnstore'
Submodule 'storage/maria/libmarias3' (https://github.com/mariadb-corporation/libmarias3.git) registered for path 'storage/maria/libmarias3'
Submodule 'storage/rocksdb/rocksdb' (https://github.com/facebook/rocksdb.git) registered for path 'storage/rocksdb/rocksdb'
Submodule 'wsrep-lib' (https://github.com/codership/wsrep-lib.git) registered for path 'wsrep-lib'
Cloning into '/srv/sources/mariadb/quick-rebuilds/mariadb-server/extra/wolfssl/wolfssl'...
remote: Enumerating objects: 2851, done.
remote: Counting objects: 100% (2851/2851), done.
remote: Compressing objects: 100% (2124/2124), done.
remote: Total 2851 (delta 800), reused 1576 (delta 589), pack-reused 0
Receiving objects: 100% (2851/2851), 20.91 MiB | 10.43 MiB/s, done.
Resolving deltas: 100% (800/800), done.
...
Unpacking objects: 100% (3/3), done.
From https://github.com/codership/wsrep-API
* branch 694d6ca47f5eec7873be99b7d6babccf633d1231 -> FETCH_HEAD
Submodule path 'wsrep-lib/wsrep-API/v26': checked out '694d6ca47f5eec7873be99b7d6babccf633d1231'
$ git -C mariadb-server/ show --oneline --summary
f2dc4d4c (HEAD -> 11.0, origin/HEAD, origin/11.0) MDEV-30673 InnoDB recovery hangs when buf_LRU_get_free_block
$ git -C mariadb-server submodule
4fbd4fd36a21efd9d1a7e17aba390e91c78693b1 extra/wolfssl/wolfssl (4fbd4fd)
12bd1d5511fc2ff766ff6256c71b79a95739533f libmariadb (12bd1d5)
8b032853b7a200d9af4d468ac58bb9f4b6ac7040 storage/columnstore/columnstore (8b03285)
3846890513df0653b8919bc45a7600f9b55cab31 storage/maria/libmarias3 (3846890)
bba5e7bc21093d7cfa765e1280a7c4fdcd284288 storage/rocksdb/rocksdb (bba5e7b)
275a0af8c5b92f0ee33cfe9e23f3db5f59b56e9d wsrep-lib (275a0af)
$ du -shc mariadb-server/.git/modules/{storage/*,extra/wolfssl,libmariadb,wsrep-lib} \
mariadb-server/.git mariadb-server/
30M mariadb-server/.git/modules/storage/columnstore
1M mariadb-server/.git/modules/storage/maria
20M mariadb-server/.git/modules/storage/rocksdb
40M mariadb-server/.git/modules/extra/wolfssl
2M mariadb-server/.git/modules/libmariadb
1M mariadb-server/.git/modules/wsrep-lib
80M mariadb-server/.git
548M mariadb-server/
=720M total</code></pre></div>
<p>With a 3-month history, the main Git data for MariaDB is about 50 MB, and the submodules as shallow clones add 30 MB more. If not using shallow cloning, the whole MariaDB repository and submodules would amount to over 1 GB of data, so using shallow clones cuts the amount of data to be downloaded by over 80%.</p>
<p>The checked out data is almost 550 MB, but that is unpacked from the Git data, so actual network transfer was at max 80 MB of Git data.</p>
<h2 id="build-inside-a-throwaway-container"><a href="#build-inside-a-throwaway-container" class="header-anchor"></a>Build inside a throwaway container
</h2><p>In addition to the source code, one also needs a long list of build dependencies installed. Instead of polluting your laptop/workstation with tens of new libraries, install all the dependencies inside a container that has a working directory mounted inside it. This way your system will stay clean, but files written in the working directory will be accessible both inside and outside the container and persist after the container is gone.</p>
<p>Next, start the container:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">docker run --interactive --tty --rm \
-v ${PWD}:/quick-rebuilds -w /quick-rebuilds debian:sid bash</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>docker run --interactive --tty --rm <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -v <span style="color:#e6db74">${</span>PWD<span style="color:#e6db74">}</span>:/quick-rebuilds -w /quick-rebuilds debian:sid bash</span></span></code></pre></div></div></div>
<p>This example uses <a class="link" href="https://en.wikipedia.org/wiki/Docker_%28software%29" target="_blank" rel="noopener"
>Docker</a>, but the principle is the same with <a class="link" href="https://en.wikipedia.org/wiki/OS-level_virtualization#Implementations" target="_blank" rel="noopener"
>any Linux container</a> tool, such as Podman.</p>
<p>Inside the Debian container, use apt to automatically install all dependencies (about 160 MB download, over 660 MB when unpacked to disk) as defined in MariaDB sources file <code>debian/control</code>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">echo 'deb-src https://deb.debian.org/debian sid main' \
> /etc/apt/sources.list.d/deb-src-sid.list
apt update
apt install -y --no-install-recommends \
devscripts equivs ccache eatmydata ninja-build clang entr moreutils
mk-build-deps -r -i mariadb-server/debian/control \
-t 'apt-get -y -o Debug::pkgProblemResolver=yes --no-install-recommends'</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>echo <span style="color:#e6db74">'deb-src https://deb.debian.org/debian sid main'</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> > /etc/apt/sources.list.d/deb-src-sid.list
</span></span><span style="display:flex;"><span>apt update
</span></span><span style="display:flex;"><span>apt install -y --no-install-recommends <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> devscripts equivs ccache eatmydata ninja-build clang entr moreutils
</span></span><span style="display:flex;"><span>mk-build-deps -r -i mariadb-server/debian/control <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -t <span style="color:#e6db74">'apt-get -y -o Debug::pkgProblemResolver=yes --no-install-recommends'</span></span></span></code></pre></div></div></div>
<p>The single biggest boost to the (re-)compilation speed is gained with <a class="link" href="https://ccache.dev/" target="_blank" rel="noopener"
>Ccache</a>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-6"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-6" style="display:none;">export CCACHE_DIR=$PWD/ccache
ccache --show-stats --verbose</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>export CCACHE_DIR<span style="color:#f92672">=</span>$PWD/ccache
</span></span><span style="display:flex;"><span>ccache --show-stats --verbose</span></span></code></pre></div></div></div>
<p>We also want to prime the environment to use <a class="link" href="https://clang.llvm.org/" target="_blank" rel="noopener"
>Clang</a>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-7"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-7" style="display:none;">export CXX=${CXX:-clang++}
export CC=${CC:-clang}
export CXX_FOR_BUILD=${CXX_FOR_BUILD:-clang++}
export CC_FOR_BUILD=${CC_FOR_BUILD:-clang}
export CFLAGS='-Wno-unused-command-line-argument'
export CXXFLAGS='-Wno-unused-command-line-argument'</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>export CXX<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>CXX<span style="color:#66d9ef">:-</span>clang++<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>export CC<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>CC<span style="color:#66d9ef">:-</span>clang<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>export CXX_FOR_BUILD<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>CXX_FOR_BUILD<span style="color:#66d9ef">:-</span>clang++<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>export CC_FOR_BUILD<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>CC_FOR_BUILD<span style="color:#66d9ef">:-</span>clang<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>export CFLAGS<span style="color:#f92672">=</span><span style="color:#e6db74">'-Wno-unused-command-line-argument'</span>
</span></span><span style="display:flex;"><span>export CXXFLAGS<span style="color:#f92672">=</span><span style="color:#e6db74">'-Wno-unused-command-line-argument'</span></span></span></code></pre></div></div></div>
<p>The first step in actual compilation is to run <a class="link" href="https://manpages.debian.org/unstable/cmake/cmake.1.en.html" target="_blank" rel="noopener"
>CMake</a>, instructing it to look at the source in directory <code>mariadb-server/</code>, output build artifacts in directory <code>build/</code> and use <a class="link" href="https://ninja-build.org/" target="_blank" rel="noopener"
>Ninja</a> as the build system. This line also always forces a fresh configuration, discarding any previous CMakeCache.txt files, to use ccache instead of calling gcc/c++ directly, and also skip a bunch of rarely used large plugins to save a lot of compilation time.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-8"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-8" style="display:none;">cmake -S mariadb-server/ -B build/ -G Ninja --fresh \
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache \
-DPLUGIN_COLUMNSTORE=NO -DPLUGIN_ROCKSDB=NO -DPLUGIN_S3=NO \
-DPLUGIN_MROONGA=NO -DPLUGIN_CONNECT=NO -DPLUGIN_TOKUDB=NO \
-DPLUGIN_PERFSCHEMA=NO -DWITH_WSREP=OFF</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>cmake -S mariadb-server/ -B build/ -G Ninja --fresh <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -DCMAKE_CXX_COMPILER_LAUNCHER<span style="color:#f92672">=</span>ccache -DCMAKE_C_COMPILER_LAUNCHER<span style="color:#f92672">=</span>ccache <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -DPLUGIN_COLUMNSTORE<span style="color:#f92672">=</span>NO -DPLUGIN_ROCKSDB<span style="color:#f92672">=</span>NO -DPLUGIN_S3<span style="color:#f92672">=</span>NO <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -DPLUGIN_MROONGA<span style="color:#f92672">=</span>NO -DPLUGIN_CONNECT<span style="color:#f92672">=</span>NO -DPLUGIN_TOKUDB<span style="color:#f92672">=</span>NO <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -DPLUGIN_PERFSCHEMA<span style="color:#f92672">=</span>NO -DWITH_WSREP<span style="color:#f92672">=</span>OFF</span></span></code></pre></div></div></div>
<p>If you are interested in knowing all possible build flags available, simply query them from CMake with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-9"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-9" style="display:none;">cmake build/ -LH</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>cmake build/ -LH</span></span></code></pre></div></div></div>
<p>Note that after the configure stage has run, there are no traditional Makefiles in ‘build/’, only a <code>ninja.build</code> since we are using Ninja. Thus, running <code>make build</code> will build. With Ninja it will be <code>ninja -C build</code>. However, we don’t need to call Ninja directly either but just let CMake orchestrate everything with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-10"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-10" style="display:none;">$ eatmydata cmake --build build/
[173/1462] Building C object plugin/auth_ed25519/CMakeFiles/ref10.dir/ref10/ge_add.c.o</code><pre><code>$ eatmydata cmake --build build/
[173/1462] Building C object plugin/auth_ed25519/CMakeFiles/ref10.dir/ref10/ge_add.c.o</code></pre></div>
<p>In interactive mode, Ninja will have just one line of output at the time showing progress. The numbers inside the brackets show how many files have been compiled of the total number of files to compile, and the filename after it shows which file is currently being compiled. Ninja runs by default on all available CPU cores, so there is no need to define parallelism manually. If Ninja encounters warnings or errors, it will spit them out but continue to show the one-liner status at the bottom of the terminal. To abort Ninja, feel free to press <code>Ctrl+C</code> at any time.</p>
<p>Re-starting the compilation will continue where it left off – Ninja is very smart and fast in figuring out what files need to compiled.</p>
<h2 id="running-the-mariadb-test-suite-mtr"><a href="#running-the-mariadb-test-suite-mtr" class="header-anchor"></a>Running the MariaDB test suite (MTR)
</h2><p>While the MariaDB server does have a small amount of <a class="link" href="https://cmake.org/cmake/help/book/mastering-cmake/chapter/Testing%20With%20CMake%20and%20CTest.html#testing-using-ctest" target="_blank" rel="noopener"
>CTest unit tests</a>, the main test system is the <a class="link" href="https://mariadb.com/kb/en/mysql-test-runpl-options/" target="_blank" rel="noopener"
>mariadb-test-run script</a> (inherited from mysql-test-run). Each test file (suffix <code>.test</code>) consists mainly of SQL code which is executed by <code>mariadb-test-run</code> (MTR) and output compared to the corresponding file with the expected output in text format (suffix <code>.result</code>).</p>
<p>To start the MTR with CMake run:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-11"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-11" style="display:none;">cmake --build build/ -t test-force</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>cmake --build build/ -t test-force</span></span></code></pre></div></div></div>
<p>Alternatively, one can simply invoke the script directly after the binaries have been compiled:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-12"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-12" style="display:none;">./build/mysql-test/mysql-test-run.pl --force</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>./build/mysql-test/mysql-test-run.pl --force</span></span></code></pre></div></div></div>
<p>This offers more flexibility, as you can easily add parameters such as <code>--parallel=auto</code> (as the default is to run just one test worker on one CPU) or limit the scope to just one suite or just one individual test:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-13"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-13" style="display:none;">./build/mysql-test/mysql-test-run.pl --force --parallel=auto --skip-rpl --suite=main</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>./build/mysql-test/mysql-test-run.pl --force --parallel<span style="color:#f92672">=</span>auto --skip-rpl --suite<span style="color:#f92672">=</span>main</span></span></code></pre></div></div></div>
<p>Note that all commands in this example run as root, as it is necessary to start the whole container with a root user inside it to have permissions to apt install build dependencies. However, the mariadb-test-run is actually not designed to be run as root and will end up skipping some tests when run as root. Also, when run like this, a lot of the debugging information isn’t fully shown. To make most out of the mysql-test-run/mariadb-test-run script, read more in the post <a class="link" href="https://optimizedbyotto.com/post/grokking-mariadb-test-run-mtr" >Grokking the MariaDB test runner (MTR)</a>.</p>
<h2 id="more-build-targets"><a href="#more-build-targets" class="header-anchor"></a>More build targets
</h2><p>As concluded above, the target <code>test-force</code> was for MTR, and the plainly named target <code>test</code> is for CUnit tests. The equivalent direct Ninja command for running target <code>test</code> would be <code>ninja -C build/ test</code>. To list all targets, run <code>cmake --build build/ --target help</code> or <code>ninja -C build/ -t targets all</code>.</p>
<p>MariaDB 11.0 has currently over 1300 targets. There does not seem to be a very consistent pattern in how build targets are named or how they are intended to be used. One way to find CMake targets that might be more important than others is to simply grep them from the main level CMake configuration file:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-14"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-14" style="display:none;">$ grep ADD_CUSTOM_TARGET mariadb-server/CMakeLists.txt
ADD_CUSTOM_TARGET(import_executables
ADD_CUSTOM_TARGET(INFO_SRC ALL
ADD_CUSTOM_TARGET(INFO_BIN ALL
ADD_CUSTOM_TARGET(minbuild)
ADD_CUSTOM_TARGET(smoketest</code><pre><code>$ grep ADD_CUSTOM_TARGET mariadb-server/CMakeLists.txt
ADD_CUSTOM_TARGET(import_executables
ADD_CUSTOM_TARGET(INFO_SRC ALL
ADD_CUSTOM_TARGET(INFO_BIN ALL
ADD_CUSTOM_TARGET(minbuild)
ADD_CUSTOM_TARGET(smoketest</code></pre></div>
<p>One of the standard targets is <code>install</code>, which can be run <code>ninja -C build install</code> or CMake:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-15"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-15" style="display:none;">$ cmake --install build/
-- Install configuration: "RelWithDebInfo"
-- Up-to-date: /usr/local/mysql/./README.md
-- Up-to-date: /usr/local/mysql/./CREDITS
-- Up-to-date: /usr/local/mysql/./COPYING
-- Up-to-date: /usr/local/mysql/./THIRDPARTY
-- Up-to-date: /usr/local/mysql/./INSTALL-BINARY
-- Up-to-date: /usr/local/mysql/lib/plugin/dialog.so
-- Up-to-date: /usr/local/mysql/lib/plugin/client_ed25519.so
-- Up-to-date: /usr/local/mysql/lib/plugin/caching_sha2_password.so
-- Up-to-date: /usr/local/mysql/lib/plugin/sha256_password.so
...
-- Installing: /usr/local/mysql/support-files/systemd/mysql.service
-- Installing: /usr/local/mysql/support-files/systemd/mysqld.service
-- Installing: /usr/local/mysql/support-files/systemd/mariadb@.service
-- Installing: /usr/local/mysql/support-files/systemd/mariadb@.socket
-- Installing: /usr/local/mysql/support-files/systemd/mariadb-extra@.socket
-- Up-to-date: /usr/local/mysql/support-files/systemd/mysql.service
-- Up-to-date: /usr/local/mysql/support-files/systemd/mysqld.service</code><pre><code>$ cmake --install build/
-- Install configuration: "RelWithDebInfo"
-- Up-to-date: /usr/local/mysql/./README.md
-- Up-to-date: /usr/local/mysql/./CREDITS
-- Up-to-date: /usr/local/mysql/./COPYING
-- Up-to-date: /usr/local/mysql/./THIRDPARTY
-- Up-to-date: /usr/local/mysql/./INSTALL-BINARY
-- Up-to-date: /usr/local/mysql/lib/plugin/dialog.so
-- Up-to-date: /usr/local/mysql/lib/plugin/client_ed25519.so
-- Up-to-date: /usr/local/mysql/lib/plugin/caching_sha2_password.so
-- Up-to-date: /usr/local/mysql/lib/plugin/sha256_password.so
...
-- Installing: /usr/local/mysql/support-files/systemd/mysql.service
-- Installing: /usr/local/mysql/support-files/systemd/mysqld.service
-- Installing: /usr/local/mysql/support-files/systemd/mariadb@.service
-- Installing: /usr/local/mysql/support-files/systemd/mariadb@.socket
-- Installing: /usr/local/mysql/support-files/systemd/mariadb-extra@.socket
-- Up-to-date: /usr/local/mysql/support-files/systemd/mysql.service
-- Up-to-date: /usr/local/mysql/support-files/systemd/mysqld.service</code></pre></div>
<p>To better understand the full capabilities of the build tools, it is recommended to skim through the <a class="link" href="https://manpages.debian.org/unstable/cmake/cmake.1.en.html" target="_blank" rel="noopener"
>cmake man page</a> and the <a class="link" href="https://manpages.debian.org/unstable/ninja-build/ninja.1.en.html" target="_blank" rel="noopener"
>ninja man page</a>.</p>
<h2 id="run-the-build-binaries-directly"><a href="#run-the-build-binaries-directly" class="header-anchor"></a>Run the build binaries directly
</h2><p>Instead of wasting time on running the <code>install</code> target, one can simply invoke the build binaries directly:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-16"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-16" style="display:none;">$ ./build/client/mariadb --version
./build/client/mariadb from 11.0.1-MariaDB, client 15.2 for Linux (x86_64) using EditLine wrapper
$ ./build/sql/mariadbd --version
./build/sql/mariadbd Ver 11.0.1-MariaDB for Linux on x86_64 (Source distribution)</code><pre><code>$ ./build/client/mariadb --version
./build/client/mariadb from 11.0.1-MariaDB, client 15.2 for Linux (x86_64) using EditLine wrapper
$ ./build/sql/mariadbd --version
./build/sql/mariadbd Ver 11.0.1-MariaDB for Linux on x86_64 (Source distribution)</code></pre></div>
<p>To actually run the server, it needs a data directory and a user, which can be created with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-17"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-17" style="display:none;">$ ./build/scripts/mariadb-install-db --srcdir=mariadb-server
$ adduser --disabled-password mariadb
$ chown -R mariadb:mariadb ./data
$ ./build/sql/mariadbd --datadir=./data --user=mariadb &
[Note] Starting MariaDB 11.0.1-MariaDB source revision as process 5428
[Note] InnoDB: Compressed tables use zlib 1.2.13
[Note] InnoDB: Using transactional memory
[Note] InnoDB: Number of transaction pools: 1
[Note] InnoDB: Using crc32 + pclmulqdq instructions
[Warning] mariadbd: io_uring_queue_init() failed with errno 0
[Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
[Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 2.000MiB
[Note] InnoDB: Completed initialization of buffer pool
[Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
[Note] InnoDB: Opened 3 undo tablespaces
[Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
[Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
[Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
[Note] InnoDB: log sequence number 47391; transaction id 14
[Note] InnoDB: Loading buffer pool(s) from /quick-rebuilds/data/ib_buffer_pool
[Note] InnoDB: Buffer pool(s) load completed at 230220 20:28:45
[Note] Plugin 'FEEDBACK' is disabled.
[Note] Server socket created on IP: '0.0.0.0'.
[Note] Server socket created on IP: '::'.
[Note] ./build/sql/mariadbd: ready for connections.
Version: '11.0.1-MariaDB' socket: '/tmp/mysql.sock' port: 3306 Source distribution</code><pre><code>$ ./build/scripts/mariadb-install-db --srcdir=mariadb-server
$ adduser --disabled-password mariadb
$ chown -R mariadb:mariadb ./data
$ ./build/sql/mariadbd --datadir=./data --user=mariadb &
[Note] Starting MariaDB 11.0.1-MariaDB source revision as process 5428
[Note] InnoDB: Compressed tables use zlib 1.2.13
[Note] InnoDB: Using transactional memory
[Note] InnoDB: Number of transaction pools: 1
[Note] InnoDB: Using crc32 + pclmulqdq instructions
[Warning] mariadbd: io_uring_queue_init() failed with errno 0
[Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
[Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 2.000MiB
[Note] InnoDB: Completed initialization of buffer pool
[Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
[Note] InnoDB: Opened 3 undo tablespaces
[Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
[Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
[Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
[Note] InnoDB: log sequence number 47391; transaction id 14
[Note] InnoDB: Loading buffer pool(s) from /quick-rebuilds/data/ib_buffer_pool
[Note] InnoDB: Buffer pool(s) load completed at 230220 20:28:45
[Note] Plugin 'FEEDBACK' is disabled.
[Note] Server socket created on IP: '0.0.0.0'.
[Note] Server socket created on IP: '::'.
[Note] ./build/sql/mariadbd: ready for connections.
Version: '11.0.1-MariaDB' socket: '/tmp/mysql.sock' port: 3306 Source distribution</code></pre></div>
<p>It is necessary to define the custom data directory path and custom user, otherwise mariadbd will fail to start:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-18"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-18" style="display:none;">[Warning] Can't create test file /usr/local/mysql/data/03727bdc8fe2.lower-test
./build/sql/mariadbd: Can't change dir to '/usr/local/mysql/data/' (Errcode: 2 "No such file or directory")
[ERROR] Aborting
./build/sql/mariadbd: Please consult the Knowledge Base to find out how to run mysqld as root!
[ERROR] Aborting</code><pre><code>[Warning] Can't create test file /usr/local/mysql/data/03727bdc8fe2.lower-test
./build/sql/mariadbd: Can't change dir to '/usr/local/mysql/data/' (Errcode: 2 "No such file or directory")
[ERROR] Aborting
./build/sql/mariadbd: Please consult the Knowledge Base to find out how to run mysqld as root!
[ERROR] Aborting</code></pre></div>
<p>To gracefully stop the server, send it the <a class="link" href="https://optimizedbyotto.com/post/stop-senseless-killing/" >SIGTERM signal</a>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-19"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-19" style="display:none;">$ pkill -ef mariadbd
[Note] ./build/sql/mariadbd (initiated by: unknown): Normal shutdown
[Note] InnoDB: FTS optimize thread exiting.
[Note] InnoDB: Starting shutdown...
[Note] InnoDB: Dumping buffer pool(s) to /quick-rebuilds/data/ib_buffer_pool
[Note] InnoDB: Buffer pool(s) dump completed at 230220 20:29:05
[Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
[Note] InnoDB: Shutdown completed; log sequence number 47391; transaction id 15
[Note] ./build/sql/mariadbd: Shutdown complete
mariadbd killed (pid 5428)</code><pre><code>$ pkill -ef mariadbd
[Note] ./build/sql/mariadbd (initiated by: unknown): Normal shutdown
[Note] InnoDB: FTS optimize thread exiting.
[Note] InnoDB: Starting shutdown...
[Note] InnoDB: Dumping buffer pool(s) to /quick-rebuilds/data/ib_buffer_pool
[Note] InnoDB: Buffer pool(s) dump completed at 230220 20:29:05
[Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
[Note] InnoDB: Shutdown completed; log sequence number 47391; transaction id 15
[Note] ./build/sql/mariadbd: Shutdown complete
mariadbd killed (pid 5428)</code></pre></div>
<h2 id="quick-rebuilds"><a href="#quick-rebuilds" class="header-anchor"></a>Quick rebuilds
</h2><p>With this setup, you can invoke <code>eatmydata cmake --build build/</code> to have the source code re-compiled as quickly as possible.</p>
<p>The ‘screenshot’ below showcases how Ninja/CMake will only rebuild the file with changes and its dependencies. In the case of a simple MariaDB client version string change, only 5 files needed to be re-built, and it <strong>took less than a second</strong>:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-20"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-20" style="display:none;">$ sed 's/*VER= "15.1"/*VER= "15.2"/' -i mariadb-server/client/mysql.cc
$ time eatmydata cmake --build build/
[5/5] Linking CXX executable client/mariadb
real 0m0.992s
user 0m0.374s
sys 0m0.353s</code><pre><code>$ sed 's/*VER= "15.1"/*VER= "15.2"/' -i mariadb-server/client/mysql.cc
$ time eatmydata cmake --build build/
[5/5] Linking CXX executable client/mariadb
real 0m0.992s
user 0m0.374s
sys 0m0.353s</code></pre></div>
<p>A similar version string change in the server leads to having to rebuild over a thousand files:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-21"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-21" style="display:none;">$ sed 's/MYSQL_VERSION_PATCH=1/MYSQL_VERSION_PATCH=2/' -i mariadb-server/VERSION
$ time eatmydata cmake --build build/
[0/1] Re-running CMake...
-- Running cmake version 3.25.1
-- MariaDB 11.0.2
-- Packaging as: mariadb-11.0.2-Linux-x86_64
-- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
== Configuring MariaDB Connector/C
-- SYSTEM_LIBS: dl;m;dl;m;/usr/lib/x86_64-linux-gnu/libssl.so;/usr/lib/x86_64-linux-gnu/libcrypto.so;/usr/lib/x86_64-linux-gnu/libz.so
-- Configuring OQGraph
-- Configuring done
-- Generating done
-- Build files have been written to: /quick-rebuilds/build
[377/1257] Generating user.t
troff: fatal error: can't find macro file m
[378/1257] Generating user.ps
troff: fatal error: can't find macro file m
[433/1257] Building CXX object storage/archive/CMakeFiles/archive.dir/ha_archive.cc.o
In file included from /quick-rebuilds/mariadb-server/storage/archive/ha_archive.cc:29:
/quick-rebuilds/mariadb-server/storage/archive/ha_archive.h:91:15: warning: 'index_type' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]
const char *index_type(uint inx) { return "NONE"; }
^
/quick-rebuilds/mariadb-server/sql/handler.h:3915:23: note: overridden virtual function is here
virtual const char *index_type(uint key_number) { DBUG_ASSERT(0); return "";}
^
[...]
In file included from /quick-rebuilds/mariadb-server/storage/archive/ha_archive.cc:29:
/quick-rebuilds/mariadb-server/storage/archive/ha_archive.h:163:7: warning: 'external_lock' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]
int external_lock(THD *thd, int lock_type);
^
/quick-rebuilds/mariadb-server/sql/handler.h:5153:15: note: overridden virtual function is here
virtual int external_lock(THD *thd __attribute__((unused)),
^
36 warnings generated.
[1257/1257] Linking CXX executable extra/mariabackup/mariadb-backup
real 2m7.786s
user 12m56.232s
sys 1m57.842s</code><pre><code>$ sed 's/MYSQL_VERSION_PATCH=1/MYSQL_VERSION_PATCH=2/' -i mariadb-server/VERSION
$ time eatmydata cmake --build build/
[0/1] Re-running CMake...
-- Running cmake version 3.25.1
-- MariaDB 11.0.2
-- Packaging as: mariadb-11.0.2-Linux-x86_64
-- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
== Configuring MariaDB Connector/C
-- SYSTEM_LIBS: dl;m;dl;m;/usr/lib/x86_64-linux-gnu/libssl.so;/usr/lib/x86_64-linux-gnu/libcrypto.so;/usr/lib/x86_64-linux-gnu/libz.so
-- Configuring OQGraph
-- Configuring done
-- Generating done
-- Build files have been written to: /quick-rebuilds/build
[377/1257] Generating user.t
troff: fatal error: can't find macro file m
[378/1257] Generating user.ps
troff: fatal error: can't find macro file m
[433/1257] Building CXX object storage/archive/CMakeFiles/archive.dir/ha_archive.cc.o
In file included from /quick-rebuilds/mariadb-server/storage/archive/ha_archive.cc:29:
/quick-rebuilds/mariadb-server/storage/archive/ha_archive.h:91:15: warning: 'index_type' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]
const char *index_type(uint inx) { return "NONE"; }
^
/quick-rebuilds/mariadb-server/sql/handler.h:3915:23: note: overridden virtual function is here
virtual const char *index_type(uint key_number) { DBUG_ASSERT(0); return "";}
^
[...]
In file included from /quick-rebuilds/mariadb-server/storage/archive/ha_archive.cc:29:
/quick-rebuilds/mariadb-server/storage/archive/ha_archive.h:163:7: warning: 'external_lock' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]
int external_lock(THD *thd, int lock_type);
^
/quick-rebuilds/mariadb-server/sql/handler.h:5153:15: note: overridden virtual function is here
virtual int external_lock(THD *thd __attribute__((unused)),
^
36 warnings generated.
[1257/1257] Linking CXX executable extra/mariabackup/mariadb-backup
real 2m7.786s
user 12m56.232s
sys 1m57.842s</code></pre></div>
<p>The above example also shows how Ninja spits out warnings.</p>
<p>Despite the majority of the project files being re-built, it still <strong>took only two minutes</strong>, mainly thanks to ccache having a high hit-rate.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-22"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-22" style="display:none;">$ ccache --show-stats
Cacheable calls: 3235 / 3235 (100.0%)
Hits: 1932 / 3235 (59.72%)
Direct: 49 / 1932 ( 2.54%)
Preprocessed: 1883 / 1932 (97.46%)
Misses: 1303 / 3235 (40.28%)
Local storage:
Cache size (GB): 0.11 / 5.00 ( 2.18%)</code><pre><code>$ ccache --show-stats
Cacheable calls: 3235 / 3235 (100.0%)
Hits: 1932 / 3235 (59.72%)
Direct: 49 / 1932 ( 2.54%)
Preprocessed: 1883 / 1932 (97.46%)
Misses: 1303 / 3235 (40.28%)
Local storage:
Cache size (GB): 0.11 / 5.00 ( 2.18%)</code></pre></div>
<p>Without ccache, the build time in the same scenario is 6–8 minutes. There are some extra flags in ccache (such as <a class="link" href="https://ccache.dev/manual/4.7.4.html#_configuration_options" target="_blank" rel="noopener"
>CCACHE_SLOPPINESS</a>) which can be used to further tune the ccache speed, but when I did some experimenting, I didn’t discover any that made a visible impact.</p>
<p>Without <a class="link" href="https://manpages.debian.org/unstable/eatmydata/eatmydata.1.en.html" target="_blank" rel="noopener"
>eatmydata</a>, the build takes 10-20 seconds longer, as the system calls to disk will wait for <a class="link" href="https://manpages.debian.org/unstable/manpages-dev/fsync.2.en.html" target="_blank" rel="noopener"
>fsync</a> and the like to complete, but which we are fine skipping since we don’t care about data durability and crash recovery as this is a throwaway environment anyway. Using regular <a class="link" href="https://gcc.gnu.org/" target="_blank" rel="noopener"
>GNU GCC</a> instead of Clang adds another 20–40 seconds to the rebuild time.</p>
<p>The current two minutes for the build time on my laptop with 8-core Intel i7-8650U CPU @ 1.90GHz is not exactly instant, but it is fast enough that I can sit and wait it out without feeling the need to context switch and loose my focus.</p>
<h2 id="automatic-rebuild"><a href="#automatic-rebuild" class="header-anchor"></a>Automatic rebuild
</h2><p>As showcased in the post <a class="link" href="https://optimizedbyotto.com/post/develop-code-10x-faster/" >How to code 10x faster than an average programmer</a>, as a high-performing software developer, you don’t want to waste time on manually running a lot of commands to build and test your code, but instead you want to have a setup where you write code in your editor and have the code automatically re-compile and run when the source code file is saved.</p>
<p>For MariaDB, the automatic rebuild part can easily be achieved with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-23"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-23" style="display:none;">find mariadb-server/* | entr eatmydata cmake --build build/</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>find mariadb-server/* | entr eatmydata cmake --build build/</span></span></code></pre></div></div></div>
<p>To automatically rebuild and also run a binary (in this case the <em>mariadb</em> client), define multiple commands in quotes to the <code>-s</code> parameter:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-24"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-24" style="display:none;">find mariadb-server/* | \
entr -s 'eatmydata cmake --build build/; ./build/client/mariadb --version'</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>find mariadb-server/* | <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> entr -s <span style="color:#e6db74">'eatmydata cmake --build build/; ./build/client/mariadb --version'</span></span></span></code></pre></div></div></div>
<p><img src="https://optimizedbyotto.com/post/quick-builds-and-rebuilds-of-mariadb-with-docker/mariadb-atom-autosave-entr-demo.gif"
width="1200"
height="611"
loading="lazy"
alt="MariaDB client automatic compilation and re-run"
class="gallery-image"
data-flex-grow="196"
data-flex-basis="471px"
>
</p>
<p>When running the server use the <code>-r</code> parameter to have Entr automatically restart it:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-25"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-25" style="display:none;">find mariadb-server/* | \
entr -r ./build/sql/mariadbd --datadir=./data --user=mariadb</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>find mariadb-server/* | <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> entr -r ./build/sql/mariadbd --datadir<span style="color:#f92672">=</span>./data --user<span style="color:#f92672">=</span>mariadb</span></span></code></pre></div></div></div>
<p><img src="https://optimizedbyotto.com/post/quick-builds-and-rebuilds-of-mariadb-with-docker/mariadb-server-atom-autosave-entr-demo.gif"
width="1200"
height="611"
loading="lazy"
alt="MariaDB server automatic compilation and restart"
class="gallery-image"
data-flex-grow="196"
data-flex-basis="471px"
>
</p>
<p>If the you are developing an MTR test by editing *.test files, there is no need to recompile anything, and you can simply have Entr re-run the test every time a file is changed:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-26"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-26" style="display:none;">find mariadb-server/* | entr -r ./build/mysql-test/mysql-test-run.pl main.connect</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>find mariadb-server/* | entr -r ./build/mysql-test/mysql-test-run.pl main.connect</span></span></code></pre></div></div></div>
<p><img src="https://optimizedbyotto.com/post/quick-builds-and-rebuilds-of-mariadb-with-docker/mariadb-mtr-atom-autosave-entr-demo.gif"
width="1200"
height="611"
loading="lazy"
alt="MariaDB test run automatic restart"
class="gallery-image"
data-flex-grow="196"
data-flex-basis="471px"
>
</p>
<h2 id="conclusion"><a href="#conclusion" class="header-anchor"></a>Conclusion
</h2><p>The examples above are specific to MariaDB and illustrate in detail how to be efficient and avoid wasting <a class="link" href="https://xkcd.com/303/" target="_blank" rel="noopener"
>time compiling</a>, but the principles of utilizing ccache/clang/ninja apply to any software project in C/C++, and entr comes in handy in a myriad of situations.</p>
<p>Hopefully this inspires you to raise the bar on what to expect of speed and efficiency in the future!</p> Grokking the MariaDB test runner (MTR) https://optimizedbyotto.com/post/grokking-mariadb-test-run-mtr/Sun, 19 Feb 2023 00:00:00 +0000 https://optimizedbyotto.com/post/grokking-mariadb-test-run-mtr/ <img src="https://optimizedbyotto.com/post/grokking-mariadb-test-run-mtr/featured-image.jpg" alt="Featured image of post Grokking the MariaDB test runner (MTR)" /><p>The main test system in the MariaDB open source database project is the <a class="link" href="https://mariadb.com/kb/en/mysql-test-runpl-options/" target="_blank" rel="noopener"
>mariadb-test-run script</a> (inherited from <a class="link" href="https://dev.mysql.com/doc/dev/mysql-server/latest/PAGE_MYSQL_TEST_RUN_PL.html" target="_blank" rel="noopener"
>mysql-test-run</a>). It is easy to run and does not require you to compile any source code.</p>
<p>While writing MTR tests is relevant only for MariaDB developers, knowing how to run MTR is useful for any database administrator running MariaDB, as it is a <strong>quick way to validate that MariaDB can run correctly on your hardware and operating system version</strong>.</p>
<blockquote>
<h2 id="tldr-for-debianubuntu-users"><a href="#tldr-for-debianubuntu-users" class="header-anchor"></a>TL;DR for Debian/Ubuntu users
</h2><p>Run the full 6000+ test suite as current user in temporary directory with
multiple workers in parallel and with detailed logging on failures, typically
taking over 30 minutes on modern laptop:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">apt install -y mariadb-test mariadb-backup mariadb-plugin-* patch gdb
cd /usr/share/mysql/mysql-test
export MTR_PRINT_CORE=detailed
./mtr --force --parallel=auto --vardir=$(mktemp -d) \
--skip-test-list=unstable-tests.amd64 --big-test</code><pre><code>apt install -y mariadb-test mariadb-backup mariadb-plugin-* patch gdb
cd /usr/share/mysql/mysql-test
export MTR_PRINT_CORE=detailed
./mtr --force --parallel=auto --vardir=$(mktemp -d) \
--skip-test-list=unstable-tests.amd64 --big-test</code></pre></div>
<p>For full details, read the whole article.</p>
</blockquote>
<h2 id="install-mariadb-test-package-in-debianubuntu"><a href="#install-mariadb-test-package-in-debianubuntu" class="header-anchor"></a>Install ‘mariadb-test’ package in Debian/Ubuntu
</h2><p>To avoid polluting your actual system with new packages, it is convenient to run MTR in a throwaway container. Start one with some RAM-memory-backed disk allocated with <code>--shm-size=1G</code> so running MTR with <code>--mem</code> later on is possible:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">docker run --interactive --tty --rm --shm-size=1G debian:sid bash</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>docker run --interactive --tty --rm --shm-size<span style="color:#f92672">=</span>1G debian:sid bash</span></span></code></pre></div></div></div>
<p>This example uses <a class="link" href="https://en.wikipedia.org/wiki/Docker_%28software%29" target="_blank" rel="noopener"
>Docker</a>, but the principle is the same with <a class="link" href="https://en.wikipedia.org/wiki/OS-level_virtualization#Implementations" target="_blank" rel="noopener"
>any Linux container</a> tool, such as Podman.</p>
<p>Next, install the MariaDB test suite package. This will also pull in the MariaDB server and all the necessary dependencies. Additionally, also install <a class="link" href="https://en.wikipedia.org/wiki/GNU_Debugger" target="_blank" rel="noopener"
>the GNU Debugger (gdb)</a> for automatic stack traces and <a class="link" href="https://manpages.debian.org/unstable/patch/patch.1.en.html" target="_blank" rel="noopener"
>patch</a>.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">apt update
apt install -y mariadb-test mariadb-backup mariadb-plugin-* patch gdb</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>apt update
</span></span><span style="display:flex;"><span>apt install -y mariadb-test mariadb-backup mariadb-plugin-* patch gdb</span></span></code></pre></div></div></div>
<p>The mariadb-test-run is not intended to be run as root and will skip some tests if run as root. Therefore, create a new user inside the container and grant it permissions to the test directory:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">adduser --disabled-password mariadb-test-runner
chown -R mariadb-test-runner /usr/share/mysql/mysql-test</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>adduser --disabled-password mariadb-test-runner
</span></span><span style="display:flex;"><span>chown -R mariadb-test-runner /usr/share/mysql/mysql-test</span></span></code></pre></div></div></div>
<p>At minimum, the test runner user needs to be able to write to the path <code>/usr/share/mysql/mysql-test/var</code> (unless some other path is defined with <code>--vardir</code> and <code>--tmpdir</code>), but some tests also run <code>patch</code> to modify the test files on-the-fly, so just grant permissions to the whole test directory. As this is a throwaway container anyway, there is no need to be prudent with the permissions.</p>
<p>Next, switch to the test user and test directory and start the run with default settings:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">$ su - mariadb-test-runner
$ cd /usr/share/mysql/mysql-test
$ ./mariadb-test-run
Logging: ./mariadb-test-run
VS config:
vardir: /usr/share/mysql/mysql-test/var
Creating var directory '/usr/share/mysql/mysql-test/var'...
Checking supported features...
MariaDB Version 10.11.2-MariaDB-1
- SSL connections supported
- binaries built with wsrep patch
Using suites: main-,archive-,atomic-,binlog-,binlog_encryption-,client-,csv-,compat/oracle-,compat/mssql-,compat/maxdb-,encryption-,federated-,funcs_1-,funcs_2-,gcol-,handler-,heap-,innodb-,innodb_fts-,innodb_gis-,innodb_i_s-,innodb_zip-,json-,maria-,mariabackup-,multi_source-,optimizer_unfixed_bugs-,parts-,perfschema-,plugins-,roles-,rpl-,stress-,sys_vars-,sql_sequence-,unit-,vcol-,versioning-,period-,sysschema-,disks,func_test,metadata_lock_info,query_response_time,sequence,sql_discovery,type_inet,type_uuid,user_variables
Collecting tests...
...
main.subselect_innodb 'innodb' w4 [ pass ] 3359
main.subselect_sj2 'innodb' w1 [ pass ] 2737
main.subselect_sj2_jcl6 'innodb' w3 [ pass ] 2933
main.parser_bug21114_innodb 'innodb' w8 [ pass ] 15364
innodb_gis.rtree_search 'innodb' w5 [ pass ] 52379
--------------------------------------------------------------------------
The servers were restarted 1736 times
Spent 8527.863 of 1659 seconds executing testcases
Completed: All 5131 tests were successful.
995 tests were skipped, 280 by the test itself.</code><pre><code>$ su - mariadb-test-runner
$ cd /usr/share/mysql/mysql-test
$ ./mariadb-test-run
Logging: ./mariadb-test-run
VS config:
vardir: /usr/share/mysql/mysql-test/var
Creating var directory '/usr/share/mysql/mysql-test/var'...
Checking supported features...
MariaDB Version 10.11.2-MariaDB-1
- SSL connections supported
- binaries built with wsrep patch
Using suites: main-,archive-,atomic-,binlog-,binlog_encryption-,client-,csv-,compat/oracle-,compat/mssql-,compat/maxdb-,encryption-,federated-,funcs_1-,funcs_2-,gcol-,handler-,heap-,innodb-,innodb_fts-,innodb_gis-,innodb_i_s-,innodb_zip-,json-,maria-,mariabackup-,multi_source-,optimizer_unfixed_bugs-,parts-,perfschema-,plugins-,roles-,rpl-,stress-,sys_vars-,sql_sequence-,unit-,vcol-,versioning-,period-,sysschema-,disks,func_test,metadata_lock_info,query_response_time,sequence,sql_discovery,type_inet,type_uuid,user_variables
Collecting tests...
...
main.subselect_innodb 'innodb' w4 [ pass ] 3359
main.subselect_sj2 'innodb' w1 [ pass ] 2737
main.subselect_sj2_jcl6 'innodb' w3 [ pass ] 2933
main.parser_bug21114_innodb 'innodb' w8 [ pass ] 15364
innodb_gis.rtree_search 'innodb' w5 [ pass ] 52379
--------------------------------------------------------------------------
The servers were restarted 1736 times
Spent 8527.863 of 1659 seconds executing testcases
Completed: All 5131 tests were successful.
995 tests were skipped, 280 by the test itself.</code></pre></div>
<h2 id="defining-what-tests-to-run"><a href="#defining-what-tests-to-run" class="header-anchor"></a>Defining what tests to run
</h2><p>If no test suite is selected, MTR will run about 6000 tests, which on my laptop takes about 30 minutes. If MTR is started with the additional <code>--big-test</code> parameter, it will run additional tests that are resource intensive and consume, for example, a lot of RAM memory and take a long time to run, totalling a test run that has 6100 tests and takes 43 minutes to complete (on my laptop). To <em>only</em> run big tests, use <code>--big --big</code>.</p>
<p>If there is a need to limit the scope, such as in build systems that want to validate that the built binary works without running all tests, typically <code>--suite=main --skip-rpl</code> is used. This results in about 1000 tests being run, which on my laptop takes about 3½ minutes.</p>
<p>Even when running without any limitations on what tests are run, many tests have code that makes them opt out automatically based on some condition being missing, and on my laptop about 1000 tests end up being skipped. Some examples:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">main.connect2 [ skipped ] Requires debug build
main.mysql_client_test_comp [ skipped ] No IPv6
main.connect-abstract [ skipped ] Need Linux
main.grant_cache_ps_prot [ skipped ] Need ps-protocol
main.fix_priv_tables [ skipped ] Test need MYSQL_FIX_PRIVILEGE_TABLES
main.no-threads [ skipped ] Test requires: 'one_thread_per_connection'
main.udf_skip_grants [ skipped ] Need udf example
main.lowercase_mixed_tmpdir_innodb [ skipped ] Test requires: 'lowercase2'
main.innodb_load_xa [ skipped ] Need InnoDB plugin
mariabackup.alter_copy_excluded [ skipped ] No mariabackup
binlog.binlog_expire_warnings [ skipped ] Test needs --big-test
encryption.innodb-spatial-index [ skipped ] requires patch executable
plugins.pam_cleartext [ skipped ] Not run as user owning auth_pam_tool_dir
rpl.rpl_gtid_mdev4474 'innodb,row' [ skipped ] Neither MIXED nor STATEMENT binlog format</code><pre><code>main.connect2 [ skipped ] Requires debug build
main.mysql_client_test_comp [ skipped ] No IPv6
main.connect-abstract [ skipped ] Need Linux
main.grant_cache_ps_prot [ skipped ] Need ps-protocol
main.fix_priv_tables [ skipped ] Test need MYSQL_FIX_PRIVILEGE_TABLES
main.no-threads [ skipped ] Test requires: 'one_thread_per_connection'
main.udf_skip_grants [ skipped ] Need udf example
main.lowercase_mixed_tmpdir_innodb [ skipped ] Test requires: 'lowercase2'
main.innodb_load_xa [ skipped ] Need InnoDB plugin
mariabackup.alter_copy_excluded [ skipped ] No mariabackup
binlog.binlog_expire_warnings [ skipped ] Test needs --big-test
encryption.innodb-spatial-index [ skipped ] requires patch executable
plugins.pam_cleartext [ skipped ] Not run as user owning auth_pam_tool_dir
rpl.rpl_gtid_mdev4474 'innodb,row' [ skipped ] Neither MIXED nor STATEMENT binlog format</code></pre></div>
<p>If you are interested in one particular test, just give it as the last argument:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang"></span>
<button
class="codeblock-copy"
data-id="codeblock-id-6"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-6" style="display:none;">$ ./mariadb-test-run main.connect
Logging: ./mariadb-test-run main.connect
Creating var directory '/usr/share/mysql/mysql-test/var'...
Checking supported features...
MariaDB Version 10.11.2-MariaDB-1
- SSL connections supported
- binaries built with wsrep patch
Collecting tests...
Installing system database...
==============================================================================
TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------
main.connect [ pass ] 14206
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 14.206 of 20 seconds executing testcases
Completed: All 1 tests were successful.</code><pre><code>$ ./mariadb-test-run main.connect
Logging: ./mariadb-test-run main.connect
Creating var directory '/usr/share/mysql/mysql-test/var'...
Checking supported features...
MariaDB Version 10.11.2-MariaDB-1
- SSL connections supported
- binaries built with wsrep patch
Collecting tests...
Installing system database...
==============================================================================
TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------
main.connect [ pass ] 14206
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 14.206 of 20 seconds executing testcases
Completed: All 1 tests were successful.</code></pre></div>
<h2 id="optimize-the-mtr-run-for-speed"><a href="#optimize-the-mtr-run-for-speed" class="header-anchor"></a>Optimize the MTR run for speed
</h2><p>There are three parameters that can greatly improve how quickly MTR runs. The primary one is <code>--parallel=auto</code>, which will run MTR in parallel with as many workers as there are CPUs (by default MTR runs with just one worker). On my laptop, going from one MTR worker to 8 in parallel reduced the total run time for the <em>main</em> suite from 17 to about 3 minutes.</p>
<p>Another parameter is <code>--fast</code>, which will make the MTR kill all server processes violently without waiting for them to gracefully shutdown. The test run restarts the MariaDB server hundreds of times, so saving half a second on every shutdown results in the main suite completing 20 seconds faster.</p>
<p>The parameter <code>--mem</code> instructs MTR to make the directory <code>var/</code> a symbolic link to a subdirectory on the shared memory device (<code>/dev/shm</code>). This works if the container is started with <code>--shm-size</code> and has at least 350MB of space on the ramdisk. On my laptop, this further reduced the main test suite duration down to 2½ minutes.</p>
<h2 id="optimize-the-mtr-run-for-logging"><a href="#optimize-the-mtr-run-for-logging" class="header-anchor"></a>Optimize the MTR run for logging
</h2><p>When running a single test, one might use <code>--verbose</code> as an additional argument to see the commands that run in the test. It is also possible to define <code>--verbose --verbose</code> twice, but that makes the test run so verbose that it is unusable.</p>
<p>When running a suite of tests, you only want to have extra output visible if the test fails. If the server fails to start and a particular test doesn’t run at all, using <code>--verbose-restart</code> might be beneficial. If running the test in an automated system, one might want to save results in a JUnit-compatible XML file that can, for example, be <a class="link" href="https://docs.gitlab.com/ee/ci/testing/unit_test_reports.html" target="_blank" rel="noopener"
>rendered by GitLab CI</a>.</p>
<p>This is the command that several CI systems run MTR with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-7"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-7" style="display:none;">export MTR_PRINT_CORE=detailed
eatmydata perl -I. ./mariadb-test-run \
--force --testcase-timeout=120 --suite-timeout=540 --retry=3 \
--verbose-restart --max-save-core=1 --max-save-datadir=1 \
--parallel=auto --skip-rpl --suite=main \
--xml-report=mariadb-test-run-junit.xml</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>export MTR_PRINT_CORE<span style="color:#f92672">=</span>detailed
</span></span><span style="display:flex;"><span>eatmydata perl -I. ./mariadb-test-run <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --force --testcase-timeout<span style="color:#f92672">=</span><span style="color:#ae81ff">120</span> --suite-timeout<span style="color:#f92672">=</span><span style="color:#ae81ff">540</span> --retry<span style="color:#f92672">=</span><span style="color:#ae81ff">3</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --verbose-restart --max-save-core<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> --max-save-datadir<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --parallel<span style="color:#f92672">=</span>auto --skip-rpl --suite<span style="color:#f92672">=</span>main <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --xml-report<span style="color:#f92672">=</span>mariadb-test-run-junit.xml</span></span></code></pre></div></div></div>
<p>The command above also showcases use of <a class="link" href="https://manpages.debian.org/unstable/eatmydata/eatmydata.1.en.html" target="_blank" rel="noopener"
>eatmydata</a>, which makes <a class="link" href="https://manpages.debian.org/unstable/manpages-dev/fsync.2.en.html" target="_blank" rel="noopener"
>fsync</a> and similar system calls skip memory-to-disk guarantees, but in my testing with MTR, it didn’t affect speed.</p>
<h2 id="running-more-tests-for-mariadb"><a href="#running-more-tests-for-mariadb" class="header-anchor"></a>Running more tests for MariaDB
</h2><p>The above summarizes everything one typically needs to know for <em>running</em> the mariadb-test-run.</p>
<p>For a <a class="link" href="https://en.wikipedia.org/wiki/Database_administration" target="_blank" rel="noopener"
>DBA</a>, there are also several other tools that ship with MariaDB, which can help with testing and validating one’s own environment, such as <a class="link" href="https://dyn.manpages.debian.org/unstable/mariadb-test/mysql-stress-test.pl.1.en.html" target="_blank" rel="noopener"
>mariadb-stress-test</a> and <a class="link" href="https://manpages.debian.org/unstable/mariadb-client/mariadb-slap.1.en.html" target="_blank" rel="noopener"
>mariadb-slap</a>.</p>
<h2 id="writing-mtr-tests-for-mariadb"><a href="#writing-mtr-tests-for-mariadb" class="header-anchor"></a>Writing MTR tests for MariaDB
</h2><p>If you want to <em>write</em> a new test or fix an existing test, there are many more additional parameters to learn, such as <code>--record</code> and <code>--gcov</code>. The official <a class="link" href="https://mariadb.org/get-involved/getting-started-for-developers/get-code-build-test/" target="_blank" rel="noopener"
>contribution docs at mariadb.org</a> list some of the most useful MTR parameters. The <a class="link" href="https://mariadb.com/kb/en/generic-build-instructions/" target="_blank" rel="noopener"
>mariadb.com knowledge base article</a> lists them all. As with all commands in Linux, there is also the <a class="link" href="https://manpages.debian.org/unstable/mariadb-test/mysql-test-run.pl.1.en.html" target="_blank" rel="noopener"
>main page for mariadb-test-run</a>.</p>
<p>The structure of the tests is actually quite easy, and <strong>requires no C/C++ skills to write</strong>. Each test file (suffix <code>.test</code>) consists mainly of SQL code which is executed by <code>mariadb-test-run</code> (MTR), with the output compared (with <a class="link" href="https://manpages.debian.org/unstable/diffutils/diff.1.en.html" target="_blank" rel="noopener"
>diff</a>) to the corresponding file with the expected output in text format (suffix <code>.result</code>).</p>
<p>In my development <a class="link" href="https://optimizedbyotto.com/post/develop-code-10x-faster/" >workflow</a>, writing a test and running MTR might look something like this:
<img src="https://optimizedbyotto.com/post/quick-builds-and-rebuilds-of-mariadb-with-docker/mariadb-mtr-atom-autosave-entr-demo.gif"
loading="lazy"
alt="MariaDB test run automatic restart"
>
</p>
<p>If you want to contribute to the MariaDB open source project, extending the test coverage is a great place to start. To <em>scratch your own itch</em>, think about a MariaDB bug you encountered while using it, and consider if that can be reproduced as a test and submitted upstream so that it becomes part of the body of tests and thus easy to catch if it ever regresses again.</p>
<p>To learn more about writing mariadb-test-run tests, read the <a class="link" href="https://mariadb.org/get-involved/getting-started-for-developers/writing-good-test-cases-mariadb-server/" target="_blank" rel="noopener"
>test case authoring guide at mariadb.org</a>.</p> How to code 10x faster than an average programmer https://optimizedbyotto.com/post/develop-code-10x-faster/Sun, 29 Jan 2023 00:00:00 +0000 https://optimizedbyotto.com/post/develop-code-10x-faster/ <img src="https://optimizedbyotto.com/post/develop-code-10x-faster/featured-image.jpg" alt="Featured image of post How to code 10x faster than an average programmer" /><p>What is the key to being an efficient programmer? Well, the answer is surprisingly simple. Having a setup where you can write and test your code over and over in an uninterrupted flow will dramatically increase your productivity.</p>
<p>The cost of doing <em>just one more tweak</em> to make the code perfect should be as close to zero as possible. The developer should not feel any drain when doing <em>just one more test</em> to ensure everything is absolutely correct. The experience should be fast and frictionless.</p>
<h2 id="instant-run-change-re-run-cycle"><a href="#instant-run-change-re-run-cycle" class="header-anchor"></a>Instant run, change, re-run cycle
</h2><p>I always try to set up my development environment in a way that I can write <strong>code in one window, and <em>immediately</em> see the result in another</strong>. It does not matter if I am doing front-end or back-end development – I insist on having the code in one window and the result update in another window as soon as I press <em>Ctrl+S</em> or switch focus between windows.</p>
<p><img src="https://optimizedbyotto.com/post/develop-code-10x-faster/atom-autosave-entr-demo.gif"
width="1200"
height="615"
loading="lazy"
alt="Atom/Pulsar autosave and Entr in action"
class="gallery-image"
data-flex-grow="195"
data-flex-basis="468px"
>
</p>
<p>I achieve this with a combination of two great developer tools:</p>
<ul>
<li>
<p>The <del>Atom</del> <a class="link" href="https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/" >Pulsar code editor</a> with <a class="link" href="https://github.com/atom/autosave" target="_blank" rel="noopener"
>autosave</a> to automatically save code files.</p>
</li>
<li>
<p>The <a class="link" href="https://eradman.com/entrproject/" target="_blank" rel="noopener"
>Entr command-line tool</a> to restart programs automatically when files change.</p>
</li>
</ul>
<p>The basic usage of Entr is to list all files in your coding project with <a class="link" href="https://manpages.debian.org/unstable/findutils/find.1.en.html" target="_blank" rel="noopener"
>find</a> and pipe the list to <a class="link" href="https://manpages.debian.org/unstable/entr/entr.1.en.html" target="_blank" rel="noopener"
>entr</a> telling it what command to run when any of the files are updated:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">find * | entr python3 demo.py</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>find * | entr python3 demo.py</span></span></code></pre></div></div></div>
<p>If you are working on a long-running process which does not exit on every run, such as a server app, you might want to use the <code>-r</code> and <code>-z</code> parameters to make Entr restart the program:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">find * | entr -rz node server.js</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>find * | entr -rz node server.js</span></span></code></pre></div></div></div>
<p>Occasionally the workflow might need two commands, such as in this example compiling and running a demo program in C. That can be achieved with the <code>-s</code> parameter along with the commands as one quoted string:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">find * | entr -s "gcc demo.c -o demo; ./demo"</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>find * | entr -s <span style="color:#e6db74">"gcc demo.c -o demo; ./demo"</span></span></span></code></pre></div></div></div>
<p><img src="https://optimizedbyotto.com/post/develop-code-10x-faster/atom-autosave-entr-multiple-commands-demo.gif"
width="1200"
height="615"
loading="lazy"
alt="Atom/Pulsar autosave and Entr in action with GCC compilation and execution"
class="gallery-image"
data-flex-grow="195"
data-flex-basis="468px"
>
</p>
<p>But what if the full development cycle includes uploading the files to a remote server? No problem, Entr and Rsync can handle that as well, and with the addition of ts you will also see the timestamps of when Rsync last ran.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">find * | entr rsync -avz --delete-after * example.com:/path-to-target-dir/ | ts</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>find * | entr rsync -avz --delete-after * example.com:/path-to-target-dir/ | ts</span></span></code></pre></div></div></div>
<h2 id="browser-window-auto-reload"><a href="#browser-window-auto-reload" class="header-anchor"></a>Browser window auto-reload
</h2><p>The same principle applies to all development workflows, not just command-line stuff. For example this very blog is written in Markdown and is converted to static HTML pages with <a class="link" href="https://gohugo.io/" target="_blank" rel="noopener"
>Hugo</a>, which has a built-in command <code>hugo server</code> that serves the pages locally and automatically reloads pages thanks to <a class="link" href="https://www.npmjs.com/package/livereload-js" target="_blank" rel="noopener"
>livereload.js</a>.</p>
<p><img src="https://optimizedbyotto.com/post/develop-code-10x-faster/auto-hugo-server-reload-demo.gif"
width="1200"
height="615"
loading="lazy"
alt="Atom/Pulsar autosave and Entr in action with GCC compilation and execution"
class="gallery-image"
data-flex-grow="195"
data-flex-basis="468px"
>
</p>
<h2 id="real-time-feedback-from-code-editor"><a href="#real-time-feedback-from-code-editor" class="header-anchor"></a>Real-time feedback from code editor
</h2><p>While optimizing the <em>code-change-compile-run cycle</em> is the key to productivity, additional gains can also be achieved when the code editor gives real-time visual clues of what is going on and what might be an issue.</p>
<p>The screenshot below shows how the <a class="link" href="https://github.com/AtomLinter/linter-gcc" target="_blank" rel="noopener"
>GCC linter in <del>Atom</del> Pulsar</a> adds a red dot to the line with an issue, and underlines the exact section on the line. A popup shows in-context information when the cursor is in the problematic function call.</p>
<p>In the same screenshot, note also how nicely <del>Atom</del> Pulsar shows (with a yellow hint) the filename that has uncommitted changes, and inside the file we can see the lines that have changed. The dark grey communicates that a file is excluded from git tracking with <code>.gitignore</code> (in this case the <em>demo</em> binary, as we only want to have source code in git).</p>
<p><img src="https://optimizedbyotto.com/post/develop-code-10x-faster/atom-features-git-and-gcc-linter-demo.png"
width="894"
height="558"
srcset="https://optimizedbyotto.com/post/develop-code-10x-faster/atom-features-git-and-gcc-linter-demo_hu14367765195504544949.png 480w, https://optimizedbyotto.com/post/develop-code-10x-faster/atom-features-git-and-gcc-linter-demo.png 894w"
loading="lazy"
alt="Atom/Pulsar git and linter integrations in action"
class="gallery-image"
data-flex-grow="160"
data-flex-basis="384px"
>
</p>
<p>In <del>Atom</del> <a class="link" href="https://optimizedbyotto.com/post/pulsar-best-text-file-and-code-editor/" >Pulsar</a> I have linters for all the programming languages I use, and also <a class="link" href="https://www.shellcheck.net/" target="_blank" rel="noopener"
>Shellcheck</a> for bash scripts and <a class="link" href="https://yamllint.readthedocs.io/" target="_blank" rel="noopener"
>yamllint</a> for configuration files.</p>
<h2 id="what-sets-some-programmers-apart"><a href="#what-sets-some-programmers-apart" class="header-anchor"></a>What sets some programmers apart
</h2><p>During my career in software engineering, I’ve noticed that some coders simply have an instinct for what is <em>too slow</em>. While most people keep grinding on the path they set when trying to solve a problem, the more passionate programmers feel frustrated if their progress is too slow. An effective programmer will switch to optimize the speed of their progress – and only once they are happy with their velocity will they switch back to solve the original problem. This means they are eventually much faster at it and all future similar problems. Smart programmers have a great sense of when it is appropriate to stop the process, improve tooling, and then have the process run much faster.</p>
<p>Next time you catch yourself wasting time on doing something repetitive and slow, stop and ask yourself why you are doing it. A good programmer will always raise the bar for what they consider acceptable development velocity.</p> Resist the urge of the first solution https://optimizedbyotto.com/post/restist-the-urge-of-quick-solutions/Sun, 11 Dec 2022 00:00:00 +0000 https://optimizedbyotto.com/post/restist-the-urge-of-quick-solutions/ <img src="https://optimizedbyotto.com/post/restist-the-urge-of-quick-solutions/featured-image.jpg" alt="Featured image of post Resist the urge of the first solution" /><p>This is the biggest and most common mistake I see that prevents people from thinking clearly, and it is also the most difficult one to unlearn.</p>
<p>Too often people rush to the first solution they see. <em>It is just too easy.</em> But the risk of the first solution not being the best one is way too high. If you want to think clearly, innovate and solve problems in engineering or just in life in general, learn this.</p>
<p>Always resist the temptation to define the problem so that it fits the solution. It must always be the other way around: first learn as much as possible about the problem. Define it well. You must first know what problem you are solving before you can even start to think about potential solutions. If you rush to the first solution you come across, you will most likely focus too much on that particular solution, with the result that you become blind to the real problem and lose the ability to see a good solution that will actually solve the problem.</p>
<p><strong>Focus your effort on the problem.</strong> Once you fully understand the problem, the solution will follow almost naturally and without too much effort, since the insight you have gained about the problem will automatically direct you towards the right solution.</p>
<h2 id="how-to-resist-the-temptation"><a href="#how-to-resist-the-temptation" class="header-anchor"></a>How to resist the temptation
</h2><p>Resisting the urge is easier said than done. There are three things I practice to break away from my bias:</p>
<p><strong>1. Write it down.</strong> Putting it in writing forces you to produce coherent sentences and thus think it through. Seeing it in writing is a quick way to distance yourself from the issue and be your own first critic.</p>
<p><strong>2. Go for a walk.</strong> Take a timeout. Grab some fresh air to elevate your mood and attentiveness. Try to forget the issue for a while. This will allow you to approach the issue from a new angle when you return to it. Sleep on it, and maybe your unconscious self will process something while you sleep. Let time pass to increase the odds that you come up with a fresh revelation about the issue. Even if no new thoughts come, the fact that time passed without any new aspects surfacing will increase the odds that you have understood the issue well.</p>
<p><strong>3. Present to or teach somebody else.</strong> Find somebody who is smart and honest. Then either in writing or in person, explain the issue and all the relevant data points. When you try to convince somebody else of your point of view, your own brain will try to anticipate counterpoints, which forces you to do your research, and when you actually present to the other person, their feedback will help you solidify your solution. The feedback from the other person does not need to be correct. If it is, that is great, but even the mere fact that you defended an idea will let you know how you feel about it.</p>
<p>If you still feel inclined after this, <em>go for it</em>.</p> Ensuring software quality with GitLab CI – case MariaDB in Debian https://optimizedbyotto.com/post/gitlab-mariadb-debian/Sun, 02 Oct 2022 00:00:00 +0000 https://optimizedbyotto.com/post/gitlab-mariadb-debian/ <img src="https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-example.png" alt="Featured image of post Ensuring software quality with GitLab CI – case MariaDB in Debian" /><p>Of <strong>all</strong> CI systems I’ve used during my software development career, both as developer and manager, <a class="link" href="https://docs.gitlab.com/ee/ci/" target="_blank" rel="noopener"
>GitLab CI</a> has proven itself to be the overall best system out there.</p>
<p>First of all, for a CI system to fulfill its purpose, it needs to test every <a class="link" href="https://optimizedbyotto.com/post/good-git-commit/" >git commit</a> to validate that no code change breaks the test suite no matter how small the change is. Having code hosting and CI integrated in the same system is the obvious way it should be, and GitLab CI does that integration well on all levels from <a class="link" href="https://git-scm.com/" target="_blank" rel="noopener"
>git</a> command-line <a class="link" href="https://docs.gitlab.com/ee/user/project/push_options.html#push-options-for-gitlab-cicd" target="_blank" rel="noopener"
>option support</a> to repository permission control to user interface in every view.</p>
<p>For developers to pay attention to tests that stop passing after their code changes, there need to be automatic and easy to read notifications. GitLab CI does email and dozens of integrations, such as Slack messages.</p>
<p>For developers to quickly find the error, root cause it and deliver a proper fix, the CI pipeline needs to be visually clean yet offer options to drill into logs and build artifacts. GitLab CI checks all the boxes here.</p>
<p>Finally, when it is time to review a code change, the way GitLab CI integrates with <a class="link" href="https://docs.gitlab.com/ee/user/project/merge_requests/" target="_blank" rel="noopener"
>GitLab Merge Requests</a> is seamless. For example, a human reviewer can, after reading the code change, choose the action <em>Merge automatically if CI pipeline passed</em> without having to attend the pipeline. Brilliant time-saver.</p>
<p><img src="https://optimizedbyotto.com/post/gitlab-mariadb-debian/gitlab-merge-request-example.png"
width="1200"
height="793"
srcset="https://optimizedbyotto.com/post/gitlab-mariadb-debian/gitlab-merge-request-example_hu1808563607855547059.png 480w, https://optimizedbyotto.com/post/gitlab-mariadb-debian/gitlab-merge-request-example.png 1200w"
loading="lazy"
alt="Example merge request"
class="gallery-image"
data-flex-grow="151"
data-flex-basis="363px"
>
</p>
<p>GitLab is also open source, so if it is missing a feature that is critical for your software development work, you can add that feature yourself (or pay somebody to do it). Being open source also brings many other benefits and ultimately sets it apart from GitHub, its traditional closed source rival.</p>
<p>Last but not least, <a class="link" href="https://docs.gitlab.com/ee/user/" target="_blank" rel="noopener"
>GitLab has excellent documentation</a> anybody can dive into easily. Therefore, instead of duplicating that by explaining the general <a class="link" href="https://about.gitlab.com/features/" target="_blank" rel="noopener"
>GitLab features and benefits</a>, I will instead <strong>showcase how it is used in a real-life project: MariaDB Debian package maintenance</strong>.</p>
<h2 id="salsa---debians-gitlab-instance"><a href="#salsa---debians-gitlab-instance" class="header-anchor"></a>Salsa - Debian’s GitLab instance
</h2><p>Debian launched <a class="link" href="https://salsa.debian.org/" target="_blank" rel="noopener"
>salsa.debian.org</a> in early 2018 as a platform for Debian developers to host the source code of Debian packages. In July of 2018, <a class="link" href="https://salsa.debian.org/salsa-ci-team" target="_blank" rel="noopener"
>Salsa-CI</a>, which provides a standardized GitLab CI pipeline template for Debian packaging quality assurance, was launched. I adopted this in August 2018 for all the packages I maintain in Debian, of which by far the biggest is the <a class="link" href="https://mariadb.org/" target="_blank" rel="noopener"
>MariaDB database server</a>.</p>
<p>Over the years it has grown to a very extensive pipeline that, <strong>in addition to the inherited general Salsa-CI steps, also runs a variety of additional test jobs</strong>, including:</p>
<ul>
<li>Building MariaDB in parallel on multiple Debian releases and processor architectures</li>
<li>Building consumers of the MariaDB Client C library to ensure the interface stays stable</li>
<li>Upgrading old versions of MariaDB to the latest one, both full server upgrades and partial small upgrades, such as the client library upgrades</li>
<li>Upgrading various versions of MySQL, Percona and others to ensure that cross-upgrades from MariaDB variants work</li>
<li>Upgrading various combinations of Debian releases and MariaDB, simulating full system upgrades</li>
<li>Running static analysis to detect security issues and general software quality issues</li>
</ul>
<p><img src="https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-example.png"
width="2648"
height="1335"
srcset="https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-example_hu10652549863109246822.png 480w, https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-example_hu9306486898252685304.png 1024w, https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-example.png 2648w"
loading="lazy"
alt="Example pipeline"
class="gallery-image"
data-flex-grow="198"
data-flex-basis="476px"
>
</p>
<p>Not only does the pipeline do all this, but it is also optimized to use <a class="link" href="https://ccache.dev/" target="_blank" rel="noopener"
>ccache</a> and other techniques to run as fast as possible. For details on exactly what the pipeline does, one can simply read the file <a class="link" href="https://salsa.debian.org/mariadb-team/mariadb-server/-/blob/debian/latest/debian/salsa-ci.yml" target="_blank" rel="noopener"
><code>debian/salsa-ci.yml</code></a>. Normal GitLab CI uses the file <code>.gitlab-ci.yml</code> in the project root, but since in Debian packaging one is only allowed to modify files under the <code>debian</code> sub-directory, the file resides at this customized path.</p>
<p>The fact that the structure and all the steps the CI runs are defined in the code repository itself is very powerful. <strong>Anybody inspecting a pipeline run of a specific git commit can always find the specific version of the GitLab CI definition in the source code of that exact git commit itself.</strong> This is vastly superior to the structure e.g. Buildbot or Jenkins uses, where the pipeline is defined separately from the code it tests.</p>
<p>Not only does this make it much easier to read the pipeline steps, but it also makes contributing to the pipeline code as straightforward as filing a Merge Request on the repository, just like with any other file. The fact that the CI code and actual software code are together in the same repository makes it much easier to enforce a rule that CI must always pass, as any commit that changes the software behavior can at the same time also update the CI pipeline to account for that intentional change in behavior. Needless to say, GitLab CI works seamlessly with the <a class="link" href="https://salsa.debian.org/help/user/project/protected_branches" target="_blank" rel="noopener"
>protected branches</a> feature and Merge Requests in GitLab to ensure easy and sensible rules to <strong>enforce that the mainline always stays green</strong>.</p>
<p>Using standard GitLab CI features, there is also a <a class="link" href="https://salsa.debian.org/mariadb-team/mariadb-server/-/pipeline_schedules" target="_blank" rel="noopener"
>scheduled monthly rebuild</a> that reveals if an update in any of the dependencies causes the pipeline to fail in the absence of new code commits.</p>
<h2 id="case-example-mariadb-1069-incompatibility-with-percona-xtradb-57-temporary-table-format"><a href="#case-example-mariadb-1069-incompatibility-with-percona-xtradb-57-temporary-table-format" class="header-anchor"></a>Case example: MariaDB 10.6.9 incompatibility with Percona XtraDB 5.7 temporary table format
</h2><p>To illustrate GitLab CI in action, let’s take a look at a recent example in which it prevented breaking upgrades for (some) Debian users. In June 2022, the <a class="link" href="https://salsa.debian.org/mariadb-team/mariadb-server/-/pipelines/386929" target="_blank" rel="noopener"
>Salsa-CI pipeline for MariaDB 10.6.8</a> was all green and upgrades from Percona XtraDB 5.7 were passing flawlessly. However, after importing a new upstream minor maintenance release on August 16th, the <a class="link" href="https://salsa.debian.org/mariadb-team/mariadb-server/-/pipelines/411764" target="_blank" rel="noopener"
>Salsa-CI pipeline for MariaDB 10.6.9</a> started failing on the Percona upgrade. From the <a class="link" href="https://salsa.debian.org/mariadb-team/mariadb-server/-/jobs/3110875" target="_blank" rel="noopener"
>CI job log</a>, it was easy to see that MariaDB failed to start with the <code>/var/lib/mysql</code> data directory from Percona XtraDB 5.7. From the build artifacts, it was easy to inspect the precise error message from the MariaDB server (as build artifacts are configured to expire after 30 days, I can’t link to them for reference).</p>
<p><img src="https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-percona-job-3110875-failure.png"
width="1548"
height="1302"
srcset="https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-percona-job-3110875-failure_hu8037768173750809726.png 480w, https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-percona-job-3110875-failure_hu2608142498116434987.png 1024w, https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-percona-job-3110875-failure.png 1548w"
loading="lazy"
alt="MariaDB Server startup failure on upgrade from Percona XtraDB Server"
class="gallery-image"
data-flex-grow="118"
data-flex-basis="285px"
>
</p>
<p>This quickly led to filing <a class="link" href="https://jira.mariadb.org/browse/MDEV-29321" target="_blank" rel="noopener"
>MDEV-29321</a> on August 18th. As the failure was caught by CI, it was easy to provide the steps to reproduce in the bug report, along with logs and CI job references. This helped the upstream developer to immediately pinpoint the issue and post a patch to fix it, which was <a class="link" href="https://salsa.debian.org/illuusio/mariadb-server/-/jobs/3122466" target="_blank" rel="noopener"
>validated by a Salsa-CI test run</a> on a temporary development branch. The fix was <a class="link" href="https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/39fe420b64afa157108dbaced14821cc9652ff3d" target="_blank" rel="noopener"
>applied on mainline</a> in the Debian packaging repository of MariaDB on August 24nd.</p>
<p>Such a quick turnaround time would not have been possible without a good CI system. <strong>The process clearly benefited from the very clear user interface of GitLab CI</strong> that made it easy for all parties – even those who didn’t have prior experience of GitLab CI – to read the pipeline and inspect the logs.</p>
<h2 id="computers-are-good-at-repeating-the-same-tasks-over-and-over-humans-are-good-at-exploring-visual-things"><a href="#computers-are-good-at-repeating-the-same-tasks-over-and-over-humans-are-good-at-exploring-visual-things" class="header-anchor"></a>Computers are good at repeating the same tasks over and over; humans are good at exploring visual things
</h2><p>Through the years, Salsa-CI (GitLab CI) has proven incredibly valuable – it has been able to catch the tiniest packaging mistake immediately as the commit is pushed to Salsa (GitLab) and the git mainline stays in a condition that can be shipped at any time. This makes it possible to import new upstream releases at any given time, and to upload them to Debian with a high confidence that nothing will break. Before Salsa-CI, the MySQL and MariaDB packages had hundreds of open bugs in Debian (and also Ubuntu, which inherits the packages from Debian). <strong>Now genuine new bugs are rare, staying consistently in the lower tens.</strong></p>
<p>If you are developing software professionally but not using CI in all your projects, you should definitely start now. Computers excel at running repetitive tasks over and over, and CI is exactly that. No human would have the diligence to always test everything. Humans tend to test code changes only when they have doubts, and thus most <em>bugs slip in when a human makes a small change they don’t think can break anything – and then it breaks</em>. These cases always come as a surprise, and you don’t want to find it out in production use but rather delegate it to a CI system to validate everything.</p>
<p>What humans are good at is visual inspection. A large part of our brain cortex is devoted to vision, so we should leverage it. GitLab CI does a great job converting the repetitive CI test run results into pipelines with various colors and symbols, perfect for human consumption. Humans also have an eye for beauty and elegance, and I personally enjoy exploring the GitLab CI pipelines. If you haven’t already, please try it out, and enjoy the warm fuzzy feeling of seeing all green pipelines!</p>
<p><img src="https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-badges.png"
width="1542"
height="498"
srcset="https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-badges_hu14464022739993467323.png 480w, https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-badges_hu5879156277011565549.png 1024w, https://optimizedbyotto.com/post/gitlab-mariadb-debian/mariadb-salsa-ci-badges.png 1542w"
loading="lazy"
alt="GitLab badges on the Salsa-CI MariaDB project page"
class="gallery-image"
data-flex-grow="309"
data-flex-basis="743px"
>
</p>
<h2 id="so-easy-and-intuitive-you-can-start-using-it-without-reading-lengthy-manuals"><a href="#so-easy-and-intuitive-you-can-start-using-it-without-reading-lengthy-manuals" class="header-anchor"></a>So easy and intuitive you can start using it without reading lengthy manuals
</h2><p>This blog was just a sneak peek into what GitLab CI can do – there are so many useful features, such as scheduled pipeline runs (to detect regressions introduced from updated dependencies), automatic repository mirroring, and many UI goodies, such as badges. I recommend skimming through the <a class="link" href="https://docs.gitlab.com/ee/ci/" target="_blank" rel="noopener"
>GitLab CI documentation</a> and in particular the <a class="link" href="https://docs.gitlab.com/ee/ci/yaml/" target="_blank" rel="noopener"
><code>.gitlab-ci.yml</code> reference</a> and then just start playing around with it. GitLab.com has a free plan for basic use which (unlike GitHub.com) also includes private repositories.</p>
<p>In short, GitLab CI makes the life of a programmer (or a team of programmers) significantly more productive by ensuring that tests run all the time, failures are quick to detect, and the whole test system is handy to maintain and evolve together with the code. Using it feels like a breeze, both because the features work just like one would expect and because the user interface is very easy to navigate. There is rarely a need to read documentation to get started – just try GitLab CI today and discover its power yourself!</p>
<blockquote>
<p>Want to read more? See also the <a class="link" href="https://about.gitlab.com/blog/2023/09/19/debian-customizes-ci-tooling-with-gitlab/" target="_blank" rel="noopener"
>GitLab.com blog post “Debian customizes CI tooling with GitLab”</a></p>
</blockquote> Stop the senseless killing https://optimizedbyotto.com/post/stop-senseless-killing/Sun, 18 Sep 2022 00:00:00 +0000 https://optimizedbyotto.com/post/stop-senseless-killing/ <img src="https://optimizedbyotto.com/post/stop-senseless-killing/featured-image.jpg" alt="Featured image of post Stop the senseless killing" /><p>In over 22 years of working with Linux systems, I have often seen people use excessive force to kill various computer programs, causing unnecessary suffering.</p>
<p>A typical example would be:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-0"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-0" style="display:none;">$ killall -9 mysqld</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ killall -9 mysqld</span></span></code></pre></div></div></div>
<p>Please <strong>stop</strong> using this command. Both <code>killall</code> and the parameter <code>-9</code> are harmful when dealing with programs that store data, like the MySQL or MariaDB database, and better alternatives exist.</p>
<h2 id="avoid-killing-in-vain"><a href="#avoid-killing-in-vain" class="header-anchor"></a>Avoid killing in vain
</h2><p>The <code>9</code> means signal 9, which is SIGKILL. All standard Linux signals can easily be listed with:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-1"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-1" style="display:none;">$ kill -L
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ kill -L
</span></span><span style="display:flex;"><span> 1<span style="color:#f92672">)</span> SIGHUP 2<span style="color:#f92672">)</span> SIGINT 3<span style="color:#f92672">)</span> SIGQUIT 4<span style="color:#f92672">)</span> SIGILL 5<span style="color:#f92672">)</span> SIGTRAP
</span></span><span style="display:flex;"><span> 6<span style="color:#f92672">)</span> SIGABRT 7<span style="color:#f92672">)</span> SIGBUS 8<span style="color:#f92672">)</span> SIGFPE 9<span style="color:#f92672">)</span> SIGKILL 10<span style="color:#f92672">)</span> SIGUSR1
</span></span><span style="display:flex;"><span>11<span style="color:#f92672">)</span> SIGSEGV 12<span style="color:#f92672">)</span> SIGUSR2 13<span style="color:#f92672">)</span> SIGPIPE 14<span style="color:#f92672">)</span> SIGALRM 15<span style="color:#f92672">)</span> SIGTERM</span></span></code></pre></div></div></div>
<p>The Linux kernel sends most of these signals to the program itself, and we expect programs to have code to handle various signals. One exception is signal number 9 (<a class="link" href="https://www.gnu.org/software/libc/manual/html_node/Termination-Signals.html#index-SIGKILL" target="_blank" rel="noopener"
>SIGKILL</a>). We use the SIGKILL signal to terminate the program immediately. The program cannot handle, block, or ignore it. Therefore, it is always fatal. If <code>kill</code> is issued without parameters, it will default to signal 15 (<a class="link" href="https://www.gnu.org/software/libc/manual/html_node/Termination-Signals.html#index-SIGTERM" target="_blank" rel="noopener"
>SIGTERM</a>) and the Linux kernel will request the running process to terminate itself and clean file pointers and other resources it was using. Thus, always <em>prefer SIGTERM</em> and <em>only use SIGKILL as a last resort</em>.</p>
<p>Violently killing programs that deal with data storage might cause writing into a file to abruptly stop halfway, leading to inconsistent data or even unusable corrupted files. For example, even if the program is not a database or similar, senseless immediate killing leads to everything dropping mid-flight, potentially causing unnecessary harm to other programs communicating over the same network. In some cases, albeit rare, poorly terminated programs may turn into zombie processes, showing up in the Linux process listing as <code><defunct></code>. If that happens, the only way to clean them is to restart the entire operating system. So, please, do not kill without trying other means first.</p>
<h2 id="dont-be-senseless"><a href="#dont-be-senseless" class="header-anchor"></a>Don’t be senseless
</h2><p>Let’s dive into the first part of the command in the opening paragraph, which is also wrong.</p>
<p>The command <code>killall</code> has been in Unix and all its descendants, such as Linux, since 1983. It does the job. However, we also have modern tools, which I prefer, like <a class="link" href="https://en.wikipedia.org/wiki/Pkill" target="_blank" rel="noopener"
>pkill</a> from 1998.</p>
<p>The program <code>pkill</code> is superior because it has the parameter <code>-e</code> to display what is killed (or terminated). Those who still use <code>killall</code> but don’t want to be senseless also use variations of <code>ps ax | grep <processname></code> to see which processes are running before and after the killing.</p>
<p>Compare the clarity in these identical examples:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-2"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-2" style="display:none;">$ killall mysqld
$ killall mysqld
mysqld: no process found</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ killall mysqld
</span></span><span style="display:flex;"><span>$ killall mysqld
</span></span><span style="display:flex;"><span>mysqld: no process found</span></span></code></pre></div></div></div>
<p>The <code>killall</code> command remains silent if it did something and only reports errors. In contrast, <code>pkill</code> reports which process it terminated and is only silent if it didn’t do anything.</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-3"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-3" style="display:none;">pkill -e mysqld
mysqld_safe killed (pid 3911)
mysqld killed (pid 4011)
$ pkill -e mysqld
(nothing)</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>pkill -e mysqld
</span></span><span style="display:flex;"><span>mysqld_safe killed <span style="color:#f92672">(</span>pid 3911<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>mysqld killed <span style="color:#f92672">(</span>pid 4011<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>$ pkill -e mysqld
</span></span><span style="display:flex;"><span><span style="color:#f92672">(</span>nothing<span style="color:#f92672">)</span></span></span></code></pre></div></div></div>
<h2 id="the-bonus-of-being-sensible"><a href="#the-bonus-of-being-sensible" class="header-anchor"></a>The bonus of being sensible
</h2><p>A bonus for system administrators using <code>pkill</code> is they could learn something valuable by observing the effects of their commands. For example, a database sysadmin could encounter the following:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-4"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-4" style="display:none;">$ pkill -9 -e mariadbd
mariadbd killed (pid 5281)
$ pkill -9 -e mariadbd
mariadbd killed (pid 5383)
$ pkill -9 -e mariadbd
mariadbd killed (pid 5412)
$ pkill -e mariadbd
mariadbd killed (pid 5497)
$ pkill -e mariadbd
(nothing)</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ pkill -9 -e mariadbd
</span></span><span style="display:flex;"><span>mariadbd killed <span style="color:#f92672">(</span>pid 5281<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>$ pkill -9 -e mariadbd
</span></span><span style="display:flex;"><span>mariadbd killed <span style="color:#f92672">(</span>pid 5383<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>$ pkill -9 -e mariadbd
</span></span><span style="display:flex;"><span>mariadbd killed <span style="color:#f92672">(</span>pid 5412<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>$ pkill -e mariadbd
</span></span><span style="display:flex;"><span>mariadbd killed <span style="color:#f92672">(</span>pid 5497<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>$ pkill -e mariadbd
</span></span><span style="display:flex;"><span><span style="color:#f92672">(</span>nothing<span style="color:#f92672">)</span></span></span></code></pre></div></div></div>
<p>With MariaDB, if the main server abruptly dies (as with signal 9/SIGKILL), the wrapper <code>mysqld_safe</code> (if used) will detect that and automatically restart the server because it assumes it died during a crash and knows getting it back up and running is desirable. However, when the proper signal 15/SIGTERM is used (default in <code>pkill</code> when no signal is defined), the <code>mariadbd</code> process intentionally shuts itself down. The wrapper respects that and understands that automatically restarting it would not make sense.</p>
<h2 id="the-procps-package"><a href="#the-procps-package" class="header-anchor"></a>The procps package
</h2><p>The command <code>pkill</code> is part of the <a class="link" href="https://gitlab.com/procps-ng/procps" target="_blank" rel="noopener"
>procps suite</a>. You can find it by default on most Linux systems, but if you don’t have it, run <code>apt install procps</code> or <code>yum install procps</code> to get it. The suite also includes the command <code>pgrep</code>, which replaces manual <code>ps ax | grep <processname></code> invocations, as mentioned above.</p>
<p>An example of <code>pgrep</code> in action:</p>
<div class="codeblock ">
<header>
<span class="codeblock-lang">shell</span>
<button
class="codeblock-copy"
data-id="codeblock-id-5"
data-copied-text="Copied!"
>
Copy
</button>
</header>
<code id="codeblock-id-5" style="display:none;">$ pgrep -af mysqld
5577 /bin/sh /usr/bin/mysqld_safe
5676 /usr/sbin/mariadbd --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --skip-log-error --pid-file=/run/mysqld/mysqld.pid --socket=/run/mysqld/mysqld.sock
5677 logger -t mysqld -p daemon error</code><div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ pgrep -af mysqld
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">5577</span> /bin/sh /usr/bin/mysqld_safe
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">5676</span> /usr/sbin/mariadbd --basedir<span style="color:#f92672">=</span>/usr --datadir<span style="color:#f92672">=</span>/var/lib/mysql --plugin-dir<span style="color:#f92672">=</span>/usr/lib/mysql/plugin --user<span style="color:#f92672">=</span>mysql --skip-log-error --pid-file<span style="color:#f92672">=</span>/run/mysqld/mysqld.pid --socket<span style="color:#f92672">=</span>/run/mysqld/mysqld.sock
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">5677</span> logger -t mysqld -p daemon error</span></span></code></pre></div></div></div>
<p>To see the complete list of options, run <code>pkill --help</code> and <code>pgrep --help</code> or <code>man pkill</code> to read <a class="link" href="https://man7.org/linux/man-pages/man1/pgrep.1.html" target="_blank" rel="noopener"
>the man page</a>. I recommend all system administrators make using <code>--help</code> and <code>man <command></code> from the command line a general habit to act more sensibly in all command-line actions.</p>
<p>Most importantly, don’t use signal 9 (SIGKILL) on databases other than as the last resort. Please avoid any senseless killings!</p> Truth, health and wealth https://optimizedbyotto.com/post/truth-health-wealth/Sat, 13 Aug 2022 00:00:00 +0000 https://optimizedbyotto.com/post/truth-health-wealth/ <img src="https://optimizedbyotto.com/post/truth-health-wealth/featured-image.jpg" alt="Featured image of post Truth, health and wealth" /><p>With so many options to choose from and with so many solutions offered by others, each screaming <em>pick me!</em>, people too often lose sight of the actual goal. In the most complex system of them all, life itself, I find it much easier to navigate when the priorities are clear. Money is not number one. It is just a means to an end. In my opinion, the best choice of priorities, in this order, are: truth, health and wealth.</p>
<p><strong>1. Truth</strong> is about the ideas themselves. The meaning of everything. Don’t live in a lie. Don’t give in when you know it’s not the right thing to do. If you sacrifice truth, you will lose everything else.</p>
<p><strong>2. Health</strong> is about yourself and about humans around you. If you lose your health, then you lose your life. The only thing worth risking your health on is the truth.</p>
<p><strong>3. Wealth</strong> is what gives you power and influence over your environment. It is the tool you can use to effect change. As they say, money makes the world go around. But it is not always simply money; wealth comes in many forms and there are many ways to accumulate it. Knowledge is a form of wealth. Wealth is something everybody can have. The wealth of one does not have to take away from another.</p>
<p>Keep these priorities clear. If you do not prioritize the truth first and foremost, you will live in an illusion waiting to vanish. Wealth gained at the expense of the truth lacks a stable foundation and might fall apart in an instant. If you don’t have health, you can’t use your wealth.</p>
<p>Setting these three priorities in the correct order is the key to optimizing life.</p> The optimal https://optimizedbyotto.com/post/the-optimal/Mon, 08 Aug 2022 00:00:00 +0000 https://optimizedbyotto.com/post/the-optimal/ <img src="https://optimizedbyotto.com/post/the-optimal/featured-image.jpg" alt="Featured image of post The optimal" /><blockquote>
<p>Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.</p><span class="cite"><span>― </span><span>Antoine de Saint-Exupéry, aviator and author</span><cite></cite></span></blockquote>
<p>This is my favorite quote, and it applies particularly well to our contemporary world, where we have access to anything at any time. Any piece of information can be in front of us in an instant, and pretty much any physical object can reach our doorstep in a matter of days if we want it to. But having a lot of everything does not bring happiness – more often it just adds to the pile of garbage, whether in physical form or mental construction.</p>
<p>I love optimizing things. To me, it is a journey towards the truth. Having just enough, but not more, is what I seek in my work, in designing and building things where the construction material is thoughts and ideas that combine to form constructions of processes and software. I apply the same principles of optimization to myself, my physical body, my life, and the physical world around me.</p>
<p>This blog is a collection of small stories about how I optimized various things. Finding out what truly matters and focusing on the correct things, while leaving the rest out, leads to end results that can be described as elegant. This beauty can be found throughout nature and physics, where things that have found their optimal state are often the ones that we humans describe as elegant and serene. My optimal might not be optimal for everybody. What fits my environment might not be the best fit everywhere. But hopefully the stories of how I got there can spark inspiration in others regarding how to optimize in their own lives, in their own surroundings and situations.</p> About me https://optimizedbyotto.com/about/Mon, 01 Jan 0001 00:00:00 +0000 https://optimizedbyotto.com/about/ <p><img src="https://optimizedbyotto.com/about/otto-kekalainen.jpg"
width="395"
height="600"
srcset="https://optimizedbyotto.com/about/otto-kekalainen.jpg 395w"
loading="lazy"
alt="Otto"
class="gallery-image"
data-flex-grow="65"
data-flex-basis="158px"
>
</p>
<p>I am a results-driven technology executive with 25+ years of professional experience and a rich blend of strategic leadership, product development expertise, and deep-seated passion for open-source software. I am currently working as an independent consultant. I was previously a Software Development Manager for the core engine team delivering the Amazon RDS for MySQL and MariaDB database services. Before that, I was the CEO of <a class="link" href="https://seravo.com" target="_blank" rel="noopener"
>Seravo</a> until 2021, and the CEO of <a class="link" href="https://mariadb.org/" target="_blank" rel="noopener"
>MariaDB Foundation</a> until 2018.</p>
<p>I have a strong bias for action and a proven history of transforming and growing organizations by leveraging a unique blend of technological acumen and business strategy, which I hope to share in this blog. My goal is to help readers gain a deeper understanding of open source software, software engineering in general, management in technical fields, business improvement, life hacks, and much more.</p>
<p>I like spending my free time hiking, running and contributing to various open source projects. I am an active Debian and Ubuntu developer. See my <a class="link" href="https://github.com/ottok" target="_blank" rel="noopener"
>GitHub</a>, <a class="link" href="https://gitlab.com/ottok" target="_blank" rel="noopener"
>GitLab</a> and <a class="link" href="https://salsa.debian.org/otto" target="_blank" rel="noopener"
>Salsa</a> profiles for recent contributions.</p>
<p>You can follow me on <a class="link" href="https://mastodon.social/@ottok" target="_blank" rel="noopener"
>Mastodon</a>, <a class="link" href="https://twitter.com/OttoKekalainen" target="_blank" rel="noopener"
>Twitter</a>, <a class="link" href="https://warpcast.com/ottok" target="_blank" rel="noopener"
>Farcaster</a>, <a class="link" href="https://bsky.app/profile/ottoke.bsky.social/" target="_blank" rel="noopener"
>Bluesky</a> or connect on <a class="link" href="https://linkedin.com/in/ottokekalainen" target="_blank" rel="noopener"
>LinkedIn</a>.</p>
<h2 id="lets-connect"><a href="#lets-connect" class="header-anchor"></a>Let’s Connect
</h2><p>Interested in discussing open source strategy, Debian contributions, or how I can help your organization?</p>
<p><a class="link" href="https://cal.com/ottok" target="_blank" rel="noopener"
><strong>Book a chat with me on Cal.com</strong></a></p>
<p>You can schedule a quick 15-minute introduction call for free, or book a 1-hour consultation for in-depth problem-solving.</p>
<p>You can also reach me by e-mail to <em>otto at debian.org</em>.</p> Archives https://optimizedbyotto.com/archives/Mon, 01 Jan 0001 00:00:00 +0000 https://optimizedbyotto.com/archives/ Debian Mentoring https://optimizedbyotto.com/mentoring/Mon, 01 Jan 0001 00:00:00 +0000 https://optimizedbyotto.com/mentoring/ <img src="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/featured-image.jpg" alt="Featured image of post Debian Mentoring" /><h2 id="why-i-do-mentoring"><a href="#why-i-do-mentoring" class="header-anchor"></a>Why I do mentoring
</h2><p><a class="link" href="https://en.wikipedia.org/wiki/Debian" target="_blank" rel="noopener"
>Debian</a> is one of the oldest and largest Linux distributions, known for its technical quality and stability. If you are a software developer, but have never tried Debian, download <a class="link" href="https://cdimage.debian.org/debian-cd/current-live/amd64/bt-hybrid/debian-live-13.0.0-amd64-gnome.iso.torrent" target="_blank" rel="noopener"
>Debian 13 live</a>, <a class="link" href="https://www.debian.org/CD/faq/#live-cd" target="_blank" rel="noopener"
>make a bootable USB drive</a> and take it for a test drive.</p>
<p><a class="link" href="https://en.wikipedia.org/wiki/Ubuntu" target="_blank" rel="noopener"
>Ubuntu</a>, the world’s most popular Linux distribution, and many other popular contenders such as Linux Mint, MX, Pop!_OS and Zorin, are all <strong>based on Debian</strong>. Contributing to Debian is an excellent way to help improve all of these and make the Linux ecosystem better overall. That is why I have been contributing to Debian and Ubuntu since early 2000s. If you subscribe to the ideals and values of open source software and believe it makes the modern information society better for everyone, you should be contributing too.</p>
<p>In the spring of 2025 <a class="link" href="https://optimizedbyotto.com/post/full-time-open-source-developer/" >I quit my job to pursue working Debian development</a> and other open source projects full-time. I am <strong>also devoting some of my time to mentor new aspiring Debian</strong> (and Ubuntu) contributors. If you want me to be your <a class="link" href="https://wiki.debian.org/Mentors" target="_blank" rel="noopener"
>Debian Mentor</a>, feel free to send me an email to introduce yourself.</p>
<h2 id="what-i-expect-from-you"><a href="#what-i-expect-from-you" class="header-anchor"></a>What I expect from you
</h2><p>I am a good mentor for you if:</p>
<ul>
<li>
<p><strong>You are keen to learn</strong> what is the best possible way to do something, and want to do more than just the quick and easy thing.</p>
</li>
<li>
<p><strong>You have read my blog</strong>, in particular posts tagged <a class="link" href="https://optimizedbyotto.com/tags/debian/" >Debian</a>, <a class="link" href="https://optimizedbyotto.com/tags/git/" >Git</a> and <a class="link" href="https://optimizedbyotto.com/tags/open-source/" >Open Source</a>, and you already know the basics of high-quality software engineering.</p>
</li>
<li>
<p><strong>You tend to agree with the engineering principles I express in my blog</strong>, such as <em>writing good git commit messages is important</em> and that <em>continuous integration testing</em> and <em>code reviews</em> should be an integral part of modern open source development.</p>
</li>
<li>
<p><strong>You respect my time as a limited and valuable resource</strong>, and when I spend time reviewing your Merge Requests and writing feedback, you in return spend time reading and reflecting on the feedback properly.</p>
</li>
<li>
<p><strong>You use AI to enhance learning, not to replace it.</strong> It is not a good idea to use a LLM to generate a bunch of garbage and send it to somebody else to review. It is however a good idea to use LLMs to review your code and to help you debug issues and explain to you software engineering concepts and help find specific information.</p>
</li>
<li>
<p><strong>You act responsibly</strong> and understand that even though we are all volunteers and Debian won’t pay you any salary, volunteering is still a commitment and comes with responsibilities. Our open source work is public and intentionally bad acts will tarnish your reputation.</p>
</li>
<li>
<p><strong>You want to solve problems once and for all, literally.</strong> While working on a Debian package we often uncover problems that affect multiple other Debian packages, or that are inherited from the upstream project. Thanks to open source it is possible to find the source of any problem (pun intended) and submit a fix there, so let’s strive to always do that when feasible. It is initially more work, but will have a larger long-term impact. I operate with this mindset, and hope you adopt it too.</p>
</li>
</ul>
<h2 id="how-i-typically-do-mentoring"><a href="#how-i-typically-do-mentoring" class="header-anchor"></a>How I typically do mentoring
</h2><p>I am able to scale and efficiently mentor multiple people in parallel because I have a good workflow based on constantly checking <a class="link" href="https://salsa.debian.org/dashboard/merge_requests?reviewer_username=otto" target="_blank" rel="noopener"
>my open Merge Requests</a>, actively following up on them and jumping between them. I do <strong>not</strong> use the <a class="link" href="https://mentors.debian.net/" target="_blank" rel="noopener"
>mentors.debian.net</a> website, as I find <a class="link" href="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/" >using Merge Requests on Salsa</a> far more efficient for submitting, reviewing, re-submitting and re-reviewing Debian packages.</p>
<p>Therefore I ask all my mentees to start their Debian journey by signing up for an account at Salsa, and plan their work in a way that all packaging improvements, new package versions and totally new packages result in <a class="link" href="https://optimizedbyotto.com/post/debian-salsa-merge-request-best-practices/" >Merge Requests</a> we can collaborate on.</p>
<p>For instant messaging I prefer <a class="link" href="https://en.wikipedia.org/wiki/Matrix_%28protocol%29" target="_blank" rel="noopener"
>Matrix</a>, and I also enjoy video or voice calls and meeting in person if possible.</p>
<h2 id="where-to-start"><a href="#where-to-start" class="header-anchor"></a>Where to start?
</h2><p>If you haven’t read my blog before but are interested in my mentoring offer, you probably want to start by reading <a class="link" href="https://optimizedbyotto.com/post/debian-maintainer-habits/" >10 habits to help becoming a Debian maintainer</a>.</p>
<p>If you want to contribute to Debian, but don’t already have an existing <em>itch you want to scratch</em>, I recommend installing the <a class="link" href="https://manpages.debian.org/unstable/how-can-i-help/how-can-i-help.1.en.html" target="_blank" rel="noopener"
>how-can-i-help</a> tool on your Debian/Ubuntu system and running it to get a list of open issues regarding software on your system, so that if you help fix them, you will benefit from the improvements yourself directly. Open source is not about altruism but about freedom to improve things collaboratively with others and get better software for yourself.</p>