Tuesday, March 26, 2024

Thunar sort order

For many years I have run Debian with xfce4 desktop.

I have always hated the way Thunar sorts files by name. The developers would say it is not a Thunar issue but rather a Gtk issue. But I really don't care. I hate it whatever the root cause. It is made worse by the fact that there is absolutely no configuration possible other than collation options (LC_ALL and LC_COLLATE) but these don't fix the problem.

So, today I dug in to fix it.

I tried several other file browsers, but they all have slight variations of the same nonsense and I didn't find any that sort file names sensibly (e.g. like ls does).

There are many bug reports against Thunar, Nautilus and several others, reporting that available sorting options are not satisfying.

There are so many, it makes me wonder why there isn't a simple plug-in option, to separate sorting from other aspects of the file browser, allowing people to easily develop the sorting algorithm they need. Gtk could provide this but so could the various file browsers. It seems all I investigated in any detail delegate sorting to Gtk and refuse to add any features to work around its limitations and stupidity (i.e. 'natural' sorting).

I read many bug reports and posts, with the conclusion that there is no configuration option to fix it.

I tried building Thunar from source, thinking I would hack on it to fix the sorting or add an interface to an external sort implementation, but there were too many dependencies and I got tired of installing them.

But in reading through all the bug reports, I came across GlibSortFileNameHackLibrary so I gave it a try. I cloned the repo and tried to build it.

On Debian Bookworm, despite having built various other packages, I still had to install libglib2.0-dev:

$ sudo apt install libglib2.0-dev

Then:

$ make all

This terminates with an error compiling test.c, but that's only a test program. The library builds OK.

Then, from a command window:

$ thunar -q; LD_PRELOAD=./glibSortFileNameHack.so thunar

This launched a new Thunar instance and files were sorted sensibly (as opposed to the Gtk idea of 'naturally'). It was wonderful! Finally, I can find files in Thunar without wasting time scrolling up and down to try to figure out where they have been misplaced.

I couldn't find trace of a Thunar background process running after I logged in (i.e. ps -ef showed nothing with 'thunar' in the command line). So I guessed there isn't one and didn't worry about why.

I use Whisker menu but don't really know how to configure it. When I tried prefixing the command for the file browser launcher with LD_PRELOAD=/path/to/glibSortFileNameHack.so, I got an error that LD_PRELOAD isn't executable. 

So, I made a bash script:

#!/bin/bash
LD_PRELOAD=/home/ian/lib/glibSortFileNameHack.so thunar

And I changed the launcher to run the script.

This seems to work fine. At least, I haven't noticed any problems yet.

Now every Thunar instance sorts files sensibly.

Kudos to Alexandre Richonnier for publishing GlibSortFileNameHackLibrary. It hasn't been updated in 9 years, but it still works a treat!

Sunday, April 2, 2023

@ig3/srf - Spaced Repetition Flashcards

It is 2 years since I gave up on Anki and wrote @ig3/srf.

I have used it to study almost very day, for 2 years now. I have changed the scheduling algorithm several times. Sometimes little tweaks of parameters and sometimes fundamental changes to the algorithm.

The most recent big change was to eliminate the percentage of correct answers as a factor in calculating the new interval. Instead, for each review of a card with interval longer than the learning threshold, the interval and due date of every card with interval longer than the learning threshold is adjusted, according to the difference between percent correct and the target percent correct. This decreases the delay of feedback. It's early days, but it seems to be working.

Overall I am very pleased with @ig3/srf. I am limited by my poor memory but not overwhelmed and not feeling like a failure. At least, not most days. There are some days, usually after a few days without sleep, that it truly seems hopeless. But then I get some rest and get back on track.

I have put a lot of time into development @ig3/srf over the past two years, but I am confident that it was less effort for a better outcome than trying to maintain my Anki plugin to improve Anki's algorithm. I don't regret my decision at all.

Thursday, March 23, 2023

apt update failing with connection failed

Today, attempting apt update was failing:

Hit:1 http://deb.debian.org/debian bullseye InRelease
Err:2 http://security.debian.org/debian-security bullseye-security InRelease
  Connection failed [IP: 151.101.166.132 80]
Err:3 http://deb.debian.org/debian bullseye-updates InRelease
  Connection failed [IP: 151.101.166.132 80]
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
W: Failed to fetch http://deb.debian.org/debian/dists/bullseye-updates/InRelease  Connection failed [IP: 151.101.166.132 80]
W: Failed to fetch http://security.debian.org/debian-security/dists/bullseye-security/InRelease  Connection failed [IP: 151.101.166.132 80]
W: Some index files failed to download. They have been ignored, or old ones used instead.
I was able to access the resources in my browser, which was redirected to https.

So I updated /etc/apt/sources.list, changing http to https throughout and then apt update completed without errors.

Hit:1 https://deb.debian.org/debian bullseye InRelease
Get:2 https://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:3 https://security.debian.org/debian-security bullseye-security InRelease [48.4 kB]
Get:4 https://security.debian.org/debian-security bullseye-security/main Sources [192 kB]
Get:5 https://security.debian.org/debian-security bullseye-security/main amd64 Packages [236 kB]
Get:6 https://security.debian.org/debian-security bullseye-security/main Translation-en [154 kB]
Fetched 675 kB in 5s (138 kB/s)                            
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
68 packages can be upgraded. Run 'apt list --upgradable' to see them.

The original sources had been working for well over a year. It seems http is no longer supported.




Monday, March 13, 2023

@ig3/couchapp is deprecated

 @ig3/couchdb is deprecated.

It works with CouchDB 3.x but vhost and rewrite rules are deprecated and planned to be removed from CouchDB 4. Without them, couchapps will not work.

The primary feature of a couchapp is that it is served from a CouchDB server. No other server is required.

The concept of a CouchApp arose with CouchDB. But, as the CouchDB docs indicate, CouchApps are deprecated:

Note: Previously, the functionality provided by CouchDB’s design documents, in combination with document attachments, was referred to as “CouchApps.” The general principle was that entire web applications could be hosted in CouchDB, without need for an additional application server.

Use of CouchDB as a combined standalone database and application server is no longer recommended. There are significant limitations to a pure CouchDB web server application stack, including but not limited to: fully-fledged fine-grained security, robust templating and scaffolding, complete developer tooling, and most importantly, a thriving ecosystem of developers, modules and frameworks to choose from.

The developers of CouchDB believe that web developers should pick “the right tool for the right job”. Use CouchDB as your database layer, in conjunction with any number of other server-side web application frameworks, such as the entire Node.JS ecosystem, Python’s Django and Flask, PHP’s Drupal, Java’s Apache Struts, and more.

Several tools have been written to automate building and deploying CouchApps. The earliest I know about were developed by Chris Anderson. Some history is available in the post: What is Couchapp?.

I had been using a node based implementation of the couchapp tool but when I set up a new CouchApp early 2022 I found it no longer worked. I forked it and released @ig3/couchapp but learned that CouchDB had deprecated the features that make CouchApps possible.

So I rebuilt my CouchApp / couchapp based apps using a combination of nginx and CouchDB.

It isn't hard.

Everything comes from nginx:

 * static content is served by nginx directly

 * CouchDB access is proxied by nginx

 * Rewrite rules are written in nginx

 A very simple app might be configured in nginx as:

server {
  server_name mycoucnapp.example.com;
  listen 443 ssl http2;

  client_max_body_size 8M;

  location / {
    root /usr/local/data/mycouchapp/attachments;
    try_files $uri /index.html;
  }

  location /couchdb {
    proxy_pass http://localhost:5984/;
  }
  location /couchdb/ {
    proxy_pass http://localhost:5984/;
  }
}
 
Note the trailing slashes on the URLs to proxy_pass. They are 
significant. Without them, the entire request path is passed to CouchDB.
With them, the / replaces the location (/couchdb or /couchdb/). 



Thursday, February 2, 2023

How I study Chinese

 Just a few notes on how I study Chinese.


I study daily using Spaced Repetition Flashcards (a.k.a. srf). About an hour per day.

I started using Anki, but I found the algorithm didn't work for me, causing me to become overwhelmed with big backlogs that I couldn't get through and I had to manually adjust new cards. The algorithm of srf works better for me and it regulates new cards automatically.

But there are many programs for presenting flashcards and any of them can work.

Because I started with Anki, I started with a few Anki decks. One of characters, one of Chinese provinces and cities, and one of words and sentences. I have added many of my own cards, mostly from videos I have watched.

I find the sentences harder to learn initially but easier in the long term. Individual characters are too abstract, with too many unrelated meanings. As I progress, it seems more important and more effective to be able to recognize them in the context of words (combinations of two or more characters) and sentences. Even if I knew all the meanings related to each individual character, it often leaves me unsure about the meaning of a word or sentence or, worse, with the wrong interpretation.

I watch a lot of Chinese videos with English and Chinese subtitles. Mostly modern drama with, I hope, modern vocabulary and accent.

A few years ago, when I started, I had a lot of difficulty finding good videos with clear audio and good subtitles. More recently there is a lot of good content. YouTube has many channels of good content. More than I can watch. I sometimes watch videos from Chinese sites, but many of them aren't available to me or require setting up accounts, so I don't bother. There's enough content on the free sites.

I have begun reading. I use Calibre to maintain a library of epub books but I view them in my Firefox browser using calibre-web to serve them and the @ig3/zhongwen add-on for lookup of Chinese characters.

The @ig3/zhongwen add-on is available from the Firefox add-ons website.

I use @ig3/zhongwen rather than the Zhongwen add-on because @ig3/zhongwen supports lookup of content in iframes and calibre-web presents the epub content in iframes, so the Zhongwen add-on doesn't work. I have also improved the positioning of the pop-up in @ig3/zhongwen so it doesn't cover the selected characters.

With a local Calibre library and calibre-web running on my laptop, I can read Chinese with character lookup even when I'm offline.

I'm old. My memory isn't as good as it used to be. But I am making progress learning Chinese. The stats in srf make this clear and also, when I watch videos or listen to conversations of native Chinese speakers, I can't yet follow but I begin to catch words and phrases and understand them without consciously translating them. It's always a thrill when I clearly get the meaning without having to think about it.

I only use Firefox, not Chrome, Safari, Opera or any other browser. While calibre-web might work from other browsers, I haven't tried and don't intend to support using @ig3/zhongwen from other browsers.  In particular, I have not published it as a Chrome add-on, as the original Zhongwen add-on is. I have submitted a patch to the Zhongwen add-on maintainer, for support of content in iframes, but have had no response to it and it hasn't been merged. Since then, I have made several more changes to @ig3/zhongwen, including a revision of how iframes are supported, making support more reliable.

I also produce my own epub documents for study. I first put the text into Markdown files then I use a simple script to convert collections of Markdown files (one per chapter) into an epub file that I can load into my Calibre library and review with calibre-web and the @ig3/zhongwen add-on.

There's plenty of Chinese content on the Internet, including many books and novels available as epub documents or easily converted to epub documents. The @ig3/zhongwen add-on doesn't work directly on all sites. Some don't allow selection of text at all. I also find it convenient to be able to review when I'm offline, which I can do if I have copied the text to make my own epub documents. A little collection that I can review as many times as I need to.

While @ig3/zhongwen is easy and very helpful, its definitions don't always help and miss the broader context. Sometimes, for example, I don't know how to break down a sentence into the correct words and phrases. While it is probably obvious to a Chinese person, I often find that there are several overlapping possibilities. I find Google Translate very helpful for understanding the meaning and structure of whole sentences.

With Google Translate, what I sometimes do is copy in a whole sentence, then split it onto multiple lines in the input box to make sure I am correct about what the words and phrases are.

The MDBG website, from which the dictionary used by @ig3/zhongwen comes, is very helpful, with various lookup functions. 

The stroke diagrams on the MDBG website are great! Though the count of strokes is sometimes nonsense. I think the discrepancies of stroke counts come from MDBG adding up counts of radicals in their traditional forms when the character has them in simplified form, with fewer strokes. Or sometimes simply errors. I try not to get hung up on the stroke counts, but I do try to learn the 'correct' stroke order and direction. Teaching sites for hand writing seem to emphasize stroke order and direction very much.

How to clean greasy scouring pads (Scotch-Brite and equivalent)

 I use scouring pads to clean pots and pans in the kitchen. They are like magnets for oil and grease which gets into them and won't come out with dish detergent. For a long time I was throwing them out when they became greasy but I didn't like the waste.

Today I discovered that a bit of caustic soda / sodium hydroxide removes the oil and grease very quickly and doesn't seem to damage the pads at all, leaving them clean and oil free.

Put a little cold water in a small pot. Add a little caustic soda - maybe half a teaspoon. Put in the pad and heat to a simmer for a few minutes, stirring occasionally.

Be very careful with the caustic soda. It is a powerful chemical that can cause burns. Instructions say to add it to cold water, not hot. When it mixes with water, it produces a lot of heat. Instructions say NEVER ADD WATER TO CAUSTIC SODA. Always add caustic soda to cold water.

Fool that I am, I added the caustic soda to hot water. Only a little at a time. Each time, it boiled up. Good thing it was only a little. I'll start with cold water next time.

But almost immediately, I could see the oil and grease coming out of the pad. The water went dark with it. I left it to simmer for maybe 5 minutes, stirring it around a few times. Then I rinsed it out with lots of cold water before touching it.

Caustic soda is a strong base. It can burn you. Handle it carefully. Don't get it on your skin. Don't get water with too much caustic soda on your skin. Definitely don't let it splash in your eyes. Rinse with plenty of water.

 


Thursday, September 29, 2022

Flush node's process.stdin

I wrote an interactive program. It uses read to prompt the user.

But it suffered from type ahead buffering, particularly at a prompt for confirmation to proceed. An accidental extra <CR> entered before the prompt, while waiting for some slow processing to occur, would be processed after the prompt and select the default, which wasn't good whether the default was to confirm or reject.

I needed a way to ensure that only something entered after the prompt would be accepted as a response to the prompt. In other words: defeat the type ahead - flush / purge the stdin buffer before prompting for the confirmation.

For other prompts the type ahead was fine. A set of routine prompts that the user might become very familiar with and type ahead, knowing what they want, despite the slow processing. But not so good for the confirmation.

After much searching I couldn't find a simple way to flush the input buffer.

So, I did it the hard way:

function flushStdin () {
  return new Promise((resolve, reject) => {
    let n = 0;
    const interval = setInterval(() => {
      const chunk = process.stdin.read();
      if (chunk === null) {
        if (++n > 3) {
          clearInterval(interval);
          resolve();
        }
      } else {
        n = 0;
      }
    }, 0);
  });
}

It probably isn't the best way, but it is the best way I have found so far and it seems to work OK. It's human interaction, speed isn't a great concern.

 I have tried many variations. Most surprising was this:

function flushStdin () {
  return new Promise((resolve, reject) => {
    process.stdin.resume();
    setTimeout(() => {
      resolve();
    }, 10);
  });
}

This works somewhat, except that it takes time to flush all the input so with a timeout of 0 it doesn't work or with no timeout: calling resolve synchronously, in the same phase of node's event loop. But wait long enough and all the input will be flushed. The problem being that the time required to flush it all is indeterminate: it depends on how much input is buffered. And there is no way to check how much data is in the buffer.

While I don't find documentation of it and I haven't read the node source code to find out what it actually does, my guess, based on observations is:

The Linux terminal driver is buffering input. Absent a pending read, the terminal driver will buffer some amount of data. I don't know how much. There must be a limit. Eventually the input must be blocked.

It appears node gets data from the terminal driver in line mode: it reads one line of input and buffers it. So process.stdin.readableLength never sees more than the length of one line of input plus the terminating linefeed. Node doesn't fetch the next line from the terminal driver until the current line has been read.

So, even if there are several lines of input buffered by the terminal driver, it takes several iterations of reading a line and processing the line before the buffer is drained and I have found no interface in node for inspecting how much data is in the terminal driver buffer.

It seems odd to me that resuming the input without a data event listener or anything else to read the input actually flushes the input, given sufficient time, but doesn't interfere with subsequent reads (e.g. by readline). Yet it does.

The call to resume causes a resume event to be emitted, followed by a series of readable and data events: one pair of events for each line of input and, contrary to my understanding of what the documentation says, these events are emitted despite there being no listeners for them. Effectively, the buffer is cleared and the data discarded even though nothing read the data. But it takes time.

If you too are curious, you might try this:

const oldEmitter = process.stdin.emit;

process.stdin.emit = function () {
  const emitArgs = arguments;
  console.log(Date.now(), 'emit: ', arguments);
  oldEmitter.apply(process.stdin, arguments);
};

This lets you observe the events emitted from process.stdin. It must interfere with them somewhat. It takes time to write to the console. But my understanding is that it is synchronous: at least, the output is written to an output buffer synchronously, even if it doesn't immediately appear on the display (e.g. if the display is connected by a low speed tty).

But this is mostly speculation: deductions based on observations of various tests using various methods and properties of process.stdin and, via the read package, I think the readline interface.

Node documentation says two seemingly inconsistent things about process.stding when connected to a TTY:

In TTY it says:

When Node.js detects that it is being run with a text terminal ("TTY") attached, process.stdin will, by default, be initialized as an instance of tty.ReadStream and both process.stdout and process.stderr will, by default, be instances of tty.WriteStream. The preferred method of determining whether Node.js is being run within a TTY context is to check that the value of the process.stdout.isTTY property is true.

But in process.stdin it says:

The process.stdin property returns a stream connected to stdin (fd 0). It is a net.Socket (which is a Duplex stream) unless fd 0 refers to a file, in which case it is a Readable stream.

So, which is it? Is it an instance of tty.ReadStream or an instance of net.Socket? Or do they deem that the returned object is at the same time an instance of both? Is a tty.ReadStream an instance of net.Socket?

 

Labels