gh-150662: Stop unbounded memory growth in Tachyon `--gecko` collector by maurycy · Pull Request #150845 · python/cpython

maurycy · 2026-06-03T12:03:54Z

The PR fixes an unbounded memory growth caused by:

cpython/Lib/profiling/sampling/gecko_collector.py

Lines 267 to 270 in 350e9de

    
           for t in times: 
        
               samples_stack.append(stack_index) 
        
               samples_time.append(t) 
        
               samples_delay.append(None)

It was reported in gh-150662 and the detailed idea for the fix by @pablogsal:

#150662 (comment)

Discussion

I don't think others collector have this issue. pstats, collapsed/flamegraph, heatmap, jsonl should just plateau. I've reviewed them. I pondered this for a day, and I don't think there's a better fix? It's not really crash-resillent safe. It likely doesn't matter here that much, as I'm really not sold on using Gecko for really long term profiling. Binary format is much better in this regard, and I've started experimenting with a different fix there. Perhaps we should encourage recording binary pattern more? The tests stay as is.

(No longer) Reproduction

2026-06-03T13:48:12.219584000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (gecko-ad-inf fcfb002*) % ./python.exe -c "
def work(): return sum(i*i for i in range(2000))
while True: work()
" & TARGET=$!

sudo ./python.exe -m profiling.sampling attach --gecko -r 10000 -d 900 -o /tmp/gecko.json $TARGET &
sleep 2; PROF=$(pgrep -fn "profiling.sampling attach")
for i in $(seq 15); do printf "t=%2dmin  RSS=%d MB\n" $i $(($(ps -o rss= -p $PROF|tr -d ' ')/1024)); sleep 60; done
[1] 80893
[2] 80894
t= 1min  RSS=30 MB
t= 2min  RSS=30 MB
t= 3min  RSS=30 MB
t= 4min  RSS=30 MB
t= 5min  RSS=30 MB
t= 6min  RSS=30 MB
t= 7min  RSS=30 MB
t= 8min  RSS=30 MB
t= 9min  RSS=30 MB
t=10min  RSS=30 MB
t=11min  RSS=30 MB
t=12min  RSS=30 MB
t=13min  RSS=30 MB
t=14min  RSS=30 MB
t=15min  RSS=30 MB
Captured 9,000,001 samples in 900.00 seconds
Sample rate: 10,000.00 samples/sec
Error rate: 27.59
Gecko profile written to /tmp/gecko.json
Open in Firefox Profiler: https://profiler.firefox.com/

[2]  + done       sudo ./python.exe -m profiling.sampling attach --gecko -r 10000 -d 900 -o

maurycy · 2026-06-03T12:07:27Z

+                yield chunk
+
+
+class NDJSONSpillColumn:


The story here is that I started with TypedSpillColumn, as in the idea from gh-150662, but array is not really a great fit for opcode markers, if we don't to maintain a separate serialization layer.

Reusing NDJSON name to avoid confusion with the --jsonl collector.

I think the best call would be to have only SpillColumn without array. It would massively simplify GeckoThreadSpill, but at the expense of 2-3 higher disk usage.

maurycy · 2026-06-03T12:08:36Z

+            "processType": thread_data["processType"],
+            "processName": thread_data["processName"],
+        }
+        file.write("{")


json would suck the file back into memory.

maurycy · 2026-06-03T12:17:34Z

+            self._prepare_for_serialization()
+            file = io.StringIO()
+            self._stream_profile(file)
+            return json.loads(file.getvalue())


_build_profile() is now a test helper. Maybe we should move it to tests?

maurycy added 3 commits June 3, 2026 13:46

taking a stab

fcfb002

news

57694cb

test

e0805d0

maurycy requested a review from pablogsal as a code owner June 3, 2026 12:03

bedevere-app Bot mentioned this pull request Jun 3, 2026

Tachyon --gecko collector memory growing ad infinitum #150662

Open

bedevere-app Bot added the awaiting review label Jun 3, 2026

maurycy commented Jun 3, 2026

View reviewed changes

maurycy added 2 commits June 3, 2026 15:15

this?

866b8fd

now?

a23814b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-150662: Stop unbounded memory growth in Tachyon `--gecko` collector#150845

gh-150662: Stop unbounded memory growth in Tachyon `--gecko` collector#150845
maurycy wants to merge 5 commits into
python:mainfrom
maurycy:gecko-ad-inf

maurycy commented Jun 3, 2026 •

edited

Loading

Uh oh!

maurycy Jun 3, 2026 •

edited

Loading

Uh oh!

maurycy Jun 3, 2026 •

edited

Loading

Uh oh!

maurycy Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	for t in times:
	samples_stack.append(stack_index)
	samples_time.append(t)
	samples_delay.append(None)

Uh oh!

Conversation

maurycy commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Discussion

(No longer) Reproduction

Uh oh!

maurycy Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maurycy Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maurycy Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maurycy commented Jun 3, 2026 •

edited

Loading

maurycy Jun 3, 2026 •

edited

Loading

maurycy Jun 3, 2026 •

edited

Loading

maurycy Jun 3, 2026 •

edited

Loading