CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Describe the bug
Simultaneous invocations of "archivebox add" crash with a database locked error, or, with enough persistence, have mistakes with others.
Steps to reproduce
The first invocation of "archivebox add" works normally.
While it continues to run, subsqeuent invocations crash with a "database is locked" error at the point where they attempt to insert into the master index. Oddly, it seems they do manage to insert SOME data into the master index. Rerunning the add with the same source will cause the number of items to add to the master index to reduce, until eventually the second add starts executing as well (this may take dozens of attempts for large sets).
At that point, however, both "add" processes that are running begin to develop mysterious errors in the processing stages, and I am unsure of the reliability.
Screenshots or log output
The failed "archivebox add" error looks like this:
docker run --rm -i -v /opt/deleteme:/data archivebox/archivebox add < ~/tmp/foo
[i] [2021-07-05 20:13:01] ArchiveBox v0.6.2: archivebox add
> /data
[+] [2021-07-05 20:13:02] Adding 156589 links to index (crawl depth=0)...
> Saved verbatim input to sources/1625515982-import.txt
> Parsed 23649 URLs from input (Generic TXT)
> Found 3809 new URLs not already in index
[*] [2021-07-05 20:13:55] Writing 3809 links to main index...
Traceback (most recent call last):
File "/app/archivebox/index/sql.py", line 41, in write_link_to_sql_index
info["timestamp"] = Snapshot.objects.get(url=link.url).timestamp
File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 429, in get
raise self.model.DoesNotExist(
core.models.DoesNotExist: Snapshot matching query does not exist.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 589, in update_or_create
obj = self.select_for_update().get(**kwargs)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 429, in get
raise self.model.DoesNotExist(
core.models.DoesNotExist: Snapshot matching query does not exist.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute
return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: database is locked
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/archivebox", line 33, in <module>
sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())
File "/app/archivebox/cli/__init__.py", line 140, in main
run_subcommand(
File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore
File "/app/archivebox/cli/archivebox_add.py", line 103, in main
add(
File "/app/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/app/archivebox/main.py", line 602, in add
write_main_index(links=new_links, out_dir=out_dir)
File "/app/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/app/archivebox/index/__init__.py", line 232, in write_main_index
write_sql_main_index(links, out_dir=out_dir)
File "/app/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/app/archivebox/index/sql.py", line 88, in write_sql_main_index
write_link_to_sql_index(link)
File "/app/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/app/archivebox/index/sql.py", line 46, in write_link_to_sql_index
snapshot, _ = Snapshot.objects.update_or_create(url=link.url, defaults=info)
File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 594, in update_or_create
obj, created = self._create_object_from_params(kwargs, params, lock=True)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 610, in _create_object_from_params
obj = self.create(**params)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 447, in create
obj.save(force_insert=True, using=self.db)
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 753, in save
self.save_base(using=using, force_insert=force_insert,
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 790, in save_base
updated = self._save_table(
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 895, in _save_table
results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 933, in _do_insert
return manager._insert(
File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1254, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1397, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute
return Database.Cursor.execute(self, query, params)
django.db.utils.OperationalError: database is locked
I have observed that if the first downloader is doing something big, like downloading from Youtube, it is possible that the subsequent ones will proceed without an error.
ArchiveBox version
ArchiveBox v0.6.2
Cpython Linux Linux-5.10.0-0.bpo.7-amd64-x86_64-with-glibc2.28 x86_64
IN_DOCKER=True DEBUG=False IS_TTY=False TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep
[i] Dependency versions:
√ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox
√ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9
√ DJANGO_BINARY v3.1.10 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py
√ CURL_BINARY v7.64.0 valid /usr/bin/curl
√ WGET_BINARY v1.20.1 valid /usr/bin/wget
√ NODE_BINARY v15.14.0 valid /usr/bin/node
√ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file
√ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor
√ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js
√ GIT_BINARY v2.20.1 valid /usr/bin/git
√ YOUTUBEDL_BINARY v2021.04.26 valid /usr/local/bin/youtube-dl
√ CHROME_BINARY v90.0.4430.93 valid /usr/bin/chromium
√ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg
[i] Source-code locations:
√ PACKAGE_DIR 22 files valid /app/archivebox
√ TEMPLATES_DIR 3 files valid /app/archivebox/templates
- CUSTOM_TEMPLATES_DIR - disabled
[i] Secrets locations:
- CHROME_USER_DATA_DIR - disabled
- COOKIES_FILE - disabled
[i] Data locations:
√ OUTPUT_DIR 7 files valid /data
√ SOURCES_DIR 4 files valid ./sources
√ LOGS_DIR 1 files valid ./logs
√ ARCHIVE_DIR 8 files valid ./archive
√ CONFIG_FILE 136.0 Bytes valid ./ArchiveBox.conf
√ SQL_INDEX 3.6 MB valid ./index.sqlite3