You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MegaParse - Your Parser for every type of documents
MegaParse is a powerful and versatile parser that can handle various types of documents with ease. Whether you're dealing with text, PDFs, Powerpoint presentations, Word documents MegaParse has got you covered. Focus on having no information loss during parsing.
Key Features π―
Versatile Parser: MegaParse is a powerful and versatile parser that can handle various types of documents with ease.
No Information Loss: Focus on having no information loss during parsing.
Fast and Efficient: Designed with speed and efficiency at its core.
Note: The model supported by MegaParse Vision are the multimodal ones such as claude 3.5, claude 4, gpt-4o and gpt-4.
Use as an API
There is a MakeFile for you, simply use :
make dev
at the root of the project and you are good to go.
See localhost:8000/docs for more info on the different endpoints !
BenchMark
Parser
similarity_ratio
megaparse_vision
0.87
unstructured_with_check_table
0.77
unstructured
0.59
llama_parser
0.33
Higher the better
Note: Want to evaluate and compare your Megaparse module with ours ? Please add your config in evaluations/script.py and then run python evaluations/script.py. If it is better, do a PR, I mean, let's go higher together .
In Construction π§
Improve table checker
Create Checkers to add modular postprocessing βοΈ
Add Structured output, let's get computer talking π€
Star History
About
File Parser optimised for LLM Ingestion with no loss π§ Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.