The Journey of Building a JSDoc Parser: From AI Documentation Chaos to Open Source Solution

By Suman Saurabh - May 26, 2025

Introduction

In the early days of working with AI-powered documentation tools, one of the most frustrating challenges we encountered was ensuring that generated documentation followed specific patterns and standards. This was particularly acute when working with JSDoc, where consistency, proper formatting, and adherence to established conventions are critical for maintaining code quality and developer experience.

What started as a simple need to validate AI-generated JSDoc comments evolved into a comprehensive parser and composer library that we've now open-sourced. This blog chronicles that journey—the technical challenges we faced, the architectural decisions we made, and the fundamental building blocks that power our JSDoc parser.

The Problem: When AI Gets Creative with Documentation

When we first started using LLMs to generate documentation, the results were... interesting. While the content was often accurate, the formatting was inconsistent:

Parameters would sometimes lack type information
Return statements would be formatted differently across functions
Complex types like generics and unions were handled inconsistently
Nested parameters and properties were often malformed
Examples would be structured differently each time

Here's what we were dealing with:

javascript

/**
 * Some function description
 * @param name The user name
 * @param {Object} options - Configuration object
 * @param options.timeout {number} - Timeout value
 * @returns Promise<User> | null
 * @example
 * createUser("john")
 * @example
createUser("jane", { timeout: 5000 })
 */

While technically valid JSDoc, this inconsistency made it difficult to:

Parse documentation programmatically
Validate documentation completeness
Maintain consistent styling across codebases
Generate reliable API documentation

The Solution: A Robust JSDoc Parser and Composer

We decided to build a comprehensive solution that could both parse existing JSDoc strings into structured data and compose new JSDoc strings from structured objects. This two-way capability would allow us to:

Parse inconsistent JSDoc into a standardized format
Validate and modify the structured data
Compose it back into properly formatted JSDoc

Core Architecture

Our parser follows a three-stage pipeline:

JSDoc String → Structured Dictionary → JSDoc String
     ↑                   ↓                    ↑
  parse_jsdoc()    [manipulation]    compose_jsdoc()

Deep Dive: The Parser Implementation

Let's examine the core components of our parser:

1. Main Parser Function

The heart of our parser is the parse_jsdoc() function:

python

def parse_jsdoc(docstring: str) -> Dict[str, Any]:
    """Parse a JSDoc string into a structured dictionary."""
    result = {
        'description': '',
        'params': [],
        'returns': None,
        'throws': [],
        'examples': [],
        'tags': {}
    }
    
    # Clean up the docstring
    docstring = docstring.strip()
    
    # Remove the opening and closing markers /** and */
    if docstring.startswith('/**'):
        docstring = docstring[3:]
    if docstring.endswith('*/'):
        docstring = docstring[:-2]
        
    # Split into lines and clean them up
    lines = [line.strip() for line in docstring.split('\n')]
    lines = [re.sub(r'^[ \t]*\*', '', line).strip() for line in lines]
    
    # Process the lines
    current_tag = None
    current_content = []

    for line in lines:
        # Check if the line starts with a tag
        tag_match = re.match(r'^@(\w+)\s*(.*)', line)

        if tag_match:
            # Process the previous tag if there was one
            if current_tag:
                _process_tag(current_tag, current_content, result)

            # Start a new tag
            current_tag = tag_match.group(1)
            current_content = [tag_match.group(2)]
        elif current_tag:
            # Continue with the current tag
            current_content.append(line)
        else:
            # This is part of the description
            if line:
                if result['description']:
                    result['description'] += '\n' + line
                else:
                    result['description'] = line

    # Process the last tag if there was one
    if current_tag:
        _process_tag(current_tag, current_content, result)
    
    return result

2. Complex Type Extraction

One of the most challenging aspects was handling complex type definitions. We built a robust type extraction system that handles nested braces:

python

def _extract_type_from_braces(content: str) -> Tuple[Optional[str], str]:
    """Extract a type definition from curly braces, handling nested braces."""
    if not content.startswith('{'):
        return None, content
        
    # Count braces to handle nested structures
    brace_count = 0
    for i, char in enumerate(content):
        if char == '{':
            brace_count += 1
        elif char == '}':
            brace_count -= 1
            if brace_count == 0:
                # Found the closing brace
                extracted = content[1:i].strip()
                return None if not extracted else extracted, content[i+1:].strip()
    
    # No matching closing brace found
    return None, content

This function correctly handles complex types like:

{Map<string, Array<number>>}
{Promise<{id: string, data: Object}>}
{function(string, number): boolean}

3. Parameter Processing

Processing parameters was particularly complex due to the variety of valid JSDoc parameter formats:

python

def _process_tag(tag: str, content: List[str], result: Dict[str, Any]) -> None:
    """Process a JSDoc tag and update the result dictionary."""
    # ... (content joining logic)
    
    if tag == 'param' or tag == 'argument' or tag == 'arg':
        # First extract the type if present using brace matching
        param_type, remaining = _extract_type_from_braces(content_str)
        
        if param_type is not None or content_str.startswith('{'):
            # Parse parameter names with special characters and optional syntax
            param_match = re.match(r'(?:\[)?([a-zA-Z_$][\w$.]*)(?:=([^]]+))?(?:\])?\s*(?:-\s*(.*))?$', remaining)
            
            if param_match:
                param_name = param_match.group(1)
                default_value = param_match.group(2)
                param_desc = param_match.group(3) or ''
                # Detect if parameter is optional (enclosed in [])
                is_optional = bool(re.match(r'^\[([a-zA-Z_$][\w$.]*)(?:=[^]]+)?\]', remaining))
            # ... (more parsing logic)
        
        # Handle nested parameters (properties)
        if '.' in param_name:
            parent_name, nested_name = param_name.split('.', 1)
            # ... (nested parameter logic)

This handles various parameter formats:

@param {string} name - Description
@param {number} [timeout=5000] - Optional with default
@param {Object} options.timeout - Nested property

The Composer: Ensuring Consistent Output

The composer takes our structured data and generates consistently formatted JSDoc:

python

def compose_jsdoc(jsdoc_obj: Dict[str, Any]) -> str:
    """Compose a JSDoc string from a structured dictionary."""
    lines = ['/**']
    
    # Add the description
    if 'description' in jsdoc_obj and jsdoc_obj['description']:
        for line in jsdoc_obj['description'].split('\n'):
            lines.append(f' * {line}')
        lines.append(' *')
    
    # Add the params
    if 'params' in jsdoc_obj:
        for param in jsdoc_obj['params']:
            param_str = ' * @param'
            if 'type' in param and param['type']:
                param_str += f' {{{param["type"]}}}'
            if 'name' in param and param['name']:
                param_str += f' {param["name"]}'
            if 'description' in param and param['description']:
                param_str += f' - {param["description"]}'
            lines.append(param_str)
    
    # Add the returns
    if 'returns' in jsdoc_obj and jsdoc_obj['returns']:
        returns_str = ' * @returns'
        if 'type' in jsdoc_obj['returns'] and jsdoc_obj['returns']['type']:
            returns_str += f' {{{jsdoc_obj["returns"]["type"]}}}'
        if 'description' in jsdoc_obj['returns'] and jsdoc_obj['returns']['description']:
            returns_str += f' {jsdoc_obj["returns"]["description"]}'
        lines.append(returns_str)
    
    # ... (additional sections)
    
    lines.append(' */')
    return '\n'.join(lines)

Utility Functions: The Supporting Cast

We also built several utility functions to make working with JSDoc objects easier:

Type Information Extraction

python

def extract_type_info(type_str: str) -> Dict[str, Any]:
    """Extract detailed type information from a JSDoc type string."""
    result = {}
    
    # Check for union types
    if '|' in type_str:
        union_types = [t.strip() for t in type_str.split('|')]
        result['union'] = union_types
        return result
    
    # Check for generics/templates
    generic_match = re.match(r'(\w+)\s*<\s*(.+)\s*>', type_str)
    if generic_match:
        base_type = generic_match.group(1)
        params_str = generic_match.group(2)
        
        # Handle nested generics by counting brackets
        params = []
        current_param = ''
        bracket_level = 0
        
        for char in params_str:
            if char == ',' and bracket_level == 0:
                params.append(current_param.strip())
                current_param = ''
            else:
                if char == '<':
                    bracket_level += 1
                elif char == '>':
                    bracket_level -= 1
                current_param += char
        
        if current_param:
            params.append(current_param.strip())
        
        result['name'] = base_type
        result['params'] = params
        return result
    
    # Simple type
    result['name'] = type_str
    return result

JSDoc Object Merging

python

def merge_jsdoc_objects(base: Dict[str, Any], overlay: Dict[str, Any]) -> Dict[str, Any]:
    """Merge two JSDoc objects, with the overlay taking precedence."""
    result = base.copy()
    
    # Merge description
    if 'description' in overlay:
        result['description'] = overlay['description']
    
    # Merge params (with intelligent deduplication)
    if 'params' in overlay:
        if 'params' not in result:
            result['params'] = []
            
        # Create lookup for existing params
        param_lookup = {p['name']: i for i, p in enumerate(result['params'])}
        
        for overlay_param in overlay['params']:
            if overlay_param['name'] in param_lookup:
                # Update existing param
                idx = param_lookup[overlay_param['name']]
                result['params'][idx].update(overlay_param)
            else:
                # Add new param
                result['params'].append(overlay_param)
    
    # ... (additional merging logic)
    
    return result

Command Line Interface

To make the parser easily accessible, we built a command-line interface:

python

def main():
    """Entry point for the command-line interface."""
    parser = argparse.ArgumentParser(description='Parse and compose JSDoc strings')
    subparsers = parser.add_subparsers(dest='command', required=True, help='Command to execute')
    
    # Parse command
    parse_parser = subparsers.add_parser('parse', help='Parse a JSDoc string into a JSON object')
    parse_parser.add_argument('file', type=str, nargs='?', help='File containing a JSDoc string (or use stdin)')
    parse_parser.add_argument('-o', '--output', type=str, help='Output file (default: stdout)')
    
    # Compose command
    compose_parser = subparsers.add_parser('compose', help='Compose a JSDoc string from a JSON object')
    compose_parser.add_argument('file', type=str, nargs='?', help='JSON file containing a JSDoc object (or use stdin)')
    compose_parser.add_argument('-o', '--output', type=str, help='Output file (default: stdout)')
    
    # ... (additional commands)

This allows developers to use the parser from the command line:

bash

# Parse a JSDoc string
echo "/** @param {string} name */" | jsdoc-parser parse

# Compose from JSON
echo '{"params": [{"name": "id", "type": "number"}]}' | jsdoc-parser compose

Comprehensive Testing Strategy

Testing was crucial for ensuring reliability. We implemented several types of tests:

Integration Tests

python

def test_round_trip_complex(self):
    """Test round-trip parsing and composing of a complex JSDoc."""
    original = """/**
 * Calculates the sum of two numbers
 * 
 * @param {number} a - First number
 * @param {number} b - Second number
 * @returns {number} The sum of a and b
 * @throws {TypeError} If a or b are not numbers
 * @example
 * add(1, 2); // returns 3
 * @since v1.0.0
 */"""
    parsed = parse_jsdoc(original)
    composed = compose_jsdoc(parsed)
    reparsed = parse_jsdoc(composed)
    
    # Verify that the essential content is preserved
    self.assertEqual(parsed['description'], reparsed['description'])
    self.assertEqual(len(parsed['params']), len(reparsed['params']))
    
    for i in range(len(parsed['params'])):
        self.assertEqual(parsed['params'][i]['name'], reparsed['params'][i]['name'])
        self.assertEqual(parsed['params'][i]['type'], reparsed['params'][i]['type'])
        self.assertEqual(parsed['params'][i]['description'], reparsed['params'][i]['description'])

Edge Case Testing

We tested numerous edge cases:

Empty JSDoc comments
Malformed type definitions
Nested parameters
Unicode characters
Complex generic types
Parameters with special characters

Real-World Usage Examples

Basic Usage

python

from jsdoc_parser import parse_jsdoc, compose_jsdoc

# Parse a JSDoc string
jsdoc_str = """/**
 * Calculates the sum of two numbers
 * @param {number} a - First number
 * @param {number} b - Second number
 * @returns {number} The sum of a and b
 */"""

parsed = parse_jsdoc(jsdoc_str)
print(parsed)
# Output: {
#     'description': 'Calculates the sum of two numbers',
#     'params': [
#         {'name': 'a', 'type': 'number', 'description': 'First number'},
#         {'name': 'b', 'type': 'number', 'description': 'Second number'}
#     ],
#     'returns': {'type': 'number', 'description': 'The sum of a and b'}
# }

# Modify and recompose
parsed['params'][0]['description'] = 'The first addend'
new_jsdoc = compose_jsdoc(parsed)

Advanced Usage with Complex Types

python

# Handle complex types
complex_jsdoc = """/**
 * Processes user data
 * @param {Map<string, Array<User>>} userGroups - User groups by category
 * @param {Promise<{status: string, data: Object}>} response - API response
 * @returns {Object<string, boolean>} Processing results
 */"""

parsed = parse_jsdoc(complex_jsdoc)
# The parser correctly extracts complex generic types

Challenges and Solutions

Challenge 1: Nested Brace Handling

Problem: Complex types with nested braces like {Map<string, Array<{id: number, name: string}>>} were difficult to parse correctly.

Solution: We implemented a brace-counting algorithm that tracks nesting levels and correctly identifies matching braces.

Challenge 2: Parameter Name Variations

Problem: JSDoc allows various parameter naming conventions:

@param {string} name
@param {string} [name] (optional)
@param {string} [name=default] (with default)
@param {string} options.timeout (nested)

Solution: We used comprehensive regex patterns and conditional logic to handle all variations while maintaining backward compatibility.

Challenge 3: Preserving Examples Formatting

Problem: Code examples need to preserve whitespace and formatting, unlike other JSDoc sections.

Solution: We implemented special handling for @example tags that preserves original formatting:

python

if tag == 'example':
    content_str = '\n'.join(content).strip()
else:
    # For non-example tags, join all lines with spaces
    content_str = ' '.join([line.strip() for line in content if line.strip()]).strip()

Challenge 4: Round-Trip Consistency

Problem: Ensuring that compose_jsdoc(parse_jsdoc(original)) produces equivalent output.

Solution: Extensive integration testing and careful handling of edge cases in both parser and composer.

Performance Considerations

The parser is designed for efficiency:

Single-pass parsing: We process each line only once
Lazy evaluation: Complex type parsing only occurs when needed
Memory efficiency: We use generators where possible and avoid unnecessary string copies
Caching: Type information extraction results are cached for repeated use

Open Source Impact

Since open-sourcing the project, we've seen:

50+ GitHub stars in the first month
Multiple contributors adding support for new JSDoc tags
Integration into CI/CD pipelines for documentation validation
Extensions for TypeScript and Flow type annotations

Future Enhancements

We're actively working on:

TypeScript Support: Better handling of TypeScript-specific type annotations
JSDoc 4.0 Compatibility: Support for newer JSDoc features
Performance Optimizations: Faster parsing for large codebases
IDE Integrations: VS Code extension for real-time JSDoc validation
Documentation Generation: Direct integration with documentation generators

Installation and Getting Started

bash

# Install from PyPI
pip install jsdoc-parser

# Or install from source
git clone https://github.com/Penify-dev/jsdoc-parser.git
cd jsdoc-parser
pip install -e .

Conclusion

Building this JSDoc parser taught us valuable lessons about:

The importance of robust input validation
The complexity of parsing human-readable documentation formats
The value of comprehensive testing for edge cases
The benefits of open-sourcing internal tools

What started as a solution to AI-generated documentation inconsistency has evolved into a comprehensive tool that benefits the entire JavaScript/TypeScript community. The parser now handles edge cases we never anticipated and supports use cases far beyond our original requirements.

The journey from a simple internal tool to an open-source project demonstrates how solving your own problems can create value for others. We encourage you to try the parser, contribute to its development, and share your own experiences with documentation tooling.

Links:

Have questions or suggestions? Feel free to open an issue on GitHub or reach out to our team!

Subscribe to our newsletter

The Journey of Building a JSDoc Parser: From AI Documentation Chaos to Open Source Solution ​

Introduction ​

The Problem: When AI Gets Creative with Documentation ​

The Solution: A Robust JSDoc Parser and Composer ​

Core Architecture ​

Deep Dive: The Parser Implementation ​

1. Main Parser Function ​

2. Complex Type Extraction ​

3. Parameter Processing ​

The Composer: Ensuring Consistent Output ​

Utility Functions: The Supporting Cast ​

Type Information Extraction ​

JSDoc Object Merging ​

Command Line Interface ​

Comprehensive Testing Strategy ​

Integration Tests ​

Edge Case Testing ​

Real-World Usage Examples ​

Basic Usage ​

Advanced Usage with Complex Types ​

Challenges and Solutions ​

Challenge 1: Nested Brace Handling ​

Challenge 2: Parameter Name Variations ​

Challenge 3: Preserving Examples Formatting ​

Challenge 4: Round-Trip Consistency ​

Performance Considerations ​

Open Source Impact ​

Future Enhancements ​

Installation and Getting Started ​

Conclusion ​

The Journey of Building a JSDoc Parser: From AI Documentation Chaos to Open Source Solution

Introduction

The Problem: When AI Gets Creative with Documentation

The Solution: A Robust JSDoc Parser and Composer

Core Architecture

Deep Dive: The Parser Implementation

1. Main Parser Function

2. Complex Type Extraction

3. Parameter Processing

The Composer: Ensuring Consistent Output

Utility Functions: The Supporting Cast

Type Information Extraction

JSDoc Object Merging

Command Line Interface

Comprehensive Testing Strategy

Integration Tests

Edge Case Testing

Real-World Usage Examples

Basic Usage

Advanced Usage with Complex Types

Challenges and Solutions

Challenge 1: Nested Brace Handling

Challenge 2: Parameter Name Variations

Challenge 3: Preserving Examples Formatting

Challenge 4: Round-Trip Consistency

Performance Considerations

Open Source Impact

Future Enhancements

Installation and Getting Started

Conclusion