Skip to content

The Journey of Building a JSDoc Parser: From AI Documentation Chaos to Open Source Solution ​

By Suman Saurabh - May 26, 2025

Introduction ​

In the early days of working with AI-powered documentation tools, one of the most frustrating challenges we encountered was ensuring that generated documentation followed specific patterns and standards. This was particularly acute when working with JSDoc, where consistency, proper formatting, and adherence to established conventions are critical for maintaining code quality and developer experience.

What started as a simple need to validate AI-generated JSDoc comments evolved into a comprehensive parser and composer library that we've now open-sourced. This blog chronicles that journeyβ€”the technical challenges we faced, the architectural decisions we made, and the fundamental building blocks that power our JSDoc parser.

The Problem: When AI Gets Creative with Documentation ​

When we first started using LLMs to generate documentation, the results were... interesting. While the content was often accurate, the formatting was inconsistent:

  • Parameters would sometimes lack type information
  • Return statements would be formatted differently across functions
  • Complex types like generics and unions were handled inconsistently
  • Nested parameters and properties were often malformed
  • Examples would be structured differently each time

Here's what we were dealing with:

javascript
/**
 * Some function description
 * @param name The user name
 * @param {Object} options - Configuration object
 * @param options.timeout {number} - Timeout value
 * @returns Promise<User> | null
 * @example
 * createUser("john")
 * @example
createUser("jane", { timeout: 5000 })
 */

While technically valid JSDoc, this inconsistency made it difficult to:

  • Parse documentation programmatically
  • Validate documentation completeness
  • Maintain consistent styling across codebases
  • Generate reliable API documentation

The Solution: A Robust JSDoc Parser and Composer ​

We decided to build a comprehensive solution that could both parse existing JSDoc strings into structured data and compose new JSDoc strings from structured objects. This two-way capability would allow us to:

  1. Parse inconsistent JSDoc into a standardized format
  2. Validate and modify the structured data
  3. Compose it back into properly formatted JSDoc

Core Architecture ​

Our parser follows a three-stage pipeline:

JSDoc String β†’ Structured Dictionary β†’ JSDoc String
     ↑                   ↓                    ↑
  parse_jsdoc()    [manipulation]    compose_jsdoc()

Deep Dive: The Parser Implementation ​

Let's examine the core components of our parser:

1. Main Parser Function ​

The heart of our parser is the parse_jsdoc() function:

python
def parse_jsdoc(docstring: str) -> Dict[str, Any]:
    """Parse a JSDoc string into a structured dictionary."""
    result = {
        'description': '',
        'params': [],
        'returns': None,
        'throws': [],
        'examples': [],
        'tags': {}
    }
    
    # Clean up the docstring
    docstring = docstring.strip()
    
    # Remove the opening and closing markers /** and */
    if docstring.startswith('/**'):
        docstring = docstring[3:]
    if docstring.endswith('*/'):
        docstring = docstring[:-2]
        
    # Split into lines and clean them up
    lines = [line.strip() for line in docstring.split('\n')]
    lines = [re.sub(r'^[ \t]*\*', '', line).strip() for line in lines]
    
    # Process the lines
    current_tag = None
    current_content = []

    for line in lines:
        # Check if the line starts with a tag
        tag_match = re.match(r'^@(\w+)\s*(.*)', line)

        if tag_match:
            # Process the previous tag if there was one
            if current_tag:
                _process_tag(current_tag, current_content, result)

            # Start a new tag
            current_tag = tag_match.group(1)
            current_content = [tag_match.group(2)]
        elif current_tag:
            # Continue with the current tag
            current_content.append(line)
        else:
            # This is part of the description
            if line:
                if result['description']:
                    result['description'] += '\n' + line
                else:
                    result['description'] = line

    # Process the last tag if there was one
    if current_tag:
        _process_tag(current_tag, current_content, result)
    
    return result

2. Complex Type Extraction ​

One of the most challenging aspects was handling complex type definitions. We built a robust type extraction system that handles nested braces:

python
def _extract_type_from_braces(content: str) -> Tuple[Optional[str], str]:
    """Extract a type definition from curly braces, handling nested braces."""
    if not content.startswith('{'):
        return None, content
        
    # Count braces to handle nested structures
    brace_count = 0
    for i, char in enumerate(content):
        if char == '{':
            brace_count += 1
        elif char == '}':
            brace_count -= 1
            if brace_count == 0:
                # Found the closing brace
                extracted = content[1:i].strip()
                return None if not extracted else extracted, content[i+1:].strip()
    
    # No matching closing brace found
    return None, content

This function correctly handles complex types like:

  • {Map<string, Array<number>>}
  • {Promise<{id: string, data: Object}>}
  • {function(string, number): boolean}

3. Parameter Processing ​

Processing parameters was particularly complex due to the variety of valid JSDoc parameter formats:

python
def _process_tag(tag: str, content: List[str], result: Dict[str, Any]) -> None:
    """Process a JSDoc tag and update the result dictionary."""
    # ... (content joining logic)
    
    if tag == 'param' or tag == 'argument' or tag == 'arg':
        # First extract the type if present using brace matching
        param_type, remaining = _extract_type_from_braces(content_str)
        
        if param_type is not None or content_str.startswith('{'):
            # Parse parameter names with special characters and optional syntax
            param_match = re.match(r'(?:\[)?([a-zA-Z_$][\w$.]*)(?:=([^]]+))?(?:\])?\s*(?:-\s*(.*))?$', remaining)
            
            if param_match:
                param_name = param_match.group(1)
                default_value = param_match.group(2)
                param_desc = param_match.group(3) or ''
                # Detect if parameter is optional (enclosed in [])
                is_optional = bool(re.match(r'^\[([a-zA-Z_$][\w$.]*)(?:=[^]]+)?\]', remaining))
            # ... (more parsing logic)
        
        # Handle nested parameters (properties)
        if '.' in param_name:
            parent_name, nested_name = param_name.split('.', 1)
            # ... (nested parameter logic)

This handles various parameter formats:

  • @param {string} name - Description
  • @param {number} [timeout=5000] - Optional with default
  • @param {Object} options.timeout - Nested property

The Composer: Ensuring Consistent Output ​

The composer takes our structured data and generates consistently formatted JSDoc:

python
def compose_jsdoc(jsdoc_obj: Dict[str, Any]) -> str:
    """Compose a JSDoc string from a structured dictionary."""
    lines = ['/**']
    
    # Add the description
    if 'description' in jsdoc_obj and jsdoc_obj['description']:
        for line in jsdoc_obj['description'].split('\n'):
            lines.append(f' * {line}')
        lines.append(' *')
    
    # Add the params
    if 'params' in jsdoc_obj:
        for param in jsdoc_obj['params']:
            param_str = ' * @param'
            if 'type' in param and param['type']:
                param_str += f' {{{param["type"]}}}'
            if 'name' in param and param['name']:
                param_str += f' {param["name"]}'
            if 'description' in param and param['description']:
                param_str += f' - {param["description"]}'
            lines.append(param_str)
    
    # Add the returns
    if 'returns' in jsdoc_obj and jsdoc_obj['returns']:
        returns_str = ' * @returns'
        if 'type' in jsdoc_obj['returns'] and jsdoc_obj['returns']['type']:
            returns_str += f' {{{jsdoc_obj["returns"]["type"]}}}'
        if 'description' in jsdoc_obj['returns'] and jsdoc_obj['returns']['description']:
            returns_str += f' {jsdoc_obj["returns"]["description"]}'
        lines.append(returns_str)
    
    # ... (additional sections)
    
    lines.append(' */')
    return '\n'.join(lines)

Utility Functions: The Supporting Cast ​

We also built several utility functions to make working with JSDoc objects easier:

Type Information Extraction ​

python
def extract_type_info(type_str: str) -> Dict[str, Any]:
    """Extract detailed type information from a JSDoc type string."""
    result = {}
    
    # Check for union types
    if '|' in type_str:
        union_types = [t.strip() for t in type_str.split('|')]
        result['union'] = union_types
        return result
    
    # Check for generics/templates
    generic_match = re.match(r'(\w+)\s*<\s*(.+)\s*>', type_str)
    if generic_match:
        base_type = generic_match.group(1)
        params_str = generic_match.group(2)
        
        # Handle nested generics by counting brackets
        params = []
        current_param = ''
        bracket_level = 0
        
        for char in params_str:
            if char == ',' and bracket_level == 0:
                params.append(current_param.strip())
                current_param = ''
            else:
                if char == '<':
                    bracket_level += 1
                elif char == '>':
                    bracket_level -= 1
                current_param += char
        
        if current_param:
            params.append(current_param.strip())
        
        result['name'] = base_type
        result['params'] = params
        return result
    
    # Simple type
    result['name'] = type_str
    return result

JSDoc Object Merging ​

python
def merge_jsdoc_objects(base: Dict[str, Any], overlay: Dict[str, Any]) -> Dict[str, Any]:
    """Merge two JSDoc objects, with the overlay taking precedence."""
    result = base.copy()
    
    # Merge description
    if 'description' in overlay:
        result['description'] = overlay['description']
    
    # Merge params (with intelligent deduplication)
    if 'params' in overlay:
        if 'params' not in result:
            result['params'] = []
            
        # Create lookup for existing params
        param_lookup = {p['name']: i for i, p in enumerate(result['params'])}
        
        for overlay_param in overlay['params']:
            if overlay_param['name'] in param_lookup:
                # Update existing param
                idx = param_lookup[overlay_param['name']]
                result['params'][idx].update(overlay_param)
            else:
                # Add new param
                result['params'].append(overlay_param)
    
    # ... (additional merging logic)
    
    return result

Command Line Interface ​

To make the parser easily accessible, we built a command-line interface:

python
def main():
    """Entry point for the command-line interface."""
    parser = argparse.ArgumentParser(description='Parse and compose JSDoc strings')
    subparsers = parser.add_subparsers(dest='command', required=True, help='Command to execute')
    
    # Parse command
    parse_parser = subparsers.add_parser('parse', help='Parse a JSDoc string into a JSON object')
    parse_parser.add_argument('file', type=str, nargs='?', help='File containing a JSDoc string (or use stdin)')
    parse_parser.add_argument('-o', '--output', type=str, help='Output file (default: stdout)')
    
    # Compose command
    compose_parser = subparsers.add_parser('compose', help='Compose a JSDoc string from a JSON object')
    compose_parser.add_argument('file', type=str, nargs='?', help='JSON file containing a JSDoc object (or use stdin)')
    compose_parser.add_argument('-o', '--output', type=str, help='Output file (default: stdout)')
    
    # ... (additional commands)

This allows developers to use the parser from the command line:

bash
# Parse a JSDoc string
echo "/** @param {string} name */" | jsdoc-parser parse

# Compose from JSON
echo '{"params": [{"name": "id", "type": "number"}]}' | jsdoc-parser compose

Comprehensive Testing Strategy ​

Testing was crucial for ensuring reliability. We implemented several types of tests:

Integration Tests ​

python
def test_round_trip_complex(self):
    """Test round-trip parsing and composing of a complex JSDoc."""
    original = """/**
 * Calculates the sum of two numbers
 * 
 * @param {number} a - First number
 * @param {number} b - Second number
 * @returns {number} The sum of a and b
 * @throws {TypeError} If a or b are not numbers
 * @example
 * add(1, 2); // returns 3
 * @since v1.0.0
 */"""
    parsed = parse_jsdoc(original)
    composed = compose_jsdoc(parsed)
    reparsed = parse_jsdoc(composed)
    
    # Verify that the essential content is preserved
    self.assertEqual(parsed['description'], reparsed['description'])
    self.assertEqual(len(parsed['params']), len(reparsed['params']))
    
    for i in range(len(parsed['params'])):
        self.assertEqual(parsed['params'][i]['name'], reparsed['params'][i]['name'])
        self.assertEqual(parsed['params'][i]['type'], reparsed['params'][i]['type'])
        self.assertEqual(parsed['params'][i]['description'], reparsed['params'][i]['description'])

Edge Case Testing ​

We tested numerous edge cases:

  • Empty JSDoc comments
  • Malformed type definitions
  • Nested parameters
  • Unicode characters
  • Complex generic types
  • Parameters with special characters

Real-World Usage Examples ​

Basic Usage ​

python
from jsdoc_parser import parse_jsdoc, compose_jsdoc

# Parse a JSDoc string
jsdoc_str = """/**
 * Calculates the sum of two numbers
 * @param {number} a - First number
 * @param {number} b - Second number
 * @returns {number} The sum of a and b
 */"""

parsed = parse_jsdoc(jsdoc_str)
print(parsed)
# Output: {
#     'description': 'Calculates the sum of two numbers',
#     'params': [
#         {'name': 'a', 'type': 'number', 'description': 'First number'},
#         {'name': 'b', 'type': 'number', 'description': 'Second number'}
#     ],
#     'returns': {'type': 'number', 'description': 'The sum of a and b'}
# }

# Modify and recompose
parsed['params'][0]['description'] = 'The first addend'
new_jsdoc = compose_jsdoc(parsed)

Advanced Usage with Complex Types ​

python
# Handle complex types
complex_jsdoc = """/**
 * Processes user data
 * @param {Map<string, Array<User>>} userGroups - User groups by category
 * @param {Promise<{status: string, data: Object}>} response - API response
 * @returns {Object<string, boolean>} Processing results
 */"""

parsed = parse_jsdoc(complex_jsdoc)
# The parser correctly extracts complex generic types

Challenges and Solutions ​

Challenge 1: Nested Brace Handling ​

Problem: Complex types with nested braces like {Map<string, Array<{id: number, name: string}>>} were difficult to parse correctly.

Solution: We implemented a brace-counting algorithm that tracks nesting levels and correctly identifies matching braces.

Challenge 2: Parameter Name Variations ​

Problem: JSDoc allows various parameter naming conventions:

  • @param {string} name
  • @param {string} [name] (optional)
  • @param {string} [name=default] (with default)
  • @param {string} options.timeout (nested)

Solution: We used comprehensive regex patterns and conditional logic to handle all variations while maintaining backward compatibility.

Challenge 3: Preserving Examples Formatting ​

Problem: Code examples need to preserve whitespace and formatting, unlike other JSDoc sections.

Solution: We implemented special handling for @example tags that preserves original formatting:

python
if tag == 'example':
    content_str = '\n'.join(content).strip()
else:
    # For non-example tags, join all lines with spaces
    content_str = ' '.join([line.strip() for line in content if line.strip()]).strip()

Challenge 4: Round-Trip Consistency ​

Problem: Ensuring that compose_jsdoc(parse_jsdoc(original)) produces equivalent output.

Solution: Extensive integration testing and careful handling of edge cases in both parser and composer.

Performance Considerations ​

The parser is designed for efficiency:

  1. Single-pass parsing: We process each line only once
  2. Lazy evaluation: Complex type parsing only occurs when needed
  3. Memory efficiency: We use generators where possible and avoid unnecessary string copies
  4. Caching: Type information extraction results are cached for repeated use

Open Source Impact ​

Since open-sourcing the project, we've seen:

  • 50+ GitHub stars in the first month
  • Multiple contributors adding support for new JSDoc tags
  • Integration into CI/CD pipelines for documentation validation
  • Extensions for TypeScript and Flow type annotations

Future Enhancements ​

We're actively working on:

  1. TypeScript Support: Better handling of TypeScript-specific type annotations
  2. JSDoc 4.0 Compatibility: Support for newer JSDoc features
  3. Performance Optimizations: Faster parsing for large codebases
  4. IDE Integrations: VS Code extension for real-time JSDoc validation
  5. Documentation Generation: Direct integration with documentation generators

Installation and Getting Started ​

bash
# Install from PyPI
pip install jsdoc-parser

# Or install from source
git clone https://github.com/Penify-dev/jsdoc-parser.git
cd jsdoc-parser
pip install -e .

Conclusion ​

Building this JSDoc parser taught us valuable lessons about:

  • The importance of robust input validation
  • The complexity of parsing human-readable documentation formats
  • The value of comprehensive testing for edge cases
  • The benefits of open-sourcing internal tools

What started as a solution to AI-generated documentation inconsistency has evolved into a comprehensive tool that benefits the entire JavaScript/TypeScript community. The parser now handles edge cases we never anticipated and supports use cases far beyond our original requirements.

The journey from a simple internal tool to an open-source project demonstrates how solving your own problems can create value for others. We encourage you to try the parser, contribute to its development, and share your own experiences with documentation tooling.


Links:

Have questions or suggestions? Feel free to open an issue on GitHub or reach out to our team!