Document objects¶
The main Document and related objects.
Document objects¶
-
class
docx.document.Document[source]¶ WordprocessingML (WML) document.
Not intended to be constructed directly. Use
docx.Document()to open or create a document.-
add_comment(runs: Run | Sequence[Run], text: str | None = '', author: str = '', initials: str | None = '') → Comment[source]¶ Add a comment to the document, anchored to the specified runs.
runs can be a single Run object or a non-empty sequence of Run objects. Only the first and last run of a sequence are used, it’s just more convenient to pass a whole sequence when that’s what you have handy, like paragraph.runs for example. When runs contains a single Run object, that run serves as both the first and last run.
A comment can be anchored only on an even run boundary, meaning the text the comment “references” must be a non-zero integer number of consecutive runs. The runs need not be _contiguous_ per se, like the first can be in one paragraph and the last in the next paragraph, but all runs between the first and the last will be included in the reference.
The comment reference range is delimited by placing a w:commentRangeStart element before the first run and a w:commentRangeEnd element after the last run. This is why only the first and last run are required and why a single run can serve as both first and last. Word works out which text to highlight in the UI based on these range markers.
text allows the contents of a simple comment to be provided in the call, providing for the common case where a comment is a single phrase or sentence without special formatting such as bold or italics. More complex comments can be added using the returned Comment object in much the same way as a Document or (table) Cell object, using methods like .add_paragraph(), .add_run()`, etc.
The author and initials parameters allow that metadata to be set for the comment. author is a required attribute on a comment and is the empty string by default. initials is optional on a comment and may be omitted by passing
None, but Word adds an initials attribute by default and we follow that convention by using the empty string when no initials argument is provided.
-
add_heading(text: str = '', level: int = 1)[source]¶ Return a heading paragraph newly added to the end of the document.
The heading paragraph will contain text and have its paragraph style determined by level. If level is 0, the style is set to Title. If level is 1 (or omitted), Heading 1 is used. Otherwise the style is set to Heading {level}. Raises
ValueErrorif level is outside the range 0-9.
-
add_paragraph(text: str = '', style: str | ParagraphStyle | None = None) → Paragraph[source]¶ Return paragraph newly added to the end of the document.
The paragraph is populated with text and having paragraph style style.
text can contain tab (
\t) characters, which are converted to the appropriate XML form for a tab. text can also include newline (\n) or carriage return (\r) characters, each of which is converted to a line break.
-
add_picture(image_path_or_stream: str | IO[bytes], width: int | Length | None = None, height: int | Length | None = None)[source]¶ Return new picture shape added in its own paragraph at end of the document.
The picture contains the image at image_path_or_stream, scaled based on width and height. If neither width nor height is specified, the picture appears at its native size. If only one is specified, it is used to compute a scaling factor that is then applied to the unspecified dimension, preserving the aspect ratio of the image. The native size of the picture is calculated using the dots-per-inch (dpi) value specified in the image file, defaulting to 72 dpi if no value is specified, as is often the case.
-
add_section(start_type: docx.enum.section.WD_SECTION_START = <WD_SECTION_START.NEW_PAGE: 2>)[source]¶ Return a
Sectionobject newly added at the end of the document.The optional start_type argument must be a member of the WD_SECTION_START enumeration, and defaults to
WD_SECTION.NEW_PAGEif not provided.
-
add_table(rows: int, cols: int, style: str | _TableStyle | None = None)[source]¶ Add a table having row and column counts of rows and cols respectively.
style may be a table style object or a table style name. If style is
None, the table inherits the default table style of the document.
-
core_properties¶ A
CorePropertiesobject providing Dublin Core properties of document.
-
inline_shapes¶ The
InlineShapescollection for this document.An inline shape is a graphical object, such as a picture, contained in a run of text and behaving like a character glyph, being flowed like other text in a paragraph.
-
iter_inner_content() → Iterator[Paragraph | Table][source]¶ Generate each Paragraph or Table in this document in document order.
-
paragraphs¶ The
Paragraphinstances in the document, in document order.Note that paragraphs within revision marks such as
<w:ins>or<w:del>do not appear in this list.
-
part¶ The
DocumentPartobject of this document.
-
CoreProperties objects¶
Each Document object provides access to its CoreProperties object via its
core_properties attribute. A CoreProperties object provides
read/write access to the so-called core properties for the document. The
core properties are author, category, comments, content_status, created,
identifier, keywords, language, last_modified_by, last_printed, modified,
revision, subject, title, and version.
Each property is one of three types, str, datetime.datetime, or int. String
properties are limited in length to 255 characters and return an empty string
(‘’) if not set. Date properties are assigned and returned as datetime.datetime
objects without timezone, i.e. in UTC. Any timezone conversions are the
responsibility of the client. Date properties return None if not set.
python-docx does not automatically set any of the document core properties other
than to add a core properties part to a presentation that doesn’t have one
(very uncommon). If python-docx adds a core properties part, it contains default
values for the title, last_modified_by, revision, and modified properties.
Client code should update properties like revision and last_modified_by
if that behavior is desired.
-
class
docx.opc.coreprops.CoreProperties[source]¶ string – An entity primarily responsible for making the content of the resource.
-
category¶ string – A categorization of the content of this package. Example values might include: Resume, Letter, Financial Forecast, Proposal, or Technical Presentation.
-
comments¶ string – An account of the content of the resource.
-
content_status¶ string – completion status of the document, e.g. ‘draft’
-
created¶ datetime – time of intial creation of the document
-
identifier¶ string – An unambiguous reference to the resource within a given context, e.g. ISBN.
-
keywords¶ string – descriptive words or short phrases likely to be used as search terms for this document
-
language¶ string – language the document is written in
-
last_modified_by¶ string – name or other identifier (such as email address) of person who last modified the document
-
last_printed¶ datetime – time the document was last printed
-
modified¶ datetime – time the document was last modified
-
revision¶ int – number of this revision, incremented by Word each time the document is saved. Note however
python-docxdoes not automatically increment the revision number when it saves a document.
-
subject¶ string – The topic of the content of the resource.
-
title¶ string – The name given to the resource.
-
version¶ string – free-form version string