Python with DOCX
SolvedHello,
I am a new user of Python.
I am using Python 3.11 with python-docx-0.8.11, with which I am creating a WORD document containing a table with 2 columns and X rows.
I would like to be able to use the table property "Allow line breaks across pages" to avoid having a line split across multiple pages.
Does anyone know how to access this property?
Thank you
Stéphane
3 réponses
Hello Stéphane,
Some preliminary questions to better understand your problem and what you are trying to achieve.
- Why store a table in a docx file, wouldn't it be more appropriate to create an excel file? If so, we usually use pandas to create a dataframe, and save it in excel format (see here).
- There seems to be a newer version of python-docx, why not use it?
- Have you looked at the python-docx documentation? In particular the API of the Table object and the index?
- Have you started writing a python script? If so, what does it contain?
Good luck
Hello,
I have a lot of images to import into a Word file report.
To do this, I use a hierarchy of directories where I place the images to be imported. My Python development scans the directories and creates the Word file by taking each directory name and assigning it a Heading style with the corresponding hierarchical level and imports all images found into a 2-column table (modifying the dpi and sizes).
I am on a network completely disconnected from the WEB, which is why I am using version 0.8.11 of docx. I don't think it is a version issue; it's just that I don't know how to use Python well.
I looked at the documentation but I am stuck with the function identified on this page (property: AllowBreakAcrossPages)
Here is my current code without the user interface:
My attempt is located at line 110.
# -*- coding: utf8 -*- """ Script file and functional for writing docx documents """ from docx import Document from docx.shared import Inches from docx.enum.text import WD_ALIGN_PARAGRAPH from docx.enum.table import WD_TABLE_ALIGNMENT import PIL from PIL import Image import os import io import time import interface_pour_docx as itf #Supported formats by Pillow: PNG,JPEG,PPM,TIFF,BMP def process_image(image_file,image_loc,table_for_dir_count,cell,dpi=220,inches=3,titres=True): """ Function to process an image file found in a folder Arguments (* for mandatory arguments): *image_file : image in the format opened by Pillow *image_loc : location of the image *table_for_dir_count : counter for the number of images already processed for the part or sub-part *cell : cell in which to add the image dpi : resolution in dots per inch inches : size of the image in the docx (in inches) titres : variable indicating whether to add image titles as captions Returns: The counter of the number of already processed images incremented """ # Retrieve dpi info and crop the image according to the target dpi (w,h) = image_file.size w_obj = inches*dpi h_obj = h*w_obj/w image_out_shape = (int(w_obj),int(h_obj)) resized_img = image_file.resize(image_out_shape) #Save the image as a stream imdata = io.BytesIO() resized_img.save(imdata,format='png') # Add the image to the cell, along with its title p = cell.add_paragraph() p.alignment = WD_ALIGN_PARAGRAPH.CENTER run = p.add_run() run.add_picture(imdata,width=Inches(inches)) if titres: p.add_run(image_loc.split(".")[0].split("/")[-1]) else : p.add_run(" ") # Finally, return the incremented counter of processed images table_for_dir_count+=1 del imdata return table_for_dir_count def explore_dir(document,directory,depth,f_part_count = "",dpi=220,avec_numeros=True,titres=True,niveau_initial=0): """ Function for exploring a directory Arguments (* for mandatory arguments): *document : the output document (docx.Document) *directory : the directory to process *depth : the depth of the title f_part_count : an optional prefix to add for the title of the part (for console display only) dpi : resolution of images in dots per inch avec_numeros : indicates if the titles of subfolders have numbers (e.g., 01 folder-1, 02 folder-2...) titres : variable indicating whether to add image titles as captions niveau_initial : variable indicating the initial processing level Returns: does not return anything """ print(f"Processing part {f_part_count} (folder {directory})") # Recursively explore different folders and subfolders, adding the title as part/sub-part title, and adding the files if depth != niveau_initial and avec_numeros: header = " ".join(directory.split("/")[-1].split(" ")[1:]) else: header = directory.split("/")[-1] heading = document.add_heading(header,level=depth) table_for_dir_count = 0 # Counter for the number of images already added in this part sub_fold_counter = 0 lst = os.listdir(directory) lst.sort() for element in lst: # List everything in the directory in alphabetical order if os.path.isdir(directory+"/"+element): sub_fold_counter += 1 if depth==0: new_f_part_count = f"{sub_fold_counter}." else: new_f_part_count = f_part_count+f"{sub_fold_counter}." # If the element is a directory, search inside recursively explore_dir(document,directory+"/"+element,depth=depth+1,f_part_count=new_f_part_count,dpi=dpi,avec_numeros=avec_numeros,titres=titres,niveau_initial=niveau_initial) else: # Otherwise, try to open it as an image, and if successful, process it try: img = Image.open(directory+"/"+element) # Depending on the number of images already present, add a column or not, and point to the first empty cell if table_for_dir_count==0: table = document.add_table(rows=1,cols=2,style="Table Grid") #table.rows.AllowBreakAcrossPages=False cell = table.rows[0].cells[1] cell.width=Inches(3.15) cell = table.rows[0].cells[0] table.alignment = WD_TABLE_ALIGNMENT.CENTER cell.width=Inches(3.15) elif table_for_dir_count%2==0: cell = table.add_row().cells[0] else: cell = table.rows[-1].cells[1] table_for_dir_count = process_image(img,directory+"/"+element,table_for_dir_count,cell,dpi=dpi,titres=titres) except PIL.UnidentifiedImageError: continue def main(): """ Main function: opens the window to retrieve processing options, creates the document, and starts the exploration script. All that's left is to save it and that's it! """ name,dpi,in_directory,out_directory,dossiers_avec_numeros,titres,niveau_initial = itf.fenetre_analyse_docx() # Creation of the document document = Document() #Exploration explore_dir(document,in_directory,niveau_initial,dpi=dpi,avec_numeros=dossiers_avec_numeros,titres=titres,niveau_initial=niveau_initial) # Save print("Saving document") document.save(out_directory+f"/{name}.docx") print("Done!") time.sleep(1.) if __name__=="__main__": main() I hope I have been clear enough.
Thank you for your attention
Stéphane
Hello,
I think I understand. The only problem is that I don't know docx and therefore I would tend to get inspired by this solution.
That said, I wouldn't have done it like you. Indeed, you could simply create an HTML file, much easier to generate, and include your images there.
Assuming this solution interests you, I am proposing this other solution, you no longer need either docx or PIL. The document will be displayed in a "responsive" manner to the dimensions of your browser, and you won’t have to worry about the inherent page issues of Word documents. Finally, you'll be able to easily print your document from your browser.
#!/usr/bin/env python3 from pathlib import Path def make_html( photos_dir: Path, output_filename: Path, num_columns: int = 2, max_files_per_dir: int = None ): with open(output_filename, "w") as f: subdirs = photos_dir.glob("**/") print( f""" <html> <head> <style> h1 {{ color: blue; }} h2 {{ color: darkgreen; }} h3 {{ color: orange; }} h4 {{ color: red; }} h5 {{ color: purple; }} .grid-container {{ display: grid; grid-template-columns: {"auto " * num_columns}; padding: 2%; width: 100%; }} .grid-item {{ padding: 5%; }} </style> </head> <body>""", file=f ) for subdir in subdirs: n = len(str(subdir).split("/")) print( f""" <h{n}><pre>{subdir}</pre></h{n}> <div class="grid-container">""", file=f ) photos = subdir.glob("**/*jpg") for photo in list(photos)[:max_files_per_dir]: print( f""" <div class="grid-item"> <pre>{photo.name}</pre> <img src="{photo}" width="100%"/> </div>""", file=f ) print(""" </div>""", file=f ) print( """ </body> </html>""", file=f ) def main(): photos_dir = Path.home() / "Photos" output_filename = "toto.html" num_columns = 2 max_files_per_dir = 10 # None --if you want to take all files from each folder make_html(photos_dir, output_filename, num_columns) main() To improve this program, you can use the argparse module so that your executable takes parameters, which will avoid hardcoding them in the main function.
Some remarks about this program:
- The huge advantage of HTML/CSS compared to DOCX is that it is portable.
- In general, you don't necessarily have Word or LibreOffice installed on your computer/tablet/laptop, but you always have a browser.
- Here, my HTML file stores references to the images.
- The advantage is that the HTML file is very small, since the images are not stored within it.
- On the downside, the rendering can only happen in the presence of this image hierarchy.
- If you want to generate a "standalone" file, one solution would be to encode each image in base64 (see here).
- In CSS, the right way to display images is to use a grid.
- The advantage is that you don’t have to worry about any empty cells to add to complete the last row of the table.
- The principle is simple. You define a grid, list the elements without worrying about their number, and the browser takes care of placing and resizing them. The number of columns of the said grid is defined in the style of the grid-container (one "auto" per column). For more details, see here.
- My python code uses f-strings. These have been standard since python 3.6 and significantly lighten the code.
- This allows injecting code (between braces) at the time of creating the string. For example, I generate as many "auto" in the CSS style as the value defined in num_columns.
- On the other hand, it’s necessary to escape the braces by doubling them.
- To simplify the handling of paths, I use pathlib which is a standard module in modern versions of python3.
- My recommendation is to use pathlib as much as possible in python when manipulating paths; it allows for portable code between Windows and Linux and often more elegant.
- If you want to support the program to consider multiple file extensions (e.g., ".jpg", ".png", etc.), you need to slightly adjust the program by drawing inspiration from this message.
Good luck
Hello,
Thank you very much for all these explanations, but I really need to enable the "AllowBreakAcrossPages" property for all my tables contained in the Word file generated by my small development.
The generation of this document with the styles used is inserted into another Word document (copy-paste) which is actually a report that must be in Word format.
I have the name of the table property in Word but I don't know how to set it in Python.
Thanks again for your information, I will continue to dig into it but it's difficult for a beginner :)
Stéphane
Alright, so if the Word format is required, my solution is indeed out the window. So, with python-docx, have you tried this?
Good luck